Non-prehensile manipulation of diverse objects remains a core challenge in robotics, driven by unknown physical properties and the complexity of contact-rich interactions. Recent advances in contact-implicit model predictive control (CI-MPC), with contact reasoning embedded within the trajectory optimization, have shown promise in tackling the task efficiently and robustly, yet demonstrations have been limited to narrowly-curated examples. In this work, we showcase the broader capabilities of CI-MPC through precise planar pushing tasks over a wide range of object geometries, including multi-object domains. These scenarios demand reasoning over numerous inter-object and object-environment contacts to strategically manipulate and de-clutter the environment, challenges that were intractable for prior CI-MPC methods. To achieve this, we introduce Consensus Complementarity Control Plus (C3+), an enhanced CI-MPC algorithm integrated into a complete pipeline spanning object scanning, mesh reconstruction, and hardware execution. Compared to its predecessor C3 , C3+ achieves substantially faster solve times, enabling real-time performance even in multi-object pushing tasks. On hardware, our system achieves overall 98% success rate across 33 objects, reaching pose goals within tight tolerances. The average time-to-goal is approximately 0.5, 1.6, 3.2, and 5.3 minutes for 1-, 2-, 3-, and 4-object tasks, respectively.
We present the Push Anything framework (above), a
pipeline integrating object perception with a novel
controller. Our framework operates in two phases. In the
offline phase, we build an object library by scanning
objects to generate meshes (via BundleSDF) and
URDFs, assuming same mass and inertia. In the online phase,
our controller uses robot and object state estimates (the
latter using
FoundationPose) to compute end effector trajectories.
Following the approach in Yang
and Posa, these trajectories are tracked by a low-level
operational space controller (OSC).
Our controller is implemented in C++ within the
Drake systems framework.
We use three computers in our experiments: (i) an Intel Core i9-13900KF (13th-gen, 32
threads) dedicated to the sampling-based controller, (ii) an
Intel Core i7-9700K running the operational-space controller
and robot drivers on a real-time kernel for Franka communication, and
(iii) an Intel Core i9-14900K paired with an NVIDIA GeForce RTX 4090 for FoundationPose. All
computers communicate via LCM.
Our controller is built upon the
original framework from
Approximating Global CI-MPC.
In this approach, candidate end effector
positions are sampled and evaluated by solving the CI-MPC
problem, augmented with a travel cost from the current location.
The candidate with the lowest overall cost is selected,
and if it differs from the current pose, the robot first moves
along a collision-free path to that position before executing
CI-MPC. As illustrated in the figure beside, by steering the system
toward configurations that expand the reach of local MPC,
this strategy helps the controller overcome local minima and
achieve goals that require long-horizon reasoning.
The success of the sampling-based CI-MPC framework hinges on two key components: a robust
sampling strategy for selecting candidate end effector locations, and a
fast, effective local CI-MPC. While this approach is effective,
applying it to arbitrary objects, particularly in multi-object
scenarios, requires significant adaptations.
We pre-process object meshes by storing body-frame face locations, areas, and normal vectors. Given world-frame object pose estimates, we generate a candidate end effector location by first performing several random sampling steps in series: 1) select an object uniformly, 2) select a stored face of this object weighted by area, and 3) sample a point lying on this face. This surface point is then projected a fixed distance along the face's outward normal vector then projected to a fixed world height (see below). We reject samples which, even after projection away from one face, are too close to any of the objects. This can occur due to object non-convexity, the presence of multiple objects, or the selection of a face whose normal is too vertical. We repeat this process until the desired number of end effector candidates is obtained.
Above is a visualization of the sampling strategy for end effector locations. The gray plane indicates the ground, and the orange planes represent local tangent planes to the mesh surfaces. Blue arrows project surface samples outwards from the mesh along the face normals, then purple arrows project those to a fixed height in the world, generating candidate samples (green dots). Samples located too close to object surfaces (e.g. red dot) are discarded.
C3+ is an enhanced version of C3, which leverages the Alternating Direction Method of Multipliers (ADMM) to solve a Quadratic Program with Complementarity Constraints (QPCC), where the system dynamics are modeled as a Linear Complementarity System (LCS). C3 decouples the contact scheduling problem, which involves solving the complementarity constraints, across planning steps, enabling parallelization where each subproblem solves a small-scale Mixed-Integer Quadratic Program (MIQP). Building on this idea, our proposed C3+ algorithm introduces a new slack variable that further decouples the contact problem across individual contacts. This reformulation makes it equivalent to solving multiple independent 1D MIQPs, each has a simple closed-form solution, thereby transforming the costly, coupled exponential-time MIQP into a constant-time analytical computation and achieving a substantial overall speedup. More detailed derivations can be found in our manuscript.
We evaluated our method in 701 hardware trials, testing 25 objects, with each object run until 28 successful trials were obtained. The system achieved a 99.9% success rate (700/701), with the only failure occurring when the large egg carton was pushed out of the robot’s reach. More details can be found in the plot below.
We conducted a total of 227 trials, comprising 10 experiments for the 2-object case, 6 for the 3-object, and 5 for the 4-object case. Each experiment was run until 10 successful trials were achieved. Our method achieved a 92.5% success rate (210/227), with detailed results under both tight and loose tolerances presented in the table below. All failures occurred when an object moved beyond the robot’s reach. The mean time-to-goal for these tasks was approximately 96.4s, 191.1s, and 315.7s for the 2-, 3-, and 4-object settings, respectively. These numbers do not scale linearly with the number of objects because goal assignments are permuted across objects in each trial, requiring object rearrangements that introduce additional time.
| # Objs | Object Names | Success Rate | Control Rate (Hz) | Time to Goal (s) within Pose Tolerances | |||
|---|---|---|---|---|---|---|---|
| Tight (2cm, 0.1rad) | Loose (5cm, 0.4rad) | ||||||
| Mean ± σ | Min, Max | Mean ± σ | Min, Max | ||||
| 2 | Lotion & Letter R | 100/102 (98.0%) | 14.06 | 97.01 ± 45.92 | 47.49, 204.35 | 74.26 ± 39.33 | 17.91, 171.49 |
| Baby toy & Letter E | 14.34 | 106.91 ± 47.62 | 60.98, 232.75 | 54.13 ± 14.28 | 20.95, 71.86 | ||
| Letter B & Letter 3 | 14.31 | 85.31 ± 28.30 | 41.97, 129.53 | 53.64 ± 24.39 | 17.98, 114.74 | ||
| Chicken Broth & Expo Box | 13.84 | 99.35 ± 38.26 | 63.78, 180.85 | 75.94 ± 28.79 | 53.77, 158.16 | ||
| Chicken Broth & Wood Block | 14.12 | 74.43 ± 20.54 | 41.69, 107.64 | 54.62 ± 14.29 | 32.24, 76.12 | ||
| Clamp & Letter I | 14.14 | 87.19 ± 26.44 | 42.45, 132.87 | 44.15 ± 15.77 | 28.52, 80.74 | ||
| Book & Letter S | 14.43 | 92.88 ± 27.27 | 67.60, 168.00 | 81.51 ± 27.38 | 61.09, 154.14 | ||
| Tape & Letter A | 14.36 | 119.07 ± 44.10 | 69.88, 231.70 | 73.62 ± 24.24 | 50.63, 138.86 | ||
| Letter T & Letter H | 11.76 | 79.63 ± 21.47 | 44.14, 119.52 | 47.17 ± 11.19 | 34.06, 69.34 | ||
| Letter G & Xbox | 13.93 | 104.37 ± 23.78 | 75.97, 149.74 | 64.50 ± 14.89 | 45.15, 89.71 | ||
| 3 | Letter R & Letter A & Letter S | 60/62 (96.8%) | 14.70 | 185.94 ± 63.01 | 113.44, 299.09 | 141.28 ± 48.89 | 94.30, 227.94 |
| Letter C & Letter 3 & Letter + | 14.85 | 173.43 ± 29.56 | 135.00, 219.33 | 116.54 ± 21.72 | 80.66, 148.72 | ||
| Letter A & Letter N & Letter Y | 14.72 | 159.63 ± 37.27 | 111.57, 237.65 | 121.65 ± 26.65 | 92.85, 175.08 | ||
| Letter I & Letter N & Letter G | 14.84 | 173.98 ± 46.20 | 115.48, 275.02 | 120.18 ± 40.28 | 68.20, 189.52 | ||
| Letter D & Letter I & Letter Y | 14.72 | 188.96 ± 38.93 | 139.52, 247.20 | 137.41 ± 19.84 | 106.46, 173.66 | ||
| Clamp & Lotion & Book | 15.19 | 224.03 ± 53.01 | 148.22, 307.58 | 157.98 ± 41.55 | 97.30, 228.89 | ||
| 4 | PUSH | 50/63 (79.3%) | 9.32 | 312.34 ± 60.07 | 160.34, 394.28 | 248.64 ± 60.06 | 159.82, 366.18 |
| ICRA | 8.63 | 267.85 ± 59.80 | 176.62, 396.33 | 208.86 ± 44.50 | 126.65, 278.78 | ||
| URDF | 9.10 | 269.01 ± 93.90 | 149.50, 465.47 | 204.92 ± 84.96 | 121.98, 401.88 | ||
| C3PO | 9.31 | 281.66 ± 85.97 | 120.23, 458.00 | 192.54 ± 38.75 | 120.23, 246.98 | ||
| DAY+ | 9.12 | 326.67 ± 117.10 | 202.43, 597.04 | 213.84 ± 58.18 | 149.72, 333.82 | ||
We benchmark our CI-MPC algorithm C3+ against its predecessor, C3, to highlight its substantial speedup. Solve times are reported for 1-, 2-, 3-, and 4-object scenarios, totaling 103,959, 42,306, 78,129, and 40,161 solves, respectively. As shown in the table below, C3+ achieves faster overall performance: while the quadratic step is slightly slower, the projection step is four to five orders of magnitude faster.
| # Objs | # Solves | Step | C3 | C3+ (ours) | ||
|---|---|---|---|---|---|---|
| Mean ± σ | Max | Mean ± σ | Max | |||
| 1 | 103,959 | Quadratic | 1.67 ± 0.39 | 5.45 | 3.09 ± 0.12 | 5.67 |
| Projection | 10.38 ± 3.84 | 41.27 | 0.007 ± 0.001 | 0.085 | ||
| 2 | 42,306 | Quadratic | 3.87 ± 0.94 | 7.69 | 9.13 ± 0.44 | 13.50 |
| Projection | 37.2 ± 9.12 | 131.98 | 0.011 ± 0.003 | 0.043 | ||
| 3 | 78,129 | Quadratic | 2.74 ± 0.47 | 5.82 | 7.97 ± 0.02 | 13.36 |
| Projection | 40.39 ± 11.17 | 1241.85 | 0.006 ± 0.001 | 0.038 | ||
| 4 | 40,161 | Quadratic | 4.59 ± 0.67 | 8.56 | 10.10 ± 0.69 | 16.02 |
| Projection | 44.07 ± 11.92 | 704.23 | 0.007 ± 0.002 | 0.041 | ||
This work was supported by an NSF CAREER Award under Grant No. FRR-2238480 and the RAI Institute.
If you find this work useful, please consider citing: (bibtex)
@article{bui2025push,
title={Push Anything: Single- and Multi-Object Pushing From First Sight with Contact-Implicit MPC},
author={Hien Bui* and Yufeiyang Gao* and Haoran Yang* and Eric Cui and Siddhant Mody and Brian Acosta and Thomas Stephen Felix and Bibit Bianchini and Michael Posa},
year={2025},
journal={arXiv preprint arXiv:2510.19974},
website={https://push-anything.github.io/}
}
Source code for this page was taken from Approximating Global CI-MPC's website, which is based on SceneComplete's website.