Push Anything

Single- and Multi-Object Pushing From First Sight with Contact-Implicit MPC




*The first three authors contributed equally to this work. University of Pennsylvania



TL;DR: Our enhanced contact-implicit MPC (CI-MPC) algorithm, C3+, enables contact-rich manipulation of single or multiple diverse objects modeled with unprecedented numbers of contact pairs.


Overview Video


Abstract

Non-prehensile manipulation of diverse objects remains a core challenge in robotics, driven by unknown physical properties and the complexity of contact-rich interactions. Recent advances in contact-implicit model predictive control (CI-MPC), with contact reasoning embedded within the trajectory optimization, have shown promise in tackling the task efficiently and robustly, yet demonstrations have been limited to narrowly-curated examples. In this work, we showcase the broader capabilities of CI-MPC through precise planar pushing tasks over a wide range of object geometries, including multi-object domains. These scenarios demand reasoning over numerous inter-object and object-environment contacts to strategically manipulate and de-clutter the environment, challenges that were intractable for prior CI-MPC methods. To achieve this, we introduce Consensus Complementarity Control Plus (C3+), an enhanced CI-MPC algorithm integrated into a complete pipeline spanning object scanning, mesh reconstruction, and hardware execution. Compared to its predecessor C3 , C3+ achieves substantially faster solve times, enabling real-time performance even in multi-object pushing tasks. On hardware, our system achieves overall 98% success rate across 33 objects, reaching pose goals within tight tolerances. The average time-to-goal is approximately 0.5, 1.6, 3.2, and 5.3 minutes for 1-, 2-, 3-, and 4-object tasks, respectively.


Uncut Single Object Manipulation Videos (1X speed)

Tight tolerances: 2cm, 0.1rad (5.7 degrees)


Push T - 28 consecutive SE(2) goals
Push T Xbox Controller Gallon Milk Baby Toy Book Egg Carton Chicken Broth Clamp Milk Bottle Eraser Tape Wood Block Lotion Expo Box Letter I Letter C Letter R Letter A Letter Y Letter G Letter B Letter H Letter 3 Letter E Letter S
Click on a thumbnail to play video

Uncut Two-Object Manipulation Videos (3X speed)

Tight tolerances: 2cm, 0.1rad (5.7 degrees)


Uncut Three-Object Manipulation Videos (10X speed)

Tight tolerances: 2cm, 0.1rad (5.7 degrees)


Uncut Four-Object Manipulation Videos (10X speed)

Tight tolerances: 2cm, 0.1rad (5.7 degrees)


System Diagram



We present the Push Anything framework (above), a pipeline integrating object perception with a novel controller. Our framework operates in two phases. In the offline phase, we build an object library by scanning objects to generate meshes (via BundleSDF) and URDFs, assuming same mass and inertia. In the online phase, our controller uses robot and object state estimates (the latter using FoundationPose) to compute end effector trajectories. Following the approach in Yang and Posa, these trajectories are tracked by a low-level operational space controller (OSC).

Our controller is implemented in C++ within the Drake systems framework. We use three computers in our experiments: (i) an Intel Core i9-13900KF (13th-gen, 32 threads) dedicated to the sampling-based controller, (ii) an Intel Core i7-9700K running the operational-space controller and robot drivers on a real-time kernel for Franka communication, and (iii) an Intel Core i9-14900K paired with an NVIDIA GeForce RTX 4090 for FoundationPose. All computers communicate via LCM.


Sampling-based MPC Controller using Enhanced Local CI-MPC

Example Image Our controller is built upon the original framework from Approximating Global CI-MPC. In this approach, candidate end effector positions are sampled and evaluated by solving the CI-MPC problem, augmented with a travel cost from the current location. The candidate with the lowest overall cost is selected, and if it differs from the current pose, the robot first moves along a collision-free path to that position before executing CI-MPC. As illustrated in the figure beside, by steering the system toward configurations that expand the reach of local MPC, this strategy helps the controller overcome local minima and achieve goals that require long-horizon reasoning. The success of the sampling-based CI-MPC framework hinges on two key components: a robust sampling strategy for selecting candidate end effector locations, and a fast, effective local CI-MPC. While this approach is effective, applying it to arbitrary objects, particularly in multi-object scenarios, requires significant adaptations.


Sampling Strategy: Applicable to Any Mesh

We pre-process object meshes by storing body-frame face locations, areas, and normal vectors. Given world-frame object pose estimates, we generate a candidate end effector location by first performing several random sampling steps in series: 1) select an object uniformly, 2) select a stored face of this object weighted by area, and 3) sample a point lying on this face. This surface point is then projected a fixed distance along the face's outward normal vector then projected to a fixed world height (see below). We reject samples which, even after projection away from one face, are too close to any of the objects. This can occur due to object non-convexity, the presence of multiple objects, or the selection of a face whose normal is too vertical. We repeat this process until the desired number of end effector candidates is obtained.



Above is a visualization of the sampling strategy for end effector locations. The gray plane indicates the ground, and the orange planes represent local tangent planes to the mesh surfaces. Blue arrows project surface samples outwards from the mesh along the face normals, then purple arrows project those to a fixed height in the world, generating candidate samples (green dots). Samples located too close to object surfaces (e.g. red dot) are discarded.


C3+ Algorithm: Fast, Effective Local CI-MPC

C3+ is an enhanced version of C3, which leverages the Alternating Direction Method of Multipliers (ADMM) to solve a Quadratic Program with Complementarity Constraints (QPCC), where the system dynamics are modeled as a Linear Complementarity System (LCS). C3 decouples the contact scheduling problem, which involves solving the complementarity constraints, across planning steps, enabling parallelization where each subproblem solves a small-scale Mixed-Integer Quadratic Program (MIQP). Building on this idea, our proposed C3+ algorithm introduces a new slack variable that further decouples the contact problem across individual contacts. This reformulation makes it equivalent to solving multiple independent 1D MIQPs, each has a simple closed-form solution, thereby transforming the costly, coupled exponential-time MIQP into a constant-time analytical computation and achieving a substantial overall speedup. More detailed derivations can be found in our manuscript.


Experimental Results

Single-Object Pushing

We evaluated our method in 701 hardware trials, testing 25 objects, with each object run until 28 successful trials were obtained. The system achieved a 99.9% success rate (700/701), with the only failure occurring when the large egg carton was pushed out of the robot’s reach. More details can be found in the plot below.

Multi-Object Pushing

We conducted a total of 227 trials, comprising 10 experiments for the 2-object case, 6 for the 3-object, and 5 for the 4-object case. Each experiment was run until 10 successful trials were achieved. Our method achieved a 92.5% success rate (210/227), with detailed results under both tight and loose tolerances presented in the table below. All failures occurred when an object moved beyond the robot’s reach. The mean time-to-goal for these tasks was approximately 96.4s, 191.1s, and 315.7s for the 2-, 3-, and 4-object settings, respectively. These numbers do not scale linearly with the number of objects because goal assignments are permuted across objects in each trial, requiring object rearrangements that introduce additional time.

# Objs Object Names Success Rate Control Rate (Hz) Time to Goal (s) within Pose Tolerances
Tight (2cm, 0.1rad) Loose (5cm, 0.4rad)
Mean ± σ Min, Max Mean ± σ Min, Max
2 Lotion & Letter R 100/102 (98.0%) 14.06 97.01 ± 45.92 47.49, 204.35 74.26 ± 39.33 17.91, 171.49
Baby toy & Letter E 14.34 106.91 ± 47.62 60.98, 232.75 54.13 ± 14.28 20.95, 71.86
Letter B & Letter 3 14.31 85.31 ± 28.30 41.97, 129.53 53.64 ± 24.39 17.98, 114.74
Chicken Broth & Expo Box 13.84 99.35 ± 38.26 63.78, 180.85 75.94 ± 28.79 53.77, 158.16
Chicken Broth & Wood Block 14.12 74.43 ± 20.54 41.69, 107.64 54.62 ± 14.29 32.24, 76.12
Clamp & Letter I 14.14 87.19 ± 26.44 42.45, 132.87 44.15 ± 15.77 28.52, 80.74
Book & Letter S 14.43 92.88 ± 27.27 67.60, 168.00 81.51 ± 27.38 61.09, 154.14
Tape & Letter A 14.36 119.07 ± 44.10 69.88, 231.70 73.62 ± 24.24 50.63, 138.86
Letter T & Letter H 11.76 79.63 ± 21.47 44.14, 119.52 47.17 ± 11.19 34.06, 69.34
Letter G & Xbox 13.93 104.37 ± 23.78 75.97, 149.74 64.50 ± 14.89 45.15, 89.71
3 Letter R & Letter A & Letter S 60/62 (96.8%) 14.70 185.94 ± 63.01 113.44, 299.09 141.28 ± 48.89 94.30, 227.94
Letter C & Letter 3 & Letter + 14.85 173.43 ± 29.56 135.00, 219.33 116.54 ± 21.72 80.66, 148.72
Letter A & Letter N & Letter Y 14.72 159.63 ± 37.27 111.57, 237.65 121.65 ± 26.65 92.85, 175.08
Letter I & Letter N & Letter G 14.84 173.98 ± 46.20 115.48, 275.02 120.18 ± 40.28 68.20, 189.52
Letter D & Letter I & Letter Y 14.72 188.96 ± 38.93 139.52, 247.20 137.41 ± 19.84 106.46, 173.66
Clamp & Lotion & Book 15.19 224.03 ± 53.01 148.22, 307.58 157.98 ± 41.55 97.30, 228.89
4 PUSH 50/63 (79.3%) 9.32 312.34 ± 60.07 160.34, 394.28 248.64 ± 60.06 159.82, 366.18
ICRA 8.63 267.85 ± 59.80 176.62, 396.33 208.86 ± 44.50 126.65, 278.78
URDF 9.10 269.01 ± 93.90 149.50, 465.47 204.92 ± 84.96 121.98, 401.88
C3PO 9.31 281.66 ± 85.97 120.23, 458.00 192.54 ± 38.75 120.23, 246.98
DAY+ 9.12 326.67 ± 117.10 202.43, 597.04 213.84 ± 58.18 149.72, 333.82

C3+ vs. C3 Solve Time Comparison

We benchmark our CI-MPC algorithm C3+ against its predecessor, C3, to highlight its substantial speedup. Solve times are reported for 1-, 2-, 3-, and 4-object scenarios, totaling 103,959, 42,306, 78,129, and 40,161 solves, respectively. As shown in the table below, C3+ achieves faster overall performance: while the quadratic step is slightly slower, the projection step is four to five orders of magnitude faster.

# Objs # Solves Step C3 C3+ (ours)
Mean ± σ Max Mean ± σ Max
1 103,959 Quadratic 1.67 ± 0.39 5.45 3.09 ± 0.12 5.67
Projection 10.38 ± 3.84 41.27 0.007 ± 0.001 0.085
2 42,306 Quadratic 3.87 ± 0.94 7.69 9.13 ± 0.44 13.50
Projection 37.2 ± 9.12 131.98 0.011 ± 0.003 0.043
3 78,129 Quadratic 2.74 ± 0.47 5.82 7.97 ± 0.02 13.36
Projection 40.39 ± 11.17 1241.85 0.006 ± 0.001 0.038
4 40,161 Quadratic 4.59 ± 0.67 8.56 10.10 ± 0.69 16.02
Projection 44.07 ± 11.92 704.23 0.007 ± 0.002 0.041

Acknowledgments

This work was supported by an NSF CAREER Award under Grant No. FRR-2238480 and the RAI Institute.


Citation

If you find this work useful, please consider citing: (bibtex)

@article{bui2025push,
 title={Push Anything: Single- and Multi-Object Pushing From First Sight with Contact-Implicit MPC},
 author={Hien Bui* and Yufeiyang Gao* and Haoran Yang* and Eric Cui and Siddhant Mody and Brian Acosta and Thomas Stephen Felix and Bibit Bianchini and Michael Posa},
 year={2025},
 journal={arXiv preprint arXiv:2510.19974},
 website={https://push-anything.github.io/}
}

Credits

Source code for this page was taken from Approximating Global CI-MPC's website, which is based on SceneComplete's website.