Push Anything

Single- and Multi-Object Pushing From First Sight with Contact-Implicit MPC

Hien Bui^*, Yufeiyang Gao^*, Haoran Yang^*,
Eric Cui, Siddhant Mody, Brian Acosta, Thomas Stephen Felix, Bibit Bianchini, Michael Posa

^*The first three authors contributed equally to this work. University of Pennsylvania

Paper

Code (coming soon)

Video (coming soon)

TL;DR: Our enhanced contact-implicit MPC (CI-MPC) algorithm, C3+, enables contact-rich manipulation of single or multiple diverse objects modeled with unprecedented numbers of contact pairs.

Overview Video

Abstract

Non-prehensile manipulation of diverse objects remains a core challenge in robotics, driven by unknown physical properties and the complexity of contact-rich interactions. Recent advances in contact-implicit model predictive control (CI-MPC), with contact reasoning embedded within the trajectory optimization, have shown promise in tackling the task efficiently and robustly, yet demonstrations have been limited to narrowly-curated examples. In this work, we showcase the broader capabilities of CI-MPC through precise planar pushing tasks over a wide range of object geometries, including multi-object domains. These scenarios demand reasoning over numerous inter-object and object-environment contacts to strategically manipulate and de-clutter the environment, challenges that were intractable for prior CI-MPC methods. To achieve this, we introduce Consensus Complementarity Control Plus (C3+), an enhanced CI-MPC algorithm integrated into a complete pipeline spanning object scanning, mesh reconstruction, and hardware execution. Compared to its predecessor C3 , C3+ achieves substantially faster solve times, enabling real-time performance even in multi-object pushing tasks. On hardware, our system achieves overall 98% success rate across 33 objects, reaching pose goals within tight tolerances. The average time-to-goal is approximately 0.5, 1.6, 3.2, and 5.3 minutes for 1-, 2-, 3-, and 4-object tasks, respectively.

Uncut Single Object Manipulation Videos (1X speed)

Tight tolerances: 2cm, 0.1rad (5.7 degrees)

Push T - 28 consecutive SE(2) goals

Click on a thumbnail to play video

Uncut Two-Object Manipulation Videos (3X speed)

Tight tolerances: 2cm, 0.1rad (5.7 degrees)

Letter B and Letter 3 - 10 consecutive SE(2) goals

Baby Toy and Letter E - 10 consecutive SE(2) goals

Book and Letter S - 10 consecutive SE(2) goals

Chicken Broth and Expo Box - 10 consecutive SE(2) goals

Chicken Broth and Wood Block - 10 consecutive SE(2) goals

Clamp and Letter I - 10 consecutive SE(2) goals

Letter G and Xbox Controller - 10 consecutive SE(2) goals

Lotion and Letter R - 10 consecutive SE(2) goals

Letter T and Letter H - 10 consecutive SE(2) goals

Tape and Letter A - 10 consecutive SE(2) goals

Uncut Three-Object Manipulation Videos (10X speed)

Tight tolerances: 2cm, 0.1rad (5.7 degrees)

Letter A, Letter N, and Letter Y - 10 consecutive SE(2) goals

Letter C, Letter 3, and Letter + - 10 consecutive SE(2) goals

Clamp, Lotion, and Book - 4 consecutive SE(2) goals

Letter D, Letter I, and Letter Y - 10 consecutive SE(2) goals

Letter I, Letter N, and Letter G - 10 consecutive SE(2) goals

Letter R, Letter A, and Letter S - 10 consecutive SE(2) goals

Uncut Four-Object Manipulation Videos (10X speed)

Tight tolerances: 2cm, 0.1rad (5.7 degrees)

C3PO - 2 consecutive SE(2) goals

C3PO - 4 consecutive SE(2) goals

C3PO - 3 consecutive SE(2) goals

DAY+ - 3 consecutive SE(2) goals

DAY+ - 2 consecutive SE(2) goals

DAY+ - 1 consecutive SE(2) goal

ICRA - 5 consecutive SE(2) goals

ICRA - 4 consecutive SE(2) goals

ICRA - 1 consecutive SE(2) goal

PUSH - 3 consecutive SE(2) goals

PUSH - 4 consecutive SE(2) goals

URDF - 2 consecutive SE(2) goals

URDF - 5 consecutive SE(2) goals

URDF - 3 consecutive SE(2) goals

System Diagram

We present the Push Anything framework (above), a pipeline integrating object perception with a novel controller. Our framework operates in two phases. In the offline phase, we build an object library by scanning objects to generate meshes (via BundleSDF) and URDFs, assuming same mass and inertia. In the online phase, our controller uses robot and object state estimates (the latter using FoundationPose) to compute end effector trajectories. Following the approach in Yang and Posa, these trajectories are tracked by a low-level operational space controller (OSC).

Our controller is implemented in C++ within the Drake systems framework. We use three computers in our experiments: (i) an Intel Core i9-13900KF (13th-gen, 32 threads) dedicated to the sampling-based controller, (ii) an Intel Core i7-9700K running the operational-space controller and robot drivers on a real-time kernel for Franka communication, and (iii) an Intel Core i9-14900K paired with an NVIDIA GeForce RTX 4090 for FoundationPose. All computers communicate via LCM.

Sampling-based MPC Controller using Enhanced Local CI-MPC

Example Image Our controller is built upon the original framework from Approximating Global CI-MPC. In this approach, candidate end effector positions are sampled and evaluated by solving the CI-MPC problem, augmented with a travel cost from the current location. The candidate with the lowest overall cost is selected, and if it differs from the current pose, the robot first moves along a collision-free path to that position before executing CI-MPC. As illustrated in the figure beside, by steering the system toward configurations that expand the reach of local MPC, this strategy helps the controller overcome local minima and achieve goals that require long-horizon reasoning. The success of the sampling-based CI-MPC framework hinges on two key components: a robust sampling strategy for selecting candidate end effector locations, and a fast, effective local CI-MPC. While this approach is effective, applying it to arbitrary objects, particularly in multi-object scenarios, requires significant adaptations.

Sampling Strategy: Applicable to Any Mesh

We pre-process object meshes by storing body-frame face locations, areas, and normal vectors. Given world-frame object pose estimates, we generate a candidate end effector location by first performing several random sampling steps in series: 1) select an object uniformly, 2) select a stored face of this object weighted by area, and 3) sample a point lying on this face. This surface point is then projected a fixed distance along the face's outward normal vector then projected to a fixed world height (see below). We reject samples which, even after projection away from one face, are too close to any of the objects. This can occur due to object non-convexity, the presence of multiple objects, or the selection of a face whose normal is too vertical. We repeat this process until the desired number of end effector candidates is obtained.

Above is a visualization of the sampling strategy for end effector locations. The gray plane indicates the ground, and the orange planes represent local tangent planes to the mesh surfaces. Blue arrows project surface samples outwards from the mesh along the face normals, then purple arrows project those to a fixed height in the world, generating candidate samples (green dots). Samples located too close to object surfaces (e.g. red dot) are discarded.

C3+ Algorithm: Fast, Effective Local CI-MPC

C3+ is an enhanced version of C3, which leverages the Alternating Direction Method of Multipliers (ADMM) to solve a Quadratic Program with Complementarity Constraints (QPCC), where the system dynamics are modeled as a Linear Complementarity System (LCS). C3 decouples the contact scheduling problem, which involves solving the complementarity constraints, across planning steps, enabling parallelization where each subproblem solves a small-scale Mixed-Integer Quadratic Program (MIQP). Building on this idea, our proposed C3+ algorithm introduces a new slack variable that further decouples the contact problem across individual contacts. This reformulation makes it equivalent to solving multiple independent 1D MIQPs, each has a simple closed-form solution, thereby transforming the costly, coupled exponential-time MIQP into a constant-time analytical computation and achieving a substantial overall speedup. More detailed derivations can be found in our manuscript.

Experimental Results

Single-Object Pushing

We evaluated our method in 701 hardware trials, testing 25 objects, with each object run until 28 successful trials were obtained. The system achieved a 99.9% success rate (700/701), with the only failure occurring when the large egg carton was pushed out of the robot’s reach. More details can be found in the plot below.

Multi-Object Pushing

We conducted a total of 227 trials, comprising 10 experiments for the 2-object case, 6 for the 3-object, and 5 for the 4-object case. Each experiment was run until 10 successful trials were achieved. Our method achieved a 92.5% success rate (210/227), with detailed results under both tight and loose tolerances presented in the table below. All failures occurred when an object moved beyond the robot’s reach. The mean time-to-goal for these tasks was approximately 96.4s, 191.1s, and 315.7s for the 2-, 3-, and 4-object settings, respectively. These numbers do not scale linearly with the number of objects because goal assignments are permuted across objects in each trial, requiring object rearrangements that introduce additional time.

# Objs	Object Names	Success Rate	Control Rate (Hz)	Time to Goal (s) within Pose Tolerances
				Tight (2cm, 0.1rad)		Loose (5cm, 0.4rad)
				Mean ± σ	Min, Max	Mean ± σ	Min, Max
2	Lotion & Letter R	100/102 (98.0%)	14.06	97.01 ± 45.92	47.49, 204.35	74.26 ± 39.33	17.91, 171.49
	Baby toy & Letter E		14.34	106.91 ± 47.62	60.98, 232.75	54.13 ± 14.28	20.95, 71.86
	Letter B & Letter 3		14.31	85.31 ± 28.30	41.97, 129.53	53.64 ± 24.39	17.98, 114.74
	Chicken Broth & Expo Box		13.84	99.35 ± 38.26	63.78, 180.85	75.94 ± 28.79	53.77, 158.16
	Chicken Broth & Wood Block		14.12	74.43 ± 20.54	41.69, 107.64	54.62 ± 14.29	32.24, 76.12
	Clamp & Letter I		14.14	87.19 ± 26.44	42.45, 132.87	44.15 ± 15.77	28.52, 80.74
	Book & Letter S		14.43	92.88 ± 27.27	67.60, 168.00	81.51 ± 27.38	61.09, 154.14
	Tape & Letter A		14.36	119.07 ± 44.10	69.88, 231.70	73.62 ± 24.24	50.63, 138.86
	Letter T & Letter H		11.76	79.63 ± 21.47	44.14, 119.52	47.17 ± 11.19	34.06, 69.34
	Letter G & Xbox		13.93	104.37 ± 23.78	75.97, 149.74	64.50 ± 14.89	45.15, 89.71
3	Letter R & Letter A & Letter S	60/62 (96.8%)	14.70	185.94 ± 63.01	113.44, 299.09	141.28 ± 48.89	94.30, 227.94
	Letter C & Letter 3 & Letter +		14.85	173.43 ± 29.56	135.00, 219.33	116.54 ± 21.72	80.66, 148.72
	Letter A & Letter N & Letter Y		14.72	159.63 ± 37.27	111.57, 237.65	121.65 ± 26.65	92.85, 175.08
	Letter I & Letter N & Letter G		14.84	173.98 ± 46.20	115.48, 275.02	120.18 ± 40.28	68.20, 189.52
	Letter D & Letter I & Letter Y		14.72	188.96 ± 38.93	139.52, 247.20	137.41 ± 19.84	106.46, 173.66
	Clamp & Lotion & Book		15.19	224.03 ± 53.01	148.22, 307.58	157.98 ± 41.55	97.30, 228.89
4	PUSH	50/63 (79.3%)	9.32	312.34 ± 60.07	160.34, 394.28	248.64 ± 60.06	159.82, 366.18
	ICRA		8.63	267.85 ± 59.80	176.62, 396.33	208.86 ± 44.50	126.65, 278.78
	URDF		9.10	269.01 ± 93.90	149.50, 465.47	204.92 ± 84.96	121.98, 401.88
	C3PO		9.31	281.66 ± 85.97	120.23, 458.00	192.54 ± 38.75	120.23, 246.98
	DAY+		9.12	326.67 ± 117.10	202.43, 597.04	213.84 ± 58.18	149.72, 333.82

C3+ vs. C3 Solve Time Comparison

We benchmark our CI-MPC algorithm C3+ against its predecessor, C3, to highlight its substantial speedup. Solve times are reported for 1-, 2-, 3-, and 4-object scenarios, totaling 103,959, 42,306, 78,129, and 40,161 solves, respectively. As shown in the table below, C3+ achieves faster overall performance: while the quadratic step is slightly slower, the projection step is four to five orders of magnitude faster.

# Objs	# Solves	Step	C3		C3+ (ours)
# Objs	# Solves	Step	Mean ± σ	Max	Mean ± σ	Max
1	103,959	Quadratic	1.67 ± 0.39	5.45	3.09 ± 0.12	5.67
1	103,959	Projection	10.38 ± 3.84	41.27	0.007 ± 0.001	0.085
2	42,306	Quadratic	3.87 ± 0.94	7.69	9.13 ± 0.44	13.50
2	42,306	Projection	37.2 ± 9.12	131.98	0.011 ± 0.003	0.043
3	78,129	Quadratic	2.74 ± 0.47	5.82	7.97 ± 0.02	13.36
3	78,129	Projection	40.39 ± 11.17	1241.85	0.006 ± 0.001	0.038
4	40,161	Quadratic	4.59 ± 0.67	8.56	10.10 ± 0.69	16.02
4	40,161	Projection	44.07 ± 11.92	704.23	0.007 ± 0.002	0.041

Acknowledgments

This work was supported by an NSF CAREER Award under Grant No. FRR-2238480 and the RAI Institute.

Citation

If you find this work useful, please consider citing: (bibtex)

@article{bui2025push,
 title={Push Anything: Single- and Multi-Object Pushing From First Sight with Contact-Implicit MPC},
 author={Hien Bui* and Yufeiyang Gao* and Haoran Yang* and Eric Cui and Siddhant Mody and Brian Acosta and Thomas Stephen Felix and Bibit Bianchini and Michael Posa},
 year={2025},
 journal={arXiv preprint arXiv:2510.19974},
 website={https://push-anything.github.io/}
}

Push Anything

Single- and Multi-Object Pushing From First Sight with Contact-Implicit MPC

TL;DR: Our enhanced contact-implicit MPC (CI-MPC) algorithm, C3+, enables contact-rich manipulation of single or multiple diverse objects modeled with unprecedented numbers of contact pairs.

Overview Video

Abstract

Uncut Single Object Manipulation Videos (1X speed)

Tight tolerances: 2cm, 0.1rad (5.7 degrees)

Uncut Two-Object Manipulation Videos (3X speed)

Tight tolerances: 2cm, 0.1rad (5.7 degrees)

Uncut Three-Object Manipulation Videos (10X speed)

Tight tolerances: 2cm, 0.1rad (5.7 degrees)

Uncut Four-Object Manipulation Videos (10X speed)

Tight tolerances: 2cm, 0.1rad (5.7 degrees)

System Diagram

Sampling-based MPC Controller using Enhanced Local CI-MPC

Sampling Strategy: Applicable to Any Mesh

C3+ Algorithm: Fast, Effective Local CI-MPC

Experimental Results

Single-Object Pushing

Multi-Object Pushing

C3+ vs. C3 Solve Time Comparison

Acknowledgments

Citation

Credits