Learning from direct human demonstrations
Teaching robots to grasp delicate objects by learning from human hand demonstrations with tactile feedback.
Hand2Rob teaches a Franka robot to grasp delicate objects by watching human hand demonstrations captured with a stereo camera pair. MediaPipe and CoTracker extract hand and object keypoints that get triangulated into 3D trajectories the robot can follow directly. The catch is that spatial imitation alone isn’t enough for fragile things, so I integrated a ResKin tactile sensor into custom gripper fingertips to give the robot a sense of force.
This project builds on Point Policy and Feel The Force by Siddhant Haldar and Lerrel Pinto. Big thanks to them for the original work. I adapted and extended both systems with my own changes to get them running on the Franka Panda, including modifications to the trajectory execution, force control integration, data collection, and the custom gripper hardware.
Without force feedback
With ResKin force control
Without force feedback, the robot has no way to modulate its grip strength, it simply closes the gripper until the binary close command is fully executed. For fragile objects like eggs, this can lead to it breaking. Here the robot successfully reaches and grasps the egg using the learned trajectory, but the uncontrolled gripper force crushes it. This failure motivates the need for closed-loop force control during the grasp phase.
With the ResKin sensor in the loop, the robot can feel how much force it is applying and stop closing once a target threshold is reached. The force controller reads the live tactile signal and adjusts the gripper incrementally, holding the egg securely without exceeding the pressure that would crack it. This demonstrates that combining learned spatial policies with real-time tactile feedback enables safe manipulation of objects that would otherwise be damaged.
Above are two evaluation runs showing the full pipeline end-to-end, the robot approaches the egg, descends to the grasp position, and closes with force-controlled grip. Each camera view is shown separately, with the live force reading overlaid in blue. The consistency across runs demonstrates that the learned policy generalizes reliably from the human demonstrations.
Figure 1. Overview of the Hand2Rob pipeline. Stereo camera footage and MediaPipe hand tracking are processed through CoTracker and stereo triangulation to build the dataset. A training policy learns both the end-effector trajectory and force grasping, which are deployed on the Franka robot with ResKin tactile feedback for manipulating fragile objects.
Gripper without sensor
Gripper with ResKin tactile sensor
Special thanks to Miguel Pegues for helping me design custom gripper fingertips to house the ResKin magnetometer-based tactile sensor. The left model shows the standard Franka gripper fingers, while the right model integrates a recessed pocket that secures the ResKin sensing pad flush against the contact surface.