Hand2Rob

Learning from direct human demonstrations

Teaching robots to grasp delicate objects by learning from human hand demonstrations with tactile feedback.

Python Imitation Learning Tactile Sensing Computer Vision PyTorch LibFranka

Overview

Hand2Rob teaches a Franka robot to grasp delicate objects by watching human hand demonstrations captured with a stereo camera pair. MediaPipe and CoTracker extract hand and object keypoints that get triangulated into 3D trajectories the robot can follow directly. The catch is that spatial imitation alone isn’t enough for fragile things, so I integrated a ResKin tactile sensor into custom gripper fingertips to give the robot a sense of force.

This project builds on Point Policy and Feel The Force by Siddhant Haldar and Lerrel Pinto. Big thanks to them for the original work. I adapted and extended both systems with my own changes to get them running on the Franka Panda, including modifications to the trajectory execution, force control integration, data collection, and the custom gripper hardware.

Data Collection and Annotation

I collect demonstrations by having a human perform the task in front of two calibrated cameras. MediaPipe tracks the hand in real time, extracting semantic keypoints on the fingers and palm that serve as the basis for the robot's trajectory. I also annotate object keypoints using CoTracker, giving the model a consistent spatial representation of both the hand and the target object across frames. A ResKin tactile sensor mounted on my thumb records contact forces during grasping trials, providing the force labels used for training.

Translating Points to Franka End Points

The core challenge is mapping a human hand demonstration onto a robot gripper that has far fewer degrees of freedom. The tracked 2D keypoints are triangulated across both camera views to recover 3D positions, and a transformation aligns the human hand's keypoint cloud to the robot's end-effector frame. This produces a full 6-DoF pose trajectory and a gripper open/close signal derived from the distance between the thumb and index finger. The result is a set of robot-executable actions that reproduce the intent of the original human demonstration.

Robot Execution

Without force feedback

With ResKin force control

Without force feedback, the robot has no way to modulate its grip strength, it simply closes the gripper until the binary close command is fully executed. For fragile objects like eggs, this can lead to it breaking. Here the robot successfully reaches and grasps the egg using the learned trajectory, but the uncontrolled gripper force crushes it. This failure motivates the need for closed-loop force control during the grasp phase.

With the ResKin sensor in the loop, the robot can feel how much force it is applying and stop closing once a target threshold is reached. The force controller reads the live tactile signal and adjusts the gripper incrementally, holding the egg securely without exceeding the pressure that would crack it. This demonstrates that combining learned spatial policies with real-time tactile feedback enables safe manipulation of objects that would otherwise be damaged.

Above are two evaluation runs showing the full pipeline end-to-end, the robot approaches the egg, descends to the grasp position, and closes with force-controlled grip. Each camera view is shown separately, with the live force reading overlaid in blue. The consistency across runs demonstrates that the learned policy generalizes reliably from the human demonstrations.

System Architecture

Figure 1. Overview of the Hand2Rob pipeline. Stereo camera footage and MediaPipe hand tracking are processed through CoTracker and stereo triangulation to build the dataset. A training policy learns both the end-effector trajectory and force grasping, which are deployed on the Franka robot with ResKin tactile feedback for manipulating fragile objects.

CAD Models

Gripper without sensor

Gripper with ResKin tactile sensor

Special thanks to Miguel Pegues for helping me design custom gripper fingertips to house the ResKin magnetometer-based tactile sensor. The left model shows the standard Franka gripper fingers, while the right model integrates a recessed pocket that secures the ResKin sensing pad flush against the contact surface.

View Code on GitHub

Back to Portfolio