FingerEye: Continuous and Unified Vision-Tactile Sensing for Dexterous Manipulation
Zhixuan Xu, Yichen Li, Xuanye Wu, Tianyu Qiu, Lin Shao
TLDR
FingerEye is a novel sensor providing continuous vision-tactile feedback for dexterous manipulation, enabling seamless pre-contact to post-contact perception.
Key contributions
- Introduces FingerEye, a compact sensor for continuous vision-tactile feedback throughout manipulation.
- Combines binocular RGB cameras for pre-contact vision with compliant ring deformation for tactile sensing.
- Develops a vision-tactile imitation learning policy using multiple FingerEye sensors and a digital twin.
- Achieves robust dexterous manipulation across diverse objects by fusing real and simulated data.
Why it matters
Existing tactile sensors often lack pre-contact feedback, hindering precise manipulation. FingerEye solves this by offering continuous vision-tactile data, allowing robots to adapt actions from pre-contact to post-contact. This significantly enhances dexterous manipulation, making robots more versatile and robust across various tasks.
Original Abstract
Dexterous robotic manipulation requires comprehensive perception across all phases of interaction: pre-contact, contact initiation, and post-contact. Such continuous feedback allows a robot to adapt its actions throughout interaction. However, many existing tactile sensors, such as GelSight and its variants, only provide feedback after contact is established, limiting a robot's ability to precisely initiate contact. We introduce FingerEye, a compact and cost-effective sensor that provides continuous vision-tactile feedback throughout the interaction process. FingerEye integrates binocular RGB cameras to provide close-range visual perception with implicit stereo depth. Upon contact, external forces and torques deform a compliant ring structure; these deformations are captured via marker-based pose estimation and serve as a proxy for contact wrench sensing. This design enables a perception stream that smoothly transitions from pre-contact visual cues to post-contact tactile feedback. Building on this sensing capability, we develop a vision-tactile imitation learning policy that fuses signals from multiple FingerEye sensors to learn dexterous manipulation behaviors from limited real-world data. We further develop a digital twin of our sensor and robot platform to improve policy generalization. By combining real demonstrations with visually augmented simulated observations for representation learning, the learned policies become more robust to object appearance variations. Together, these design aspects enable dexterous manipulation across diverse object properties and interaction regimes, including coin standing, chip picking, letter retrieving, and syringe manipulation. The hardware design, code, appendix, and videos are available on our project website: https://nus-lins-lab.github.io/FingerEyeWeb/
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.