Yicheng Ma

Yicheng Ma

Master's student in Machine Learning at Nanyang Technological University.
B.Eng. in Optoelectronic Information Engineering, Zhejiang University.

My research focuses on robot learning for manipulation, with a particular interest in data-efficient robot learning methods. Previously, I conducted research at the Grasp Lab at Zhejiang University, where I worked on robotic perception and grasping. I am currently exploring PhD opportunities in robot learning and manipulation and welcome inquiries via email.

Research Interests

Robot Learning Robot Manipulation Data-efficient Robot Learning Methods

Publications

(* equal contribution, † corresponding author)

FD-VLA
FD-VLA: Force-Distilled Vision-Language-Action Model for Contact-Rich Manipulation
Ruiteng Zhao, Wenshuo Wang, Yicheng Ma, Xiaocong Li, Francis E.H. Tay, Marcelo H. Ang Jr. and Haiyue Zhu†
International Conference on Robotics and Automation (ICRA) Accepted
Abstract
Force sensing is a crucial modality for Vision-Language-Action (VLA) frameworks, as it enables fine-grained perception and dexterous manipulation in contact-rich tasks. We present Force-Distilled VLA (FD-VLA), a novel framework that integrates force awareness into contact-rich manipulation without relying on physical force sensors. The core of our approach is a Force Distillation Module (FDM), which distills force by mapping a learnable query token, conditioned on visual observations and robot states, into a predicted force token aligned with the latent representation of actual force signals. During inference, this distilled force token is injected into the pretrained VLM, enabling force-aware reasoning while preserving the integrity of its vision-language semantics. This design provides two key benefits: first, it allows practical deployment across a wide range of robots that lack expensive or fragile force-torque sensors, thereby reducing hardware cost and complexity; second, the FDM introduces an additional force-vision-state fusion prior to the VLM, which improves cross-modal alignment and enhances perception-action robustness in contact-rich scenarios. Surprisingly, our physical experiments show that the distilled force token outperforms direct sensor force measurements as well as other baselines, which highlights the effectiveness of this force-distilled VLA approach.
Hybrid Gripper
Construction of Bin-picking System for Logistic Application: A Hybrid Robotic Gripper and Vision-based Grasp Planning
Zhian Su, Yicheng Ma, Haotian Guo, and Huixu Dong†
IEEE Robotics and Automation Letters (RA-L) Accepted
Abstract
An autonomous bin-picking system for grasping various cluttered packages can significantly benefit logistics by reducing manual labor and streamlining processing. We propose a bin-picking system that includes a novel multi-mode hybrid gripper combining suction and pinch, and a corresponding vision-based grasp planning strategy based on unseen object instance segmentation. The system was evaluated in simulation achieving a 71.4% success rate, compared to suction (53.9%) and Hand-E (39.3%). Real-world experiments further validated its practicality in logistics scenarios.
Gaussian Spotlight
Gaussian Spotlight: Enhancing Visuomotor Policy Learning via Latent Spatial Keypoint Embedding
Mohan Liu*, Yicheng Ma*, Chang Su, Zhiyuan Yang, Shijun Yan, Pey Yuen Tao, and Haiyue Zhu†
IEEE Robotics and Automation Letters (RA-L)
Abstract
Generative visuomotor policies rely heavily on the conditioning representation to guide the synthesis of accurate and stable control sequences. Yet, standard visual encoders produce high-dimensional embeddings that often lose fine-grained spatial information while retaining substantial redundancy. To address this bottleneck, we propose Gaussian Spotlight, designed to construct a latent spatial keypoint embedding that serves as a more precise and manipulation-aware conditioning signal. Gaussian Spotlight first generates a state-conditioned anisotropic Gaussian Attention Field that selectively amplifies spatial regions critical for interaction. It then transforms these enhanced regions into implicit, latent keypoint embeddings via an attention-guided skip-layer aggregation pathway. Extensive experiments across diverse real-world manipulation tasks and simulation benchmarks demonstrate that Gaussian Spotlight consistently enhances policy performance.
3D-LOT Policy
3D-LOT Policy: Latent Optimal Transport Flow Matching for One-Step Action Generation
Yicheng Ma*, Mohan Liu*, Chang Su, Ruiteng Zhao, Zhiping Lin, and Haiyue Zhu†
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Abstract
Real-time efficiency is critical for visuomotor policy learning, as any delay in action generation can accumulate over sequential control steps. In this work, we introduce 3D-LOT Policy, a latent prototype-guided optimal transport flow-matching framework for effective single-step action generation. Our approach encodes 3D observations into a compact latent space that preserves task-relevant spatial information and induces prototype structures to serve as anchors for policy learning. Our experiments demonstrate that 3D-LOT achieves lower latency while maintaining or even surpassing baseline performance, offering a practical solution for fast and robust visuomotor policy learning.

Research Experience

A*STAR SIMTech ARM, Singapore — Research Intern Sep 2024 – Dec 2025
Grasp Lab, Zhejiang University — Graduation Project & Thesis Sep 2023 – Jun 2024