JY Jin Yang / 杨进 Ph.D. Candidate · Robotics · Multimodal Learning

Biography

I am a Ph.D. student in the Institute of Artificial Intelligence and Robotics (IAIR) at Xi'an Jiaotong University, supervised by Prof. Ping Wei. I received my M.S. degree from Huazhong University of Science and Technology, supervised by Assoc. Prof. Gang Peng.

My research interests focus on robotics, embodied intelligence, multimodal learning, and video understanding. I aspire to develop AI technologies that genuinely benefit society.

Welcome to contact me for any discussion and cooperation.

RoboticsEmbodied IntelligenceMultimodal LearningVideo Understanding

News

Researches

I explore unmanned systems and embodied agents that perceive scenes, understand multimodal signals, and plan reliable actions under challenging real-world conditions.

Imaging

Reliable multimodal signal acquisition for RGB, event, depth, LiDAR, and robot state.

Perception

Discriminative representations for temporal grounding, manipulation, and scene understanding.

Understanding

Reasoning with multimodal large models, task-driven feedback, and embodied knowledge.

Planning

Robotic action planning, trajectory optimization, and real-world challenge systems.

Publications

Selected Publications. The list may not be up-to-date. Please find my latest publications on Google Scholar.

Differential Amplifier-Inspired AmpAttention for Multi-View Robotic Manipulation thumbnail

Differential Amplifier-Inspired AmpAttention for Multi-View Robotic Manipulation

Jin Yang, Ping Wei, Nanning Zheng

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2026)

A Mechanically Decoupled Six-Axis Spherical Rolling Robot For Stable Propeller-Driven Rolling thumbnail

A Mechanically Decoupled Six-Axis Spherical Rolling Robot For Stable Propeller-Driven Rolling

Xijian Deng, Leqi Ding, Jiayi Chen, Ping Wei, Jin Yang

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2026) Project leader

Lightweight Communication for Collaborative Perception via Wavelet Feature Distillation thumbnail

Lightweight Communication for Collaborative Perception via Wavelet Feature Distillation

Erdemt Bao, Jin Yang

IEEE International Conference on Robotics & Automation (ICRA 2026) Co-first author & Project leader

Hyperbolic Multiview Pretraining for Robotic Manipulation thumbnail

Hyperbolic Multiview Pretraining for Robotic Manipulation

Jin Yang, Ping Wei, Yixin Chen

2026

Learning Visual-Audio Dissonance for Moment Retrieval and Highlight Detection thumbnail

Learning Visual-Audio Dissonance for Moment Retrieval and Highlight Detection

Jin Yang, Ping Wei, Nanning Zheng

IEEE Transactions on Multimedia (TMM 2025)

Learning Unified Patterns of Multimodalities for Joint Moment Retrieval and Highlight Detection thumbnail

Learning Unified Patterns of Multimodalities for Joint Moment Retrieval and Highlight Detection

Jin Yang, Ping Wei

Pattern Recognition (PR 2025)

Task-Driven Exploration: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection thumbnail

Task-Driven Exploration: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection

Jin Yang, Ping Wei, Huan Li, Ziyang Ren

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024)

Cross Time-Frequency Transformer for Temporal Action Localization thumbnail

Cross Time-Frequency Transformer for Temporal Action Localization

Jin Yang, Ping Wei, Nanning Zheng

IEEE Transactions on Circuits and Systems for Video Technology (TCSVT 2024)

Perceptual Consistency-Driven Abstraction via Minimax Optimization for Joint Moment Retrieval and Highlight Detection thumbnail

Perceptual Consistency-Driven Abstraction via Minimax Optimization for Joint Moment Retrieval and Highlight Detection

Jin Yang, Ping Wei, Huan Li, Nanning Zheng

2023

Gated Multi-Scale Transformer for Temporal Action Localization thumbnail

Gated Multi-Scale Transformer for Temporal Action Localization

Jin Yang, Ping Wei, Ziyang Ren, Nanning Zheng

IEEE Transactions on Multimedia (TMM 2023)

Patents

CN Patent 魏平, 杨进. 基于多模态统一表征的视频语言时序定位方法及系统. No. 2025102048215

Method and system for video-language temporal grounding based on unified multimodal representations

CN Patent 彭刚, 杨进. 一种基于深度强化学习的机械臂运动规划方法和系统. No. 2022105019028

Robotic arm motion planning method and system based on deep reinforcement learning

CN Patent 彭刚, 杨进, 黎莉, 尹智. 一种智能清洗机器人路径规划方法及系统. No. 2021104000462

Path planning method and system for an intelligent cleaning robot

Awards

2025-06

First Prize, CVPR2025 Robotwin Dual-Arm Collaboration Challenge

Real-World Track

2025-05

Second Prize, CVPR2025 Robotwin Dual-Arm Collaboration Challenge

Simulation Round1

2025-06

Invited Speaker at ICCIR 2025

Oral presentation on video multimodal learning

Projects

HyperMVP preview
Robotics

HyperMVP

Hyperbolic Multiview Pretraining for Generalizable Robotic Manipulation

Multiview pretraining for robotic manipulation with geometry-aware representation learning.

RVAF preview
Robotics

RVAF

Multi-Task Robotic Manipulation

Multi-task robotic manipulation system for robust perception and action under embodied settings.

WebUI-MR&HD preview
Video Understanding

WebUI-MR&HD

Moment Retrieval and Highlight Detection System

A web interface for video-language temporal grounding, moment retrieval, and highlight detection demos.

Deep Reinforcement Learning for Robot Planning preview
Planning

Deep Reinforcement Learning for Robot Planning

Dense reward and stage incentive mechanisms

Reinforcement learning systems for robotic trajectory planning and intelligent cleaning robot path planning.