I am a Ph.D. Student in the Institute of Artificial Intelligence and Robotics (IAIR) at Xi'an Jiaotong University, supervised by Prof. Ping Wei. I received my M.S. degree from Huazhong University of Science and Technology, supervised by Assoc. Prof. Gang Peng.
I focus on multimodal learning and video understanding, which utilizes various modalities (e.g., image, audio, text, etc) to understand long-term untrimmed videos, supporting downstream tasks like video moment retrieval, highlight detection, robot perception, etc. I aim to develop multimodal learning from the perspective of human brain intelligence/cognitive science, with the goal of emulating the brain's cognitive reasoning process. Since the human brain is the most marvelous organ in the universe. Currently, I am investigating the zero-shot potential of multimodal large language models(MLLMs). I aspire to develop AI technologies that genuinely benefit society.
My research interests include: (1) Multimodal Learning (2) Video Understanding (3) Multimodal Large Language Models (4) Robot (5) Deep Reinforcement Learning