Gongwei Chen

I am currently an associate professor at School of Information Science and Technology, Harbin Institute of Technology (Shenzhen). Before that, I worked as a PostDoc in Prof. Liqiang Nie’s group. I received my PhD degree from Institute of Computing Technology, Chinese Academy of Sciences in 2023, supervised by Prof. Shuqiang Jiang, and my bachelor degree from University of Science and Technology Beijing in 2016. I also had close collaboration with Prof. Rui Shao, Prof. Miao Zhang, and Prof. Xinhang Song.

My research interests focus on the broad areas of multimodal learning, AI agent, efficient learning, and scene understanding. Recently, I focus on

Multimodal Large Language Models (MLLM)
MLLM-based Agent
MLLM-based Multimodal Retrieval
Dataset Distillation

news

Feb 21, 2026	Three paper about GUI Agent, Embodied Agent are accepted by CVPR 2026!
Oct 14, 2025	One paper about GUI Agent Evaluation is accepted by NeurIPS 2025!
Jul 06, 2025	One paper about MLLM-based Unified Multimodal Retrieval is accepted by ACM MM 2025!
Jun 26, 2025	Two papers about MLLM, GUI Agent (Highlight) are accepted by ICCV 2025!
May 16, 2025	One paper about GUI Agent are accepted by ACL main 2025!
Feb 11, 2025	One paper about GUI Agent Benchmark is accepted by ICLR 2025 as Spotlight!
Feb 05, 2025	Two papers about AI Agent, Dataset Distillation are accepted by CVPR 2025!

selected publications

Enhancing GUI Agent with Uncertainty-Aware Self-Trained Evaluator

Gongwei Chen, Lirong Jie, Lexiao Zou, Weili Guan, Miao Zhang, and Liqiang Nie

In Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS), 2025

PDF Code
Less is More: Empowering GUI Agent with Context-Aware Simplification

Gongwei Chen, Xurui Zhou, Rui Shao, Yibo Lyu, Kaiwen Zhou, Shuai Wang, Wentao Li, Yinchuan Li, and 2 more authors

In International Conference on Computer Vision (ICCV), Highlight , 2025

PDF Code
MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models

Leyang Shen^*, Gongwei Chen^*, Rui Shao, Weili Guan, and Liqiang Nie

In Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS), 2024

PDF Code
LION: Empowering multimodal large language model with dual-level visual knowledge

Gongwei Chen, Leyang Shen, Rui Shao, Xiang Deng, and Liqiang Nie

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

PDF Code