Publications

\* denotes equal contributions, † denotes corresponding authors.

2025

  1. Enhancing GUI Agent with Uncertainty-Aware Self-Trained Evaluator
    Gongwei Chen, Lirong Jie, Lexiao Zou, Weili Guan, Miao Zhang, and Liqiang Nie
    In Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS), 2025
  2. PUMA: Layer-Pruned Language Model for Efficient Unified Multimodal Retrieval with Modality-Adaptive Learning
    Yibo Lyu, Rui Shao, Gongwei Chen, Yijie Zhu, Weili Guan, and Liqiang Nie
    In ACM International Conference on Multimedia (ACM MM), 2025
  3. Less is More: Empowering GUI Agent with Context-Aware Simplification
    Gongwei Chen, Xurui Zhou, Rui Shao, Yibo Lyu, Kaiwen Zhou, Shuai Wang, Wentao Li, Yinchuan Li, and 2 more authors
    In International Conference on Computer Vision (ICCV), Highlight , 2025
  4. FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers
    Renshan Zhang, Rui Shao, Gongwei Chen, Kaiwen Zhou, Weili Guan, and Liqiang Nie
    In International Conference on Computer Vision (ICCV), 2025
  5. GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI Agent
    Bin Xie, Rui Shao, Gongwei Chen, Kaiwen Zhou, Yinchuan Li, Jie Liu, Min Zhang, and Liqiang Nie
    Annual Meeting of the Association for Computational Linguistics (ACL), 2025
  6. Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy
    Zaijing Li, Yuquan Xie, Rui Shao, Gongwei Chen, Dongmei Jiang, and Liqiang Nie
    In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2025
  7. Curriculum Coarse-to-Fine Selection for High-IPC Dataset Distillation
    Yanda Chen*, Gongwei Chen*, Miao Zhang, Weili Guan, and Liqiang Nie
    In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2025
  8. D2AF: A Dual-Driven Annotation and Filtering Framework for Visual Grounding
    Yichi Zhang, Gongwei Chen, Jun Zhu, and Jia Wan
    arXiv preprint, 2025
  9. Mirage-1: Augmenting and Updating GUI Agent with Hierarchical Multimodal Skills
    Yuquan Xie, Zaijing Li, Rui Shao, Gongwei Chen, Kaiwen Zhou, Yinchuan Li, Dongmei Jiang, and Liqiang Nie
    arXiv preprint, 2025
  10. Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts
    Zaijing Li, Yuquan Xie, Rui Shao, Gongwei Chen, Weili Guan, Dongmei Jiang, and Liqiang Nie
    arXiv preprint, 2025
  11. Spa-bench: A comprehensive benchmark for smartphone agent evaluation
    Jingxuan Chen, Derek Yuen, Bin Xie, Yuhao Yang, Gongwei Chen, Zhihao Wu, Li Yixing, Xurui Zhou, and 3 more authors
    In The Thirteenth International Conference on Learning Representations (ICLR), Spotlight (5.1%) , 2025

2024

  1. Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RL
    Qi Lv, Xiang Deng, Gongwei Chen, Michael Yu Wang, and Liqiang Nie
    In Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS), 2024
  2. MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models
    Leyang Shen*, Gongwei Chen*, Rui Shao, Weili Guan, and Liqiang Nie
    In Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS), 2024
  3. Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks
    Zaijing Li, Yuquan Xie, Rui Shao, Gongwei Chen, Dongmei Jiang, and Liqiang Nie
    In Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS), 2024
  4. LION: Empowering multimodal large language model with dual-level visual knowledge
    Gongwei Chen, Leyang Shen, Rui Shao, Xiang Deng, and Liqiang Nie
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
  5. Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding
    Renshan Zhang, Yibo Lyu, Rui Shao, Gongwei Chen, Weili Guan, and Liqiang Nie
    arXiv preprint, 2024
  6. Enhancing the emotional generation capability of large language models via emotional chain-of-thought
    Zaijing Li, Gongwei Chen, Rui Shao, Dongmei Jiang, and Liqiang Nie
    arXiv preprint, 2024
  7. Object-to-Manipulation Graph for Affordance Navigation
    Xinhang Song, Bohan Wang, Liye Dong, Gongwei Chen, Xinyun Hu, and Shuqiang Jiang
    CAAI Artificial Intelligence Research, 2024

2023

  1. Composite Object Relation Modeling for Few-Shot Scene Recognition
    Xinhang Song, Chenlong Liu, Haitao Zeng, Yaohui Zhu, Gongwei Chen, Xiaorong Qin, and Shuqiang Jiang
    IEEE Transactions on Image Processing, 2023

2021

  1. See More for Scene: Pairwise Consistency Learning for Scene Classification
    Gongwei Chen, Xinhang Song, Bohan Wang, and Shuqiang Jiang
    Advances in Neural Information Processing Systems, 2021

2020

  1. Scene recognition with prototype-agnostic scene layout
    Gongwei Chen, Xinhang Song, Haitao Zeng, and Shuqiang Jiang
    IEEE Transactions on Image Processing, 2020
  2. Amorphous Region Context Modeling for Scene Recognition
    Haitao Zeng, Xinhang Song, Gongwei Chen, and Shuqiang Jiang
    IEEE Transactions on Multimedia, 2020

2019

  1. MUCH: Mutual Coupling Enhancement of Scene Recognition and Dense Captioning
    Xinhang Song, Bohan Wang, Gongwei Chen, and Shuqiang Jiang
    In Proceedings of the 27th ACM International Conference on Multimedia, 2019
  2. Deep patch representations with shared codebook for scene classification
    Shuqiang Jiang, Gongwei Chen, Xinhang Song, and Linhu Liu
    ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 2019
  3. Scene Recognition with Comprehensive Regions Graph Modeling
    Haitao Zeng and Gongwei Chen
    In International Conference on Image and Graphics, 2019
  4. Image representations with spatial object-to-object relations for RGB-D scene recognition
    Xinhang Song, Shuqiang Jiang, Bohan Wang, Chengpeng Chen, and Gongwei Chen
    IEEE Transactions on Image Processing, 2019
  5. Learning scene attribute for scene recognition
    Haitao Zeng, Xinhang Song, Gongwei Chen, and Shuqiang Jiang
    IEEE Transactions on Multimedia, 2019