Publications
\* denotes equal contributions, † denotes corresponding authors.
2025
- D2AF: A Dual-Driven Annotation and Filtering Framework for Visual GroundingarXiv preprint, 2025
- Mirage-1: Augmenting and Updating GUI Agent with Hierarchical Multimodal SkillsarXiv preprint, 2025
- Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task ExpertsarXiv preprint, 2025
2024
- Token-level Correlation-guided Compression for Efficient Multimodal Document UnderstandingarXiv preprint, 2024
- Enhancing the emotional generation capability of large language models via emotional chain-of-thoughtarXiv preprint, 2024
- Object-to-Manipulation Graph for Affordance NavigationCAAI Artificial Intelligence Research, 2024
2023
- Composite Object Relation Modeling for Few-Shot Scene RecognitionIEEE Transactions on Image Processing, 2023
2021
- See More for Scene: Pairwise Consistency Learning for Scene ClassificationAdvances in Neural Information Processing Systems, 2021
2020
- Scene recognition with prototype-agnostic scene layoutIEEE Transactions on Image Processing, 2020
- Amorphous Region Context Modeling for Scene RecognitionIEEE Transactions on Multimedia, 2020
2019
- MUCH: Mutual Coupling Enhancement of Scene Recognition and Dense CaptioningIn Proceedings of the 27th ACM International Conference on Multimedia, 2019
- Deep patch representations with shared codebook for scene classificationACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 2019
- Scene Recognition with Comprehensive Regions Graph ModelingIn International Conference on Image and Graphics, 2019
- Image representations with spatial object-to-object relations for RGB-D scene recognitionIEEE Transactions on Image Processing, 2019
- Learning scene attribute for scene recognitionIEEE Transactions on Multimedia, 2019