About Me
Hi there! I am a reseach scientist at Salesforce AI Research, directed by Silvio Savarese and Caiming Xiong.
Previously, I completed my PhD at UC San Diego, working with Julian McAuley. My research interests are in vision & language, with a current focus on building and understanding scalable vision-language models, e.g., multimodal LLMs, video diffusion models.
Selected Research:
Multimodal LLMs
BLIP-3: A Family of Open Large Multimodal Models
• Le Xue, Manli Shu, Anas Awadalla, Jun Wang, An Yan, Senthil Purushwalkam, Honglu Zhou, Viraj Prabhu, many others, Juan Carlos Niebles, Caiming Xiong, Ran Xu
ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models
• Jieyu Zhang, Le Xue, Linxin Song, Jun Wang, Weikai Huang, Manli Shu, An Yan, Zixian Ma, Juan Carlos Niebles, Caiming Xiong, Zeyuan Chen, Ranjay Krishna, Ran Xu
Trust but Verify: Programmatic VLM Evaluation in the Wild
• Viraj Prabhu, Senthil Purushwalkam, An Yan, Caiming Xiong, Ran Xu
• International Conference on Computer Vision 2025 (ICCV 2025)
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
• An Yan, Zhengyuan Yang, Junda Wu, Wanrong Zhu, Jianwei Yang, Linjie Li, Kevin Lin, Jianfeng Wang, Julian McAuley, Jianfeng Gao, Lijuan Wang
• Conference on Language Modeling (COLM 2024)
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
• An Yan, Zhengyuan Yang, Wanrong Zhu, Kevin Lin, Linjie Li, Jianfeng Wang, Jianwei Yang, Yiwu Zhong, Julian McAuley, Jianfeng Gao, Zicheng Liu, Lijuan Wang
Visual Understanding & Interpretability
Learning Concise and Descriptive Attributes for Visual Recognition
• An Yan, Yu Wang, Yiwu Zhong, Chengyu Dong, Zexue He, Yujie Lu, William Wang, Jingbo Shang, Julian McAuley
• International Conference on Computer Vision 2023 (ICCV 2023)
Robust and Interpretable Medical Image Classifiers via Concept Bottleneck Models
• An Yan, Yu Wang, Petros Karypis, Zexue He, Amilcare Gentili, Chun-Nan Hsu, Julian McAuley
• Conference on Neural Information Processing Systems, Medical Imaging workshop (NeurIPS 2023)
Driving through the Concept Gridlock: Unraveling Explainability Bottlenecks in Automated Driving
• Jessica Echterhoff, An Yan, Kyungtae Han, Amr Abdelraouf, Rohit Gupta, Julian McAuley
• Winter Conference on Applications of Computer Vision (WACV 2024)
PA3D: Pose-Action 3D Machine for Video Recognition
• An Yan, Yali Wang, Zhifeng Li, Yu Qiao
• IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019)
Language Models & Text Generation
A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law
• Zhiyu Zoey Chen, Jing Ma, Xinlu Zhang, Nan Hao, An Yan, Armineh Nourbakhsh, Xianjun Yang, Julian McAuley, Linda Petzold, William Yang Wang
• Transactions on Machine Learning Research (TMLR 2024)
MedEval: A Multi-Level, Multi-Task, and Multi-Domain Medical Benchmark for Language Model Evaluation
• Zexue He, Yu Wang, An Yan, Yao Liu, Eric Y Chang, Amilcare Gentili, Julian McAuley, Chun-Nan Hsu
• Empirical Methods in Natural Language Processing (EMNLP 2023)
Visualize Before You Write: Imagination-Guided Open-Ended Text Generation
• Wanrong Zhu, An Yan, Yujie Lu, Wenda Xu, Xin Eric Wang, Miguel Eckstein, William Yang Wang
• European Chapter of the Association for Computational Linguistics (EACL 2023)
RadBERT: Adapting Language Models to Radiology
• An Yan, Chun-Nan Hsu, Amilcare Gentili, Julian McAuley
• Radiology: Artificial Intelligence, (RSNA Journal 2022)
Weakly Supervised Contrastive Learning for Chest X-Ray Report Generation
• An Yan, Zexue He, Xing Lu, Jiang Du, Eric Chang, Amilcare Gentili, Julian McAuley, Chun-Nan Hsu
• Empirical Methods in Natural Language Processing (EMNLP 2021)
Personalization & Recommendation
Bridging Language and Items for Retrieval and Recommendation
• Yupeng Hou, Jiacheng Li, Zhankui He, An Yan, Xiusi Chen, Julian McAuley
Personalized Showcases: Generating Multi-Modal Explanations for Recommendations
• An Yan, Zhankui He, Jiacheng Li, Tianyang Zhang, Julian McAuley
• The International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2023)
Personalized Complementary Product Recommendation
• An Yan, Yan Gao, Chaosheng Dong, Jinmiao Fu, Tong Zhao, Yi Sun, Julian McAuley
• The ACM Web Conference (WWW 2022)
2D Convolutional Neural Networks for Sequential Recommendation
• An Yan, Shuo Cheng, Wang-Cheng Kang, Mengting Wan, Julian McAuley
• ACM International Conference on Information and Knowledge Management (CIKM 2019)
Work Experience
Research Intern at Microsoft, Redmond, WA.
Hosts: Zhengyuan Yang, Jianwei Yang, Jianfeng Wang, Linjie Li, Kevin Lin, Zicheng Liu, Lijuan Wang.
GPT-4V as Agents. Data recipe and training of Multimodal LLMs.
Sep 2023 - Mar 2024.
Research Intern at Adobe, San Jose, CA.
Hosts: Raghav Addanki, David Arbour, Zhao Song, Tong Yu.
Gradient-based constrained sampling from LMs.
Jun 2023 - Sep 2023.
Research Intern at Meta, Menlo Park, CA.
Hosts: Cem Akkaya, Licheng Yu, Charlie Zhu, Yang Bai.
Multi-modal pre-training for ads understanding and generation.
Jun 2022 - Sep 2022.
Applied Scientist Intern at Amazon, Seattle, WA.
Hosts: Chaosheng Dong, Yan Gao, Jinmiao Fu, Tong Zhao.
Personalized complementary recommendation. Top 10 most viewed publications of 2022 at Amazon Science.
Jun 2021 - Sep 2021.
Applied Scientist Intern at Amazon, Santa Barbara, CA.
Hosts: Craig Bennett, Nic Jedema.
QA quality evaluation with BERT.
Jun 2020 - Sep 2020.
Education
University of California San Diego
Ph.D. & M.S. in Computer Science
Sep 2018 - Mar 2024.
University of Science and Technology of China
B.E. in Electronic Engineering & Information Science
Sep 2014 - Jun 2018.