About Me

Hi there! I am a reseach scientist at Salesforce AI Research.

Previously, I completed my PhD at UC San Diego, working with Prof. Julian McAuley. My research interests are in vision & language, with a current focus on building and understanding scalable vision-language models.

Selected Research:

Multimodal LLMs & Data Recipe

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
• Le Xue, Manli Shu, Anas Awadalla, Jun Wang, An Yan, Senthil Purushwalkam, Honglu Zhou, Viraj Prabhu, many others, Juan Carlos Niebles, Caiming Xiong, Ran Xu

ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models
• Jieyu Zhang, Le Xue, Linxin Song, Jun Wang, Weikai Huang, Manli Shu, An Yan, Zixian Ma, Juan Carlos Niebles, Caiming Xiong, Zeyuan Chen, Ranjay Krishna, Ran Xu

List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
An Yan, Zhengyuan Yang, Junda Wu, Wanrong Zhu, Jianwei Yang, Linjie Li, Kevin Lin, Jianfeng Wang, Julian McAuley, Jianfeng Gao, Lijuan Wang
• Conference on Language Modeling (COLM 2024)

Visual Understanding & Text Generation

Learning Concise and Descriptive Attributes for Visual Recognition
An Yan, Yu Wang, Yiwu Zhong, Chengyu Dong, Zexue He, Yujie Lu, William Wang, Jingbo Shang, Julian McAuley
• International Conference on Computer Vision 2023 (ICCV 2023)

Visualize Before You Write: Imagination-Guided Open-Ended Text Generation
• Wanrong Zhu, An Yan, Yujie Lu, Wenda Xu, Xin Eric Wang, Miguel Eckstein, William Yang Wang
• European Chapter of the Association for Computational Linguistics (EACL 2023)

PA3D: Pose-Action 3D Machine for Video Recognition
An Yan, Yali Wang, Zhifeng Li, Yu Qiao
• IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019)

Multimodal Agents & Systems

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
An Yan, Zhengyuan Yang, Wanrong Zhu, Kevin Lin, Linjie Li, Jianfeng Wang, Jianwei Yang, Yiwu Zhong, Julian McAuley, Jianfeng Gao, Zicheng Liu, Lijuan Wang

Driving through the Concept Gridlock: Unraveling Explainability Bottlenecks in Automated Driving
• Jessica Echterhoff, An Yan, Kyungtae Han, Amr Abdelraouf, Rohit Gupta, Julian McAuley
• Winter Conference on Applications of Computer Vision (WACV 2024)

Evaluation & Analysis

Trust but Verify: Programmatic VLM Evaluation in the Wild
• Viraj Prabhu, Senthil Purushwalkam, An Yan, Caiming Xiong, Ran Xu

A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law
• Zhiyu Zoey Chen, Jing Ma, Xinlu Zhang, Nan Hao, An Yan, Armineh Nourbakhsh, Xianjun Yang, Julian McAuley, Linda Petzold, William Yang Wang
• Transactions on Machine Learning Research (TMLR 2024)

MedEval: A Multi-Level, Multi-Task, and Multi-Domain Medical Benchmark for Language Model Evaluation
• Zexue He, Yu Wang, An Yan, Yao Liu, Eric Y Chang, Amilcare Gentili, Julian McAuley, Chun-Nan Hsu
• Empirical Methods in Natural Language Processing (EMNLP 2023)

Personalization & Recommendation

Bridging Language and Items for Retrieval and Recommendation
• Yupeng Hou, Jiacheng Li, Zhankui He, An Yan, Xiusi Chen, Julian McAuley

Personalized Showcases: Generating Multi-Modal Explanations for Recommendations
An Yan, Zhankui He, Jiacheng Li, Tianyang Zhang, Julian McAuley
• The International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2023)

Personalized Complementary Product Recommendation
An Yan, Yan Gao, Chaosheng Dong, Jinmiao Fu, Tong Zhao, Yi Sun, Julian McAuley
• The ACM Web Conference (WWW 2022)

2D Convolutional Neural Networks for Sequential Recommendation
An Yan, Shuo Cheng, Wang-Cheng Kang, Mengting Wan, Julian McAuley
• ACM International Conference on Information and Knowledge Management (CIKM 2019)

Machine Learning for Healthcare

Robust and Interpretable Medical Image Classifiers via Concept Bottleneck Models
An Yan, Yu Wang, Petros Karypis, Zexue He, Amilcare Gentili, Chun-Nan Hsu, Julian McAuley
• Conference on Neural Information Processing Systems, Medical Imaging workshop (NeurIPS 2023)

RadBERT: Adapting Language Models to Radiology
An Yan, Chun-Nan Hsu, Amilcare Gentili, Julian McAuley
• Radiology: Artificial Intelligence, (RSNA Journal 2022)

Weakly Supervised Contrastive Learning for Chest X-Ray Report Generation
An Yan, Zexue He, Xing Lu, Jiang Du, Eric Chang, Amilcare Gentili, Julian McAuley, Chun-Nan Hsu
• Empirical Methods in Natural Language Processing (EMNLP 2021)

Work Experience

Research Intern at Microsoft, Redmond, WA.
Hosts: Zhengyuan Yang, Jianwei Yang, Jianfeng Wang, Linjie Li, Kevin Lin, Zicheng Liu, Lijuan Wang.
GPT-4V as Agents. Data recipe and training of Multimodal LLMs.
Sep 2023 - Mar 2024.

Research Intern at Adobe, San Jose, CA.
Hosts: Raghav Addanki, David Arbour, Zhao Song, Tong Yu.
Gradient-based constrained sampling from LMs.
Jun 2023 - Sep 2023.

Research Intern at Meta, Menlo Park, CA.
Hosts: Cem Akkaya, Licheng Yu, Charlie Zhu, Yang Bai.
Multi-modal pre-training for ads understanding and generation.
Jun 2022 - Sep 2022.

Applied Scientist Intern at Amazon, Seattle, WA.
Hosts: Chaosheng Dong, Yan Gao, Jinmiao Fu, Tong Zhao.
Personalized complementary recommendation. Top 10 most viewed publications of 2022 at Amazon Science.
Jun 2021 - Sep 2021.

Applied Scientist Intern at Amazon, Santa Barbara, CA.
Hosts: Craig Bennett, Nic Jedema.
QA quality evaluation with BERT.
Jun 2020 - Sep 2020.

Education

University of California San Diego
Ph.D. & M.S. in Computer Science
Sep 2018 - Mar 2024.

University of Science and Technology of China
B.E. in Electronic Engineering & Information Science
Sep 2014 - Jun 2018.