About Me
Hi there! I am a reseach scientist at Salesforce AI Research.
Previously, I completed my PhD at UC San Diego, working with Prof. Julian McAuley. My research interests are in vision & language, with a current focus on building and understanding scalable vision-language models.
Selected Research:
Multimodal LLMs & Data Recipe
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
• Le Xue, Manli Shu, Anas Awadalla, Jun Wang, An Yan, Senthil Purushwalkam, Honglu Zhou, Viraj Prabhu, many others, Juan Carlos Niebles, Caiming Xiong, Ran Xu
ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models
• Jieyu Zhang, Le Xue, Linxin Song, Jun Wang, Weikai Huang, Manli Shu, An Yan, Zixian Ma, Juan Carlos Niebles, Caiming Xiong, Zeyuan Chen, Ranjay Krishna, Ran Xu
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
• An Yan, Zhengyuan Yang, Junda Wu, Wanrong Zhu, Jianwei Yang, Linjie Li, Kevin Lin, Jianfeng Wang, Julian McAuley, Jianfeng Gao, Lijuan Wang
• Conference on Language Modeling (COLM 2024)
Visual Understanding & Text Generation
Learning Concise and Descriptive Attributes for Visual Recognition
• An Yan, Yu Wang, Yiwu Zhong, Chengyu Dong, Zexue He, Yujie Lu, William Wang, Jingbo Shang, Julian McAuley
• International Conference on Computer Vision 2023 (ICCV 2023)
Visualize Before You Write: Imagination-Guided Open-Ended Text Generation
• Wanrong Zhu, An Yan, Yujie Lu, Wenda Xu, Xin Eric Wang, Miguel Eckstein, William Yang Wang
• European Chapter of the Association for Computational Linguistics (EACL 2023)
PA3D: Pose-Action 3D Machine for Video Recognition
• An Yan, Yali Wang, Zhifeng Li, Yu Qiao
• IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019)
Multimodal Agents & Systems
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
• An Yan, Zhengyuan Yang, Wanrong Zhu, Kevin Lin, Linjie Li, Jianfeng Wang, Jianwei Yang, Yiwu Zhong, Julian McAuley, Jianfeng Gao, Zicheng Liu, Lijuan Wang
Driving through the Concept Gridlock: Unraveling Explainability Bottlenecks in Automated Driving
• Jessica Echterhoff, An Yan, Kyungtae Han, Amr Abdelraouf, Rohit Gupta, Julian McAuley
• Winter Conference on Applications of Computer Vision (WACV 2024)
Evaluation & Analysis
Trust but Verify: Programmatic VLM Evaluation in the Wild
• Viraj Prabhu, Senthil Purushwalkam, An Yan, Caiming Xiong, Ran Xu
A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law
• Zhiyu Zoey Chen, Jing Ma, Xinlu Zhang, Nan Hao, An Yan, Armineh Nourbakhsh, Xianjun Yang, Julian McAuley, Linda Petzold, William Yang Wang
• Transactions on Machine Learning Research (TMLR 2024)
MedEval: A Multi-Level, Multi-Task, and Multi-Domain Medical Benchmark for Language Model Evaluation
• Zexue He, Yu Wang, An Yan, Yao Liu, Eric Y Chang, Amilcare Gentili, Julian McAuley, Chun-Nan Hsu
• Empirical Methods in Natural Language Processing (EMNLP 2023)
Personalization & Recommendation
Bridging Language and Items for Retrieval and Recommendation
• Yupeng Hou, Jiacheng Li, Zhankui He, An Yan, Xiusi Chen, Julian McAuley
Personalized Showcases: Generating Multi-Modal Explanations for Recommendations
• An Yan, Zhankui He, Jiacheng Li, Tianyang Zhang, Julian McAuley
• The International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2023)
Personalized Complementary Product Recommendation
• An Yan, Yan Gao, Chaosheng Dong, Jinmiao Fu, Tong Zhao, Yi Sun, Julian McAuley
• The ACM Web Conference (WWW 2022)
2D Convolutional Neural Networks for Sequential Recommendation
• An Yan, Shuo Cheng, Wang-Cheng Kang, Mengting Wan, Julian McAuley
• ACM International Conference on Information and Knowledge Management (CIKM 2019)
Machine Learning for Healthcare
Robust and Interpretable Medical Image Classifiers via Concept Bottleneck Models
• An Yan, Yu Wang, Petros Karypis, Zexue He, Amilcare Gentili, Chun-Nan Hsu, Julian McAuley
• Conference on Neural Information Processing Systems, Medical Imaging workshop (NeurIPS 2023)
RadBERT: Adapting Language Models to Radiology
• An Yan, Chun-Nan Hsu, Amilcare Gentili, Julian McAuley
• Radiology: Artificial Intelligence, (RSNA Journal 2022)
Weakly Supervised Contrastive Learning for Chest X-Ray Report Generation
• An Yan, Zexue He, Xing Lu, Jiang Du, Eric Chang, Amilcare Gentili, Julian McAuley, Chun-Nan Hsu
• Empirical Methods in Natural Language Processing (EMNLP 2021)
Work Experience
Research Intern at Microsoft, Redmond, WA.
Hosts: Zhengyuan Yang, Jianwei Yang, Jianfeng Wang, Linjie Li, Kevin Lin, Zicheng Liu, Lijuan Wang.
GPT-4V as Agents. Data recipe and training of Multimodal LLMs.
Sep 2023 - Mar 2024.
Research Intern at Adobe, San Jose, CA.
Hosts: Raghav Addanki, David Arbour, Zhao Song, Tong Yu.
Gradient-based constrained sampling from LMs.
Jun 2023 - Sep 2023.
Research Intern at Meta, Menlo Park, CA.
Hosts: Cem Akkaya, Licheng Yu, Charlie Zhu, Yang Bai.
Multi-modal pre-training for ads understanding and generation.
Jun 2022 - Sep 2022.
Applied Scientist Intern at Amazon, Seattle, WA.
Hosts: Chaosheng Dong, Yan Gao, Jinmiao Fu, Tong Zhao.
Personalized complementary recommendation. Top 10 most viewed publications of 2022 at Amazon Science.
Jun 2021 - Sep 2021.
Applied Scientist Intern at Amazon, Santa Barbara, CA.
Hosts: Craig Bennett, Nic Jedema.
QA quality evaluation with BERT.
Jun 2020 - Sep 2020.
Education
University of California San Diego
Ph.D. & M.S. in Computer Science
Sep 2018 - Mar 2024.
University of Science and Technology of China
B.E. in Electronic Engineering & Information Science
Sep 2014 - Jun 2018.