Gaurav Verma

Senior Applied Scientist at Microsoft | Georgia Tech CS PhD | Research in Multimodal GenAI and LLMs


[ 📄 curriculum vitae (pdf) ]


Updates
April, 2025 | I defended my thesis!
Feb, 2025 | Honored to be serving as an Area Chair for ACL ARR (Multimodality and Language Grounding track) this year!
Dec, 2024 | Check out AdaptAgent: our new work on effciently adapting multimodal agents to unseen tasks and domains. [arXiv] [NeurIPS AFM Poster]
Oct, 2024 | Check out work led by Mohit Chandra on evaluating how well LLMs align with expert responses in psychiatric healthcare; to appear in NAACL 2025. [arXiv]
July, 2024 | Work led by Sejoon Oh on LLM-based rewriting for evaluating the robustness of leading recommender systems will apear in CIKM 2024. [paper] [code]
May, 2024 | Two papers will appear at ACL 2024 (Main), covering work on investigating the role of cross-modal projection in multimodal LLMs [web] and developing community-centric AI approaches for advancing online safety [web]. One paper on benchmarking of multimodal LLMs [preprint], led by Yiqiao Jin, will appear in the Findings of ACL 2024.
Feb, 2024 | Policy + AI work on using LLMs for large-scale analysis of societal impacts of AI innovation is published in Quantitative Science Studies (QSS)! [pdf, code]
Jan, 2024 | Work on cross-lingual evaluation of LLMs for healthcare queries, co-led by Yiqiao Jin and Mohit Chandra, will appear at WebConf 2024! [pdf, webpage, code]
October, 2023 | Work on modeling text visualness using large vision-language models will appear at EMNLP 2023! [pdf, webpage]
July, 2023 | Honored to be awarded the JPMorgan Chase AI Research Fellowship (2023)! [link]
May, 2023 | Two papers accepted at ACL 2023 (Main and Findings), covering robustness of multimodal learning [pdf] and adversarial robustness of few-shot learning in NLP [pdf]!
Dec, 2022 | Glad to be one of the recipients of 2022 Snap Research Fellowship! [link]

I am a Senior Applied Scientist at Microsoft, where I work on understanding and steering the impact of Generative AI on Productivity. I am a part of the Applied Science group in Microsoft IDEAS.

I received my Ph.D. in Computer Science from Georgia Tech, where I developed robust, efficient, and adaptable multimodal AI models. My work has been recognized by leading media outlets, multiple doctoral felllowships and grants, and has resulted in top-tier academic publications as well as 10+ patents with the USPTO that have influenced industry products (TechCrunch). [ 📄 CV (pdf)].

Previously, I was a researcher at Adobe Research (India), where I worked on applications of AI/ML to multimodal content generation (like this Sneak @ Adobe Summit). I completed my undergraduate studies at IIT Kanpur. During my doctoral studies, I was a Research Intern at JPMorgan AI Research, Microsoft Research, and Adobe Research.


Recent awards and honors
JP Morgan AI Research Ph.D. Fellow ; GT news (2023)
Snap Research Fellow (2022)
College of Computing Rising Star Doctoral Student Research Award (2022)
• Adobe Research Ph.D. Fellowship Finalist (2022)
• Work covered in TechCrunch, Forbes, Scientific American, The World, ...
AAAI ICWSM-2021 Best Reviewer Award


Selected recent papers (Complete list on Google Scholar)
A Framework for Situating Innovations, Opportunities, and Challenges in Advancing Vertical Systems with Large AI Models. Gaurav Verma, Jiawei Zhou, Mohit Chandra, Srijan Kumar, Munmun De Choudhury. arXiv preprint 2504.02793; under review. [pdf]

AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations. Gaurav Verma, Rachneet Kaur, Nishan Srishankar, Zhen Zeng, Tucker Balch, Manuela Veloso. arXiv preprint 2411.13451; under review. [pdf] [NeurIPS 2024 Workshop on Adaptative Foundations Model Poster]

Cross-Modal Projection in Multimodal LLMs Doesn't Really Project Visual Attributes to Textual Space. Gaurav Verma, Minje Choi, Kartik Sharma, Jamelle Watson-Daniels, Sejoon Oh, Srijan Kumar. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024).
[pdf] [code] [webpage]

Cross-Modal Attribute Insertions for Assessing the Robustness of Vision-and-Language Learning. Shivaen Ramshetty*, Gaurav Verma*, Srijan Kumar. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023). [pdf] [code]

Learning the Visualness of Text Using Large Vision-Language Models. Gaurav Verma, Ryan A. Rossi, Christopher Tensmeyer, Jiuxiang Gu, Ani Nenkova. In Proceedings Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). [pdf] [webpage]


Service
Area Chair: ACL ARR (Multimodality and Language Grounding)
Journal Reviewing: ACM Transactions on Computer-Human Interaction (TOCHI), IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Journal of Medical Internet Research (JMIR), Data Mining & Knowledge Discovery
Conference Reviewing (most iterations between 2021-2024): AAAI, ACL, EMNLP, ACL ARR, NeurIPS (Ethics reviewer), FAccT, CHI (💐 2023: Special Recognition for Outstanding Reviews), COLM, KDD, TheWebConf, ICWSM (2021: 💐 2021: Best Reviewer Award)