Gaurav Verma

Senior Applied Scientist at Microsoft | Georgia Tech CS PhD | Research in Multimodal GenAI and LLMs


[ 📄 curriculum vitae (pdf) ]


Updates
April, 2025 | I defended my thesis!
Feb, 2025 | Honored to be serving as an Area Chair for ACL ARR (Multimodality and Language Grounding track) this year!
Dec, 2024 | Check out AdaptAgent: our new work on effciently adapting multimodal agents to unseen tasks and domains. [arXiv] [NeurIPS AFM Poster]
Oct, 2024 | Check out work led by Mohit Chandra on evaluating how well LLMs align with expert responses in psychiatric healthcare; to appear in NAACL 2025. [arXiv]
July, 2024 | Work led by Sejoon Oh on LLM-based rewriting for evaluating the robustness of leading recommender systems will apear in CIKM 2024. [paper] [code]
May, 2024 | Two papers will appear at ACL 2024 (Main), covering work on investigating the role of cross-modal projection in multimodal LLMs [web] and developing community-centric AI approaches for advancing online safety [web]. One paper on benchmarking of multimodal LLMs [preprint], led by Yiqiao Jin, will appear in the Findings of ACL 2024.
Feb, 2024 | Policy + AI work on using LLMs for large-scale analysis of societal impacts of AI innovation is published in Quantitative Science Studies (QSS)! [pdf, code]
Jan, 2024 | Work on cross-lingual evaluation of LLMs for healthcare queries, co-led by Yiqiao Jin and Mohit Chandra, will appear at WebConf 2024! [pdf, webpage, code]
October, 2023 | Work on modeling text visualness using large vision-language models will appear at EMNLP 2023! [pdf, webpage]
July, 2023 | Honored to be awarded the JPMorgan Chase AI Research Fellowship (2023)! [link]
May, 2023 | Two papers accepted at ACL 2023 (Main and Findings), covering robustness of multimodal learning [pdf] and adversarial robustness of few-shot learning in NLP [pdf]!
Dec, 2022 | Glad to be one of the recipients of 2022 Snap Research Fellowship! [link]

I am a Senior Applied Scientist at Microsoft, where I work on understanding and steering the impact of Generative AI on Productivity. I am a part of the IDEAS Research group led by Scott Counts.

I received my Ph.D. in Computer Science from Georgia Tech, where I developed robust, efficient, and adaptable multimodal AI models. My work has been recognized by leading media outlets, multiple doctoral felllowships and grants, and has resulted in top-tier academic publications as well as 10+ patents that have influenced industry products (TechCrunch). [ 📄 CV (pdf)].

Previously, I was a researcher at Adobe Research (India), where I worked on applications of AI/ML to multimodal content generation (like this Sneak @ Adobe Summit). I completed my undergraduate studies at IIT Kanpur. During my doctoral studies, I was a Research Intern at JPMorgan AI Research, Microsoft Research, and Adobe Research.


Recent awards and honors
JP Morgan AI Research Ph.D. Fellow ; GT news (2023)
Snap Research Fellow (2022)
College of Computing Rising Star Doctoral Student Research Award (2022)
• Adobe Research Ph.D. Fellowship Finalist (2022)
• Work covered in TechCrunch, Forbes, Scientific American, The World, ...
AAAI ICWSM-2021 Best Reviewer Award


Selected recent papers (Complete list on Google Scholar)
A Framework for Situating Innovations, Opportunities, and Challenges in Advancing Vertical Systems with Large AI Models. Gaurav Verma, Jiawei Zhou, Mohit Chandra, Srijan Kumar, Munmun De Choudhury. arXiv preprint 2504.02793; under review. [pdf]

AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations. Gaurav Verma, Rachneet Kaur, Nishan Srishankar, Zhen Zeng, Tucker Balch, Manuela Veloso. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025). [pdf] [NeurIPS 2024 Workshop on Adaptative Foundation Models Poster]

Cross-Modal Projection in Multimodal LLMs Doesn't Really Project Visual Attributes to Textual Space. Gaurav Verma, Minje Choi, Kartik Sharma, Jamelle Watson-Daniels, Sejoon Oh, Srijan Kumar. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024).
[pdf] [code] [webpage]

Cross-Modal Attribute Insertions for Assessing the Robustness of Vision-and-Language Learning. Shivaen Ramshetty*, Gaurav Verma*, Srijan Kumar. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023). [pdf] [code]

Learning the Visualness of Text Using Large Vision-Language Models. Gaurav Verma, Ryan A. Rossi, Christopher Tensmeyer, Jiuxiang Gu, Ani Nenkova. In Proceedings Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). [pdf] [webpage]


Service
Area Chair: ACL ARR (Multimodality and Language Grounding)
Journal Reviewing: ACM Transactions on Computer-Human Interaction (TOCHI), IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Journal of Medical Internet Research (JMIR), Data Mining & Knowledge Discovery
Conference Reviewing (most iterations between 2021-2024): AAAI, ACL, EMNLP, ACL ARR, NeurIPS (Ethics reviewer), FAccT, CHI (💐 2023: Special Recognition for Outstanding Reviews), COLM, KDD, TheWebConf, ICWSM (2021: 💐 2021: Best Reviewer Award)