I am currently a Senior Applied Scientist in Microsoft's IDEAS Research group. My research leverages one of the largest productivity data infrastructures in the world to build better AI agents. Specifically, I aim to (i) understand how AI tools are used in real productivity workflows, and (ii) turn those large-scale, data-driven insights into more capable agents and more intuitive human–AI interactions.

I received my Ph.D. in Computer Science from Georgia Tech, where I developed robust, efficient, and adaptable multimodal AI models. My work has been recognized by leading media outlets, multiple doctoral fellowships and grants, and best paper awards, and has led to academic publications and over a dozen patents that have influenced industry products (TechCrunch).

Before my Ph.D., I was a researcher at Adobe Research (India), working on multimodal content generation, some of which was featured in this Sneak @ Adobe Summit. I completed my undergraduate studies at IIT Kanpur, and during my doctoral studies I interned at JPMorgan AI Research, Microsoft Research, and Adobe Research.

Awards and Honors

Selected Recent Works (Complete list on Google Scholar)

Microsoft New Future of Work Report 2025.

Jenna Butler, Sonia Jaffe, Rebecca Janssen, Nancy Baym, Jake Hofman, Brent Hecht, Sean Rintel, Bahar Sarrafzadeh, Abigail Sellen, Mihaela Vorvoreanu, Jaime Teevan (editors) and other authors.

Microsoft Research Tech Report MSR-TR-2025-58 (https://aka.ms/nfw2025), 2025.

A Framework for Situating Innovations, Opportunities, and Challenges in Advancing Vertical Systems with Large AI Models.

Gaurav Verma, Jiawei Zhou, Mohit Chandra, Srijan Kumar, Munmun De Choudhury.

In Proceedings of the ACM/AAAI Conference on Artificial Intelligence, Ethics, and Society (AIES 2025).

AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations.

Gaurav Verma, Rachneet Kaur, Nishan Srishankar, Zhen Zeng, Tucker Balch, Manuela Veloso.

In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025).

Cross-Modal Projection in Multimodal LLMs Doesn't Really Project Visual Attributes to Textual Space.

Gaurav Verma, Minje Choi, Kartik Sharma, Jamelle Watson-Daniels, Sejoon Oh, Srijan Kumar.

In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024).

Cross-Modal Attribute Insertions for Assessing the Robustness of Vision-and-Language Learning.

Shivaen Ramshetty*, Gaurav Verma*, Srijan Kumar.

In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023).

Learning the Visualness of Text Using Large Vision-Language Models.

Gaurav Verma, Ryan A. Rossi, Christopher Tensmeyer, Jiuxiang Gu, Ani Nenkova.

In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023).

Service

Area Chair
ACL ARR (Multimodality and Language Grounding)
Journal Reviewing
ACM Transactions on Computer-Human Interaction (TOCHI), IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Journal of Medical Internet Research (JMIR), Data Mining & Knowledge Discovery
Conference Reviewing
(most iterations between 2021-2024): AAAI, ACL, EMNLP, ACL ARR, NeurIPS (Ethics reviewer), FAccT, CHI (2023: Special Recognition for Outstanding Reviews), COLM, KDD, TheWebConf, ICWSM (2021: Best Reviewer Award)