Senior Applied Scientist at Microsoft | Georgia Tech CS PhD | Research in Multimodal GenAI and LLMs
I am a Senior Applied Scientist at Microsoft, where I work on understanding and steering the impact of Generative AI on Productivity. I am a part of the IDEAS Research group led by Scott Counts.
I received my Ph.D. in Computer Science from Georgia Tech, where I developed robust, efficient, and adaptable multimodal AI models. My work has been recognized by leading media outlets, multiple doctoral felllowships and grants, and has resulted in top-tier academic publications as well as 10+ patents that have influenced industry products (TechCrunch). [ 📄 CV (pdf)].
Previously, I was a researcher at Adobe Research (India), where I worked on applications of AI/ML to multimodal content generation (like this Sneak @ Adobe Summit). I completed my undergraduate studies at IIT Kanpur. During my doctoral studies, I was a Research Intern at JPMorgan AI Research, Microsoft Research, and Adobe Research.
Recent awards and honors
• JP Morgan AI Research Ph.D. Fellow ; GT news (2023)
• Snap Research Fellow (2022)
• College of Computing Rising Star Doctoral Student Research Award (2022)
• Adobe Research Ph.D. Fellowship Finalist (2022)
• Work covered in TechCrunch, Forbes, Scientific American, The World, ...
• AAAI ICWSM-2021 Best Reviewer Award
Selected recent papers (Complete list on Google Scholar)
→ A Framework for Situating Innovations, Opportunities, and Challenges in Advancing Vertical Systems with Large AI Models. Gaurav Verma, Jiawei Zhou, Mohit Chandra, Srijan Kumar, Munmun De Choudhury. arXiv preprint 2504.02793; under review.
[pdf]
→ AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations. Gaurav Verma, Rachneet Kaur, Nishan Srishankar, Zhen Zeng, Tucker Balch, Manuela Veloso. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025).
[pdf] [NeurIPS 2024 Workshop on Adaptative Foundation Models Poster]
→ Cross-Modal Projection in Multimodal LLMs Doesn't Really Project Visual Attributes to Textual Space. Gaurav Verma, Minje Choi, Kartik Sharma, Jamelle Watson-Daniels, Sejoon Oh, Srijan Kumar. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024).
[pdf] [code] [webpage]
→ Cross-Modal Attribute Insertions for Assessing the Robustness of Vision-and-Language Learning. Shivaen Ramshetty*, Gaurav Verma*, Srijan Kumar. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023).
[pdf] [code]
→ Learning the Visualness of Text Using Large Vision-Language Models. Gaurav Verma, Ryan A. Rossi, Christopher Tensmeyer, Jiuxiang Gu, Ani Nenkova. In Proceedings Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023).
[pdf] [webpage]
Service
→ Area Chair: ACL ARR (Multimodality and Language Grounding)
→ Journal Reviewing: ACM Transactions on Computer-Human Interaction (TOCHI), IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Journal of Medical Internet Research (JMIR), Data Mining & Knowledge Discovery
→ Conference Reviewing (most iterations between 2021-2024): AAAI, ACL, EMNLP, ACL ARR, NeurIPS (Ethics reviewer), FAccT, CHI (💐 2023: Special Recognition for Outstanding Reviews), COLM, KDD, TheWebConf, ICWSM (2021: 💐 2021: Best Reviewer Award)