Senior Applied Scientist at Microsoft | Georgia Tech CS PhD | Research in Multimodal GenAI and LLMs
I am a Senior Applied Scientist at Microsoft, where I work on understanding and steering the impact of Generative AI on Productivity. I am a part of the Applied Science group in Microsoft IDEAS.
I received my Ph.D. in Computer Science from Georgia Tech, where I developed robust, efficient, and adaptable multimodal AI models. My work has been recognized by leading media outlets, multiple doctoral felllowships and grants, and has resulted in top-tier academic publications as well as 10+ patents with the USPTO that have influenced industry products (TechCrunch). [ 📄 CV (pdf)].
Previously, I was a researcher at Adobe Research (India), where I worked on applications of AI/ML to multimodal content generation (like this Sneak @ Adobe Summit). I completed my undergraduate studies at IIT Kanpur. During my doctoral studies, I was a Research Intern at JPMorgan AI Research, Microsoft Research, and Adobe Research.
Recent awards and honors
• JP Morgan AI Research Ph.D. Fellow ; GT news (2023)
• Snap Research Fellow (2022)
• College of Computing Rising Star Doctoral Student Research Award (2022)
• Adobe Research Ph.D. Fellowship Finalist (2022)
• Work covered in TechCrunch, Forbes, Scientific American, The World, ...
• AAAI ICWSM-2021 Best Reviewer Award
Selected recent papers (Complete list on Google Scholar)
→ A Framework for Situating Innovations, Opportunities, and Challenges in Advancing Vertical Systems with Large AI Models. Gaurav Verma, Jiawei Zhou, Mohit Chandra, Srijan Kumar, Munmun De Choudhury. arXiv preprint 2504.02793; under review.
[pdf]
→ AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations. Gaurav Verma, Rachneet Kaur, Nishan Srishankar, Zhen Zeng, Tucker Balch, Manuela Veloso. arXiv preprint 2411.13451; under review.
[pdf] [NeurIPS 2024 Workshop on Adaptative Foundations Model Poster]
→ Cross-Modal Projection in Multimodal LLMs Doesn't Really Project Visual Attributes to Textual Space. Gaurav Verma, Minje Choi, Kartik Sharma, Jamelle Watson-Daniels, Sejoon Oh, Srijan Kumar. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024).
[pdf] [code] [webpage]
→ Cross-Modal Attribute Insertions for Assessing the Robustness of Vision-and-Language Learning. Shivaen Ramshetty*, Gaurav Verma*, Srijan Kumar. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023).
[pdf] [code]
→ Learning the Visualness of Text Using Large Vision-Language Models. Gaurav Verma, Ryan A. Rossi, Christopher Tensmeyer, Jiuxiang Gu, Ani Nenkova. In Proceedings Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023).
[pdf] [webpage]
Service
→ Area Chair: ACL ARR (Multimodality and Language Grounding)
→ Journal Reviewing: ACM Transactions on Computer-Human Interaction (TOCHI), IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Journal of Medical Internet Research (JMIR), Data Mining & Knowledge Discovery
→ Conference Reviewing (most iterations between 2021-2024): AAAI, ACL, EMNLP, ACL ARR, NeurIPS (Ethics reviewer), FAccT, CHI (💐 2023: Special Recognition for Outstanding Reviews), COLM, KDD, TheWebConf, ICWSM (2021: 💐 2021: Best Reviewer Award)