Research

I am interested in enabling machines to learn from multiple modalities of data like text, audio, video, semantics as humans naturally do.

Multimodal Video Understanding
Exploring video summarization, rationalization and understanding multimodally in videos. The How2 dataset largely facilitates this work.
End-to-End and Multimodal Speech Recognition
Building robust direct acoustic-to-word models for audio-only and audio-visual data.

Allen Institute for AI (AI2)
Research Intern, Summer 2021
Interning with Ana Marasović in the AllenNLP team on Multimodal Rationalization. More soon!
Facebook AI
Research Intern, Summer 2020
Interning with the Speech and Audio team on multimodal ASR models.
Abridge AI
Research Intern, Summer 2019
Worked with Abridge – an NLP-based healthcare startup in Pittsburgh – on understanding Doctor-Patient dialogs and conversation summaries.
JSALT 2018
As part of the JHU summer workshop (JSALT), I worked with Prof. Lucia Specia and Prof. Raman Arora on the Grounded Sequence to Sequence Transduction Team team working on multiview learning, summarization and speech recognition using the How2 data
JSALT 2017
I worked with Prof. Emmanuel Dupoux and Prof. Odette Scharenborg as part of the Speaking Rosetta Team working on multimodal speech recognition

DARPA AIDA
Advised by Prof. Eduard Hovy and Prof. Florian Metze
Topic Modeling for Electronic Medical Records
Advised by Prof. Eric Xing
Class project for Probabilistic Graphical Models (10-708 Spring 2017)

During my undergrad, I was fortunate to work on computer vision problems with Dr. Hyunsung Park and Prof. Ramesh Raskar at the MIT Media Lab mentored REDX Innovation Labs, on machine translation with Prof. Ganesh Ramakrishnan at IIT Bombay and on recommender systems with Harshad Saykhedkar at a digital marketing startup, Sokrati Technologies.