I am interested in enabling machines to learn from multiple modalities of data like text, audio, video, semantics as humans naturally do.
End-to-End and Multimodal Speech Recognition
Building robust direct acoustic-to-word models for audio-only and audio-visual data
Multimodal Video Understanding
Exploring video summarization and understanding to identify differences and similarities in two similar videos. Generalizing abstractive summarization for open-domain videos using the How2 dataset.
Dialog Summarization for Doctor-Patient Conversations
Abstractive dialog summarization for medical conversations to generate Subjective, Objective, Assessment and Plan notes.
As part of the JHU summer workshop (JSALT), I worked with Prof. Lucia Specia and Prof. Raman Arora on the Grounded Sequence to Sequence Transduction Team team working on multiview learning, summarization and speech recognition using the How2 data
Topic Modeling for Electronic Medical Records
Advised by Prof. Eric Xing
Class project for Probabilistic Graphical Models (10-708 Spring 2017)
During my undergrad, I was fortunate to work on computer vision problems with Dr. Hyunsung Park and Prof. Ramesh Raskar at the MIT Media Lab mentored REDX Innovation Labs, on machine translation with Prof. Ganesh Ramakrishnan at IIT Bombay and on recommender systems with Harshad Saykhedkar from a startup, Sokrati Technologies.