Building systems that capture commonsense (e.g. why is someone behaving a certain way?) and understand the physical world (e.g. why does a rock make a better hammer than a screwdriver?) requires moving NLP beyond text and building models that integrate vision and robotics to construct rich multimodal linguistic representations. In this talk, I will present work on combining the state-of-the-art in NLP and Computer Vision to address commonsense reasoning, and introduce modeling advances for grounded language learning in robotics. Finally, I will lay out how I plan to grow RoboNLP into a core component of the NLP and broader AI communities. Central to my research agenda is demonstrating how NLP and linguistics are key to advancing multimodal research.
Yonatan Bisk is a Postdoc at the University of Washington working with Yejin Choi and he received his Ph.D. from the University of Illinois in Urbana-Champaign with Julia Hockenmaier. His thesis work focused on unsupervised methods of structure induction. The research results he found there motivated his move towards helping build the new field of multimodal RoboNLP. He is an active member of the academic community (e.g. conference and workshop organization) and broader CS community (e.g. ACM Future of Computing Academy Initiatives on D&I, Education, Future of Work).