ICCV

YouRefIt: Embodied Reference Understanding with Language and Gesture

Download Abstract: We study the machine’s understanding of embodied reference: One agent uses both language and gesture to refer to an object to another agent in a shared physical environment. Of note, this new visual task requires understanding multimodal cues with perspective-taking to identify which object is being referred to. To tackle this problem, we …

YouRefIt: Embodied Reference Understanding with Language and Gesture Read More »

VLGrammar: Grounded Grammar Induction of Vision and Language

Download Abstract: Cognitive grammar suggests that the acquisition of language grammar is grounded within visual structures. While grammar is an essential representation of natural language, it also exists ubiquitously in vision to represent the hierarchical part-whole structure. In this work, we study grounded grammar induction of vision and language in a joint learning framework. Specifically, …

VLGrammar: Grounded Grammar Induction of Vision and Language Read More »

Scroll to Top