Abstract

Language-conditioned robotic policies allow users to specify tasks using natural language. While much research has focused on improving the action prediction of language- conditioned policies, reasoning about task descriptions has been largely overlooked. Ambiguous task descriptions often lead to downstream policy failures due to misinterpretation by the robotic agent. To address this challenge, we introduce AmbResVLM, a novel method that grounds language goals in the observed scene and explicitly reasons about task ambiguity. We extensively evaluate its effectiveness in both simulated and real- world domains, demonstrating superior task ambiguity detection and resolution compared to recent state-of-the-art methods. Finally, real robot experiments show that our model improves the performance of downstream robot policies, increasing the average success rate from 69.6% to 97.1%.

Teaser Video

Code

For academic usage a software implementation of this project based on PyTorch can be found in our GitHub repository and is released under the GPLv3 license. For any commercial purpose, please contact the authors.

Publications

If you find our work useful, please consider citing our paper:

Eugenio Chisari, Jan Ole von Hartz, Fabien Despinoy, Abhinav Valada

Robotic Task Ambiguity Resolution via Natural Language Interaction
Arxiv, 2025.
(PDF) (BibTeX)

Authors

Eugenio Chisari

Eugenio Chisari

University of Freiburg

Jan Ole von Hartz

Jan Ole von Hartz

University of Freiburg

Fabien Despinoy

Fabien Despinoy

Toyota Motor Europe

Abhinav Valada

Abhinav Valada

University of Freiburg

Acknowledgment

This work was funded by Toyota Motor Europe (TME) and an academic grant from NVIDIA.