EMNLP 2025

November 06, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Speakers of unwritten languages have the potential to benefit from speech-based automatic information retrieval systems. This paper proposes a speech embedding technique that facilitates such a system that we can be used in a zero-shot manner on the target language. After conducting development experiments on several written Indic languages, we evaluate our method on a corpus of Gormati -- an unwritten language -- that was previously collected in partnership with an agrarian Banjara community in Maharashtra State, India, specifically for the purposes of information retrieval. Our system achieves a Top 5 retrieval rate of 87.9% on this data, giving the hope that it may be useable by unwritten language speakers worldwide.

Downloads

SlidesPaperTranscript English (automatic)

Next from EMNLP 2025

Preemptive Detection and Correction of Misaligned Actions in LLM Agents
poster

Preemptive Detection and Correction of Misaligned Actions in LLM Agents

EMNLP 2025

Xiaodan ZhuIryna GurevychHaishuo Fang
Haishuo Fang and 2 other authors

06 November 2025