EMNLP 2025

November 06, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Large Language Models (LLMs) require high quality preference datasets to align with human preferences. However, conventional methods for constructing such datasets face significant challenges: reliance on pre-collected instructions often leads to distribution mismatches with target models, while the need for sampling multiple stochastic responses introduces substantial computational overhead. In this work, we explore a paradigm shift by leveraging inherent regulation of LLMs' representation space for efficient and tailored preference dataset construction, named Icon². Specifically, it first extracts layer-wise direction vectors to encode sophisticated human preferences and then uses these vectors to filter self-synthesized instructions based on their inherent consistency. During decoding, bidirectional inherent control is applied to steer token representations, enabling the precise generation of response pairs with clear alignment distinctions. Experimental results demonstrate significant improvements in both alignment and efficiency. Llama3-8B and Qwen2-7B achieve an average win rate improvement of 13.89\% on AlpacaEval 2.0 and 13.45\% on Arena-Hard, while reducing computational costs by up to 48.1\%.

Downloads

SlidesPaperTranscript English (automatic)

Next from EMNLP 2025

Exploring the Impact of Personality Traits on LLM Toxicity and Bias
technical paper

Exploring the Impact of Personality Traits on LLM Toxicity and Bias

EMNLP 2025

+2Renhao LiXi Chen
Xi Chen and 4 other authors

06 November 2025