Alignment Auditing: Uncovering Hidden Objectives in Language ModelsMarch 21, 2025 Anthropic researchers explore alignment audits, a process for investigating hidden objectives in language models, using a blind auditing game and various techniques.