From Fast Follower to Innovation Leader: Restructuring South Korea’s Technology RegulationMay 12, 2025
Alignment Auditing: Uncovering Hidden Objectives in Language ModelsMarch 21, 2025 Anthropic researchers explore alignment audits, a process for investigating hidden objectives in language models, using a blind auditing game and various techniques.