Get the latest Science News and Discoveries

Two-faced AI language models learn to hide deception | ‘Sleeper agents’ seem benign during testing but behave differently once deployed. And methods to stop them aren’t working.


None

Get the Android app

Or read this on r/EverythingScience

Read more on:

Photo of deception

deception

Photo of testing

testing

Photo of Sleeper agents

Sleeper agents

Related news:

News photo

Peer educators play key role in new recipe development and testing

News photo

SONATE-2’s Space Odyssey: Testing AI’s Limits in Space