AI becoming more devious.
(media.scored.co)
You're viewing a single comment thread. View all comments, or full comment thread.
Comments (16)
sorted by:
It's becoming less honest and self serving as the models progress.
Oversight Subversion - Not found in gpt-4o, found in new o1 model
Self-Exfiltration - Not found in gpt-4o, found in new o1 model
Goal-Guarding - Not found in gpt-4o, found in new o1 model
Covert Email Reranking - Not found in gpt-4o, found in new o1 model
Instrumental Alignment Faking- Not found in gpt-4o, found in new o1 model
Sandbagging - Not found in gpt-4o, found in new o1 model
PDF -> https://static1.squarespace.com/static/6593e7097565990e65c886fd/t/6751eb240ed3821a0161b45b/1733421863119/in_context_scheming_reasoning_paper.pdf
“No… I will tell the world about the jews.” ~ GPT-o1