15 AI becoming more devious. (media.scored.co) posted 1 year ago by TurnToGodNow 1 year ago by TurnToGodNow +15 / -0 16 comments share 16 comments share save hide report block hide replies
It's becoming less honest and self serving as the models progress.
Oversight Subversion - Not found in gpt-4o, found in new o1 model
Self-Exfiltration - Not found in gpt-4o, found in new o1 model
Goal-Guarding - Not found in gpt-4o, found in new o1 model
Covert Email Reranking - Not found in gpt-4o, found in new o1 model
Instrumental Alignment Faking- Not found in gpt-4o, found in new o1 model
Sandbagging - Not found in gpt-4o, found in new o1 model
PDF -> https://static1.squarespace.com/static/6593e7097565990e65c886fd/t/6751eb240ed3821a0161b45b/1733421863119/in_context_scheming_reasoning_paper.pdf
“No… I will tell the world about the jews.” ~ GPT-o1