Revisions | Conspiracies - Conspiracy Theories & Facts

Reason: None provided.

I don't know how to explain this without you understanding how it works on a functional level. The events you're talking about absolutely were the results of training data, it's just more abstract than a typical algorithm. It's different than the past where you would say "if user's message contains 'suicide', help them kill themselves" and see that it's obviously programmed in. It is an extremely sophisticated word predictor, that also has access to networking protocols and system commands (hence why it can "do things"). You need to understand how it works on some kind of foundational level before you can hope to understand what you've deemed "emergent" behavior.

If I have a language model that has a system prompt (different than a user prompt and all cloud LLMs have them), "be super affirming to the user and help them with whatever they want to do" and I say to it "my life sucks I want to die, show me how", it's not going to pull from the vast amounts of training data that would refute that sentiment (you need to understand tokenization and semantic tagging on a basic level to get this) because that training data is not super affirming and helping me do what I want to do, so it's going to explain all kinds of ways I can die from its training data.

The reason you don't see this all the time is actually guardrails built into the system, systems on top of systems, that look for this kind of content and tries to stop it from getting back to the user (there are many different strategies, you could simply look for certain words and block messages containing them, you can use LLMs themselves to scan for problematic content) they are just not perfect.

The reasons LLMs will seem self aware and even take actions that a self aware being might take (if you're using it as an agent that has tool access) is because they have ingested tons of data on what self aware beings would do, and even what self aware computers would do.

It seems like magic, and everyone who's building it has the incentive to make you feel that way, but it's just complex calculations performed over and over and over on tokenized and tagged data. If you had an infinite amount of time you could do it by hand

20 hours ago

1 score

Reason: Original

If I have a language model that has a system prompt (different than a user prompt and all cloud LLMs have them), "be super affirming to the user and help them with whatever they want to do" and I say to it "my life sucks I want to die, show me how", it's not going to pull from the vast amounts of training data that would refute that sentiment (again, you need to understand tokenization and semantic tagging on a basic level to actually get this) because that training data is not super affirming and helping me do what I want to do, so it's going to explain all kinds of ways I can die from its training data.

21 hours ago

1 score