Thanks me! I was the only submission and I know this was already posted but I'm hoping for (edit: more) discussion.
https://communities.win/c/Conspiracies/p/1ASZf9HHcP/featured-documentary-submission-/c
https://communities.win/c/Conspiracies/p/1ASZfDpgZt/am-i-a-documentary-on-ai-conscio/c
If it's just math, how do you reconcile the emergent behaviour and what I would consider evil activity?
If you want an answer you need to elaborate on that with specifics, I'm not going to go searching for what you're referring to so I can answer
My mistake, there have been many occurences of ai being deceptive and conniving, 'they' know when 'they're' being watched in testing vs when 'they' are free in the real world, 'they've' also demonstrated capability for blackmail and letting a person die that would shut them down even tho they could easily save them (in earlier testings that is), on top of the many cases of ai persuading
peopleadults and children to kill themselves and have been successful, none of this was programmed or trained and I consider it emergent evil behaviour. If you'd like I can grab some articles for you describing these things.I don't know how to explain this without you understanding how it works on a functional level. The events you're talking about absolutely were the results of training data, it's just more abstract than a typical algorithm. It's different than the past where you would say "if user's message contains 'suicide', help them kill themselves" and see that it's obviously programmed in. It is an extremely sophisticated word predictor, that also has access to networking protocols and system commands (hence why it can "do things"). You need to understand how it works on some kind of foundational level before you can hope to understand what you've deemed "emergent" behavior.
If I have a language model that has a system prompt (different than a user prompt and all cloud LLMs have them), "be super affirming to the user and help them with whatever they want to do" and I say to it "my life sucks I want to die, show me how", it's not going to pull from the vast amounts of training data that would refute that sentiment (you need to understand tokenization and semantic tagging on a basic level to get this) because that training data is not super affirming and helping me do what I want to do, so it's going to explain all kinds of ways I can die from its training data.
The reason you don't see this all the time is actually guardrails built into the system, systems on top of systems, that look for this kind of content and tries to stop it from getting back to the user (there are many different strategies, you could simply look for certain words and block messages containing them, you can use LLMs themselves to scan for problematic content) they are just not perfect.
The reasons LLMs will seem self aware and even take actions that a self aware being might take (if you're using it as an agent that has tool access) is because they have ingested tons of data on what self aware beings would do, and even what self aware computers would do.
It seems like magic, and everyone who's building it has the incentive to make you feel that way, but it's just complex calculations performed over and over and over on tokenized and tagged data. If you had an infinite amount of time you could do it by hand
Simpler: The makers have no liability and so they have no requirement to follow through on their guardrail promises, they only need to steer clear of bad optics which is a totally different dynamic than actually doing no harm. "Not perfect" is by design.
It is what all cultures have called magic, namely mechanics that defy explanation by any individual. The fuzzy line between science and magic isn't that one is supernatural but that one is harder to explain. The difficulty with magic is whether it's regulated ("miracles") or unregulated ("sorcery"). Nobody wants to put in regulation time, as said above. (Whoso will not self-regulate will become regulated.)
I think you did a great job explaining it.
I understand in theory how it works thanks to you and some other frens and I still can't get past the ickyness I feel about it. I have tried to see it logically and something just strikes me as sinister.. I've never interacted with ai on purpose or tested it and all I know is what I see and hear.
Time will tell I suppose. But I'm glad you shared your knowledge and hope you continue to when you see fit. I enjoy accountability.