One major distinguishing characteristic in data is that good data comes from and identifies genuine existing human cultural rules and values. Bad data does not come from that. For example, a left-wing opinion is not always grounded in reality. So if you capture a left-wing comment, you will be training an AI on fake reality.
Be aware there is some fakery and lies currently about the chatbot technology, and the big players are definitely withholding some information about what they are doing and have built into the bots.
By the way, .win has been a testing ground for both corporate and private individual's bots. But before that, Reddit, Facebook, Twitter, and other sites were used both for development and deployment of bots to alter public opinion. The government paid Facebook to work on that and it has not yet been disclosed about Twitter in that.
One major distinguishing characteristic in data is that good data comes from and identifies genuine existing human cultural rules and values. Bad data does not come from that. For example, a left-wing opinion is not always grounded in reality. So if you capture a left-wing comment, you will be training an AI on fake reality.