10
posted ago by brahbruh ago by brahbruh +10 / -0

I made this post as a response to a comment thread (here: https://conspiracies.win/p/16a9h3yYIa/x/c/), because I felt like it was long enough to warrant existence as a post.


Picture a chart with the X axis representing age and the Y axis representing wealth. It would be pretty easy to see that, aside from some outliers, the majority of the people (points) placed on this chart would be in a diagonal line from the bottom left to the top right (indicating a high degree of correlation), since young people tend to have less wealth and older people tend to have more wealth. This is easy for a human to see when visualizing a chart. In a computer program, a Euclidean distance calculation is used, which involves essentially drawing a triangle with the hypotenuse ends connecting two data points on the chart. In reality, no triangle is drawn, since this is in a computer program rather than a visual chart on paper, but the left to right and up to down distance, between the points, is used to calculate the length of the hypotenuse, via the Pythagorean theorem. The length of the hypotenuse is the Euclidean distance. The shorter the distance the higher the correlation.

This would be the same is looking at such a chart and figuring out "these old guys are all like within a few centimeters of each other" by putting a measuring tape on the chart, the distance between each of those dots being the Euclidean distance.

That example is based on a two dimensional array of data (those dimensions being age and wealth), but imagine a three dimensional chart, like a transparent cube, and on the third dimension you had something like a 0-100 scalar value representing political leaning (0 for left, naturally, and 100 for right). Now the data points would be floating in three-dimensional space, but the same concept would apply, where you could draw lines between the points and figure out the correlations, such as "old people that have wealth tend to be right leaning, but old people that are poor tend to be left-leaning". Obviously three dimensional data is much more valuable than two dimensional data. You can use two data points to predict the third, with a reasonable degree of accuracy (e.g., "this guy is 60 and very wealthy, so he is probably right leaning").

Think about how much more valuable 12 dimensions would be, compared to three. How about 12,000 dimensions? Unfortunately, three dimensions is about as far as we can go with the human mind, since we can only observationally understand three dimensions.

But, in data science there exists N-dimensional arrays. And, despite not being able to observe one of these arrays in a visual chart, the same kind of math applies.

Take for example, a 5-dimensional array:

  • sex
  • age
  • married status
  • favorite color
  • is gay (self reported: true or false)

Given a set of millions of people and knowing these 5 attributes of all of them, a similar kind of mathematics can be applied to this data set to come up with a deep understanding of the relationship between these attributes. For example, your program would be able to understand things like "males between 24 and 68, who wear pink, and are not married, are more likely to be gay", or " females between 18 and 36 who wear pink or more likely to be gay than females between 36 and 48 who wear pink", etc.

This is called psychographic analysis, and it's exactly how we are so deeply understood by big tech, and it's the exact knowledge that big tech and the powers that be use against us in every fashion (e.g., trying to make us gay, etc.). The same kind of data science is precisely what Cambridge Analytica used to get Trump elected in 2016. It worked on me, despite me knowing about their use of Cambridge Analytica before the election.

This kind of knowledge gives one a lot of power.