Plato on Analytics, and Why Data != Knowledge

If Plato was a data scientist, he'd have a lot to complain about.

In his classic masterpiece "The Republic", Plato writes about a group of prisoners who are chained in a dark cave for their whole lives.

Knowing nothing about the outside world, the only things these people are exposed to are shadows of things passing by as projected by the light of a fire outside. They are fascinated by these shadows of animals, plants, and people, and do not realize that these are not the real forms of the original, just mere phantoms.

Then one day, by chance, one of these prisoners escape.

After struggling to adjust his eyes to sunlight, he finally gets a glimpse of the real world for the first time. The man finally gets a chance to see the true forms of the mere shadows he has witnessed in the cave, and is mesmerized by the diversity of creatures, plants, and people he encounters.

To describe the man's journey into freedom, Plato writes:

"Previously he had been looking merely at phantoms; now he is nearer to the true nature of being."

Unfortunately, after rushing back into the cave, this man tries to enlighten everyone with the new knowledge he has gained, only to find that he is met with apathy, sarcasm, and ignorance.

Plato wrote this allegory with the intention of comparing "the effect of education and the lack of it in our nature." He believed that it was important to relentlessly strengthen knowledge by constantly subjecting ideas to grueling examination rather than making decisions on impulse. He compared decisions driven by instinct to being dangerously dragged along by a group of horses.

Similarly, whenever launching a new product, we can spend much of our time in a platonian "cave" -- designing features we think our users want and solving problems we think people have, limited by the little, fragile insight postulated from incomplete, fragmented data.

It's not that we explicitly choose to be in the cave, but it is just where we end up starting when we attempt to create something new. The promise of big data analytics serves as hope for us to finally get out of the cave to evolve our knowledge of the world, resulting in better informed decisions.

Of course, the ambition of learning from big datasets is nothing new -- industry and academia have long produced massive datasets at scale. However, collecting, processing, analyzing, and storing such datasets has always been a great challenge -- much of this work is unfortunately still very manual, resulting in very large financial and human costs with little return on investment. Additionally, insights from these datasets have traditionally been derived in tightly controlled ways using sampling techniques that greatly limit the scope and temporality of the acquired insight.

We often glorify data scientists and shower them with high salaries under the expectation that they will mine gold out of our data. But in reality, they spend 90% of their time and efforts on dirty and manual ETL to get terabytes of data into a data warehouse that is inaccessible to the rest of the company.

Even with the millions of dollars thrown at teams of data scientists, the truth is that terabytes of data rarely translate to terabytes of knowledge within the company.

Knowledge is classically defined by Plato to be "justified true belief." The atomic units we have at our disposal for building our justification for a belief is data -- discrete points of information we collect from sampling the environment. With data, we may never get to 100% justification of our beliefs, but strive towards it. For an employee of a company to be able to curate knowledge, (s)he must be able to propose a hypothesis as a potential belief and accumulate evidence to support the justification of this belief.

But when the reality is that the only people within the organization who can interface to the data are spending almost all of their time data munging, there is close to zero chance for everyone else to curate data into knowledge, including the data scientists themselves.

Plato would have despised being a data scientist.

comments powered by Disqus