Audit Your Analytics

Behind most projects to integrate data into a company’s decision making is an ambition to improve your business. You want to find costly inefficiencies, lost opportunities and hidden potential that will positively impact your business. Whether you’re dabbling with some SaaS analytics, or building an internal BI suite, this is hopefully what spurs you on - the idea that you could do better, that you would do better, and that there’s power in knowledge.

Sadly, many analytics endeavors end up being complete vanity projects for some ambitious but disconnected manager.

When done right, analytics is a value-generating process. The hours and dollars spent on reaching insight are more than adequately compensated for with better business outcomes, yet the reality is that analytics is rarely done right - and that it is not for everyone.

The idea behind big data is simple enough: At a large enough scale, when collecting billions of events and terabytes of data, even one-in-a-million insights become predictable and reliable assets in your company’s arsenal. We’ve seen it work for big successful companies like Target, and thousands of blog posts and panels and articles proclaim the bright future of the big data market. The sheer size of our modern marketplace has tilted the odds of finding useful and actionable patterns in our favor.

20 years ago, very few could dream of having tens of millions of active users, let alone having their behavior and customs recorded and stored. The market has changed, our attention has shifted - but in our heart of hearts we know that comparatively little innovation has followed our increased interest in big data. Data-driven design is still incredibly difficult to effectively incorporate into your workflow, and far from every company manages to get a return on their big data investment. Analytics is still difficult because we’re stuck in a paradigm that is as old as I am.

There’s this joke about SQL that made me chuckle:

A SQL query walks into a bar, confidently approaches two tables and asks “May I join you”?

It’s also humorous to note that there’s a good likelihood that this joke is older than I am.

For those that paid attention, the last 20 years of technological development has been truly remarkable. We’ve gone from floppy disks to the app store, from dial up to broadband, from DOS to iOS, from command line to touch screen - but SQL is still SQL.

Imagine a world where the graphical operating system never caught on, where business was still conducted primarily in DOS and other command line interfaces. Remember, or imagine, the frustration that came with needing specific training to use some of the most instrumental tools of your trade. Imagine hiring people to operate those machines, instead of buying machines to improve the productivity of the people you hire. It was a nightmare then, and it is nightmare today, because that’s exactly where we are today with analytics.

Remembering that we want to find costly inefficiencies, lost opportunities and hidden potential it is ironic that we often fail to have the ambition to audit our analytics the same way. One of the perennial problems with analytics projects is that most of the costs are hidden away from the people who build them.

How many little needles are there to move in your company, and how often do you get to move them based on data driven decisions?

We builders are often obsessing with implementation details that affect our domain, but the not necessarily the businesses we work for. We compare databases, SDKs, noSQL to SQL, Hadoop to Spark, and we pride ourselves in building faster and more performant (and cost effective) systems - failing to consider the costly inefficiencies of having people interface through engineers, the lost opportunities of questions not asked, and the hidden potential of giving the people best suited to make decisions at your company the freedom to truly explore the data without our well-meaning engineering supervision.

Analytics is not about finding some silver bullet that magically transforms your business overnight, it’s about hundreds of questions leading to many small ways to move the needle just enough for it to matter. How many little needles are there to move in your company, and how often do you get to move them based on data driven decisions?

One particularly ghastly company we had the comedic fortune to interact with proudly announced to us that their new in-house system could answer almost any conceivable question in a matter of minutes, across billions of daily events. When asked how people would interact with the system, the confident engineering team told us that the company had plans to educate 100 non-technical staff on how to use SQL within 12 months - and the greatest part is that the system was built entirely with open source components, and it didn’t cost a penny!

In 2013 the McKinsey Global Institute projected that there will be a shortage of 190 000 data scientists by 2018.

It didn’t cost a penny… This is exactly what the overlooked problem is. Apart from the 10 or so engineers that had worked on the problem for about 15 man-years (coming in at over 2 million dollars), and the time it took to re-instrument some of their applications (some additional tens of thousands of dollars) and maybe two total man-years on the ridiculously ambitious project of teaching a hundred non-technical staff members, the project didn’t cost a penny - and let’s not forget hardware that this cobbled together in-house system required to run. How soon will this company move enough needles to get a positive ROI on what might well have been close to a 5 million dollar investment? Last we heard, there were still not a 100 non-technical staff using SQL at the company.

The problem is that organizations are often blind to the human costs of analytics, as if though we’ve already accepted that analytics is difficult and it is going to take experts to deal with it, there’s no way around it.

In 2013 the McKinsey Global Institute projected that there will be a shortage of 190 000 data scientists by 2018. There are already 1000 job openings for data scientists in San Francisco alone.

This tells us two things:

1) We want to make data-driven decisions.

2) We’ve made ourselves painfully reliant on data scientists, because data-driven decision making is hard.

I submit to you that the problem is that data scientist is a human keyboard, not unlike the telephone operators from the infancy of telephony, and they operate between you and your data, between you and your power to make data-driven decisions - and by the year 2018, there will be a shortage of 190 000 data scientists. This is not a sustainable state of affairs. Find the real bottlenecks in your company, and get in the habit of reminding yourself that insight is meaningless without impact. Aspire to have dozens of decision makers move the needle every day, so that you don’t have to move mountains later.

comments powered by Disqus