The Four Costly Compromises in Big Data Analytics

Before developing Traintracks, a big part of our team were working with behavioral engineering and big data consulting. We worked for big companies doing everything from writing intra-cortical neural interfaces to analyzing the behavior of users en masse, and we what found time and time again is that companies are not extracting enough value from the amount of data they're collecting. Nils Pihl, CEO at Traintracks, shares his thoughts on the roadblocks that prevent companies from being truly data-driven:

Transcript:

Dealing with big data sets and doing analytics today is difficult for four main reasons - what I like to call the four costly compromises, the first being that every question you want to ask has to start in your application code. You can't ask questions about something you didn't instrument for. That means you have to make a choice between releasing early and having robust analytics and for many teams that is a very difficult choice to make.

Second, anything you do send over to analyze, you need to make sure that you have parsers that understand these incoming events and so on and so forth and that means that you have to make a choice between having reliable historical access to your data and flexibility to ask new questions and again this is a very difficult choice for a lot of teams to make. A lot of companies end up choosing to release early not having very robust analytics and the bigger a team gets the more likely they are choose to have reliable historical access over the flexibility to ask new questions.

The third costly compromise has to do with how the data is stored. When you build an in-house system, you have to take in account that different databases are good for different things, and in a sense, you have to optimize for scalability, performance and accessibility. Really talented teams might get two out of three of those, but more realistically, you're optimizing for one. You’re choosing scalability, or accessibility, or performance.

And this leads to costly compromise number four - how you’re actually going to access this data. You are either going to have to train people in how to use a domain specific language and this can be very costly to spread this expertise across an organization or you're going to have costly people that end up acting as human keyboards for everyone else that's asking questions.

comments powered by Disqus