Big Data: Evolution vs Intelligent Design

So you have the ambition to make data-driven decisions. You want concrete knowledge about how your users interact with your product, or shed some light on some dark nook of a process, and you’ve made the commitment to your company and culture that decisions will be based on informed opinions, hard facts and a respect for science. Great. Where do we go from here?

One of the first, and most important, decisions you’ll have to make is how to capture and store the data. There are plenty of grey and grizzled experts with impressive battle scars that will tell you why SQL is ultimately the right choice, and they’ll be met with fierce resistance from ambitious and agile minds that want to roll up their sleeves and just do it.

Either party have both compelling and contentious arguments, but I submit to you that the decision is much more difficult to make than slogans, talking points and StackOverflow votes would have you believe.

The decision is difficult, because building the system is only a small part of your company’s endeavor to become data-driven. Once you’ve squeezed the data into your database of choice, you’re still left with the problem of turning that data into insight that allows your team to perform better. Insight gets you nowhere without impact.

One of the great ironies of the debate is how much effort is spent emulating the benefits of the other paradigm.

Choosing between SQL and noSQL should not start with you assessing the features and solutions of either paradigm, but you taking a long and honest look at your culture and your ambitions.

Don’t start with “What questions do we want to ask of our data”, but rather with “Do we already know what questions we will want to ask tomorrow” and “Do we already know who the stakeholders are going to be”?

The promise of SQL is logic, structure and the performance that comes with knowing in advance where everything is - and it’s commensurate weakness is its upfront cognitive cost and its inflexibility. SQL is for the intelligently designed system, where thought has been spent on the details by a well-meaning pantheon of data engineers.

NoSQL tempts us with its flexibility and almost promiscuous acceptance of data, and allows evolving formats to introduce new questions and new ideas into a vibrant and changing cultural ecosystem - with all of the junk DNA and inefficiencies that come with survival of the fittest rather than designing the perfect.

But noSQL’s ability to handle format evolution is in many ways also its undoing.

But noSQL’s ability to handle format evolution is in many ways also its undoing. The real costs of noSQL don’t start truly taking their toll until you’re struggling to preserve precious old questions and insights with the introduction of new formats. The effortless introduction of new formats leaves a trail janitorial work behind it, where engineers have to balance the needs of different stakeholders just as much as they had to in SQL.

One of the great ironies of the debate is how much effort is spent emulating the benefits of the other paradigm. How many years and investment millions have we not collective spent making SQL a bit more flexible, or noSQL a bit more structured?

Ultimately, the choice between these paradigms must be informed by your team’s operational needs more than your engineers’ technological preference.

We built a backend for in-house systems, based on a new paradigm of data management that gives you the power of mutable schemas on immutable data. When each stakeholder is free to interact with a single source of truth on their own terms, formats can evolve to fit your needs without your team losing the ability to intelligently design a system that works for them.

comments powered by Disqus