Big Data in Little China

When we started consulting in China back in early 2011, the state of things were simply bizarre to a newcomer. In many ways, China has a completely different internet - not only because of the Great Firewall’s access restrictions or differences in language, but because the usage characteristics are fundamentally different. The internet reached mainstream adoption in the West many years before China, driven by economic factors that led to earlier mass adoption of home computers, whereas internet adoption in China was driven by the success of mobile.

I remember struggling with the prejudiced idea that China simply copies Western technology - because the environment something exists in ultimately shapes its character. Weibo was and was not a Twitter clone, Alibaba was and was not eBay. Initially, looking for similarities so that I could pat myself on the back and feel a bit more secure about my home when faced with China’s massive growth prevented me from seeing the things that are truly different out here.

Made in China

My first serious wake up call was when we consulted for Weibo and became privy to some of the industry’s less obvious traits. At the time, it turned out that three of the titans in China had to put their enmity and competition aside to collaborate on big data projects - there simply wasn’t enough Hadoop talent to go around, so these three companies quietly pooled their resources and built in-house analytics systems on older versions. There was no comparing Weibo’s data performance to that of Twitter, not at the time. Jealously and reverently looking at the success of Storm at Twitter and its almost real-time prowess, engineers at Weibo ran overnight batch jobs to handle the three terabytes of daily data that their own microblogging platform generated.

But this deficit in data talent turned out to be a blessing in disguise. With almost no sunk costs into the Hadoop paradigm, China adopted Spark and streaming at an incredible pace. In 2013 we closed down our consultancy in favor of developing Traintracks full time. We were one of the very first companies to touch Spark, and we solemnly concluded that we’d have to leave Beijing in favor of Silicon Valley - how could we ever scale up our engineering team here in China? We prepared to move back to the States within a year, but by the time summer of 2014 came around it was impossible not to notice one of the most important trends in big data:

Data is bigger in China.

Early in the summer of 2014 we were approached by one of the former founding members of Tencent and his VC firm, and we were offered a surprisingly generous valuation for a company that hadn’t even released their product - especially in China. How could our prototype be worth 10m pre-money, several times bigger than the typical seed deal in China at the time (and almost three times bigger than the average seed deal in SV that year)? Although we declined the offer, we asked ourselves what this Tencent founder knew that we didn’t.

It’s easy to forget how big the Chinese internet is, and just how much data there is to capture.

A few months later we heard of a 9000 node Spark cluster at Tencent, one of many gargantuan implementations that appeared in China almost overnight. We sponsored a few of the very first Spark meetups in Beijing and were surprised to see over a hundred talented engineers taking the time to meet up an hour outside of town to hear our thoughts on the future of the Spark ecosystem. Some of them were even going to work for Databricks, the company founded by the makers of Spark, and we quickly learned that Silicon Valley was importing large numbers of Spark engineers from China.

It’s easy to forget how big the Chinese internet is, and just how much data there is to capture. My mobile provider in China has more subscribers than there are people in North America, and life in China is constantly plugged into the mobile internet.

The future of big data is being built here in the #BeiArea, not in Silicon Valley.

Going back to the West for business now is taking a step back in time, where services are not as advanced, life is not as convenient, and where the innovators of years past are struggling to copy the massive success of the WeChat platform. The Chinese internet, for all its flaws, now has many serious legs up on the internet in the West - and the future of big data is being built here in the #BeiArea, not in Silicon Valley.

When our client PengPeng has a good day they account for a very significant portion of all the HTML5 traffic in the world. When we started working with dating platform Tantan, they were a brand new app but still had 50 gigabytes of server logs every day - and grew to half a terabyte a day within half a year.

Meanwhile, a West that has had its imagination damaged by years of misguided Hadoop use continues to invest in a paradigm that was dead before it arrived - and even well-respected venture capitalists like Andreessen Horowitz say things like “Mixpanel has solved big data” - not understanding at all what big data actually means in the context of the world outside of their startup bubble.

comments powered by Disqus