Is Open Source Alive in China?

Steve Moore
Inside Machine learning
6 min readSep 5, 2017

Thankfully the answer is yes — with some caveats.

But first some background…. China has been and still is a complicated place for many reasons, but it’s hard to deny its economic transformation in the last few decades. As just one example, over the last 25 years, the country has sustained a GDP growth rate well over 6% — a staggering run.

And under President Xi Jinping, the last five years have seen a careful effort to sustain that growth by transitioning the county’s focus from manufacturing to consumption and services — notably around technology.

The upshot is that Chinese software developers are playing a larger role in the global world of code, not only at companies like Baidu, Alibaba, and Tencent, but at thousands of smaller tech firms wrestling with transportation, healthcare, energy, e-commerce, and more.

All that makes this a key moment to look at China and open source.

Promising Progress

Happily, Chinese government agencies and corporations have been leveraging open source for a long time, especially since the mid-2000s when the arrival of Android (and Android’s inclusion of Linux) exposed a generation of developers to open source code — and, just as important, exposed them to the idea that that they could use open source as a foundation to innovate.

We know the number of organizations who use some aspect of open source code is increasing year by year, especially when it comes to the adoption of big data infrastructure that typically outperforms proprietary options. Obviously, that ability is especially key in the high-storage, high-throughput arena of modern China, with its estimated 720 million internet users.

When it comes to China’s own contributions to open source projects, things were slow for a time but are beginning to take off. Giants like Huawei, Baidu, Alibaba, and Tencent have recently contributed significant code around cloud infrastructure and machine learning. Unfortunately, those big-banner contributions obscure a deeper trend: In general, open source contributions from China — while impressive — still pale in comparison to contributions from the US and Europe:

A Look at the Roots

By now, it’s a familiar tale: the tradition of open source grew out of academic environments that thrived on collaboration and that spirit was inherited by businesses that saw how implementing open source code low in the stack freed up time and energy for innovation higher up. In the US, the tradition created a culture where developers can reasonably expect to be compensated — with a combination of money and reputation — for the contributions they make to open source projects.

Unfortunately, that expectation of reward has yet to penetrate as deeply into the developer community in China. Researcher Matteo Tarantino notes that:

China did not have the benefit of this formative phase. Instead, it imported a mature model and tried to integrate it through strong top-down support. Results have been mixed: Chinese developers have yet to be provided with an organic system of incentives for FOSS [free and open-source software] contribution in the form of recognisable career benefits, social prestige or symbolic payoffs.

It’s possible that the tide is beginning to turn, but there looks to be more work to do — especially in the arena of machine learning.

Machine Learning In Particular

The lower rates of contribution from China impact open source projects across the board, but let’s consider the areas of machine learning and AI.

Why should those areas be of particular interest? As several recent articles point out, China is surging forward as a force in the world of machine learning — especially in deep learning. Not surprisingly, Baidu is deeply involved with direct backing for several high-profile initiatives from the Chinese government, most recently the opening of the National Engineering Laboratory of Deep Learning Technology on Baidu’s own campus in Beijing. It’s promising that Baidu has been a substantial code contributor, for example when they announced late last year that they were open-sourcing their ML platform, PaddlePaddle, under an Apache license.

Before long, we might see the first open source project initiated in China from scratch, possibly related to machine learning in particular. It would certainly serve to demonstrate a deepening participation.

Why It Matters

To the extent that Chinese firms have been slower to contribute directly to open source, it’s easy to see the logic. They’re paying higher and higher salaries to differentiate their work from competitors. Won’t giving code away mean sacrificing their edge?

To be clear, no one expects organizations to give away the code that truly differentiates their work. And it’s worth being upfront that contributions to open source aren’t always simply gestures of community altruism. When organizations give away code, some part of the motivation can be strategic.

As mentioned above, open source code low in the stack can allow for innovation higher up. At that same time, we’ve seen that the top of the stack is always climbing. That creates a dynamic where those who open source their code can have some influence on where and how the stack grows.

Consider an organization that chooses to withhold and copyright some novel infrastructure code from open source. These days, there’s a good chance that soon enough someone — possibly a competitor — will open source code that serves a similar purpose. If that new, freely available code functions well, the original organization could stand to lose customers — while the competitor who released the code might be able sell products, services, or consulting that leverages their deep knowledge of the infrastructure that they made available. The contribution also earns them the opportunity to advocate for specific additions and modifications to the code as it evolves.

These are lessons that big Chinese firms seem to be learning fast, but the dynamics are no less relevant for small and midsize firms.

Briefly Back to Altruism

Open source code has now been around for decades and seems to be going strong. But it’s worth taking a moment to consider how unusual the open source universe really is: Thousands of individuals offer up their valuable time as coders, testers, project administrators — and sometimes referees — to create software that’s freely available to all. That’s both fantastic and bizarre.

And it’s hard to overstate the influence that open source has had on the speed and efficiency of software development over the last 40 years. Without it, we’d likely live in an endless tangle of over-written, fragile, and incompatible code. (Well, even more than we do already.)

But the open source ecosystem is more fragile than we imagine. It depends on a deep, good-faith agreement that giving keeps pace with taking. Without that agreement, resentments could very well collapse the system, as happens so easily in prisoner’s dilemma scenarios.

As Chinese firms and developers grow their influence in the global world of code — including their influence as investors in American firms — the whole community will do well to continue the efforts of education and encouragement.

The stakes are high, but with some foresight and care, we can keep open source alive and thriving.

+++

Learn more about IBM’s own efforts with open source and open data — especially on behalf of Apache Spark™ — with this quick video overview.

--

--

Steve Moore
Inside Machine learning

IBM Story Strategist. Machine Learning researcher. Speaker. Teacher. Opinions are my own.