The Only Thing That Matters in Machine Learning is…

Picture from newgrounds.com

The hashtag hot trend in machine learning is giving away stuff for free. Tech companies have always been advocates of the open-source community and are happy to release parts of their code as open-source. Over the last year, however, the big players in machine learning have given away complete codebases. Google made its TensorFlow open source and Facebook gave away its optimized deep learning modules for Torch, another open-source library. Then, Microsoft released its Distributed Machine Learning Toolkit (DMTK) for free and, not to be outdone, IBM open-sourced its SystemML platform.

These developments have explicitly confirmed what observers already know; tech companies no longer see software and algorithms as valuable assets to keep proprietary. The most valuable asset, today, is data. The second most valuable asset is the talent to use this data.

2015, the year of open source

Facebook — Deep learning modules for Torch

In January, Facebook was the first to open-source its machine learning code. Facebook’s artificial intelligence (AI) efforts are run out of its AI research lab known as FAIR. In the lab, Facebook uses Torch, an open-source developer toolkit for machine learning tasks. Torch is used by numerous companies including Twitter, NVidia, AMD, and Intel. Torch has been best applied to deep learning and convolutional neural nets, which have been successful in understanding images and video. Earlier this year, Facebook made its optimized deep learning modules open-source. These modules are significantly faster than the default modules in Torch and allow developers to train larger neural nets in less time.

IBM — SystemML

In June, IBM — a company synonymous with AI with its Deep Blue and Watson systems — recently contributed SystemML, its machine learning platform to the fastest-growing open-source community, Apache Spark. IBM will offer Spark as part of its broader IBM Bluemix open cloud technology platform.

Google — TensorFlow

In November, Google released TensorFlow for free. TensorFlow is Google’s second-generation machine learning system, replacing DistBelief. The system represents computations as stateful dataflow graphs, making it easy to run networks across multiple machines with different hardware. Developed by the Google Brain team, including deep learning legend Geoffrey Hinton, it’s used in various Google products including Gmail and Photos. Its most high profile use is in the RankBrain system, Google’s AI engine that handles a substantial amount of Google’s search queries.

Microsoft — Distributed Machine Learning Toolkit (DMTK)

Finally, in November, just 3 days after Google, Microsoft open-sourced its framework and algorithms for distributed machine learning. The DMTK is designed to allow machine learning tasks to be easily scaled. The toolkit also includes LightLDA, an efficient algorithm for topic model training, and Distributed Word Embedding, a tool for natural language processing.

Software prices tend to zero as the value of data rises

Machine learning tools are making it easier to understand the abundance of data that is being collected. Deep learning techniques are enabling systems to learn from unstructured data. Much of the real world is messy, complex, and rarely fits nicely into the rows and columns that traditional approaches to intelligent machines, software, and databases require. Videos, unlabeled text, and voice are all being analyzed by systems that can now infer context, making insights more accurate and valuable.

“While laggards in the industry debate the merits of on-premise servers versus cloud services and struggle to merge vast numbers of databases, technology leaders are pushing further ahead.”

Intellectual property is being handed over to the open-source community to use as they want. As most companies are just beginning to devise their Big Data strategies, Google, Facebook, Microsoft, and IBM have devised their strategies, built Big Data and machine learning tools, and are now giving them away for free.

Most companies consider their proprietary software to be a competitive advantage and how they provide value to customers. As traditional hardware companies are slowly trying to become software- and services-based companies, the ground beneath them has shifted.

Telcos are trying to adapt to a world of software-defined networking rather than routers and switches, and manufacturers are moving from providing tools and widgets to usage analytics and predictive maintenance. As they arrive in this new dawn of software and services with the promise of fat margins, they will find it was a mirage. Software on the Internet has almost zero marginal costs. Prices will trend to zero. The real value is data.

Using machine learning tools is hard

Google, Facebook, Microsoft, and IBM have not given away all of their software. Google, Microsoft, and IBM also have machine learning platforms through which they offer machine learning APIs to paying customers. These companies want to attract developers to build on their platforms to make it more valuable. They are open-sourcing their tools basically so developers can learn how to use them. This is great for future hiring and it fosters a thriving developer ecosystem.

Valuable platforms attract users and developers. Developers have limited resources and will only allocate resources to platforms which generate the greatest revenues. This is why small developers build iOS apps first, Android apps second, and Windows Mobile never. Platform dynamics are winner-takes-almost-all. Companies can court developers, pay them to build for the platform, and take a lower cut of sales; but if the platform doesn’t have users, it doesn’t matter. See Windows Mobile.

“The challenge for non-software companies trying to build platforms for their own customers is that open-source is not part of their culture.”

Customer value is created with machine learning applications from third-party developers providing new innovative services. To get developers on board, open-source will be the only way. Data will be the only sustainable competitive advantage.

Recent advice to the industry has been to move away from making physical things and to making digital things. However, charging for digital things on the Internet is harder than ever. With machine learning, making digital things is not even enough. Companies need to give away the digital things. This will be a bitter pill to swallow for the management and boards of many companies going through a digital transformation.

The only thing that matters now is data.

Like what you read? Give Lawrence Lundy-Bryan a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.