What is Open Resource and why is it necessary?

AIN Cloud beta release posting series (1/2)

DS
4 min readFeb 11, 2019

Language: English / 한국어

  1. What is Open Resource and why is it necessary? (This article)
  2. Experience Open Resource with AIN Cloud beta
Original Photo by Fancycrave on Unsplash

‘Son, I heard that you can write a document with this machine, called a computer. Show me how.’

Above is what my parents, who had only basic computer knowledge, asked me. Of course, it would have been easy to buy a prebuilt, brand name computer, but, because I was unsure of whether they would use it often, I could not justify spending so much. So, I purchased the various components and assembled the computer myself; and after installing the operating system and software such as a word processor, it could be used by the ‘user’ — my parents (although I still had to teach them how to use it…).

The issue of the execution environment

However, my parents are not the only ones who experience such difficulties with computers. Even developers and researchers involved with artificial intelligence or machine learning, who would seem to have mastered computer technology, experience problems sometimes. In fact, in machine-learning communities like Reddit, questions about specifications or code errors come up almost every day. These errors can generally be classified as execution environment problems, which can be divided into two broad categories:

1. Complex and diverse execution environment combinations

Github, the world’s largest open source sharing site, publishes statistics and rankings every year. In 2018, the 8th and 9th most popular tags and the 2nd and 3rd fastest growing subjects were keywords related to machine learning(link). The 2017 statistics(link) show that 25 million source code repositories were created over the course of that year. This indicates that there is a fair amount of code that can be freely used and modified.

However, due to differences in both problem definition and coding preferences, these Github projects are often rigid and lack conformity to best standards and practices. As such, each project has its own set of different software execution environments, including different types and versions of operating systems, programming languages, libraries, and frameworks.

In addition, the compatibility of various combinations of hardware such as CPU, GPU, memory, motherboards, cooling devices, and power supplies must be considered when creating a software execution environment. This often leads to developers ending up in a situation where configuring the execution environment for machine learning is more difficult than the actual machine learning problem itself!

2. Scalability of execution environment

In addition to the complexity of execution environments, scaling these environments in order to handle increasingly complex machine learning problems poses a serious challenge for developers.

A good example of these scalability issues is the BERT study, published in October 2018 by a team of Google researchers. In this study, 20 Google-developed Cloud-TPU were run for 4 days, just for pre-training the network. To put that in context, if you wanted to run BERT yourself, you would need to buy 8 TESLA P100s for about $6,600 USD, and run them constantly for over a year! Please refer to the article written by our AI Network development team for more analysis and meaning of BERT(link).

As such, this latest research trend assumes that developers have a tremendous number of computational resources. This makes it almost impossible for individuals or small organizations to reproduce or improve those findings. In addition to the high cost, the environment specifications for these specialized projects are extremely rigid and take considerable time to configure properly. This makes these environments non-reusable for other machine learning projects.

I’d rather just go Google or Facebook…

Open Resource lets you focus more on the essence

The essence is solving an important problem by developing a better machine learning algorithm; developers shouldn’t have to spend most of their time and energy on environment configuration problems which have nothing to do with actual machine learning challenges they are trying to solve.

But what if you could run and modify the source code when needed by paying only for the necessary code? It would be like paying for the use of a computer that comes preinstalled with the programs you need.

Furthermore, if only the person who originally created the source code provides the execution environment, the resources required for execution would be insufficient for multiple requests. In that case, if the source code could be provided alongside the execution environments and profits can be distributed among resource providers and source code developers, there would be no such problem.

As such, last August, our AI Network team presented our Open Resource Vision, which is to build an ecosystem that allows source code + execution environment creators, providers, and those who need to freely exchange needs and supply.

You can check the technical background via this article (link).

--

--