Data Science industry perspectives in the cloud

JUNE 8, 2017

In principle, the cloud is becoming commoditized. Such corporations like Google and Amazon are trying to commoditize almost everything. This is also true for the instruments of data gathering, data storage, and data transformation. And in the nearest future, we should expect that platform engineering, that is overly complex now, will become simpler.

On the part of Data Engineering, the prospects of such simplification in the nearest future is not that clear. The reason is that right now we have no single simple solution even for ETL (Extract Transform Load). For example, the Amazon Data Pipelines or Amazon Glue (that hasn’t been even fully released). They represent some very squalid beginnings of what people are building up right now brick-by-brick. This is something that is built by Data Engineers and Architects and other professionals to solve real-life cases.

From the Data Science point of view, new Machine Learning as a Service solutions pop up like mushrooms after a spring rain. There is a multitude of them on the market: Big ML, Data Robot, Azure ML, Amazon ML, IBM Watson, Google Prediction and so on. But what is Machine Learning as a Service (MLaaS)?

Let’s look at IBM Watson Analytics as an example:

We are talking about the pre-set wizards to create Machine Learning models and the ways to use these models through API. For example, we need to recognize text and separate some facts and data out of it. To do that we initiate the service in a couple of clicks, insert the original text and get back a marked out and analyzed copy. This means that the service commoditizes NLP (Natural Language Processing). The same goes for VTT (Voice-to-Text). We insert an audio file and get the text version of the talk back.

There are other uses of Machine Learning technology:

Machine Learning closely works with data. But one thing is to have data, and the other thing is to have actionable data with which you can help users. As seen from the chart above one of the key trends is the commoditization of data services.

These basic services do what they are supposed to, and this is great. And you don’t need to employ a data scientist to do some of these tasks. But if you are a company like Grammarly whose know-how is in actual text recognition, provision of recommendations and text analytics, ready-made services are insufficient. So they have to take a hands-on approach that will be very expensive. This is their core business and using third-party services is not viable.

So how would you go about using ML Cloud services?

Even taking into account a large number of “as a service” solutions the main challenge is in navigating and understanding “what is better” and “what is best,” and “how it all works.” The issue is that such services are constantly changing. The best choice for you may be to employ tools consulting services. It is required to understand what is there on the market right now and what tools would better suit your business. This is the most basic but essential help you may need.

Secondly, integrating your operations with these services requires particular engineering efforts. Indeed, most of the Data Science tasks right now are solved through these cloud “as a service” solutions. But having a proper consulting company with a strong engineering background can help you better utilize such ready-to-use solutions. They offer value to the business at a faster pace. There is no need to invent the wheel, especially when your consultant knows what and where to use.

Amazon Machine Learning

If you decide to go to a data science firm where the majority of employees are data scientists, they will create for you some unique genius custom solutions, do great research and so on, and so forth. This is what they are good at. This is their bread. Truly, if they will tell you to go and plug-in a ready-made service, this will look like cannibalism on their part, meaning that they will go out of business.

In turn, as your partner, an end-to-end consulting company’s goal is to bring value to your business. They are willing to cut corners using ready-made services. It is important to note that just cutting corners is not enough. Having a deep understanding of how these services work is key to providing productive and effective solutions. Such consultancy company should know how to create and build proper architecture. This way you can rely on them at any moment of time to substitute one service with another when needed. They should also provide custom development work that is unique to your services for your business to scale.

Finally, evolution is key if you plan to grow your business. It is better to start with the readily available solutions. In due time you can switch to in-house solutions that can be done with the help a group of data scientists within this consultancy. And here we come to an interesting point when out-of-the-box solutions do not work from the start. You can read about this issue in our next article.

STAS IVASCHENKO

Senior DevOps Engineer, Data Science Analyst at SQUADEX