DeepOps for Business: Building an AI-First Company

ODSC - Open Data Science
4 min readAug 11, 2020

Enterprises and large companies like Facebook have had AI-first capability for years, but it’s only recently that small businesses could make the transition. Yuval Greenfield of missinglink.ai has developed a ten-point checklist for companies wanting to make the change, giving them both AI capability and the chance to attract top talent by working on these hot projects. Let’s take a look at his talk “DeepOps: Building an AI-First Company” for ODSC’s Accelerate AI.

What Data Scientists Want

Data scientists want to build a state of the art models through an iterative process. Neural network developers move the needle through these experiments, and through massive trial and error, data scientists refine these models, driving innovation.

Unfortunately, that’s not where most data scientists spend their time. Instead, companies want data scientists to focus on the models, but the reality involves expense tracking and efficient usage.

These problems are back end problems. At many companies, data scientists are working with primitive tools instead of cutting edge AI solutions. As companies transition to AI-driven solutions, a marker of success will be the time to market, making those “expensive” tools more necessary.

So What’s the Problem?

A company has a lot of machines to run code and move data. This creates a processing issue. Data scientists are running machines at their desks using cobbled together an on-premise solution. Others are using legacy cloud systems, flipping between on-premise and scaled clouds.

It’s tricky to get things consistent. Getting creative can go well, or it can create a slowdown, blocking the pipeline when efficiency is critical. You’re wasting both talent and time.

Wouldn’t it be great if you could automate those tasks? If it involved just one button to launch?

The Risks of Experimentation

Data scientists might tweak and tweak again, retrying models without committing because no one cares about the garbage changes. Everything is fine until that one critical tweak that results in quality change, and now no one has a record or systematic documentation.

A further risk is job change. The average job trajectory for a data scientist is around two years. When your data scientist leaves, you lose a wealth of historical data, not to mention future collaboration with quality.

Sharing notes can be bloated, despite being vital to the collaboration process. However, only the automation of notes can help strike the right balance between diligence and tedium.

Building the Folder

In a typical folder, there’s a whole bunch of data with a bit of metadata. Running a new architecture allows you to drag a few files from your primary folder, allowing you to run the experiment. However, the folder system doesn’t work if you delete the folder or if you want to run many experiments.

Using a Database

A database does solve some of that issue, but databases aren’t suitable for all types of information. If you run a query once and then the same query a month later, you may not get the same results because of the database changes.

And deciding to integrate some of the data into a folder after all means you’re managing both data sources now. So is it worth the hassle to version control data?

Complicating the issue

Most companies only have version control for the code, not the model or data. It makes critical questions challenging to answer because your scientists are now hunting for answers when things fall apart.

The DeepOps Answer

What if we could take everything we’ve learned in DevOps and apply it to questions like these to transform the way we think of version control and data experimentation? If you can’t reproduce your results, the evolution of your product is lost.

As our understanding of deployment changes in the face of continuous intelligence and development, companies must be willing to accept that shipping out changes happens in mere minutes instead of months or years.

In a seriously counterintuitive understanding of this data, a 2018 state of DevOps report found that companies that take an hour or less between commit to production have a failure rate of less than ten percent while companies that take one to six months experience a massive jump in that number to over 50 percent.

Faster and more reliable? It seems too good to be true. However, the collaboration between developers and Ops teams has jumpstarted this ideal situation. Better balance between these two teams provided the chance for continuous development using the infrastructure of Ops with the innovation of development.

A Culture of Development

The biggest key for AI-transformation is creating a culture of responsibility throughout the pipeline. Each person has the keys to innovation, testing, and production. There are four key building blocks:

  • Version control
  • Test
  • Automate
  • Monitor

These core principles made it possible to increase innovation pipelines. Now, deep learning uses these core aspects, providing data scientists a more focused target.

Greenfield believes that companies should hire the right people. Companies must invest in engineers and data scientists to build these solutions.

You’ll also need to invest in the methodologies that work. Blending white box with black box solutions can put your data scientists back on growth-producing activities, neither burdening them with mundane tasks nor risking poor, unexplained roll-outs.

Greenfield’s DeepOps for Business Checklist:

Automating Documentation:

  • Code
  • Params
  • Results
  • Compare

Data:

  • Version Data
  • Query Data
  • Stream Data

Actions:

  • One-click launch
  • Job queue
  • Cost speed knobs

Checking these should help small companies make the transition to an AI-first culture.

Original post here.

Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday.

--

--

ODSC - Open Data Science

Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.