Data Science Mystery — How to move from“Death in Dev” to “Prove in Prod” ?

5 min readApr 13, 2020

Algorithmia, which found according to the “The findings of the 2020 [State of Enterprise Machine Learning] study” that while machine learning maturity in the enterprise is generally increasing, the majority of companies (50%) spend between 8 and 90 days deploying a single machine learning model (with 18% taking longer than 90 days). Most peg the blame on failure to scale (33%), followed by model reproducibility challenges (32%) and lack of executive buy-in (26%).

The majority of the work done in data science is dying in dev without getting promoted to production because of the following:

Lack of Data Science Skills
Lack of an environment which caters to the ask of a Data Science project
Lack of Model explainability
Turnaround time from the business requirement to model evaluation
It’s a world of open-source — Who takes the responsibility to maintain, upgrade and fix issues?

Gartner reported in January that AI implementation grew a whopping 270% in the past four years and 37% in the past year alone. And according to the McKinsey Global Institute, the subsequent labor market shifts will result in a 1.2% increase in gross domestic product growth (GDP) for the next 10 years and help capture an additional 20% to 25% in net economic benefits — $13 trillion globally — in the next 12 years.

There is a definite future. Hence the need to be addressed and done the right way.

Let us look at the Data Science Lifecycle.

The CRISP-DM model (Cross Industry Standard Process for Data Mining) has traditionally defined six steps in the data mining life-cycle. Data Science life cycle incorporates all these six steps + more.

The CRISP model steps are:
1. Business Understanding
2. Data Understanding
3. Data Preparation
4. Modeling
5. Evaluation and
6. Deployment

What are the two additional steps in a Data Science Life Cycle?

MLOps:

7. Monitoring — Drift/ Bias Detection
8. Feedback — Real-time De-biasing and Model Tuning

Do enterprises just look for a platform that provides capabilities to achieve all these 8 steps?

Yes in terms of processes and more in terms of capabilities…

What are these additional capabilities?

9. A platform that bridges the gap between citizen Data Scientists and Experts — Auto ML, Data Prep Recommendations..etc.
10. Explainable Models — Not just locally but even Globally (better than what LIME/ SHAP can do locally)
11. Native Big Data Execution Environment (Apache Spark is a good example)
12. Scalable and affordable Infrastructure (Money always matters in Data Science)
13. Model portability — Host anywhere, no vendor lock-in (Because of the Multi-Cloud world we deal with)
14. Governance and Access Control

Based on this can we consider the following as KPIs for a Data Science Platform?

Oracle Cloud Infrastructure(OCI) Data Science

OCI Data Science is a collaborative, scalable and a powerful Data Science platform that provides the following

Scalable Infrastructure
Powerful and diverse compute (Intel Xeon, AMD, NVIDIA Tesla Pascal/ Volta GPU)
Easy Environment Setup
Collaborative Workspace/ Shared Environment
Jupyter Lab IDE
IAM based Access Control + OCI Governance Capabilities
Model Catalog
Transparent Pricing — Only charged for the Compute & Storage used. Turn on/ off based on the requirement

And most importantly, a homegrown SDK that is provided free

6. Accelerated Data Science (ADS) SDK

Accelerated Data Science (ADS) SDK

ADS SDK helps Data Science teams to innovate faster. It provides capabilities for

a. Data Connection (Oracle DB, Autonomous DB, MySQL, Object Storage, AWS S3, SQLLite…etc.)

b. Data Manipulation (Profiling, Correlations, Feature Selection, Recommendations..etc.)

c. Native Dask support (If you are interested in Dask then please visit https://towardsdatascience.com/why-every-data-scientist-should-use-dask-81b2b850e15b)

d. ML Framework Support (Tensorflow, Keras, XGboost, and scikit-learn..etc.)

e. AutoML

f. Model Evaluation

g. Model Explanation (Oracle MLX — Global & Local)

Now we know what OCI Data Science is capable of. Let us look at the journey from development to production in OCI Data Science.

The journey from Dev to Prod — OCI Data Science

**OCI Functions (Oracle Functions is a fully managed, multi-tenant, highly scalable, on-demand, Functions-as-a-Service platform. It is built on enterprise-grade Oracle Cloud Infrastructure and powered by the Fn Project open-source engine). This helps us achieve model portability since the function artifacts can be ported to any other function as a service provider that is powered by Fn Project.

**OCI API Gateway (The API Gateway service enables you to publish APIs with private endpoints that are accessible from within your network, and which you can expose with public IP addresses if you want them to accept internet traffic)

How does OCI Data Science help in moving from “Death in Dev” to “Prove in Prod”?

First, let us see how OCI Data Science map to the Data Science Platform KPIs

OCI Data Science helps in

a. Reducing Cost

b. Access to more data

c. Reducing Time

d. Increased Security

e. Increased Flexibility

f. Increased Trust

and thereby helps reduce “Death in Dev” and makes way for you to “Prove in Prod”.

Welcome to the world of Data Science done right!

The views expressed are those of the author and not necessarily those of Oracle. Contact Deepak Sekar

Additional Resources

Migration to the Cloud Made Simple

No results found Your search did not match any results. We suggest you try the following to help find what you’re…

www.oracle.com

“https://www.oracle.com/a/ocom/docs/cloud/oracle-cloud-infrastructure-platform-overview-wp.pdf”

Cloud Compute

No results found Your search did not match any results We suggest you try the following to help find what you’re…

www.oracle.com

Cloud Infrastructure for Data Science | Oracle

No results found Your search did not match any results. We suggest you try the following to help find what you’re…

www.oracle.com

Data Science

Oracle Data Science is a platform for data scientists to build, train, and manage models on Oracle Cloud Infrastructure…

docs.cloud.oracle.com

Oracle Accelerated Data Science SDK (ADS) — ADS 1.0.0 documentation

Oracle Accelerated Data Science (ADS) SDK The Oracle Accelerated Data Science (ADS) SDK is a Python library that is…

docs.cloud.oracle.com

Overview of Functions

Oracle Functions is a fully managed, multi-tenant, highly scalable, on-demand, Functions-as-a-Service platform. It is…

docs.cloud.oracle.com

Data Science Mystery — How to move from“Death in Dev” to “Prove in Prod” ?

Migration to the Cloud Made Simple

No results found Your search did not match any results. We suggest you try the following to help find what you’re…

Cloud Compute

No results found Your search did not match any results We suggest you try the following to help find what you’re…

Cloud Infrastructure for Data Science | Oracle

No results found Your search did not match any results. We suggest you try the following to help find what you’re…

Data Science

Oracle Data Science is a platform for data scientists to build, train, and manage models on Oracle Cloud Infrastructure…

Oracle Accelerated Data Science SDK (ADS) — ADS 1.0.0 documentation

Oracle Accelerated Data Science (ADS) SDK The Oracle Accelerated Data Science (ADS) SDK is a Python library that is…

Overview of Functions

Oracle Functions is a fully managed, multi-tenant, highly scalable, on-demand, Functions-as-a-Service platform. It is…

Written by Deepak Sekar