When Gotham city of Data Scientists needs you!

Sachin Tripathi
5 min readJul 27, 2019

--

Few years after the events of The Dark Knight , Gotham is in a state of peace. Batman has disappeared since the night of Dent’s death. Jim Gordon has nearly eradicated violent and organised crime. However, he still feels guilty about the cover-up of Harvey Dent’s crimes as Two-Face.

Wait wait is this the plot of “Dark knight Rises” ?

I am not sure , you decide for yourself , let me introduce the cast !

Harvey Dent :

In Gotham of Data Scientists Harvey Dent was Gotham City’s amiable and courteous district attorney and one of Batman’s strongest allies till he was given all the powers, attention and trust. When Data Scientists cared only about how to make Dent happy i.e “ ways to improve accuracy of their models

As state-of-the-art machine learning (ML) is advancing so rapidly that data scientists are focusing primarily on short term goals like model precision.

Jim Gordon :

Gordon is the police commissioner of the Gotham City Police Department. He wanted to protect Gotham “ the land of Data Scientists “ even at the cost of his life .But Something happened at the Dark Knight and he still feels guilty about the cover-up of Harvey Dent’s crimes .

He is also know by the name “online courses” and he is guilty of hiding the actual truth .

Are you ready to meet our Hero ?

Batman :

Photo by Umanoide on Unsplash

Data Scientists pretty soon, realize that the majority of data science work involve getting data into the format needed for the model to use, deployment of model ,the model being developed is just part of an application for the end user .

Not the hero we deserved ,

but the hero we needed.

They need their Batman back who is also known as “DevOps” .

Some common DevOps functionalities involve:
- Integration
- Testing
- Packaging
- Deployment

I will try to cover the deployment part ,this is the part -1 of the Deployment series , I will try to cover different aspects of deployment (like Containerisation : Docker ,kubernetes , CI/CD) in the coming blogs, so stay tuned!!

The rest of the blog will explain why you should bother learning Docker !!

Why Docker:

Docker was ranked #1 in “Most Loved Platform”, #2 “Most Wanted Platform” and #3 “Platform In Use” in Stackoverflow survey.

Only independent container platform that enables organizations to seamlessly build, share and run any application, anywhere — from hybrid cloud to the edge

To understand in simple terms first understand what is container :

Similarly to its real world example these containers are also use to hold stuffs and provide those stuffs an isolated environment also they are super portable.

In virtual container we put our application with all its dependencies which solves the well known rivalry between Devloper and Tester when the production code doesn’t run in other environment.

I will cover this in more detail in the next blog

These are the main reasons , why every Data Scientist should know about Dockers

1- Separate environments across models :

As you can put different models in different containers so that isolated environments for different models will be obtained. If you need some different environment setting for one model this solves you issue.

2- Different library requirements across models. :

Maybe one of your model needs TF 1.4.0 and other demands TF 2.0 , you can put them in different containers. Both work independently and properly.

3- Different resource requirements across models.:

Let’s say one of your model need GPUs rest can work with CPUs , you have an option to add GPUs for specific containers

4- Scaling of particular instance :

Kubernetes is mainly used for container management so if some of your instances needs to be scaled up , this can be used

5- Flask’s built-in server is not suitable for production :

If you are planning to serving ML models as a RESTful API using flask . Understand that Flask’s server is not suitable for productions.(https://flask.palletsprojects.com/en/1.0.x/deploying/#deployment).

6- If you don’t want to screw your system for installing cuda :

If you have ever tried working with GPUs, you can understand the pain

7- When Application is in different language :

If you develop ML models in python. But the application is in Go language you can expose the ml model through docker to the app .

Hopefully this overview has helped you better understand why DevOps is really important and have also provided you with brief introduction of Docker landscape.

If you found this helpful please share it on your favorite social media so other people can find it, too. 👏

I write about Python, Docker, data science,life lessons and more. If any of that’s of interest to you, read more here and follow me at linkedin and Youtube where I try to help budding Data Scientists in getting their first gig.

--

--