Web Frameworks for Data Scientists, and Why You Should Care.

Matthew Yates
The Startup
Published in
5 min readSep 7, 2020

Some motivation for data scientists to learn a web framework.

Introduction

I want to start by noting that this is a bit of an opinion article, so take it with a grain of salt. At the very least, I hope it gets people thinking and acts as food for thought. With that, on with the show!

Full Stack Python has a fantastic article on web frameworks. Probably the most powerful quote is:

Web frameworks encapsulate what developers have learned over the past twenty years while programming sites and applications for the web. Frameworks make it easier to reuse code for common HTTP operations and to structure projects so other developers with knowledge of the framework can quickly build and maintain the application.

Now that we know what a web framework is, why should Data Scientists know more?

The Trouble with Implementing Models

“I’m a data scientist. Why should I learn a web framework?”

While it’s great to build models, they’re no good to anyone if you can’t get them into production. Companies like VentureBeat, Redapt, and others report that ~90% of machine learning projects don’t make it to production. Why is this true? Let’s first take a look at what kind of team is required for a machine learning project.

Machine Learning Team Breakdown

Image created by the author in MS paint

Machine Learning projects need 4 teams to succeed. Data Scientists, Application\Web Developers, Data Engineers, and MLOps\DevOps. If one of those pieces is missing, then you have people wearing multiple hats (i.e. playing multiple roles).

So let’s think about where the team could potentially use reinforcement.

Data Scientists need data to do their job. If you have Data Scientists, then their relationship with Data Engineers should be strong. Why? Well, Data Scientists can’t do anything if they don’t have data to analyze. So the connection between Data Scientists and Data Engineers should be strong (if not, then you’re really in trouble). The same goes for Data Engineers and Application\Web Developers. Data Engineers can hardly do their job if they can’t get data from front-end systems. So the right side of the Venn diagram is generally pretty strong. If the right side is not strong, seems logical to assume your company’s business intelligence program is in its infancy. So you’re either in a development phase, or you’re really in trouble.

The middle and left sides of the Venn diagram could be common pain points. In general, Data Engineering and BI have been around for a while. Data Science is an old job (e.g. statistician, statistical modeler, predictive analyst, etc.) with a new title. So the flow of data from an application, to data engineer, to analyst is a flow that has been around for decades now. But MLOps (the merger of DevOps and machine learning) is not a particularly mature field. And so the connection between data scientist and application developer could be weak. Below is an updated Venn diagram to convey my point.

Image created by the author in MS paint

So how do we overcome these problems? We need our application\web developers and data scientists to develop a stronger relationship. This is not the only part of the solution, but an important part of the solution.

Now let’s list out some potential business reasons a machine learning project could fail (not necessarily all-inclusive, but should hit the more common issues).

Non-Technical Issues

  • Confidence — E.g., the company lacks confidence (in the product, or team, or somewhere in the pipeline) and is hesitant to pull the trigger on ML implementations.
  • Communication — E.g., lack of coordination between different teams programming in different languages and using different tools.

Technical Issues

  • Technical Talent & Tools — E.g., the company struggles to understand the technical requirements for implementing ML models.

With all this in mind, sounds like companies could fall into 3 buckets:

MLOps Maturity Segments

  • Mature and flexible MLOps — in which case the issue is more likely business-related (non-technical).
  • Mature MLOps but inflexible — most likely your company built a strong process for a limited number of projects that don’t work for anything else (technical issue).
  • Immature MLOps — little to no experience in MLOps (just starting).

If you fall into bullet #2 (Mature MLOps but inflexible), then you’re having technical issues, and in this case, it helps to have your data scientists meeting your application\web developers halfway. Side note — This can also help with non-technical issues like communication. If your data scientists are more familiar with web frameworks and application development, they have more power to prototype and develop a strong app foundation. This provides the business more comfort in saying “yes, let’s invest more in that”. Over time you can build on this momentum to achieve higher MLOps flexibility. For more information on prototyping, you can check out my Medium article From Prototype to Production for Data Scientists.

Implementing a Model

“So I get that having a strong relationship with your application\web developers seems important, but I still don’t understand why I should specifically learn web frameworks?”

Okay, let’s list some machine learning implementation options (not all-inclusive):

  • A dashboard — dependent on web frameworks
  • A REST API — dependent on web frameworks
  • A batch job
  • Recode the model into the production system

The 2 options requiring web frameworks are 2 of the more powerful implementation options (next to batch jobs). Let’s take a look at a REST API. Let's say I have a model, and let’s say I wrap that model up into a REST API. This enables other programs to send my REST API some data, which my model can score and return a result. With one service I can serve multiple applications!

So in short, web frameworks are a powerful way to implement a machine learning model.

I would go a step further to say that it’s also important to understand how other companies implement models. Conferences and podcasts are a great way to study up on infrastructures and architectures that other companies are using. (In some companies, the data science team might own the deployment of the model itself, while IT focuses on the full architecture. But one step at a time.)

Popular Web Frameworks for Data Scientists

Here are some popular frameworks that I’ve personally seen in the industry:

  • Flask (Python)
  • Django (Python)
  • RShiny (R)
  • plumber (R)

Flask is a very simple web framework that is easy to learn and great for REST APIs. Django is a bit more involved, but comes with a lot of power and is a popular framework among web developers. RShiny and plumber are very popular among R programmers.

If you come from a Python background and you want to learn more, Udemy has some great courses that teach Flask and Django. I personally recommend getting started in Flask.

Resources

The End!

Thanks for reading and hope you find this helpful! Happy coding and happy modeling!

--

--