Web frameworks in Analytics and Data Science
To make data products fully usable, so the users can act on them, it is necessary to provide an interface enabling user’s interaction with your models and data. Democratising data in organisation means that it is essential to enable as many users from outside Data Science or Analytics teams to self-serve.
Traditionally, Analytical and Data Science teams would be trained in using a whole range of Algorithms, using Relational and NoSQL databases and presenting their findings either through the form of dashboards. They would usually use tools like Tableau, Qlikview, or Power BI to surface their work if it is meant to be used by non-technical colleagues. Those reporting tools do not always have ability to address all the requirements of surfacing tools.
Firstly, the mentioned tools have very limited support of the programming languages used in Data Science. They can run data transformation scripts, but they do not enable user to put their own parameters to algorithms. End users cannot trigger or schedule model runs from the interface or adjust the parameters.
Secondly, they are optimised to accommodate less computationally expensive processes than training or scoring large datasets. Machine Learning routines can often take a considerable amount of time and require user interfaces that is enabling seamless usage of processes taking hour or more.
Thirdly, the models’ output often feeds into other tools. We need to be able to create post requests and write to databases. From the interface stand point this requires multi-stage workflows that are currently not supported by the reporting tools.
Fortunately, modern web frameworks address those problems very well and the advent of Cloud Computing makes it easier than ever to deploy custom app-based solutions.
When to use web-apps?
The purpose of this article is not to say that custom web apps are better than the off the shelf reporting tools. Both methods have their pros and cons, and in most of the cases you will probably be able to satisfy your requirements using just the reporting frameworks.
If you want to build a simple dashboard or report that does not go beyond standard that can be built by a reporting tool you have a licence for and already use — do not reinvent the wheel. Use a simpler solution.
Make sure the hype for using something more sophisticated does not win over pragmatic approach. Web apps bring more flexibility and are more powerful way of visualising data but it comes at the cost of development time.
On the other hand, when you need to build a tool that requires greater level of abstraction with outbound loads or performing more sophisticated computations or visualisation, a custom web app would be a great choice.
The use cases for using web apps in Analytics include:
- Multi-stage analytical frameworks
- Reports utilising on-the-go machine learning
- Automation and creating feeds to other tools
- ETL pipelines for ad-hoc data loads
- Custom A/B testing and campaign orchestration platforms
Data Science Stack
Python vs R
Currently the most popular languages for Data Science are Python and R. R is being used solely for statistical and Data Science purposes. Python has a greater number of potential applications, so to understand it’s popularity in Data Science only we can use Jupyter Notebook, that is primarily used for Python’s Data Science applications, as a proxy (chart below).
Looking at the chart, it is very clear that Python’s relative popularity, compared to other languages used in the field, is growing in popularity while R gradually becomes a niche language. I am writing this without any emotional engagement as I have used both languages extensively and both have their pros and cons.
From the web development stand point it is worth mentioning that usage of JavaScript in data science is increasing with libraries like TensorFlow now being made available in that language, enabling in browser Machine Learning.
Web frameworks
Below is the list of some R and Python frameworks used in Data Science and Analytics. While reporting frameworks are a cost efficient alternative to Tableau like tools, web frameworks like Flask, Django and API building tools are the ones that can fully leverage Python’s functionalities.
- Reporting: Plotly, Bokeh, Streamlit (Python), Shiny (R)
- Full Web Frameworks: Flask, Django (Python)
- Back End API: Falcon, FastAPI(Python), Plumber (R)
Even though R has some really useful and robust frameworks, Shiny being a great example, Python dominates over it’s counterpart in the field of web development.
Thought it is enough to know Python and R to develop a simple app, to add a modern design to them it is necessary to know HTML, CSS, JavaScript, and a modern web development framework. Those skills, unfortunately, are not as common among data professionals, so development of more complex apps often require additions to team’s skill set.
Cloud deployment
Cloud computing is a really big trend in Data Science as it enables usage of more powerful machines, more efficient modes of data storage, scalability and robustness.
Since the data used by apps is very often already in the cloud, the deployment becomes a real pleasure. Containerisation and easiness to deploy apps massively reduces barriers to entry for to web app based solutions for data scientists who usually do not come from the web development world.
Building capability
Web applications can bring great value to analytical team, but they come some cost.
As mentioned earlier, traditional Data Science or Analytics team does not usually have front end or web developer skill set. Though many businesses tend to have in-house developers, they usually are in product related teams, rather than in Analytics. Requesting resource from another team often proves a difficult task.
The learning curve for the dashboarding/reporting frameworks like Streamlit or Dash is quite easy, but requesting a Data Scientist to learn front end JavaScript, CSS and the secrets of DOM might not be the best usage of their time, given the topic is quite broad.
A cost effective solution would be to use Analysts and Data Scientists to do the back end modelling and API work, while hiring a Front End developer to work on the front end.
You might also consider using a specialised agency.
Summary
As Data Science is being used growing number of applications there is going to be an increasing need for being able to provide machine learning on the go for internal non-technical clients through custom UI, as a natural extension to already available self-serve tools.
Machine Learning these days is a standard and having a strong DS team is something expected from every business. This makes it more important to innovate and look for new solutions when it comes to increasing usage of models in businesses. One of them is democratising the ML process and providing end users with the ability to do ML on the go.
The area is still not sufficiently explored and there is a lot of room for specialised developers and innovative SaaS businesses who have ideas on making the development of user interfaces for data products more seamless.
For more, visit my website visit my website www.clusteroneanalytics.com follow me on Twitter or on LinkedIn.