The New York Times on an average Sunday contains more information than a Renaissance-era person had access to in his entire lifetime.We’re getting better and better at collecting data, but we lag in what we can do with it. Lots of data is out there, but it’s not being used to its greatest potential because it’s not being visualized as well as it could be.
As legends have it, >90% of Data science projects don’t see the light of the day and ML models die their slow deaths within jupyter notebooks. Absence of a sweet intersecting spot between Data Scientists and business stakeholders prevents 2-way communication and “We Will get back to you” meeting never materializes. What if the meeting starts with a good data based storytelling where the stakeholders keep asking follow up questions and getting the answer right there keeping both the parties involved.I have noticed it more than once that a good story makes for a great first impression.A good story needs to stand by itself, should be self serving and should blow the audience off their feet.Now, a good story cant be told for sure in a redshift table or a jupyter notebook as they have shareabaility concerns and have steep learning curve for business audiences. Business folks may come back with N! questions from a minimal dataset with 6 columns on New York Taxi Fare dataset( https://www.kaggle.com/c/new-york-city-taxi-fare-prediction) — ID, Fare, Time of Trip, Passengers and Location. Their questions may range from:
1. How the fares have changed Year-Over-Year?
2. Has the number of trips increased across the years?
3. Do people prefer traveling alone or they have company?
4. Has the small distance rides increased as people have become lazier?
5. What time of the day and day of week do people want to travel?
6. Is there emergence of new hotspots in the city recently except the regular Air Port pickup and drop?
7. Are people taking more inter-city trips?
8. Has the traffic increased leading to more fares/time taken for the same distances?
9. Are there cluster of pick-up and Drop points or areas which see high traffic?
10. Are there outliers in data i.e 0 distance and fare of $100+ and so on?
11. Do the demand change during Holiday season and airport trips increase?
12. Is there any correlation of weather i.e rain or snow with the taxi demand?
However, if we provide a good story telling platform a.k.a dashboard hosted on a Webapp, they would have a secondary platform to quench all such curiosities even after in-person meeting.
On similar light, below is a fictitious conversation below between tech and business stakeholders on stock market foreseeing the impact of Covid in new world.
Mr. Market has Already Prophesied How the Post-Covid World Will Look Like
The market knows it all and it has prepared itself for the post-Covid world using millions of data points, Have you?
On impact of powerful storytellings, there are dashboards that have created ripples right from the word go:
- New York Times portrayal of how Coronavirus travelled- The visualization demonstrates lifecycle of Coronavirus in US starting with 1st infection to Millions with a clear virus trajectory and how unrestrained people movement has led to situation going out of hand. This makes a compelling case for Stay-In-Home orders and subsequently help in flattening the curve.
How the Virus Won
Invisible outbreaks sprang up everywhere. The United States ignored the warning signs. We analyzed travel patterns…
2. Average day in life of an American: This visualization portrays average day in the life of an American starting from 4 am to Midnight and interspersed between Sleeping, Work, Leisure, chores etc. The animation is right on point to establish the life patterns and can be of so much use to technology companies for reaching out to their target group e.g Push notification for Netflix at 7:00 pm.
A Day in the Life of Americans
From two angles so far, we've seen how Americans spend their days, but the views are wideout and limited in what you…
During the course of this article spread across 3 parts, we will be never moving out of python ecosystem and deploying Plotly based Dash WebApps for showcasing to the world.Series of Animations/Screenshot below represents my handpicked components while working on Dash for last 2 years. During next few weeks we will be building these dashboards together in our open source datasets.
- Integrating Tensorboard projector:
I have written a blog post with detailed blog post here:
Build Floating Movie Recommendations using Deep Learning — DIY in <10 Mins
Recommendations served on your own Webapp.com, Watch on Netflix.com
2. Creating network diagram using Pyvis:
Code for generating wordcloud
4. Travelling Salesman Problem
5. Playing Youtube video live
Code for attaching YouTube video in Dash
6. Treeplot to dive deep into sub segments:
8. Integrate Google trends Data:
9. Dynamic Drill-down using Dash tables and radio buttons:
10. Images display and aggregated information in Dash Table:
I love storytelling and I have been looking out for that perfect dashboard toolkit since last 6 years and the journey has stopped at Dash where I have found a sweet mix of everything.Chronological order of my tryst with dashboards:
2014- Excel Dashboards: Started fidgeting with spreadsheets, macros and pivot charts. Since it was an extension of regular excel data analysis, the learning curve was smooth. However, excel limits the data size to 1Mn rows in addition to slow data joins across tables. Since the reporting is ad-hoc, the dashboards are not shareable with static data.
2015-Shiny: Shiny is an R package that makes it easy to build interactive web apps straight from R. It combines the computational power of R with the interactivity of the modern web. However, I started facing challenges with Data scaling up in R owing to latest developments being constrained to python, difficulty to launch R studio on linux servers and better community and my company support to Python.
2016–17- Tableau: I was blown by the richness of possible visualizations with mere Drag N Drop. In fact I loved tableau so much that I made my entire CV in tableau. Legends has it that my tableau CV went viral and got called for interview at Amazon(To share in another post). I could connect Redhsift/CSV files in tableau and the charts would update automatically based on updated data, Seemed just like magic. However, there are costs associated with such rich features(Not just talking about $15.5Bn. Tableau Acquisition by Salesforece). The trial is expensive in addition to difficulty in share-ability (Either TWBX file or tableau online). Free version is provided by tableau public but the dashboards cant be saved in local machine. I maintain my tableau public profile here, would love to get feedbacks.
2017–18- Apache Superset: Since tableau had difficulty in creating calculated columns and entire data had to be fetched every time, I started exploring open source tools which could integrate seamlessly with cloud based data warehouses. Key advantages of superset is easy deployment on localhost and linux with clear charts on flask ecosystem. Since it runs SQL queries in the backend, calculated columns is a breeze using enterprise based warehouses(Redshift/Big Query/Athena). However, the number of charts are limited and it needs reshuffling between python & sql ecosystem for advanced data visualizations.
2019-Ongoing- Dash: “This is how I met the love of my dashboarding life” i.e Dash.Since last 1.5 years , I have spent ~25% of my working time on Dash and it has helped me create scalable solutions not just for myself in jupyter notebooks but also has led to my evolution from mere model building on jupyter notebooks to taking the projects live.
Summary Table of Comparison :
Flowchart for a dashboard deployment:
- Resourcing and stakeholders- Identify the core team driving the dashboard with clear deliverables, timelines and checkpoints.
- Design- Always start with a rough sketch on paper as it gives more room for creativity and scribbling seemingly random dashboard components at start.
3. Create the components in a modular fashion and keep getting feedbacks. Also, consider the coding best practices like code comments and scalability with proper classes & modules into consideration as the dashboard must outlive the org tenure.
4. Create a scalable pipeline to update data (Connection to redshift, code running by DJS, Batch processing) with minimal manual interference. It is an oft-ignored component but starts creating trouble once dependencies are built on the dashboard.
5. Test on localhost/non prod servers and do a dry run with colleagues to prevent it breaking on prod systems
6. Deploy using Servers(EC2, Heroku, Elastic Beanstalk)
7. Optional- Get a professional name, everyone prefers a name like www.buyyourbunny.com vs www.ec2jjjdxxrrr.com:8080
- We Covered grounds on need of a good storytelling
- Potential to improve the success rate of Data Science projects by having a compelling visualization
- Shareability power of dashboards so that the stakeholders become Force Multiplier in seeking additional funding for the project
- My individual tryst with dashboards across last 6 years and my favorite creations in Dash