Follow along series of articles here on Medium or Star/Fork entire series on my GitHub channel. I hope you get something meaningful to study during this pandemic. Stay safe guys!

Data is the new Oil” is a pretty infamous proverb that has been floating around on the web lately. In this era of Big Data with zettabytes of information is being shared across the network as this new Oil, and we know how rapidly this volume is increasing every single day. Hence for any business to succeed, it is very intrinsic to extract meaningful insights from the data being generated across their platform, be it web or mobile. This demand is extremely high as of today and with the development happening across the globe, we’re pretty sure that any professional in Technology domain with Data Science skills will comfortably sustain for next few decades. Simultaneously, I am also glad to see the revolution in Education domain where experienced professionals are coming forward to share their knowledge with newbies on various platforms.

Somehow I believe that the value of Data Visualization in the Data Science toolkit is constantly being undermined. Data Science is not all about super cool predictive algorithms and instead in itself “Data Science is a life cycle”. No matter how well we have pre-processed our data, if we cannot project the inference on to a screen, it won’t make stakeholders happy because they don’t care about our code. Irrespective of how well our algorithm actually predicts, unless we can project those numbers (preferred in $) efficiently on screen for them to visualize, oh boy, for us “Winter has come”, so get ready to face the Night King. Human brain perceives best, what it can visually comprehend to & if stakeholder/clients could have done that, Harvard Business wouldn’t have ever rated Data Science as the sexiest job of the 21st century.

Image Source — HBR

This is a facet of Data Science which requires an equal amount of dedicated hours for learning, because be it in an interview or in a board room meeting, just projecting a Stripplot on top of Swarmplot with few nice colors isn’t going to solve our purpose if we want their $120,000. And in this process, Statistics plays a very important role, for which we have to READ BOOKS, irrespective of format (Paperback, PDF, whatever). If nothing else, at least read ISLR (Introduction to Statistical Learning), which is a very foundation level book with minimal concepts to kick-start our journey. I shall also add few other resources for reference at the end of this post or in next of this series.

Graphics need to be concise and should account for a range of different stakeholders. However, there are two points here that needs to be heavily focused upon, one being “Strategic Focus”. While aesthetically pleasing graphics are great, we also need to ensure that the graph/model presented is in sync with organization objectives. A nice graphic representation on it’s own will never garner stakeholder support unless they can see the correlation with strategic orientation. Secondly we’ve “Overall Deliverable” which means a nice looking graphic isn’t exactly a competitive advantage or strategic insight for actual business change. Rather the end product is a deliverable which enables new capability realization for business to: > Improve competitive positioning, or > Explore new insight from static data which then ties back tour first point.

If we think all of this is not important because what matters is how quickly we become an Avenger and save some city, then without using Google, answer this inane question: “What does a Box in a Boxplot represent and what are those lines within it?” If we got the answer right, “Bravo! Saved next 5 minutes for something better than reading this article!” & if we couldn’t, then we’re doomed (just kidding!). Lack of such visualization skills isn’t generally with working professionals, but more with the beginners because MOOCs they undertake breeze through Data Visualization topics as if they never even existed, so here is a small attempt to make things little better for those who struggle inferring plots.

Oh yeah! This is how clients look if unhappy with our presentation :)

Before I start plunging more into the actual Viz discussion, let me spend a minute validating the choice of Seaborn. There are two main reasons for this: > One, that I am biased, and > Second, it doesn’t really matter which package we use, provided we know how to use whatever we choose and subsequently able to explain our usage to others in simplest possible language. With Python, Matplotlib is the “grandfather” of visualization and when topped with Seaborn, the visual helps us easily decipher anything being communicated. Though Seaborn isn’t a core scientific computation package like SciPy, but is extremely efficient with Statistical Data Graphics. Hence, Seaborn is an industry-wise accepted visualization technique, unless a Dashboard is required (even widget option is available though), so below I encompass all the Seaborn offerings:

Writer is a Finance graduate with MS in Applied Statistics. He also holds Post Graduate Diploma in Cognitive Computing. Professionally he is experienced in Predictive Analytics and BI domain, and has previously worked with organizations like Dell and VMware. He has trained professionals in prior companies in his domain and continues to help grow the community. Currently he works as a Cognitive Computing consultant.

--

--