Data Analytics — A Software Engineer’s view

On Thursday 5th Oct I attended ‘Connected with the MET’ - a Data Analytics Seminar hosted by Belfast MET Connected programme. The programme promotes business growth, innovation and competitiveness by giving SMEs a knowledge exchange link between academia and industry. (More info here: http://www.belfastmet.ac.uk/support-for-business/employer-support/connected/)

As a Software Engineer currently working with Machine Learning technologies, I was intrigued by this event. For me, Data Analytics is one of the stepping stones before Machine Learning and a very important one at that. Doing analytics on your data allows you to get a better understanding of what you’re working with and helps to make better informed decisions about which Machine Learning models to use and can help when evaluating models. Being a software engineer, I haven’t don’t much of this. Therefore, I thought it would be valuable to find out a bit more about this area and see if I could pick up any tips.

Data Maturity — The stepping stones to ML

There were 3 talks based on different tools to perform Data Analysis: Excel, Business Intelligence Software and Python. They were described as beginner, intermediate and advanced respectively, however I disagreed. Coming from a programming background I’m well equipped with Python, although I’m not an expert in the Data Analytics field and had little knowledge of Data Analytics with Excel and BI Software. I think each tool is just suited to different people and different use cases.

Data Analytics in Excel

A familiar application to everyone, Excel on the surface is an spreadsheet application. It has many functions for manipulating and visualising data, however in recent updates it has got 6 new chart types that provide more detailed insights into data sets. These include: Treemap, sunburst, Waterfall, Funnels, Histogram and Box & Whisker. In this talk we got an overview of some of these and examples of when they should be used. A few points I took from this:

  • Tree maps are good for large data sets that have an hierarchy, i.e. data that can be split into groups of groups.
  • Funnel charts are an alternative to pie charts and are good for displaying progression.
  • Box & whisker could previously be created in excel, but with a lot of manual effort. Now they can be created automatically. They are good for identifying outliers in data and viewing mean values at a glance.

More information about these new charts can be found in the following Microsoft blog: https://blogs.office.com/en-us/2015/07/02/introducing-new-and-modern-chart-types-now-available-in-office-2016-preview/?eu=true

Power tools in Excel

We were also introduced to some Power Tools in Excel, which are either included in newer versions or can be installed as add-ons. The Power Tools can be split into 2 groups:

  1. Analytical — Power Pivot, Power Query & Power View
  2. Visual — Power View & Power Map

Most of the tools were briefly mentioned and there was an example of one from each group:

  • Power Pivot: Allows you to bring data together from different sources (e.g. Orders data from database & Customer details csv), to then get better insights into the data through queries: E.g.: Orders per country.
  • Power Map: View data distribution through respectively sized dots on a map. Can also create 3D maps and export moving map as an mp4 file.
3D Power Map

BI Software

In this talk, 3 popular BI Software applications were explored and compared. In general comparison to Excel, BI applications are usually favoured when using data visualisations on dashboards.

My notes on the comparison of Power BI, Qlik & Tableu

Data Analysis with Python

The only experience I have had with Data Analysis is with Python using: Matplotlib, Pandas & Numpy, which were all used in this talk. Therefore, I didn’t learn as much from this one as the others, not because it wasn’t valuable but only because I’d done it before. What was mentioned though was reasons why you would choose Python over the Excel or BI Software:

  • It’s flexible, portable and easy to plug into a web front end or backend system.
  • Reusable code scripts.
  • If you come from a programming background.

Follium — a python library for visualising data on maps, was also mentioned. It has the likes of OpenStreetMap already built in and supports GeoJSON and enables you to create interactive maps. The maps can be exported as HTML and therefore easily plugged into a front end dashboard.

The best thing I got out of this talk was learning that you can convert your Jupyter notebook into a slideshow and execute code on the slides! Amazing! Instead of me writing up instructions to this, I’ll lazily link you to a post that already does that: https://medium.com/@mjspeck/presenting-code-using-jupyter-notebook-slides-a8a3c3b59d67 . I’ll definitely be using this in future presentations.

Takeaways

Overall, this was a great session and what I’ve taken away is awareness of different ways to do Data Analytics. As a programmer, if I’d like to get insight to a data set I will probably still use Python, but having seen the complex visualisations that can be created with Excel I’d also like to play around with this. BI tools for investigating data may be overload for me however, they are good to be aware of and keep in mind for customers who may need dash boarding.

Finally, my last takeaway was that if you switch around the co-ordinates of Belfast (54.5973° N, 5.9301° W), you end up near Madagascar — always useful to know :)