Perceive your Data

Nikhil Ravindra Akki
Grey Matter AI
Published in
4 min readApr 20, 2021

Explore the expanse of data not just with numbers but artistic visuals.

Introduction

Visualisation is a craft of creating meaningful messages by leveraging cognitive elements such as animations, diagrams, images that communicate a message. A picture that speaks a thousand words is true and we will understand why in the next section once we understand the types of data we have to deal with in the world of data science and analytics.

Visualisation can not only help you understand certain behaviour within the data or sample but also help you decide which method, algorithm or technique to use to reach your goal. This goal can be anything from data analysis, predictive modelling or even a recommendation. The possibilities are limited only to one’s imagination. Now let’s look at the different kinds of data we would have to deal with in our journey in Data Science.

Understanding types of data -

  1. Tabular Data : As the name suggests the data can be put into an excel sheet (with rows and columns). Columns contain the header for the observation and the row value contains the actual value. Let’s understand this with a simple example -
Sample tabular data

2. Perceptual Data (Image, Audio and Text):

Perceptual data can be easily seen, heard or read than its underlying representation.

Let’s understand this with some examples -

  1. Image: When you see an image the final output looks like a picture (in fact it is one) but the underlying data representation or in other words the pixel values are nothing but colour intensity values usually between (0, 255) 0 means no light and 255 is full light, this logic paired with different colour spectrums (Red, Green Blue) creates what we perceive as a coloured image, for a more in-depth explanation please visit the link.
  2. Audio: Audio data is represented in the form of amplitude and frequency, the amplitude is the value of the sound wave and if it’s in the range humans hear (usually between 20 Hz to 20 Khz) we can hear it as music, sound or noise. In computers, all data needs to be digital and since this is an analogue signal it is converted using a technique which is known as ADC or Analog to Digital Converter more on this topic visit link.
  3. Text: As the end-user will see if the text is stored in the format it is seen on the screen. In the backend, each alphabet is mapped to a particular code and there are a bunch of different encoding standards such as UTF-8, Unicode, ASCII etc. more on this here.

Now that we have understood how data is stored on a computer, let’s look at how these kinds of data would be used for visualisation.

Tools for Data Exploration -

There are a plethora of tools out there both paid and open-source all of which are useful in their own right. In this section, we are going to explore different open source tools which we could use to explore each type of data discussed above.

  1. Pandas — A powerful library in Python for working with structured data or tabular data. A powerful laptop can read gigabytes of a tabular file (beat that excel! ;-))
  2. OpenCVAnother powerful library that can help you clear, filter, transform image-based data. The original implementation is in C++ but has nice Python wrappers which allow some neat features for Computer vision tasks.
  3. Librosa — A popular library in Python for working with audio data and it comes with built-in features for digital audio processing.
  4. Spacy — One of the most widely used NLP or Natural Language processing framework out there and it is also among the fastest. Spacy has some nice out of the box features like Entity recognition, Vector embeddings, Similarity search etc.

Tools for Data Visualisation -

  1. Matplotlib — Probably the most mature visualisation library in the Python ecosystem and has a super configurable API.
  2. Plotly — If you are looking for some advanced plotting and jazzy modern look for your charts, look no further! Plotly comes with an extensive gallery of charts and plots with a robust Python API.
  3. Streamlit — Built a dashboard like app in the shortest time possible (no kidding!) Streamlit uses all the good libraries from Python stack like pandas, plotly, bumpy, matplotlib to name a few underneath so puts them on steroids which helps you build a web-based dashboard application by writing just one python file!

The above list by no means is exhaustive there are so many more tools and frameworks that can’t be covered in one article for sure. We recommend you try them out yourself, all of them come with great documentation.

Takeaways

The point is to make sure to use the right tool for the right job, if you intuitively understand the underlying data and what tool to use, it will come as second nature. Remember this is just the first step, with you and the right set of tools under your belt, the only question left is how well you use it and that comes with experience. Most of all, the one key trait which is needed for this kind of job is a strong sense of curiosity which will propel you further in your data science journey.

Keep tinkering and do let us know your thoughts. 😁

--

--