Data Science: Data Visualisation on Javascript

Aparna Joshi
Nov 22 · 7 min read
Image for post
Image for post

I have been using javascript for pretty much my whole career, and have started embarking on a new journey of becoming a data scientist. Along with the exciting new things I have been learning, I felt the need to revisit some of the old classics and try to make sense of the survey data.

This survey was conducted this year and it reached nearly 21,717 people. The survey has collected a wide range of data starting from the general demographics of the population using javascript such as the country they live in, the job specifications of the people working with javascript, and what is their proficiency level in frontend or backend development. It also collects the data about the different syntax used by the developers, various frontend frameworks and libraries preferred.

Let’s being by applying some of the basic data visualization techniques available in python and go to the more advanced versions.

Prerequisites

import pandas as pd
pd.read_json("./state_of_js.json").to_csv("state_of_json.csv")

Sampling the data

The survey data usually obtained in real life will be huge, generally in the range of a few ten thousand. The initial task is to look at sample data and identify the structure of the data. The easiest way is to use the following function:

data = read_dataset("state_of_js.csv");
print(data.head())

However, printing this in jupyter notebook doesn't give a better holistic look at the data. A better way would be to sample out a couple of hundred rows and save it in a CSV file.

import pandas as pd

def read_dataset(dataset):
data = pd.read_csv(dataset);
return data

data = read_dataset("state_of_js.csv");
sample_data = data[:1000];

def write_csv(filename, data):
data.to_csv(filename);
return;

write_csv("sample_state_of_js.csv", sample_data);

Now we can visualize the data to see the type of values, and the organization of rows.

Image for post
Image for post

Looking at the sample data we can see that the following categories of data are present. In other words, the descriptive features of the data are as follows:

For this article, we will focus on the demographics and the tools being used by the people, and try to visualize them.

Data preprocessing

One of the important tasks for data visualization and manipulation is to preprocess the data and convert them to the required format. Sometimes it is also better to remove certain columns (descriptive features) from the data that are currently not required.

In the next step, we are doing the following:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import json

def read_dataset(dataset):
folder = "./";
data = pd.read_csv(dataset);
return data

data = read_dataset("state_of_js.csv");
userdf = pd.DataFrame();
toolsdf = pd.DataFrame();
combineddf = pd.DataFrame();
for index, row in data.iterrows():
user_info = row["user_info"];
tools = row["tools"];
if(isinstance(user_info, str) and isinstance(tools, str)):
tempjsonuser = eval(user_info);
tempjsontools = eval(tools);
for key, value in tempjsontools.items():
tempjsontools[key] = value['experience'];
tempjsoncombined = {**tempjsonuser, **tempjsontools}
userdf = userdf.append(tempjsonuser, ignore_index=True);
toolsdf = toolsdf.append(tempjsontools, ignore_index=True);
combineddf = combineddf.append(tempjsoncombined, ignore_index=True);

We have the following content on which data manipulation can be done:

Image for post
Image for post

Sources of the Survey

We have the userdf DataFrame which contains the data of the user. The column of interest under this DataFrame is source_normalized. This shows the various sources from which the survey was taken. We first filter out the data to remove rows with missing values and then plot a bar graph.

filtered_df_source = userdf[userdf["source_normalized"].notna()];

fig, ax = plt.subplots(figsize=[96, 64])
sns.countplot(ax = ax, data=filtered_df_source, x="source_normalized")
plt.savefig("source.png");
Image for post
Image for post

Distribution by country

The user data frame also contains information about the country from which the survey was taken. Let us plot this by filtering the data and plotting a bar graph.

filtered_df_country = userdf[userdf["country"].notna()];

fig, ax = plt.subplots(figsize=[96, 64])
sns.countplot(ax = ax, data=filtered_df_country, x="country")
plt.savefig("country.png");
Image for post
Image for post

Salary distribution of the users based on the country

We have two non-numeric data country and yearly_salary. Usually, the bar graphs and stacked graphs are formed on top of numeric data. However, it is possible to transform the data and grouping them by country and yearly_salary.

sample_data = filtered_combineddf_country[:1000]
df_plot = sample_data.groupby(['country', 'yearly_salary']).size().reset_index().pivot(columns='country', index='yearly_salary', values=0)
fig, ax = plt.subplots(figsize=[12, 8])
df_plot.plot(ax=ax, kind='bar', stacked=True, rotation=40, ha="right")
plt.savefig("country_salary_stacked.png")
Image for post
Image for post

The following observations can be made:

There can be another type of representation for the distribution of salary based on country:

fig, ax = plt.subplots(figsize=[12, 8])
sns.stripplot(ax = ax, x ='country', y='yearly_salary', data = filtered_combineddf_country[:1000], jitter = True, hue ='yearly_salary', dodge = True)
plt.savefig("country_salary.png");
Image for post
Image for post

Number of users of React, Angular, and VueJs

In the following representation, we have 3 different columns of categorical data. These are for Angular, React, and VueJs usage. The data represents 5 different levels of experience users have. There are two options to approach this problem:

sample_data = filtered_combineddf_country[:1000]

fig, ax = plt.subplots(figsize=[12,8])

ax = sns.countplot(x=sample_data.angular, color='green')
ticks = ax.get_xticks()
ticklabels = ax.get_xticklabels()
lim = ax.get_xlim()

sns.countplot(x=sample_data.react, color='red')
ax.set_xlim(lim)
ax.set_xticks(ticks)
ax.set_xticklabels(ticklabels)

sns.countplot(x=sample_data.vuejs, color='blue')
ax.set_xlim(lim)
ax.set_xticks(ticks)
ax.set_xticklabels(ticklabels)
ax.set_xlabel('languages')
plt.savefig("languages.png");
Image for post
Image for post

In the above plot, we are counting the different values under Angular, React and VueJs, and stack them together under the same graph.

Distribution of Angular JS usage by the country

We can now plot the count of different types of angular js users, distributed by their region of work. This would be similar to how the country-wise salary distribution was plotted.

sample_data = filtered_combineddf_country[:1000]
df_plot = sample_data.groupby(['angular', 'country']).size().reset_index().pivot(columns='angular', index='country', values=0)
fig, ax = plt.subplots(figsize=[24, 16])
df_plot.plot(ax = ax, kind='bar', stacked=True)
plt.savefig("country_angular_stacked.png")
Image for post
Image for post

From the graph:

In this article, we explored some real-time data collected about the usage of javascript and applied data visualization techniques to it. This will be a continuous series where more number of statistical analysis would be done on the same data. If you need any further information on data visualization refer to Seaborn and Matplotlib

The Startup

Medium's largest active publication, followed by +733K people. Follow to join our community.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store