Crushing Data in JavaScript (Part 2)

If you haven’t read my first blog post on data in JavaScript I highly recommend you go back and have a look at that before reading this blog. That first piece was light on code and heavy on theory. Expect this piece to delve into much more code. First off, as promised, we are going to be creating visualizations using D3. In order to do this we need to make sure we are using optimal data structures. Last time we pulled our data into one large object. This is manageable but I think it is going to be much more effective if we put it in an array of objects. Remember our imported dataset contained a bunch of news stories, if the reader recalled them, and their accuracy. Our intention here is to create an array of objects for each headline (A — K), the count, a count of times it was recalled, and a count of how many they thought were accurate.

d3.csv('responses.csv', function(data){
var arr = [];
for (i = 0; i < data.length; i++){
if (arr.indexOf(data[i].headline) === -1) {
arr.push(data[i].headline);
}
}
for (j = 0; j < arr.length; j++){
var currObj = data.filter(function(a) {
return a.headline === arr[j];
}, []);
var reducedObj = currObj.reduce(function(a, b){
a.count++;
if (b.recalled_bool === "True") {
a.recalled_cnt++;
}
if (b.accuracy_bool === "True") {
a.accuracy_cnt++;
}
return a;
}, { "headline": arr[j], "count": 0, "recalled_cnt": 0, "accuracy_cnt": 0 })
ds.push(reducedObj);
}
renderGraph(ds);
});

First we are initializing the array arr which is where we are going to store the unique value for every headline [A, B, … , K]. This is going to be used in our main for loop below so we can create an object for each index in our array. By looping through our entire raw dataset we check every element and push any unique values to the arr array.

Next we have the main loop. Here we go through every element of the newly created arr array. For each index in the array for apply a filter to our dataset and pass the subset of data to currObj.

If you remember filter was one of those critical components of data management. In order to be able to consume data you need to be able to filter data because you need to create smaller subsets that pinpoint the very specific information that your users are looking for. This particular function takes a which is the current index of the dataset and verifies if it is equal to the headline for the current iteration of the for loop

var currObj = data.filter(function(a) {
return a.headline === arr[j];
}, []);

So for each headline we are creating a subset of the data from the master dataset. The next piece is the reduce function. This where where we actually flatten our data and summarize it. Remember we were going to create three metrics: count, recalled_cnt, and accuracy_cnt.

var reducedObj = currObj.reduce(function(a, b){
a.count++;
if (b.recalled_bool === "True") {
a.recalled_cnt++;
}
if (b.accuracy_bool === "True") {
a.accuracy_cnt++;
}
return a;
}, { "headline": arr[j], "count": 0, "recalled_cnt": 0, "accuracy_cnt": 0 })
ds.push(reducedObj);
}

Our reduce function is taking in two parameters: a which is the value from the previous iteration and b which is the current datum that we are working with. The elusive second parameter the the reduce function (the first being the callback function that is actually aggregating the data) is the initial value. So we start with an object that contains a the headline for the current iteration and our three metrics initialized to zero. The first time the callback is invoked it will pass this object as a and the 0th index as b. We will increment count, and increment recalled_count and accuracy_cnt where applicable. Finally we will return a which will be passed back to the function on the 2nd call and so on through the nth call which will be the last value in the array. Once complete we will push the returned value (reducedObj) to our ds array which is the final dataset that we will be working with in D3.

Our filtered / reduced dataset

The last thing we do is we call the renderGraph function passing the final and complete ds array. I am not going to go through the entire D3 code but I will add it here for completeness:

function renderGraph (B){
d3.select(".chart").selectAll("*").remove();
var x = d3.scaleLinear()
.domain([0, d3.max(B)])
.range([0, cfg.width]);

var chart = d3.select(".chart")
.attr("width", cfg.width)
.attr("height", cfg.barHeight * B.length);
var bar = chart.selectAll("g")
.data(B)
.enter().append("g")
.attr("transform", function(d, i) {
return "translate(0," + i * cfg.barHeight + ")";
});
bar.append("rect")
.attr("width", function(d) {
return d.recalled_cnt / 5 + 10;
}).attr("height", cfg.barHeight - 1);
bar.append("text")
.attr("x", function(d) {
return d.recalled_cnt / 5;
})
.attr("y", cfg.barHeight / 2)
.attr("dy", ".35em")
.text(function(d) {
return d.recalled_cnt;
})
};

The final output will look something like this:

There are plenty of resources available for learning D3 and I am by no means the definitive expert on this. I highly recommend Curran Kelleher’s YouTube video if you are interested to learn more. You can reference the full code for my project on GitHub if you are interested in playing around with the code a bit. If you are interested to learn more you can see my final piece in the series in which I cover mapping data.

Thank you for taking the time to read this. Cheers.