Interesting facts about Arctic Ice (or “how to query a CSV file with Javascript and LokiJS”)
In the period 2003–2014 (*):
- The highest arctic ice extension was reached on March 21st 2003
- The lowest extension was on September 18th 2012
- The highest average extension for the year was 2009
- The lowest average extension for the year was 2012
- The highest standard deviation was 2009, the lowest 2006
(*) based on the JAXA data, normalizing all faulty/not-recorded records to the average value for the 2000's decade. The results are only very good approximations.
Abstract (what hipsters call a TL; DR)
JavaScript can be a really powerful language to perform data analysis (as this great post from Nathan Epstein eloquently shows). I wanted to find out interesting information about Arctic Ice using the library I authored, LokiJS (a fast, in-memory nosql datastore). In the process I discovered that node.js + LokiJS can become a really handy tool to make sense of the data contained in a CSV file, even large ones. By the way, you don’t need Loki, you could use some other tool, maybe a more functional programming oriented one, like Ramda (❤), or plain javascript, but I was able to work out all these statistics in a matter of minutes with LokiJS, hence my choice.
I like climatology, I’m not a climatologist.
I love climatology and meteorology. I don’t have qualifications to speak with authority about said subjects, I simply read a lot about them. For years I regularly visited the JAXA Sea Ice Monitor website (now gone) and by complete coincidence I managed to save a copy of the CSV file of data only days before the whole website vanished.
I also happened to be simultaneously working on a few statistical methods on LokiJS, specifically really basic operations like calculating maximum, minimum, average values for a particular property for all the objects in a LokiJS collection (which have been included in v1.2.2).
So the two came together beautifully. I wanted to test Loki’s capability to perform statistical operations on collections and I had a subject I was interested in to test it on.
Global Warming & Co.
This post is not for, against or whatever other position you can take towards the theory of Global Warming. It is outside the point. The data will show a negative trend on Arctic Ice, if anybody feels there’s an angle here, you’re wrong. Antarctic Ice is in positive trend, next time I can examine that if you want.
Loading the CSV
There are a number of npm packages that work with CSV (and very well, see node-csv) but sometimes I find it faster to write my own mini-tool (csv-loader) than to learn a bigger, more fully featured one. Ultimately, csv-loader just a glorified fs.readFile that parses each row into objects. In any case you can find the utilty at csv-loader, it also contains a demo.js file which is basically the code for this article.
Analyzing the data
Now for the interesting part. Once the data is loaded, I insert each object into a Loki records collection. From here I can leverage Loki to find out what I am looking for. As I’m only going to use a subset of the years available in the csv, I declared a handy years array that I will use to iterate.
var years = [‘2003', ‘2004', ‘2005', ‘2006', ‘2007', ‘2008', ‘2009', ‘2010', ‘2011', ‘2012', ‘2013', ‘2014'];
The JAXA records are a bit weird (as in, I would not have done it that way), each day of the calendar has data recorded for each year, so effectively as the years pass there’s a new column added to the existing rows. Each CSV row contains Day, Month, Average, Average 1980s, 1990s, 2000s, then the individual record for 2002–2015, then the Average for that day of the year. If data was not (or has not yet been) recorded a value of -9999 is assigned. As I mentioned in the beginning, I normalized these sporadic -9999 to Average 2000s, as they were throwing out the statistics.
So for clarity I will say the data obtained at the end is not 100% accurate, but that is because the CSV data in the first place is not 100% complete.
Absolute maximum
Here we go. The first thing I did was to create a sorting function that allows a bit of currying so I can sort data by property.
function sortBy(property) {
return function (a, b) {
return a[property] < b[property] ? -1 : (a[property] === b[property] ? 0 : 1);
};
}
Then, we use Loki’s maxRecord() function to retrieve the maximum record for each year (remember rows contains records for all the years).
var maxima = years.map(function (obj) {
return records.maxRecord(obj);
}).sort(sortBy('value'));
maxRecord() only returns a very basic summary: an object containing the id of the record with maximum value for a certain field, and its value, in this case the array of summaries returned is
Maxima:
[ { index: 75, value: 14127729 },
{ index: 69, value: 14132380 },
{ index: 55, value: 14209677 },
{ index: 66, value: 14396094 },
{ index: 79, value: 14448416 },
{ index: 73, value: 14523635 },
{ index: 61, value: 14657047 },
{ index: 90, value: 14688540 },
{ index: 69, value: 14701388 },
{ index: 66, value: 14709086 },
{ index: 68, value: 14774776 },
{ index: 80, value: 15066086 } ]
The last element is our guy. Now we just have to retrieve that record with our handy get() to find out what went on.
var alltimeMax = records.get(maxima[maxima.length — 1].index);{ ‘2003': 15066086,
‘2004': 14374684,
‘2005': 13844442,
‘2006': 13774868,
‘2007': 13817518,
‘2008': 14574178,
‘2009': 14434695,
‘2010': 14524826,
‘2011': 13843563,
‘2012': 14598826,
‘2013': 14320881,
‘2014': 14408834,
Month: ‘03',
Day: ‘21',
‘1980\’s Average’: ‘15391637',
‘1990\’s Average’: ‘14945313',
‘2000\’s Average’: ‘14472686',
Average: 0,
meta: { revision: 0, created: 1426796470020, version: 0 },
‘$loki’: 80 }
We could write a function to determine the maximum value among the properties 2003–2014 of the object, but the answer is staring at us right in the face. March 21st 2003, the arctic ice cap had an extension of 15,066,086 square km.
Absolute minimum
Given the above process we only have to apply two changes to the previous code to find out the minimum: map the records with the minRecord() function then look up the element at position zero instead of the last. This gives:
{ ‘2003': 5933760,
‘2004': 5712311,
‘2005': 5232747,
‘2006': 5752096,
‘2007': 4074312,
‘2008': 4521068,
‘2009': 5168319,
‘2010': 4623077,
‘2011': 4412475,
‘2012': 3319816,
‘2013': 4838927,
‘2014': 4898064,
Month: ‘09',
Day: ‘18',
‘1980\’s Average’: ‘7324923',
‘1990\’s Average’: ‘6627664',
‘2000\’s Average’: ‘5497402',
Average: 0,
meta: { revision: 0, created: 1426796834978, version: 0 },
‘$loki’: 261 }
There we go, on Setpember 18th 2012 the arctic ice cap went down as low as 3,319,816 square km.
Standard Deviations
Standard deviations for each year are easily retrieved with:
years.map(function (year) {
return {
year: year,
stdDev: records.stdDev(year)
};
}).sort(sortBy(‘stdDev’));
which yields:
[ { year: ‘2006', stdDev: 2854052.451217569 },
{ year: ‘2004', stdDev: 2877107.6396103706 },
{ year: ‘2005', stdDev: 3014763.9129358763 },
{ year: ‘2003', stdDev: 3045668.3664291264 },
{ year: ‘2014', stdDev: 3135944.659122019 },
{ year: ‘2013', stdDev: 3213957.9332678406 },
{ year: ‘2009', stdDev: 3244804.0962197566 },
{ year: ‘2010', stdDev: 3296042.6069022967 },
{ year: ‘2008', stdDev: 3312263.7334665037 },
{ year: ‘2011', stdDev: 3349821.8710202053 },
{ year: ‘2007', stdDev: 3461103.5490136337 },
{ year: ‘2012', stdDev: 3808234.077127053 } ]
Which is consistent with the historical data: 2007 and 2011 are the years in which the 2nd and 3rd lowest values were recorded, and were years in which the extension stayed low year round.
Average extensions
Once again, we iterate the years array and use avg().
years.map(function (year) {
return {
year: year,
average: records.avg(year)
}
}).sort(sortBy(‘average’));
And this yields:
[ { year: ‘2012', average: 9959411.569863014 },
{ year: ‘2011', average: 10046048.802739726 },
{ year: ‘2007', average: 10082738.82739726 },
{ year: ‘2014', average: 10324460.457534246 },
{ year: ‘2010', average: 10327712.84109589 },
{ year: ‘2006', average: 10351781.408219177 },
{ year: ‘2013', average: 10419325.241095891 },
{ year: ‘2005', average: 10489626.232876712 },
{ year: ‘2009', average: 10543074.794520548 },
{ year: ‘2008', average: 10573503.49041096 },
{ year: ‘2004', average: 10810508.410958905 },
{ year: ‘2003', average: 10990147.832876712 } ]
Again, this seems consistent. 2012 had the lowest average, 2003 the highest.
Where to go from here
LokiJS is fast performance, nosql, in-memory, document-oriented. It can persist data, so it can be great for offline data in the browser, as an alternative solution to SQLite on mobile and node-webkit (especially because LokiJS is available on npm and bower). Now you can use it to query a CSV file, like you would a database. The era of JavaScript as a data language has just begun.