Exploratory Data Analysis I: The McClay Library

The McClay Library at Queen’s University is an award winning library. It contains more than a million volumes and more than 2,000 reader spaces. As the exam period approaches the number of occupants gradually rise. At its peak during exams, finding a PC or a seat can be quite difficult. The library provides occupancy data, showing the different floors, the number of available PCs on each, and the total number of occupants in the library. This service is updated once every 10 seconds and can be a useful guide when finding a place to study on a very busy day. I suppose that the information on the website is sourced from the library’s controlled entry system.

McClay Library (source: RPP Architects)

Having been a student at Queen’s quite some time now, I am well aware of how the library can become saturated with rather disconcerted-looking students when occupancy peaks. I was curious to know how useful the occupancy information could be at such times. So I began exploring the website and later found a way to collect the data on it.

Over the course of the summer exams period, I aim to gather as many datapoints as possible and play with the dataset. I hope to find something which can be benefial to every Queen’s student, in or out of exam season. In this post I shall discuss what I have done so far with the ever-growing dataset, and how I’m using the Wolfram Language and other technologies such as Wolfram Data Drop and Wolfram|Alpha to store and analyse it. I hope you find this both interesting and useful.


1. Simple Display

I will begin by discussing how I accessed the occupancy information and visualised it the way I wanted. Two URLs are of interest here: go.qub.ac.uk/mcclay-availablepcs and go.qub.ac.uk/mcclay-occupancy. The former shows the number of available PCs on all floors while the later shows the total number of people in the building.

Create display

Combining data from both sources, we can create a simple interface similar to the one on the occupancy site. We’ll first extract the data from our two sources and process it, and then represent occupancy and PC usage in the form of a horizontal gauge.

display = Block[{
url = "https://www.qub.ac.uk/directorates/InformationServices/pcs/impero/",
url2 = "https://www.qub.ac.uk/directorates/InformationServices/pcs/sentry",
rawDataPCs, dataPCs, dataOccupancy, timeAndDate, mcclayLib,
currentOccupancy, maxOccupancy, gaugesPCs, gaugesOccupancy},

(* import info. from sites *)
rawDataPCs = Import[url];
(* extract useful info. *)
dataPCs = ImportString[rawDataPCs, "RawJSON"];
dataOccupancy = Import[url2];
{currentOccupancy, maxOccupancy} =
First /@ (
StringCases[
dataOccupancy, ___ ~~ "<" <> # <> ">" ~~ occ___ ~~
"</" <> # <> ">" ~~ ___ -> occ] & /@ {"occupancy",
"maxoccupancy"});
Row[{"Occupancy ", Row[{currentOccupancy, maxOccupancy}, "/"]}];

(* extract time and date *)
timeAndDate = dataPCs["dateTimeNow"];
(* associate each floor with its info and sort list *)
mcclayLib =
With[{order = {"Ground Floor Atrium", "Ground Floor Extended Hours",
"First Floor", "Second Floor", "Third Floor"}},
{#[[1]], Row[{#[[3]] - #[[2]], #[[3]]}, " / "]} & /@
SortBy[Values[dataPCs["rooms"][[1 ;;, 2 ;; 4]]],
Position[order, #[[1]]] &]];

(* template for horizontal gauges *)
makeGauge[value_] :=
HorizontalGauge[value, {0, 1},
GaugeMarkers -> Placed[Graphics[Disk[]], "DivisionInterval"],
GaugeFrameSize -> None, GaugeFrameStyle -> None,
GaugeStyle -> {Orange, Lighter[Brown, .8]},
GaugeFaceStyle -> None,
ScaleRangeStyle -> {Lighter[Brown, #] & /@ {.9, .5}},
GaugeLabels -> {None, None},
TicksStyle -> None,
ScaleDivisions -> None, ScaleRanges -> None, ScalePadding -> 0
];

(* gauges for PC usage on each floor *)
gaugesPCs = Table[
makeGauge[
N[(mcclayLib[[fl, 2, 1, 1]]/mcclayLib[[fl, 2, 1, 2]]), 2]],
{fl, Range@Length@mcclayLib}];

(* gauge for total number of occupants *)
gaugesOccupancy =
makeGauge[(FromDigits@currentOccupancy/FromDigits@maxOccupancy)];

(* put everything together *)
Column[{
Framed[timeAndDate,FrameStyle -> Directive[Black, Thin]],
Panel@Grid[
Prepend[
Reverse[Flatten /@
Transpose[{mcclayLib, gaugesPCs}]], {"Total occupancy",
Row[{currentOccupancy, maxOccupancy}, " / "], gaugesOccupancy}],
Alignment -> {"/", Center}, Dividers -> {None, {False, True}},
Spacings -> {Automatic, 2 -> 2}]
}, Alignment -> Center]
]

Note that the values of all gauges are normalised between 0 and 1. This is what we get:

Occupancy and PC usage on all floors.

Deploy Display to Cloud

We can go ahead and deploy this to the Wolfram Cloud to make it accessible to every student.

CloudDeploy[Delayed[
ExportForm[display,
"PNG", ImageFormattingWidth -> 400, ImageResolution -> 250],
UpdateInterval -> 20
], "McClayLibraryOccupancy", Permissions -> "Public"]

The display then gets deployed to this website which automatically refreshes every 20 seconds. It’s also availble via: http://go.qub.ac.uk/mcclay-usagedisplay. And of course, you can scan a QR code to access it:


2. Collecting Data

Create Databin

To analyse the occupancy and PC usage over a period of time, we’ll have to store the data somewhere. I chose to use the Wolfram Data Drop to do this. It allows you to easily create a databin and periodically add data to it using a custom API, email, Arduino, Raspberry Pi, IFTTT and even Twitter!

I used the Wolfram Langauge to create a databin, but you can do this online :

(* connect to Wolfram Cloud first *)
If[$CloudConnected == False, CloudConnect["username","password"],
Print["Connected to Wolfram Cloud."]];

(* create bin *)
libraryBin =
CreateDatabin[<|
"Name" -> "McClay Library",
"Interpretation" ->
{
"datetime" -> "DateTime",
"totalNum" -> Restricted["StructuredQuantity", "People"],
"grndFloorATR" -> Restricted["StructuredQuantity", "People"],
"grndFloorEXT" -> Restricted["StructuredQuantity", "People"],
"floor1" -> Restricted["StructuredQuantity", "People"],
"floor2" -> Restricted["StructuredQuantity", "People"],
"floor3" -> Restricted["StructuredQuantity", "People"],
"totalUsingPCs" -> Restricted["StructuredQuantity", "People"],
"totalNotUsingPCs" -> Restricted["StructuredQuantity", "People"]
},
"Administrator" -> $WolframID,
Permissions -> "Public"|>]
Properties of databin as shown in Mathematica and online.

The databin is available to the Public and it automatically interpretes and associates input values with their appropriate quantites, e.g., number of “People” on each floor. Note that this databin is only temporary and expires a month from now. Data can not be added to it after it expires.

The following list explains what the inputs are:

datetime -> date and time
totalNum -> current library occupancy
grndFloorATR -> Ground Floor Atrium
grndFloorEXT -> Ground Floor (Extended Hours)
floor1 -> First Floor
floor2 -> Second Floor
floor3 -> Third Floor
totalUsingPCs -> total no. of students using PCs
totalNotUsingPCs -> total no. of students NOT using PCs

Properties of databin and metadata

To get properties and metadata of databin:

props = Grid[Options[libraryBin] /. (property___ -> value___) -> {property, value},Frame -> All]
metaData = Column@Normal@libraryBin["Information"]
Databin properties
Databin metadata

Prepare Data for Upload

Using the Wolfram Language, data is usually added to the bin in the form of an Association. Since we already know what the required inputs for our databin are, we can extract the occupance and immediately process it into our desired association.

getLibraryData[] := Block[
{
urlPCs =
"https://www.qub.ac.uk//directorates/InformationServices/pcs/impero/",urlOccupancy =
"https://www.qub.ac.uk//directorates/InformationServices/pcs/sentry",
rawDataPCs, PCsUsageData,OccupancyData, occ,
datetime, f0atr, f0ext, f1, f2, f3, totalUsingPCs, totalNotUsingPCs
},

rawDataPCs = Import[urlPCs];
PCsUsageData = ImportString[rawDataPCs, "RawJSON"];
datetime = PCsUsageData["dateTimeNow"];

(* occupancy *)
OccupancyData = Import[urlOccupancy];
occ = First@StringCases[OccupancyData, ___ ~~ "<occupancy>" ~~ occ___ ~~"</occupancy>" ~~ ___ :> FromDigits@occ];

(* PC usage on all floors *)
{f0atr, f0ext, f1, f2, f3} =
Abs@With[{order = {"Ground Floor Atrium",
"Ground Floor Extended Hours", "First Floor", "Second Floor",
"Third Floor"}},
(#[[3]] - #[[2]]) & /@
SortBy[Values[PCsUsageData["rooms"][[1 ;;, {2, 3, 4}]]],
Position[order, #[[1]]] &]];
totalUsingPCs = f0atr + f0ext + f1 + f2 + f3;
totalNotUsingPCs = occ - totalUsingPCs;

(* our desired association *)
<|"datetime" -> datetime, "totalNum" -> occ, "grndFloorATR" -> f0atr,
"grndFloorEXT" -> f0ext, "floor1" -> f1, "floor2" -> f2,
"floor3" -> f3, "totalUsingPCs" -> totalUsingPCs,
"totalNotUsingPCs" -> totalNotUsingPCs|>
];

Running this gets the data and creates an association we can directly send to our databin. For example, evaluating getLibraryData[] gives:

<|"datetime" -> "Saturday, 23rd April 2016 14:40:05", 
"totalNum" -> 630, "grndFloorATR" -> 88, "grndFloorEXT" -> 50,
"floor1" -> 24, "floor2" -> 25, "floor3" -> 33, "totalUsingPCs" -> 220, "totalNotUsingPCs" -> 410|>

3. Uploading data

The are various ways to upload extracted data to our bin. We’ll so this using the Wolfram Language, since our data will fisrt need to be extracted from a website.

Scheduled task

Scheduled tasks are great for running code in the background. QUB updates the data on the occupancy website once every 10 seconds. However, a basic Wolfram Data Drop only allows a maximum of 60 data entries per hours. So we’ll upload our data once every minute to add as much as possible to the databin each hour. Note that the short ID for our databin is cbeUoury.

libraryBin = Databin["cbeUoury"]
RunScheduledTask[(
assoc = getLibraryData[];
DatabinAdd[libraryBin, assoc]), 60]
We’ve set up a task to upload info to our databin once every 60 seconds

As long as we allow this to run without interrupting the task or closing Mathematica, it’ll run forever. This is fantastic, given that every Raspberry Pi comes with Mathematica pre-installed! We can also add data to the bin using a custom API, Arduino and other different methods.

After a few minutes (and entries) our data can be viewed on the databin site. You can deploy your scheduled task to the Wolfram Cloud and it’ll keep running until you choose to stop it. Although a paid subscription is required to use this service. This will be ideal for gathering data over a lengthy period of time, which is what I aim to do . Unfortunately, I do not have a paid subscription at the moment but will look into getting one. For now, data uploads will be made from Mathematica on my PC or using a Raspberry Pi, or an Intel Edison.

After a few minutes (and uploads) we can view our data.

Cloud-based scheduled task

If we’re deploying our scheduled task to run autonomously in the cloud, we’d use the following code:

dataUploadCloudObject = Block[{shortid = "cbeUoury", databin, assoc},
databin = Databin[shortid];
CloudDeploy[ScheduledTask[
(assoc = getLibraryData[]; DatabinAdd[databin, assoc]),
600],"McClayLibraryDataUpload", Permissions -> "Private"
]]

4. Data Analysis

I ran the program pretty much throughout the exam period with short intervals of inactivity. The entries amassed into a dataset large enough to spot trends and run some analyses.

Dataset

Let’s see an overview of the data we have collected so far:

(* Data collected so far *)
libraryBin = Databin["cK8HWZj4"];
dataEntriesThusFar := libraryBin["Entries"];

(* Delete Null, Missing and Negative cases... Sometimes we get negative cases when the number of available PCs isn't updated correctly. *)
dataEntriesThusFar = DeleteCases[dataEntriesThusFar, Null | getLibraryData2[], {1}];
dataEntriesThusFar = DeleteCases[dataEntriesThusFar, _Missing | _?Negative | getLibraryData2[], {2}]
(* create dataset of entries *)
dataset = Dataset[dataEntriesThusFar]
Dataset of all entries, thus far

Now we have a continuously growing dataset — we can run some analysis and see if we find anything meaningful/useful out of it.

Basic Stats

Find the minimum, maximum, standard deviation, etc of each library location:

Module[{funcs = {"Location", Mean, StandardDeviation, Median, Max, 
Min}},
Prepend[Flatten@{#,
Table[dataset[
IntegerPart@func[Cases[#, _?QuantityQ]] &, #1], {func,
Rest@funcs}]} & /@ {"totalNum", "totalUsingPCs","totalNotUsingPCs", "grndFloorATR", "grndFloorEXT", "floor1", "floor2", "floor3"},
ToString /@ funcs] // Dataset
]
(* visualise charts using box charts *)
makeBoxChart[labels_] := Block[{data},
data = Table[DeleteMissing@Normal@dataset[[All, i]], {i, labels}];
BoxWhiskerChart[data,
{{"Outliers", Large}, {"Whiskers", Thin}, {"Fences", Thin}},
AspectRatio -> 1/2, ChartLabels -> labels, ChartStyle -> "StarryNightColors", BarOrigin -> Left,
BarSpacing -> .5, PerformanceGoal -> "Quality", ImageSize -> 400]];

makeBoxChart[#] & /@ {{"totalNum", "totalUsingPCs",
"totalNotUsingPCs"}, {"grndFloorATR", "grndFloorEXT", "floor1", "floor2", "floor3"}} // gridOfTwoItems
Stats of each library level and whole library PC usage

To see the average number of people on each floor (over the period of time we’ve been collecting data, and NOT in real-time):

Table[(dataset[
BarChart[Floor@Mean[QuantityMagnitude[DeleteMissing[#, 1, Infinity]]],
BarOrigin -> Left, ChartStyle -> "StarryNightColors", ChartLegends -> Automatic, PlotLabel -> "Mean number of people\n", PlotTheme -> "Detailed"] &, i]),
{i, {{"grndFloorATR", "grndFloorEXT", "floor1", "floor2",
"floor3"}, {"totalUsingPCs", "totalNotUsingPCs"}}}] // gridOfTwoItems
Mean number of PCs in use and free on all levels

To find when the library had the highest number of people in it:

Query[MaximalBy[#totalUsingPCs &]] @ dataset
On 17 May, at about 11.20 am was there was the highest number of occupants over the exam period.

Time series analysis

Extract time series of library floors and remove the last timeseries in the list. This timeseries holds failed entries which were uploaded to the databin.

tseries = Delete[libraryBin["TimeSeries"], -1];

makeDatePlot[series_List] :=
Block[{colourScheme = Take[ColorData[89, "ColorList"], Length[series]]},
DateListPlot[(QuantityMagnitude[tseries[#1]] &) /@ series,
AspectRatio -> 1/\[Pi], PlotStyle -> ({#1,
Directive[AbsoluteThickness[0.75`], Opacity[1]]} &) /@ colourScheme,
PlotLabels -> MapThread[Style, {(tseries[#1]["LastValue"] &) /@ series, colourScheme}], Filling -> Bottom,
PlotLegends -> Placed[SwatchLegend[colourScheme, series, LegendLayout -> "Row"], Bottom],
FrameLabel -> Automatic, Frame -> True,
FrameTicks -> {{Automatic, None}, {All, Automatic}},
GridLines -> Automatic, ImageSize -> 800,
PlotRangePadding -> {0, {0, Automatic}},
TargetUnits -> {"DimensionlessUnit", "People"}]];

{makeDatePlot[{"floor3", "floor2", "floor1", "grndFloorEXT", "grndFloorATR"}],
makeDatePlot[{"totalNum", "totalUsingPCs", "totalNotUsingPCs"}]} // Column
Library occupancy and PC usage trends (note that I stopped running the program for a while after exams, hence the flat gap. I ran it again to after some days to carry out some tests.)

Let’s see the timespan of entries, and the latest entry to the databin with its metadata.

Column@Normal@libraryBin["Latest"]
Grid@Transpose[{{"First Entry", "Last Entry"}, libraryBin["TimeInterval"]}]
Some data on the… databin

Location

See where entries were uploaded from:

(* list of geo locations for all entries in the databin *)
geoLocations = libraryBin["GeoLocations"];
geoLocTallyAssoc = Tally[geoLocations] /. {g_GeoPosition, c_} :> {g -> c} // Flatten //Association;

(* plot geo locations on a map *)
pinToMap[highText_] := Column[{highText,
Graphics[
GraphicsGroup[{FaceForm[{Orange, Opacity[.8]}],
FilledCurve[{{Line[
Join[{{0, 0}}, ({Cos[#1], 3 + Sin[#1]} &) /@ Range[-((2 \[Pi])/20), \[Pi] + (2 \[Pi])/20, \[Pi]/20], {{0, 0}}]]},
{Line[(0.5 {Cos[#1], 6 + Sin[#1]} &) /@
Range[0, 2 \[Pi], \[Pi]/20]]}}]}], ImageSize -> 15]}, Alignment -> Center];

findNearestCity[geoCoords_] :=
Row[{First[GeoNearest[Entity["City"], geoCoords]], " (",
geoLocTallyAssoc@geoCoords,
If[geoLocTallyAssoc@geoCoords == 1, " entry", " entries"], ")"}];

GeoGraphics[(GeoMarker[#[[1, 1]], pinToMap[#], "Alignment" -> Bottom,
"Scale" -> Scaled[1]] &[findNearestCity@#1]) & /@ (First /@
Tally[geoLocations]) & /@ (First /@ Tally[geoLocations]),
GeoRange -> "World", GeoProjection -> "Equirectangular",
PlotRangePadding -> Scaled[.05], ImageSize -> 750]
Most of the entries were from my PC (Belfast) while a few were uploaded by calling a custom API I created (I assume the server(s) that handled these are based in Seattle)

5. Distributions

Well done if you’re still reading this — I hope it’s been worth your while so far.

In this section I will discuss my attempt in finding the daily distributions of people using the library and the PCs in it. I followed William Sehorn’s methods in this article to find and visualise the distributions.

Date-value pairs

Shown below are the timestamps and values of entries.

(* convert date-objects to date-strings *)
dateValuePairs = (tseries["totalNum"]["DatePath"]) /. {x_,
y_} :> {DateString@x, QuantityMagnitude@y};
RandomSample[dateValuePairs, 10]

(* Reformat date-strings to date-lists *)
dateValuePairs =
dateValuePairs /. {date_, value_} :> {DateList@date, value}

(* total number of entries *)
Length@dateValuePairs
We have 4866 date-value pairs in total

Daily occupancy distributions

We’ll group all entries into 5-minute intervals over a 24-hour period:

timeIntervals = Tuples[{Range[0, 23], Range[0, 55, 5]}];
timeIntervals[[;; 12]]

Find number of people within each time interval, over the entire data gathering period:

Clear[addOccupancyToTimeInterval, OccupancyAtTimeInterval];
OccupancyAtTimeInterval[_] = 0;
addOccupancyToTimeInterval[{__, hour_, minute_, _}, occupancyCount_] := OccupancyAtTimeInterval[{hour, minute}] += occupancyCount;

addOccupancyToTimeInterval @@@ dateValuePairs;

For example, we can find the total number of people for 5-past-noon, over the entire period:

OccupancyAtTimeInterval[{12, 5}]
We can see that the total number of people between noon and 5-past-noon is 1575

We can now plot the number of people over 5-minute intervals for the entire day:

OccupancyDistributionData = OccupancyAtTimeInterval /@ timeIntervals;
OccupancyDistributionData =
RotateLeft[OccupancyDistributionData, 12*6];

(* averaged daily distribution of occupancy *)
barChartStyleOptions = With[{yticks =
Map[{#*12,
DateString[3600*(# + 6), {"Hour12Short", " ", "AMPM"}]} &,
Range[0, 24, 2]]},
{
AspectRatio -> 1/5, Frame -> True, FrameTicks -> {{None, None}, {yticks, None}},
FrameStyle -> GrayLevel[0.75], FrameTicksStyle -> Black,
GridLines -> {Range[24, 11*24, 24], None}, GridLinesStyle -> GrayLevel[0.87],
Ticks -> None,
BarSpacing -> 0,
PlotRange -> {{1, 288}, All},
PlotRangePadding -> {{1, 0}, {1, Scaled[.15]}},
ImagePadding -> {{12, 12}, {20, 0}}
ChartElementFunction ->
ChartElementDataFunction["SegmentScaleRectangle",
"Segments" -> 20, "ColorScheme" -> "DeepSeaColors"],
PerformanceGoal -> "Speed", ImageSize -> 750,
}
];

BarChart[OccupancyDistributionData, barChartStyleOptions]
Distribution of occupants over a 24-hour period

So… Is it the same pattern everyday? Apparently not.

(* remove June entries *)
totalNumTimeSeries = 
TimeSeriesWindow[tseries["totalNum"], {{2016, 5, 16}, {2016, 5, 31}}];
wd = WeightedData[totalNumTimeSeries["Dates"],
totalNumTimeSeries["Values"]];
DateHistogram[wd, "Day", DateReduction -> "Week",
ColorFunction -> "StarryNightColors"]

Oddly, there’s a deviation from the decreasing-in-number trend on Saturday.

If we reduce the histogram down to the span of a day, we get a clearer picture of the flow of people, in and out of the library:

DateHistogram[wd, “Hour”, DateReduction -> “Day”, 
ColorFunction -> “StarryNightColors”]
We see a smoother increase and fall here, as compared to the previous plot because the data is binned hourly here, and by minute in the previous one.

Array plot of occupants

arrayData = dateValuePairs[[All, 2]];
arrayData = Partition[arrayData, 12*24];
arrayData = Transpose[Reverse /@ arrayData];

arrayPlotStyleOptions = Module[{startDate, endDate, xticks, yticks},
startDate = dateValuePairs[[1, 1, ;; 3]];
endDate = dateValuePairs[[-1, 1, ;; 3]];
yticks = {#*12,
DateString[(24 - #)*3600, {"Hour12Short", " ", "AMPM"}]} &/@ Range[0, 24, 4];
xticks =
With[{month = Take[DatePlus[startDate, {#, "Month"}], 2]},
{QuantityMagnitude@DateDifference[startDate, month],
DateString[month, "MonthName"]}
] & /@
Range[0, First@DateDifference[startDate, endDate, "Month"] + 1];
{
AspectRatio -> 1/4,
Frame -> True, FrameTicks -> {{yticks, None}, {xticks, None}},
FrameStyle -> GrayLevel[.75], FrameTicksStyle -> Black,
ImageSize -> 450, PlotRangePadding -> {{0, 0}, {0, 1}},
ColorFunction -> "StarryNightColors"
}
];

ArrayPlot[arrayData, arrayPlotStyleOptions]

Daily and monthly distributions

(* Auxiliary functions *)
occupancyDataEntryToDay[{{year_, month_, day_, __}, _}] := {year,
month, day};
occupancyDataEntryToMonth[{{year_, month_, __}, _}] := {year, month};
getDailyTotals[data_] :=
Module[{groupedByDay},
groupedByDay = GatherBy[data, occupancyDataEntryToDay];
Map[Total@#[[All, 2]] &, groupedByDay]
]

(* daily distribution *)
dailyOccupancyTotals = getDailyTotals[dateValuePairs];
days = DeleteDuplicates[occupancyDataEntryToDay /@ dateValuePairs];
dailyPlotData = Transpose[{days, dailyOccupancyTotals}];

(* plotting style options common to daily and monthly distribution plots *)
sharedDateListPlotStyleOptions =
{
Joined -> True, Filling -> Axis, ImageSize -> 450, Frame -> True,
FrameStyle -> GrayLevel[.75], FrameTicksStyle -> Black,
FillingStyle -> Automatic, PlotStyle -> Directive[Thickness[.002], Black],
PlotRange -> {libraryBin["TimeInterval"], {0, All}},
GridLines -> {DayRange[First@#, Last@#] &@
libraryBin["TimeInterval"](*{2016,#,1}&/@Range[1,12]*), None},
GridLinesStyle -> GrayLevel[.8]
};

(* plotting options for daily distribution *)
dailyDateListPlotStyleOptions =
With[{padding = .1},
{
FrameTicks -> {{Rest@
FindDivisions[{0, (1 + padding)*Max@dailyOccupancyTotals, 1}, 8], None},
{DateString[#, {"DayShort", " ", "MonthNameShort"}] & /@
DateRange[First@#, Last@#, Quantity[3, "Days"]] &@
libraryBin["TimeInterval"][[;; , 1]], None}},
PlotRangePadding -> {{0, 0}, {0, Scaled[padding]}}, ImagePadding -> Automatic, AspectRatio -> 1/4, ColorFunction -> "StarryNightColors"
}
];

(* daily distribution plot *)
dailyPlot = DateListPlot[dailyPlotData, sharedDateListPlotStyleOptions, dailyDateListPlotStyleOptions]
(* and monthly *)
groupedByMonth = GatherBy[dateValuePairs, occupancyDataEntryToMonth];
monthlyOccupancyAverages =
Map[Mean@getDailyTotals@# &, groupedByMonth];
months = DeleteDuplicates[occupancyDataEntryToMonth /@ dateValuePairs];
monthlyPlotData = Transpose[{months, monthlyOccupancyAverages}];

(* monthly distribution plotting options *)
monthlyDateListPlotStyleOptions =
With[{padding = .3},
{
FrameTicks -> {{Most@
FindDivisions[{0, (1 + padding)*Max@monthlyOccupancyAverages,
1}, 4], None}, {months, None}},
PlotRangePadding -> {{0, 0}, {0, Scaled[padding]}},
ImagePadding -> Automatic, AspectRatio -> 1/4, ColorFunction -> "StarryNightColors"
}
];

monthlyPlot = DateListPlot[monthlyPlotData, sharedDateListPlotStyleOptions, monthlyDateListPlotStyleOptions]
Note the sharp peak before May 22 — I presume that this was the busiest day of the exam period. It must have been really difficult to get a deat on that day, poor students.

Distributions for each floor

Lastly, we want to see how the PC usage evolves over a day on each floor. To generate the following plots, I combined pieces of code from previous sections and formed a function which could be used repeatedly.

plotDailyDistribution[timesries_] := Block[
{dateValuePairs, timeIntervals, OccupancyAtTimeInterval,
addOccupancyToTimeInterval, OccupancyDistributionData,
barChartStyleOptions},

(* convert date-objects to date-strings *)
dateValuePairs = (tseries[timesries]["DatePath"]) /. {x_,
y_} :> {DateString@x, y};
  (* Reformat date-strings to date-lists *)
dateValuePairs[[All, 1]] = DateList /@ dateValuePairs[[All, 1]];

(* create 5 minute intervals over a 24-hour period *)
timeIntervals = Tuples[{Range[0, 23], Range[0, 55, 5]}];

OccupancyAtTimeInterval[_] = 0;
addOccupancyToTimeInterval[{__, hour_, minute_, _}, stepCount_] :=

OccupancyAtTimeInterval[{hour, minute}] += stepCount;
addOccupancyToTimeInterval @@@ dateValuePairs;

OccupancyDistributionData = OccupancyAtTimeInterval /@ timeIntervals;
OccupancyDistributionData =
RotateLeft[OccupancyDistributionData, 12*6];

(* averaged daily distribution of occupancy *)

(* barchart styling options *)
barChartStyleOptions = With[{yticks =
Map[{#*12,
DateString[3600*(# + 6), {"Hour12Short", " ", "AMPM"}]} &,
Range[0, 24, 2]]},
{
AspectRatio -> 1/5, Frame -> True, FrameTicks -> {{None, None}, {yticks, None}}, FrameTicksStyle -> Black, GridLines -> {Range[24, 11*24, 24], None},
GridLinesStyle -> GrayLevel[0.87], PlotRange -> {{1, 288}, All}, Ticks -> None, BarSpacing -> 0,
PlotRangePadding -> {{1, 0}, {1, Scaled[.15]}}, ChartElementFunction ->
ChartElementDataFunction["SegmentScaleRectangle",
"Segments" -> 20, "ColorScheme" -> "StarryNightColors"], PerformanceGoal -> "Quality", ImageSize -> 450, FrameStyle -> GrayLevel[0.75], ImagePadding -> {{12, 12}, {20, 0}}
}
];

Column[{timesries, BarChart[OccupancyDistributionData, barChartStyleOptions]}, Alignment -> Center]
]

(* plot distributions *)
{
Column[plotDailyDistribution[#] & /@ {"totalNum", "totalUsingPCs", "totalNotUsingPCs"}, Spacings -> 0],
Column[plotDailyDistribution[#] & /@ {"grndFloorATR", "grndFloorEXT", "floor1", "floor2", "floor3"}, Spacings -> 0]
} // gridOfTwoItems
Daily distribution for each floor of the library, and for PC usage

Conclusion

This post demonstates (I hope) the enormous power that data holds. This so-called “data” is now seemingly ubiquitous — in fact, theres’s more of it than ever before. Nearly all devices we carry or surround ourselves with produces a copiuos amount of data — some of which, we don’t have direct access to or know what to do with.

Running an analysis such as this over a lengthy period of time, say, years, could reveal a very interesting insight — other than the expected rise and fall of occupants during and outside term times— and indeed useful for Queen’s students. I originally started this out of the frustration of not fining a seat in the library during the summer exams.

These plots only give an idea of the flow of people in and out of the library, but not the full picture. A group of 5 friends may leave the library for lunch — and counter goes minus 5— but perhaps they’ve left their stuff where they were sat. So net seats available is still zero.

Furthermore, when a PC is logged off automatically (due to prolonged inactivity, for instance), the counter goes -1, but the user could still have their stuff there, or indeed, still be sat in front of that very PC!

However, looking at the totalNum plots, it appears one ought to be there before 11am — one ought to follow the trend.

First-come, first-served, I guess.


Originally posted on AeroExplora.org, on 2 July 2016.