Recurrent Neural Nets and Nairobi Traffic.
The end game for @safiribot would be to create an artificial intelligence agent that would navigate Nairobi traffic at a level of expertise that no human could ever achieve. The AI agent would be a routing algorithm that would know which routes to take and which ones to avoid. It would decide whether to stay put on a congested road or take diversions in its original planned routes. It would constantly make plans and monitor its progress, adjusting its parameters accordingly, to satisfy an optimal function. It would look at all road conditions (Accidents, Traffic, Demos and closed sections) leading up to the destination and plan its way there. The purpose of all this planning would be to enable the AI to arrive at a given destination in the least amount of time and or consuming the least amount of fuel (depending on your preference). An important aspect is that its decisions should be an order of magnitude better than the best human driver. The agent should be superhuman in its ability to navigate Nairobi roads. This seems like an impossible goal: the agent must plan for traffic and predict it with acceptable error rates and it must also deal with unexpected occurrences such as accidents and demos. Moreover, the routes taken should not lead to wasteful use of fuel. A huge amount of data may be needed to train this agent. The data as of now exists in an unstructured, incomplete and skewed form. Data is not the only problem. Compute power and algorithms may present an even bigger challenge. However, the value extracted from finding a solution far outweigh the challenges. This is why the problem is worth solving. Creating a superhuman navigator is the ultimate goal for @safiribot. The first step to achieving this goal, we believe, is to intuitively understand the data. We need to deal with inconsistencies, account for incompleteness and extract a structure from noisy inputs. The following is a partial solution to the first step.
When you think about the data generated from traffic update tweets there are certain correlations in reports. Traffic reports occur in part A of a road because there have been traffic reports in part B of the same road. A traffic report at 5.30 pm on some part of Thika road may occur because there was traffic reported at 4.30 pm in another area on the road. Similarly the 4:30 pm report occurred possibly due to a third traffic report made at 3.10 pm. We do not know which report influences which and to what degree. Traffic occurrences in a given road length can be defined as a sequence of occurrences where one event depends on another. If the theory is that traffic propagates throughout a network can we find a structure in this sequence? Can we predict with a degree of certainty when the next report of traffic is going to be posted given all this previous sequential reports? It turns out the answer is yes. Consider the following graph derived from 3000 Thika road traffic reports starting from December 2015 to August 2016:
The y-axis represents the time of the day and the x-axis represents the number of reports arranged sequentially from latest to earliest. The dots blue dots represents each individual report. The blue line is drawn to illustrate the data as a sequence. All that remains now is to predict the sequence. This is done by dividing the data into three sets: a training set, a test set and a cross validation set. An algorithm is then let loose on the training set while its tweaking parameters as guided by evaluations on the test set. Finally, the cross validation set is used to find out how accurate your algorithm is.
There is a certain class of neural networks that works very well with this sequential data. Recurrent neural networks have been used in a number of tasks including object recognition, automatic translation of text and in the creation of chat-bots. For a more nuanced demonstration of what they can accomplish consider this: If you feed any program (php, c++ python or java) source code, letter by letter, into a certain variant of a recurrent neural network, it can learn syntax rules like how to define a function. Indeed it will write a return statement before it ends its function definition. A good read about this experiment and how recurrent neural networks do this can be found here. The idea behind recurrent neural nets is that the outputs at time t is a function of the input at time t-1, t-2 and so on. Thus data is sequentially looped in a recurrent neural network. It doesn’t matter what your data is. It can be stock market prices or the win rate of your favorite football club (Arsenal). Everything can be predicted to some degree of accuracy. The variant of recurrent neural network used for our task is the long short term memory (The one that can write syntax correct code). This variant is easier to train and can learn longer sequences. It consists of gated layers so it knows which parts of a sequence to forget and which parts to remember. The details are much more complicated than this. Here is an excellent blog post explaining how they work.
So the task at hand is feeding the sequence illustrated by the blue line into our recurrent neural network and telling it to trace a line just like it. Well, after a few minutes of training here are the points that emerge in our data.
Notice how the red dots try to mimic the relative positions of sequences of blue dots. It should be noted that for every red point drawn this LSTM looks at five of previous blue dots in the sequence to determine where it should place its red dot. Now lets try to mimic the blue line:
Not bad considering the amount of data we have (Not enough). The neural network learns (with acceptable accuracy) to trace a red line on top of the blue line. Where the data point is extreme such as at point 200 on the x axis, the network clearly forgets that it ever saw such a value. Which is a good thing.
Still there are some problems that need solutions.
1. The red line could be made more accurate by more data. You want the blue line completely hidden by the red one.
2. Specific locations are needed to make this model useful. Concretely, we need to know what is the trend when traffic jams occurs on a road. Do they start at area A before hitting area B and C? etc
This model partially solves the three challenges we identified
1. It ignores inconsistencies and extreme data points.
2. It learns sequences from noisy inputs. We can go ahead and predict what time the next traffic report will occur.
3. It brings us closer to a superhuman routing agent.
Have questions? My twitter is: @menace_