Leveraging your GPS data using Geospatial analytics
The advent of sharing economy has brought a sea change in the way urban populace commute locally. The Ubers, Lyfts and many other local players have made taxi riding convenient, affordable and safe. These rides have emerged as a strong alternative to the public transport clocking millions of rides per month in some cities. The emergence of hyper-local delivery models to optimize the supply chain has also led to a large number of daily trips by these vehicles.
These developments have mandated the installations of either standalone or smartphone app-based GPS devices to keep track of and better regulate these rides and a fleet of taxis. These GPS systems spew a ton of data generating up to GBs of data per second. With the automobile & technology experts predicting that self-driving cars would replace human-driven cars in no more than a decade, the volume and velocity of GPS data is only set to increase. With that context in mind, it becomes imperative to understand the GPS data and the kind of insights which can be obtained by analyzing it.
A GPS or a GPS-enabled device can produce all or some of the data points mentioned below at a specified frequency (generally one record per second):
Coordinates — The latitude and longitude values are the primary data points provided by GPS devices. A set of latitude and longitude values is sufficient to locate a point on the earth. For example, (51.5007° N, 0.1246° W) denotes Big Ben in London. Just to brush up, latitude is the angular separation of a point from the equatorial plane in north or south direction while longitude is the angular separation of a plane containing the point in east or west direction relative to the plane containing the prime meridian. A collection of latitude and longitude values over time can reveal the trail followed by the vehicle. Direction — This data point denotes the geographic direction in which the vehicle is moving at that instant. A direction of 450would mean that the vehicle is headed in north-west direction while 2250 would mean that is going in south-west direction. North is taken as the reference (00) Speed — The instantaneous rate at which the vehicle is travelling. Timestamp — A timestamp data point can be stripped to get year, month, day, hour, minute and second information from each record Additional data — GPS enabled devices can also send additional information like whether a taxi is carrying a passenger or not or the amount of payload a truck is carrying. These become very powerful when combined with the coordinates and timestamp data.
Since the size of GPS data, more often than not, is huge, it makes sense to load such data into distributed file frameworks like HDFS and then process it using tools like Hive and Spark. The processed results can be visualized in tools like R Shiny, Tableau, D3.js and Excel. If the data size is small and if one is interested in prototyping an analytics use case then Python can be used as well.
Posted on 7wData.be.