Advent at Grakn Labs: Predictive Analytics

#GraknLovesTech: 15th December 2016

Source: Photo taken by Outreachhr.com under CC02

Here at Grakn Labs we love technology. So much so that, this month, we’ve decided to share our favourite technology moments from 2016. Each weekday during December, we will open a window on our virtual advent calendar, and peek inside to recall some of the greatest innovation or news that the past year has brought us.

Please recommend and share with hashtag #GraknLovesTech if you enjoy our posts. And if you have any favourite links you’d like us share, just leave us a comment or tweet us @graknlabs!

So now it’s my turn to peak into our virtual advent calendar and today’s techie topic is . . . Predictive Analytics. Great, analytics is quite a large field so today lets focus on common goal of the field which is making predictions based on past observations.

Can I predict the future ?

Does this mean we can predict future lottery numbers based on past lottery numbers? Sadly no, but, if anyone wants to prove me wrong, I will require at least 3 successful live demonstrations before I am convinced.

I am not going to get into too many details in this article as the field is quite large and I am far from an expert. I am just going to touch on the general process used when trying to make predictions using historical data. Then I am going to poke my head into some cool tech within this field.

Step 1: Get the Data

Step 2: Analyse the Data

You should also ensure your data is of good quality. A reliable source alone does not ensure quality. What if you scraped your data from wikipedia on the day someone thought it would be fun to vandalise the articles you were mining? Running your data through existing analysis pipelines could be quite informative and a simple method of spotting questionable data. More formally you can use confirmatory factor analysis to ensure your extracted data will at least fit your model. It is also recommend that you apply other statistical techniques to ensure your data can account for variance, false positives, and other issues which often crop up from real world data.

Step 3: Model the Data

Similarly to data extraction, your models should undergo the same scrutiny. You should ensure that your models are valid representations of the issue you are trying to predict. Consulting with domain experts is often a good idea. Trying to predict inflation for the next years? Well you should probably speak to an economist as a first step when defining the model. When modelling ontologies at Grakn Labs I cannot count the times an expert on hand would have saved us from hours of deliberation.

Step 4: Predicting the Future

I would have stuck with the charts.

Your data is extracted, cleaned, quality checked, and fits your model. Time to start peering into your crystal ball and predicting the future. . . Oh wait, there are multiple crystal balls to use and you not sure which one will work.

This is where the massive field of machine learning can come into play. There are a multitude of ways to start recognising patterns in your data and exploiting those patterns. Neural Networks, Linear Regression, Bayesian Networks, Deep Learning: all of these and many more can help you to start making predictions. Personally, I recommend Graph Based Analytics, but I may be a bit biased here.

Luckily, data analytics is becoming so desirable these days that many of these tools are available as simple applications. This means that it is now much easier to start analysing your data without the need to understand how each crystal ball works.

If I can’t predict the lottery then why bother?

Predictive analytics also plays a big role in project management. Any large project can fail and estimating the likelihood of that failure is an important part of deciding if it should be attempted or not. Before the days of predictive analytics, we would rely on experience and instinct on these matters. Now we can formally measure the chances of success of a project using these techniques. Although anyone working in a startup has probably learned to ignore these risks.

The number of applications are countless, but one of the reasons I think it is worth pursuing this field is a more fundamental one. Warning: I am going to become a bit of an idealist here. Imagine if we could make predictions on a truly large and accurate scale. What mistakes could we have avoided? The subprime mortgage crisis in 2007? Terror attacks? The Ebola outbreak? World wars? I would like to think that these mistakes could have been avoided with the right tools in place.

Predicting Predictive Analytics

This provides us with more easily accessible tools which enable us to perform analytics. This also means that we are likely to become more accurate in our estimations as we consume more data in making these estimations.

What is interesting to see is that a process which was formerly exclusively reserved for Data Scientists is now becoming accessible to everyone. In fact, the open source community is also starting to produce analytics platforms such as these. Furthermore, we are seeing open source APIs even incorporate these features. Apache Tinkerpop’s Graph Computer is a perfect example of something which has the potential to perform predictive analytics.

I believe that, in the future, we will continue to see an adoption of these technologies as demand continues to grow. I am particularly interested to see how we will make these features more accessible and easy to work with.

GRAKN.AI is an open-source knowledge graph data platform that brings knowledge ontologies and transactional data together to enable highly intelligent querying of data. Querying is performed through Graql, a declarative, knowledge-oriented graph query language for retrieving explicitly stored and implicitly derived information, and for performing graph analytics and automated reasoning. Grakn and Graql will help you effectively manage and harness large-scale graph data by allowing you to model it expressively, migrate it efficiently, and to draw insightful knowledge from your deep information network.

Find out more about Grakn Labs from our website!

Vaticle

Creators of TypeDB and TypeQL

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store