Big Data: What Tools Can We Use To Understand It
At the World Economic Forum in Davos this year, the topic of note was Big Data. The digital revolution is quickly being heralded as the fourth industrial revolution, with the physical, digital and biological worlds merging through technology. Indeed, areas like robotics, artificial intelligence and the Internet of Things have seen some overwhelming progress.
But the fact that really lies at the very core of this revolution is that billions of people are now connected together by devices like cell phones, which possess immense processing power and storage capacity. And that is what has made the Big Data boom possible beyond the CERN data center’s (which through its Large Hadron Collider experiments, collects roughly 30 pentabytes of data per year) wildest dreams.
In 2015, Facebook users sent an average 31.25 million messages and viewed 2.77 million videos per minute. And within five years there will be over 50 billion connected smart devices in the world, all developed to collect, analyze and share data. Digital information simply doubles every 18–24 months. On a yearly basis, the volume of information is growing at rates that regularly demand more and more sophisticated tools to analyze and structure it.
That is where approaches like deep learning come into play. The premise of deep learning is to uncover rich, hierarchical models that represent probability distributions over the kinds of data encountered in artificial intelligence applications, such as symbols in natural language corpora. Organizations like Findo are taking part in research that will help create models capable of understanding and then generating text data. This “real understanding” is but one step toward the goal of creating artificial “human” cognition, and the overarching goals of Artificial Intelligence research.
With regard to this, Findo is working on Natural Language Processing in personal clouds of files, emails, notes and contacts. It is also working on a product field of personal assistants and bots. Most of these are rule-based but their existence indicates a strong demand for such solutions. Findo is therefore working to solve the problem of creating the smart personal assistant that helps individual customers search their personal clouds. It focuses on analyzing text data: email, files, and notes distributed across devices, cloud storages, tags, and folders.
The creation of such an assistant isn’t a roadblock as much as is the development of an unsupervised learning network that can be trained in different languages. This can be got around by studying the patterns of the ideal search example, where it is found that information should be found not by exact keywords, but by description. This is referred to as “smart search”. Findo is also working on a system that can understand patterns in personal data and organize it in folders. The idea is to let users experience a “smart search” or “knowledge discovering” experience.
Deep statistical models that contain many layers of latent variables and millions of parameters can be learned efficiently. They can show the learned high-level feature representations and links or hidden links between data. They especially allow data links to be generated dynamically when the data changes quickly: when you meet a new person, are hired in a new company or start a new project. Strict rules will not control it.