Data Mining: Understanding those billions of 0s and 1s

Every day several Exabytes of data are generated and stored in data centres. Although most of this data is structured, it must be processed to analyse and interpret the content.

Data Mining’s goal is to go through these vast amounts of data and extract useful information, summarizing, clustering, identifying anomalies or rules. Automatic summarization selects a subset of existing data, that best represents the whole set. On the other hand, anomaly detection is used to find abnormal objects that don’t fit or deviate from the usual model. These techniques are often used in fraud detection, Medical Informatics or sensors event detection. Data clustering consists in forming groups of which elements are related in some way, this approach is used in fields like pattern recognition, data compression or bioinformatics.

One of the first and most famous example of data mining usage is the «market basket analysis», which consists in finding relations between different items in people’s shopping baskets, using association rules and affinity analysis techniques. The objective is to extract frequent combination of products and use them for marketing purposes to boost the company’s sales. This approach is also often applied on the web, finding correlation between users, filling their profiles or in medical areas to assist doctors processing diagnostic rules.

Bioinformatic is another field that actively leads data mining researches. Since it targets solving biological problems by analysis large data sets, data mining methods can be useful to achieve cancer classification, protein structure predictions or analysis of gene expression.

Data mining is thus a must-have if you hold big amounts of data, given that it can help you understand hidden aspects in your data and assist you reaching your target.

Written by : Daniel Antunes, AI Engineer