What’s the relationship between machine learning and data mining?

This is not an easy question because there is no common agreement on what “Data Mining” means. But, I am going to say that I disagree with the answer from Wikipedia that Yuvraj Singla points to. I don’t think saying that machine learning focuses on prediction is accurate at all although I mostly agree with the definition of Data Mining focusing on the discovery of properties on the data.

So, let’s start with that: Data Mining is a cross-disciplinary field that focuses on discovering properties of data sets. (Forget about it being the analysis step of “knowledge discovery in databases” KDD, this was maybe true years ago, it is not anymore).

There are different approaches to discovering properties of data sets. Machine Learning is one of them. Another one is simply looking at the data sets using visualization techniques or Topological Data Analysis

On the other hand Machine Learning is a sub-field of data science that focuses on designing algorithms that can learn from and make predictions on the data. Machine learning includes Supervised Learning and Unsupervised Learning methods. Unsupervised methods actually start off from unlabeled data sets, so, in a way, they are directly related to finding out unknown properties in them (e.g. clusters or rules).

It is clear then that machine learning can be used for data mining. However, data mining can use other techniques besides or on top of machine learning.

Btw, to make things even more complicated, now we have a new term, Data Science, that is competing for attention, especially with Data Mining and KDD. Even the SIGKDD group at ACM is slowly moving towards using Data Science. In their website, they now describe themselves as “The community for data mining, data science and analytics[1]. My bet is that KDD will disappear as a term pretty soon and data mining will simply merge into data science.

[1] About SIGKDD


Originally published at www.quora.com.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.