An Insight to Data Mining Algorithms
One of the most instructive lessons is that simple ideas often work very well, and I strongly recommend the adoption of a simplicity-first methodology when analyzing practical datasets.
There are many different kinds of simple structure that datasets can exhibit.
In one dataset, there might be a single attribute that does all the work and the others may be irrelevant or redundant.
Inferring rudimentary rules
In any event, it is always a good plan to try the simplest things first.
The idea is this:
we make rules that test a single attribute and branch accordingly.
Each branch corresponds to a different value of the attribute.
It is obvious what is the best classification to give each branch: use the class that occurs most often in the training data.
Missing values and numeric attributes
Although a very rudimentary learning method, 1R does accommodate both missing values and numeric attributes.
It deals with these in simple but effective ways.
Missing is treated as just another attribute value.
So that, for example,if the weather data had contained missing values for the outlook attribute, a rule set formed on outlook would specify four possible class values, one each for sunny, overcast, and rainy and a fourth for missing.
The 1R method uses a single attribute as the basis for its decisions and chooses the one that works best.
Another simple technique is to use all attributes and allow them to make contributions to the decision that are equally important and independent of one another.
Constructing decision trees
Decision tree algorithms are based on a divide-and-conquer approach to the classification problem.
They work from the top down, seeking at each stage an attribute to split on that best separates the classes; then recursively processing the sub-problems that result from the split.
This strategy generates a decision tree, which can if necessary be converted into a set of classification rules — although if it is to produce effective rules, the conversion is not trivial.