AI: Human way of Data mining
Have you ever wondered what’s going on in the mind of a kid while playing the shapes sorting cubes?
Instantly, in the first step, the kid would become curious and tries to figure out a way to play with those objects.
The second step he/she would do is try to match the attributes such as color and shape. The kid iterates through steps one and two several times until he/she gets to some kind of conclusion.
The third step becomes a natural preparation of analyzing, which object fits into the hole by trying to fit those rightly.
The fourth step is to build a pattern and figure out a shape that fits into one of the holes. During these iterations between third and fourth, the kid resorts to analytical techniques — sampling, including, excluding, etc.
The fifth step is to evaluate the same model the kid has built, to see if he/she can accomplish until the kid finally fits all the shapes into the right hole.
Wait! What!? How did the kid learn the data mining approach?
That’s the natural way of the human learning process. Now the above 5 steps, can be related to the below Process diagram showing the relationship between the different phases of CRISP-DM (Cross-industry standard process for data mining)
The cross-industry standard process for data mining, known as CRISP-DM, is an open standard process model that describes common approaches used by data mining experts. It is the most widely-used analytics model and it breaks the process of data mining into six major phases.
- 1: Business Understanding — Focuses on understanding the project objectives and requirements from a business perspective, and then converting this knowledge into a data mining problem definition and a preliminary plan.
- 2: Data Understanding — Starts with an initial data collection and proceeds with activities in order to get familiar with the data, to identify data quality problems, to discover first insights into the data, or to detect interesting subsets to form hypotheses for hidden information.
- 3: Data Preparation — The data preparation phase covers all activities to construct the final dataset from the initial raw data.
- 4: Modeling — Modeling techniques are selected and applied. Since some techniques like neural nets have specific requirements regarding the form of the data, there can be a loopback here to data preparation.
- 5: Evaluation — Once one or more models have been built that appear to have high quality based on whichever loss functions have been selected, these need to be tested to ensure they generalize against unseen data and that all key business issues have been sufficiently considered. The end result is the selection of the champion model(s).
- 6: Deployment — Generally this will mean deploying a code representation of the model into an operating system to score or categorize new unseen data as it arises and to create a mechanism for the use of that new information in the solution of the original business problem. Importantly, the code representation must also include all the data prep steps leading up to modeling so that the model will treat new raw data in the same manner as during model development.
(Am sure you scrolled up to cross-check the steps the kid went through in shape sorting cube game, to see if the activity matches to the above-listed phases ;)
This data mining process happens in each and every learning a human goes through in each stage of its life cycle.
To learn more about each phase of CRISP-DM conceptually visit the below page
To learn the implementation of CRISP-DM, please visit the below pages