Getting Started With Weka
Weka is a widely used tool for performing data mining-related tasks on the dataset or understanding various data mining techniques. Weka is a very interesting tool when it comes to research purposes because it requires no coding and data mining tasks can be performed speedily on it. You can perform every data mining task on Weka in a very fluent and easy manner in just a matter of some clicks. If you are new to this tool and want to explore it further, then you are at the right place. Let’s discuss what Weka is and how you can use deal with datasets on Weka.
What is Weka?
Weka is a tool developed by the University of Waikato in Hamilton. It is a set of machine learning algorithms designed for data mining tasks. The algorithms can be applied to a dataset directly or called from your own Java code. Weka includes data preprocessing, classification, regression, clustering, association rules, and visualization techniques.
Getting Started With Weka
To start exploring Weka, you have to first install the tool for your OS platform. After installing launch, the software and you will see the GUI Chooser application as depicted below:
The GUI chooser application enables us to execute the five services described below:
· Explorer
· Experimenter
· Knowledge Flow
· Workbench
· Simple CLI
We will be using the explorer in our data mining task. Simply clicking on Explorer will launch the interface shown below:
As highlighted above, there are 6 kinds of operations available that we can perform in Weka Explorer. They are as follows:
· Preprocess: It helps in preprocessing your data like replacing missing values.
· Classify: It helps in the classification of your data into various classes using a decision tree, naïve Bayes, etc.
· Cluster: It helps in clustering of your data using SimpleKmeans, hierarchical clustered, etc.
· Associate: It helps in finding association rules in data using the Apriori algorithm, FPGrowth, etc.
· Select attributes: It allows to feature selections based on several algorithms such as ClassifierSubsetEval, PrinicipalComponents, etc.
· Visualize: It helps in visualizing the data with the help of graphs.
Now Let’s See how you can deal with datasets on Weka.
Working With Datasets on Weka
To begin, we will load a dataset into Weka Explorer. There are numerous methods for loading datasets into your Weka explorer, which are highlighted in the diagram below:
As shown in the above figure, there is a total of 7 operations that we can perform on datasets. Out of which 4 are related to fetching of the dataset and the other 3 deal with editing the datasets. The various operations on the dataset are as below:
· Open file: To get the dataset from your local storage.
· Open URL: To get the dataset from the specified URL.
· Open DB: To get the specified database.
· Generate: To create your own dataset.
· Undo: To undo the recently made change.
· Edit: To edit the database i.e., edit values, modify the dataset, etc.
· Save: To save the changed dataset.
Weka accepts data files of the arff, arff.gz, bsi, csv, dat, data, json, and many other formats. You can easily convert your dataset into the format specified. Weka also includes several pre-built datasets such as weather prediction, voting, diabetes, and so on. Consider the example of the Vote dataset that we will be loading in Weka. Follow the steps below to simply load the dataset into Explorer:
First of all, click on the “Open file” option as stated above. You will be taken to your file manager for selecting your preferred dataset. Select your dataset from the location where it is kept. If you want an inbuilt dataset by weka, you can get it from C:->program files->weka->data. It provides the following stated datasets:
You can select any of the above datasets to perform Data Mining tasks. I am loading the Vote dataset. It will open a screen as described below and highlighted part describes the information about the attributes you will be provided with.
Conclusion
Weka is a vital tool when it comes to experimenting and learning about different Data Mining Tasks. We have successfully loaded the Vote Dataset in the Weka Tool and in the next article I will be dealing with the missing values of this dataset. Just stay tuned for the updates!!