#4 Data Science : Introduction to Orange tool 🍊

SMIT KHOKHARIYA
5 min readAug 29, 2021

--

This blog is all about an orange tool for data mining. We can do a lot of stuff with the help of the orange tool like visual programming, data visualization, data exploration, data mining, etc… The orange tool is free and open-source and you can install it very easily on any os.

Orange is an open-source data visualization, machine learning, and data mining toolkit. It features a visual programming front-end for explorative data analysis and interactive data visualization, and can also be used as a Python library.

For Downloading orange tool click here

Overview:

Home Screen of Orange

Here is the black canvas of orange where you will do all your data exploration. On the left-hand side, you can see there is a total of 5 sections and that all 5 sections contain different-different widgets which we will use in the future for data exploration.

Check Out the Widgets Catalog of Orange tool here.

Dataset:

In the orange tool, we can load any type of data. There is some test dataset available with the orange tool. We will use Iris Dataset for all the explanations.

Loading dataset we have to use File Widget which is available in Data Section. After clicking on File Widget, it will automatically appear on Canvas then you have to double click on that widget and you will get a window like this.

File Widget Properties

As you can see by default It will load iris.tab dataset. This window also shows information about the dataset. It will also show the information about the columns of the dataset. So that’s how data represent in the file widget. There are other datasets available like titanic, housing, heart disease, etc…You have to explore it by yourself.

Workflow:

There are lots of widgets available with the Orange tool we can connect those widgets with each other in a proper manner and that’s how we can generate workflow. We can also use some inbuilt workflow that comes with the tool. we can use that workflow for our tasks.

Check out the list of Workflow that comes with the tool here.

Classfication of tree workflow

As you can see in the bottom left corner we have option workflow examples. In the workflow examples, we have a lot of options for workflow.

Here I load Classification Tree Workflow.

After loading workflow you can see many widgets are connected with each other. We can also modify the widget as per our needs.

Classification Tree workflow uses to explore the classification of data using Decision tree methods. Let’s see the classification tree for Iris Dataset.

By clicking on Tree Viewer You can get this type of decision tree for your dataset.

Finally, at the end we have to scatter plot and box plot widgets which are connected with the tree viewer so first, you have to select the data from the tree viewer then you can visualize that selected data into the scatter and box plot widgets.

Data Exploration

We can visualize the dataset very easily and also get insights from data using this Orange tool.

I used the below workflow (created by me) to explain.

ML Model Workflow

For the exploration part I used the following widgets:

(1) Data Table: For viewing the information about the dataset in tabular form. Here Iris column is dark i.e it is the target variable.

Data Table Widget

(2) Distribution: It is used to getting information about the distribution of data.

(3) Scatter Plot: Used to visualize data using scatter plot

(4) Bar Plot: It will represent the data into bars. It is a very simple & basic plot.

Bar chart Widget

(5) Linear Projection: In this widget, you can visualize the data up to 3D. For higher dimensions, it will project the data on a linear plane.

(6) Violin Plot: A violin plot is a method of plotting numeric data. It is similar to a box plot, with the addition of a rotated kernel density plot on each side.

Violin plot Widget

After Exploring the data visually I applied Random Forest Widget which is Random Forest Machine Learning Algorithm. Random Forest is connected with a Scatter plot so first, we have to select data points from the scatter plot then it will classify the data into categories using a random forest machine-learning algorithm.

For Testing the Model I used Test & Score Widget. Text & Score widget connects with two widgets one with random forest and another is file widget. File Widgets can have a maximum of five connections so that’s why I used another file widget.

Conclusion:

I hope you learned something from this blog. Explore more about the Orange tool here. You can refer to some other materials on the internet and learn.

--

--