Alirezakarimi
3 min readSep 28, 2020

--

A Useful Wrench for Data Scientists

As computer science and programing grow rapidly, and fast, data science becomes one of the most important field in computer science and programing.

More important, Python is the key to learn data science and work with. Although Python is the most widely used programming language for data scientists today, this language needs more abilities and powers that make Python more useful and interesting. Those who give python that abilities are called tool/library. There are many tool/library out there which each one designed for specific need.

We all have heard and worked with some of them like Pandas, NumPy, Seaborn, Matplotlib and many more. I chose to Scikit-learn (SKlearn) which we have not heard about yet, and we will use in the future as we learn and using machine learning (ML).

What is SKlearn?

This is a ML library for Python and interacts with other Python’s libraries NumPy and SciPy. It has 3 features which are classification, regression and clustering.

What problem is this tool/library designed to solve?

Every feature has algorithms that they use to solve the problems. Those algorithms are different based on 2 categories:

1. Supervised learning: which is the prediction of additional attributes that come with our data.

· Classification:

Which designed to solve problems that have samples from two or more class, and it helps to predict unlabeled data from data that already labeled. These algorithms are logistic regression, decision tree, random forest, gradient-boosted tree, multilayer perceptron, one-vs-rest, Naive Bayes and K-Nearest Neighbors.

The example of problems that solve with these algorithms is determining what the quality of a bottle of wine is based on features like acidity and alcohol content.

· Regression:

If the problem includes one or more continuous variables, then we use the regression model.

The sample example for regression modeling is when we want to see the relation between the rate of birth and age of mothers.

2. Unsupervised:

· Clustering:

in which the training data consists of a set of input vectors x without any corresponding target values. The goal in such problems may be to discover groups of similar examples within the data, where it is called clustering, or to determine the distribution of data within the input space, known as density estimation, or to project the data from a high-dimensional space down to two or three dimensions for the purpose of visualization.

Example for this model is when we have 2 or more types in one data set, and we don’t have extra information to label them, we use clustering model to split the observation into well-separated group called clusters.

How well does SKlearn solve that problem?

Sklearn is one of the most powerful tools to make models and solve problems in Python. It is very comprehensive for every model that we want to make. The number and the variety of algorithms show that how powerful is this tool. Moreover, this became very popular and many big and major companies like Spotify, JPMorgan and many more are using Sklearn to make their models.

What are the main alternatives or competitors to Sklearn?

MLlib, Weka, Google Cloud TPU, and XGBoost.

Who originally made, and who currently maintains, this tool/library?

The original author is David Cournapeau, and the initial release for that is June 2007. Its name stems from the notion that it is a “SciKit” (SciPy Toolkit). Later in 2010, Fabian Pedregosa, Gael Varoquaux, Alexandre Gramfort and Vincent Michel developed it, and released it for the first time in February 2010.

Links to documentation and/or tutorials for Sklearn:

tutorial

User guide

Links to examples of projects or blog posts:

Examples

Sources and citations:

By The scikit-learn developers — github.com/scikit-learn/scikit-learn/blob/master/doc/logos/scikit-learn-logo.svg, BSD, https://commons.wikimedia.org/w/index.php?curid=71445288

“Scikit-Learn.” Wikipedia, Wikimedia Foundation, 21 Sept. 2020, en.wikipedia.org/wiki/Scikit-learn.

“Learn.” Scikit, scikit-learn.org/stable/.

--

--