Creating a Simulated Dataset from Scikit-learn

You’ll need to create a synthetic data set.

Suraj Yadav
3 min readJul 16, 2022
Image by Author

There are a variety of strategies available within scikit-learn for the generation of simulated data. There are three approaches that are very helpful among those.

1. make_regression() :

make_regression() is a good choice when we want a dataset that is made to be used with linear regression.

Image by author
Parameters :n_samples : int, default=100
The total amount of samples taken.
n_features : int, default=100
The number of features.
noise : float, default=0.0
The standard deviation of the gaussian noise applied to the output.
Although I've focused on the most important ones, there are a plethora of others parameter to consider.

2. make_classification() :

Using make_classification(), we may generate a simulated dataset for classification purposes.

Parameters :n_samples : int , default=100
The number of samples.
n_features : int , default=20
The total number of features.
n_classes : int , default=2
The number of classes (or labels) of the classification problem.

3. make_blobs() :

Scikit-learn gives us make_blobs() if we want a dataset that works well with clustering techniques.

Parameters :n_samples : int , default=100
The number of samples.
n_features : int , default=2
The number of features for each sample.
centers : int , default=None
The number of centers to generate.

The number of clusters that are made is set by the centers parameter, we can see the clusters made by make_blobs() by using the matplotlib library:

I hope you find this article helpful and have learned some new things ❤

Clap if you enjoyed this article and follow for more content like this.

Reference :

  1. https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html
  2. https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html
  3. https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_blobs.html
  4. https://www.oreilly.com/library/view/machine-learning-with/9781491989371/

--

--