Here’s How You Can Create Custom Datasets in a Second

Build the perfect dataset to test your model within seconds

Retin P Kumar
Geek Culture
3 min readDec 8, 2021

--

Photo by Mika Baumeister on Unsplash

As a beginner, it might be a bit daunting to start working on an already existing dataset.

Or maybe you want to test a model real quick or practice building a model on the latest algorithm you’ve learned.

But you don’t want to waste your time behind data cleaning and stuff.

This article will help you build your own custom dataset within the blink of an eye and you can start practicing as soon as you want.

In this article, we will be making our own custom function that generates random datasets according to our requirements. We will create this function with the help of the Sklearn library.

We will be using functions defined for creating random datasets as defined in the Sklearn datasets module.

So, let's begin.

Creating Custom dataset

We will start by importing the make_regression() method from the Sklearn module for creating a dataset for regression.

Then we will create a regression object by passing the required parameters.

For now, we will pass arguments for creating a dataset with 100 samples, 4 features, and 1 target.

Now, that our dataset is ready, but it is in the form of a tuple of arrays.

We need to convert it to a data frame for easy processing.

So, we will define our columns first.

Now, let's generate our data frame.

Now, let's concatenate features and target dataframe to obtain our final data frame.

png

That’s it. Our custom dataframe is ready and now you can start practicing with your machine learning models real quick.

But wait, what about datasets for other tasks?

No worries. I have made a custom function for you all which you can copy-paste in your terminal/window/notebook wherever you like.

Custom Function for Creating Custom Dataset

You can make use of this function to generate your custom datasets at ease.

You should provide the required number for the problem to generate the corresponding dataset.

For example, to make a dataset for regression, type

png

For a classification dataset,

png

Similarly, you can pass the argument and get your required dataset.

Hope you liked the post.

Follow me for more actionable data science and machine learning content.

You can visit my GitHub profile at: https://github.com/Retinpkumar for accessing the code files related to my blog posts.

Thank you and have a nice day.

--

--