Here’s How You Can Create Custom Datasets in a Second
Build the perfect dataset to test your model within seconds
As a beginner, it might be a bit daunting to start working on an already existing dataset.
Or maybe you want to test a model real quick or practice building a model on the latest algorithm you’ve learned.
But you don’t want to waste your time behind data cleaning and stuff.
This article will help you build your own custom dataset within the blink of an eye and you can start practicing as soon as you want.
In this article, we will be making our own custom function that generates random datasets according to our requirements. We will create this function with the help of the Sklearn library.
We will be using functions defined for creating random datasets as defined in the Sklearn datasets module.
So, let's begin.
Creating Custom dataset
We will start by importing the make_regression() method from the Sklearn module for creating a dataset for regression.
from sklearn.datasets import make_regression
Then we will create a regression object by passing the required parameters.
For now, we will pass arguments for creating a dataset with 100 samples, 4 features, and 1 target.
reg = make_regression(n_samples=100, n_features=4, n_targets=1)
type(reg)tuple
Now, that our dataset is ready, but it is in the form of a tuple of arrays.
We need to convert it to a data frame for easy processing.
So, we will define our columns first.
columns = [f"F{i}" for i in range(1, 7)] #defining our column names
Now, let's generate our data frame.
# features
features = pd.DataFrame(reg[0], columns=columns)
# target
target = pd.DataFrame(reg[1], columns=['Target'])
# checking dataframe shape
print("Dataset shape: ", features.shape, target.shape)Dataset shape: (10000, 6) (10000, 1)
Now, let's concatenate features and target dataframe to obtain our final data frame.
# concatenating features and target.
df_reg = pd.concat([features, target], axis=1)
df_reg.head() # looking at first 5 observations
That’s it. Our custom dataframe is ready and now you can start practicing with your machine learning models real quick.
But wait, what about datasets for other tasks?
No worries. I have made a custom function for you all which you can copy-paste in your terminal/window/notebook wherever you like.
Custom Function for Creating Custom Dataset
You can make use of this function to generate your custom datasets at ease.
You should provide the required number for the problem to generate the corresponding dataset.
For example, to make a dataset for regression, type
makedf(1).head()Dataset shape: (10000, 6) (10000, 1)
For a classification dataset,
makedf(3).head()Dataset shape: (10000, 6) (10000, 1)
Similarly, you can pass the argument and get your required dataset.
Hope you liked the post.
Follow me for more actionable data science and machine learning content.
You can visit my GitHub profile at: https://github.com/Retinpkumar for accessing the code files related to my blog posts.
Thank you and have a nice day.