Random Link Generator for Datasets

Shraddha Anala
2 min readMay 3, 2020

--

I’m expanding with more posts on ML concepts + tutorials over at my blog!

Hi, I’m Shraddha and I recently dipped toes into data science.

One of the main ways I am able to practice and hone my data science skills is by using datasets available on Kaggle and the UCI Machine Learning Repository.

I like working with data from the UCI Repository as there is more often a chance there, to work with unstructured data and improve your data cleansing and preprocessing skills. Real-world data is often messy and I’ve found many different file formats uploaded on UCI that require careful cleaning.

So to challenge myself every day and keep practising on new data, I’ve programmed a short piece of code that generates a link to a random dataset hosted at the UCI repo.

This is a short implementation of Web Scraping using the Beautiful Soup library in Python. I run this code once a day and then build a machine learning model for the random dataset generated. I intend to post tutorials for the random datasets as well.

Here is the code below.

Random Dataset Link Generator

The link is then stored as a string in the variable ‘random_link’ which you can then copy and paste in your browser. I was unsure of automatic links as I know some websites may block you if they sense a script was trying to open their pages rather than a human. So be warned and make sure to go through the website’s Terms & Conditions before scraping to avoid any illegal activities.

This is my first post on Medium and I hope you find this fun, little code useful to develop machine learning models across a wide variety of data.

P.S Beware of sites that do not allow scraping. This is only intended as a challenge to yourself and not for any other purposes as such.

--

--