Dataset and Software License Types You Need to Consider

You can’t use datasets and software as much as you want. Certain restrictions apply!

Rukshan Pramoditha
Data Science 365

--

The most restrictive Creative Commons (CC) license type (Photo by Umberto on Unsplash)

One of the biggest reasons for emerging data science, machine learning and deep learning is freely available software and datasets. It will be hard to find any software or datasets for most of us if they are not freely available to download and use!

The term “free” in free software and free datasets does not mean something that “you can use as much as you want”. Certain limitations and restrictions apply!

Most of us do not consider the license type of software or datasets.

See the following examples:

Breast cancer dataset license at https://www.kaggle.com/datasets/uciml/breast-cancer-wisconsin-data/metadata (Screenshot by author)

In the above examples, you can see that popular software like TensorFlow, popular datasets like Breast Cancer Dataset and even the contents of the software documentation pages are protected by some type of license.

Today, in this article, I will explain dataset and software license types along with “open source” and “public domain” concepts. MIT, Apache, BSD, GPL, CC, ODC & CDLA licenses are explained.

--

--

Rukshan Pramoditha
Data Science 365

3,000,000+ Views | BSc in Stats | Top 50 Data Science, AI/ML Technical Writer on Medium | Data Science Masterclass: https://datasciencemasterclass.substack.com