Dataset and Software License Types You Need to Consider

You can’t use datasets and software as much as you want. Certain restrictions apply!

One of the biggest reasons for emerging data science, machine learning and deep learning is freely available software and datasets. It will be hard to find any software or datasets for most of us if they are not freely available to download and use!

The term “free” in free software and free datasets does not mean something that “you can use as much as you want”. Certain limitations and restrictions apply!

Most of us do not consider the license type of software or datasets.

See the following examples:

Breast cancer dataset license at (Screenshot by author)

In the above examples, you can see that popular software like TensorFlow, popular datasets like Breast Cancer Dataset and even the contents of the software documentation pages are protected by some type of license.

Today, in this article, I will explain dataset and software license types along with “open source” and “public domain” concepts. MIT, Apache, BSD, GPL, CC, ODC & CDLA licenses are explained.

The information provided in this article is useful for both licensee (the person who is receiving the software and dataset usage rights from another party) and licensor (the person who is providing the software and dataset usage rights to another party).

Open source and open-source software (OSS)

First of all, you need to know that “open source” is just an idea, not a license type. The core idea behind open source is that people in the world can better learn and develop when software is free to download, use and share.

In general, open-source software is free to download, use, modify and share. Again, you should remember that “open source” is not a license, it is just an idea. So, specific license restrictions may apply for open-source software.

Things that do not need to be provided by OSS



