Member-only story
Sparse Matrices: Why They Matter for Machine Learning and Data Science
And why you should care
Introduction
What is sparse data?
When representing data using a matrix, we can quantify the number of empty values it contains. This is referred to as its sparsity. A matrix (or dataset) that mostly contains zeros is called a sparse matrix.
A simple example
Suppose you ask 4 of your friends to give you a rating of 4 different movies from 1 to 5 (or zero if they have not seen it). Now, imagine you get the following ratings:
This means that John has not seen movies 1, 2 and 4 but gave the 3rd one a rating of 2.
The sparsity of this matrix is low - 38 % to be precise (6 zeroes out of 16 values = 3/8 sparsity) and we would actually call it a “dense” matrix. , Now, imagine that you have a lot more movies. Imagine you have 15,000 movies (the size of the Netflix catalogue).
As you can guess, most people have not seen all 15,000 movies on Netflix. Therefore, given that each unseen movie gets a rating of zero, you can…