Member-only story

Sparse Matrices: Why They Matter for Machine Learning and Data Science

And why you should care

Thomas Le Menestrel
Towards Data Science
6 min readApr 26, 2022

--

Source: Unsplash

Introduction

What is sparse data?

When representing data using a matrix, we can quantify the number of empty values it contains. This is referred to as its sparsity. A matrix (or dataset) that mostly contains zeros is called a sparse matrix.

A simple example

Suppose you ask 4 of your friends to give you a rating of 4 different movies from 1 to 5 (or zero if they have not seen it). Now, imagine you get the following ratings:

An example matrix (Image by author)

This means that John has not seen movies 1, 2 and 4 but gave the 3rd one a rating of 2.

The sparsity of this matrix is low - 38 % to be precise (6 zeroes out of 16 values = 3/8 sparsity) and we would actually call it a “dense” matrix. , Now, imagine that you have a lot more movies. Imagine you have 15,000 movies (the size of the Netflix catalogue).

As you can guess, most people have not seen all 15,000 movies on Netflix. Therefore, given that each unseen movie gets a rating of zero, you can…

--

--

Towards Data Science
Towards Data Science

Published in Towards Data Science

Your home for data science and AI. The world’s leading publication for data science, data analytics, data engineering, machine learning, and artificial intelligence professionals.

Thomas Le Menestrel
Thomas Le Menestrel

Responses (3)