TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Member-only story

Why We Use Sparse Matrices for Recommender Systems

Introduction to SciPy’s Sparse Module

David Chong
TDS Archive
Published in
6 min readMay 9, 2020

--

In recommender systems, we typically work with very sparse matrices as the item universe is very large while a single user typically interacts with a very small subset of the item universe. Take YouTube for example — a user typically watches hundreds if not thousands of videos, compared to the millions of videos YouTube has in its corpus, resulting in sparsity of >99%.

This means that when we represent the users (as rows) and items (as columns) in a matrix, the result is an extremely sparse matrix consisting of many zero values (see below).

Sparse User-item Matrix (Source by Author)

In a real-life scenario, how do we best represent such a sparse user-item interaction matrix?

To understand this, we have to understand the two major constraints on computing — and . The former is simply what we know as “how much time a program takes to run” whereas the latter is “how much RAM is being used by the program”. The former is quite straightforward but as for the latter, making sure our program doesn’t consume all our memory is important especially when we deal with large datasets, otherwise we would encounter the famous “out-of-memory” error.

Source: StackExchange by alessandro308

Yes, every program and application on our PC uses some memory (see below image). When we run matrix computations and we want to store those sparse matrices as a Numpy array or Pandas DataFrame, they consume memory as well.

Mac’s Activity Monitor (Source by Author)

To formalize these two constraints, they are known asand(memory).

Space Complexity

When dealing with sparse matrices, storing them as a full matrix (from this point…

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

David Chong
David Chong

Written by David Chong

Software Engineer @ Shopee; Closet n3rd; Husband & Father; LinkedIn → bit.ly/3CmUbUf; Medium — tinyurl.com/2rk9ub8k; Support me → tinyurl.com/davidcjw

No responses yet