Understanding Litter Patterns at Large Events, With Data Science and Clustering.

Alexander Kahanek
rubbish stories
Published in
8 min readJul 21, 2021

Why do we care about litter patterns?

To understand how to best equip ourselves against the constant fight of trash in our planet, we must dig into the patterns of how and why people throw their trash on the ground. This is especially important for large event spaces, that have hundreds (if not thousands) of people constantly running around.

To accomplish this, Rubbish partnered with Startup Grind 2020 to clean up their event space, over the course of 4 days, and tracked everything along the way! A fantastic analysis done by the CEO of Rubbish, Emin Israfil, goes over their journey and how they significantly reduced the event’s litter footprint. However, I came along to try and figure out if we can map the behaviors of littering to the efficacy of Trash Cans, Recycling Cans, and other types of waste bins.

First, who are we?

I am just a person who loves to do things with Data and Machine Learning. From doing research in Applied Natural Language Processing to building nickname generators with trained Deep Learning models.

Rubbish is an amazing company with one big goal: to get other humans involved in cleaning, tracking, and mapping litter. They built an iOS App to encourage users to pick up and track litter in their community, and a Rubbish Beam (a smart litter picker-upper) to make this process easier (and cooler).

TL;DR

Using clustering analysis, we found that the average person will drop their trash on the ground if a waste bin (i.e., Trash Cans, etc.) is 40 feet away.

This article goes into how we came to these conclusions, and some interesting insights and recommendations!

So, where do we start?

Well, first we should get an idea of the litter that was picked up. Let’s start with the breakdown of the event timeline first though: Sunday was Rubbish's baseline cleanup day, where no events were running; Monday was the Startup Grind Opening Night Event; Tuesday was the first event day; Wednesday was the second event day.

Next, let’s get a sense of what the Rubbish Team actually cleaned up over the event.

From this chart, we can see a clear increase in all litter from our baseline day, Sunday. However, on the first event day (Tuesday), we see an increase (260%) in paper litter as clear as day. This is noted to be from flyers, business cards, and other marketing materials. Overall, we see quite a large increase (70%) in litter from the event!

This should give us a simple understanding of what litter the Rubbish Team picked up during the Startup Grind 2020 event.

Well, how does this look in the real world?

The total litter and event space for the cleanup of Startup Grind 2020, done by the Rubbish Team.

From this, we can see most of the traffic was right below the Courthouse Square. This is where most of the booths were located; however, there were also booths surrounding the Courthouse and one block to the left and right (ending at Winslow St and Jefferson Ave). We also see a lot of litter going towards Marshall St and some of the other areas away from the main center.

If we look at our waste bins, we actually see two distinct clusters. One right below the Courthouse Square, which is mostly bathrooms, and another above the Redwood City bus stop. At first glance, the placements of these waste bins are pretty decent, they seem to be spread throughout the area pretty evenly, but, they could use more waste bins towards the main event area below the Courthouse Square.

Okay, but how do we determine the efficacy of the waste bins?

First, we need to cluster our litter objects to their closest waste bins. Although, there are a few caveats to this.

  • We need to account for the curvature of the earth, as our planet is not flat.
  • Certain litter objects cannot be thrown away in every waste bin. For example, we shouldn’t put paper and plastic trash into a Tobacco Ash Can, nor should we put normal trash into a Recycling Can.

Simply, we are just finding the closest allowable waste bin for each litter object. (For anyone familiar with a K-Means Clustering algorithm, this process is similar to that; except our centroids are fixed in numbers and locations.)

From here we can use two simple attributes to measure a waste bin’s efficacy: the litter's distance from the bin, and the amount of litter around the bin.

But first, what do the clusters look like?

After calculating each litter object's closest available litter waste bin, we end up with the following clusters, over the course of the event.

Each color corresponds to a different cluster, where the cluster is the closest waste bin. (n = 143)

Here, each color represents a different litter cluster, and each cluster is associated with its closest waste bin. This helps us identify the range of litter each waste bin is accountable for. One note is that the farther away from the event space we go, the larger our clusters get. This is actually a good thing! We want smaller clusters in the more litter-dense areas because this implies that people do not need to walk as far to throw away their trash, which should help reduce the amount of litter overall.

However, this also means littering could be prevented by spreading out our waste bins more into the not-as-traveled areas. Spacing them out more evenly would allow a closer distance of litter to the waste bins, around the edges. Yet, ideally, we want more waste bins in the heaviest traffic areas.

Finding the perfect way to space our waste bins could be done from data like this; however, we are missing one crucial feature. How full were those waste bins? If waste bins are full, then people are less likely to throw their trash in it, or it might even fall out onto the ground if they do. But this aspect of litter collection is not part of this analysis, as it would be too specific and harder to generalize to future events. Instead, we want to look at the average distance of litter to the waste bin’s cluster.

Speaking of distances …

To see how far a piece of litter is from its closest waste bin on average, look at the following:

A distance of 0 (pure white) means there was no litter collected near the waste bin.

This graph shows the average distance (in meters) of litter to its closest available waste bin, for each waste bin. In theory, waste bins with a lower average distance are better. They are more likely to be grouped in tighter groups of other waste bins and they would be easier for staff to clean up. This metric also gives us a gauge on how far away a waste bin has to be, in order for the average person to drop their litter on the ground (about 12 meters). To consider the walkover “not worth it”.

What about the number of litter per waste bin?

The rows and columns are exactly the same as the above heatmap!

Well, this tells us that we definitely have some outliers. Most waste bins only have 10–20 pieces of litter found near them on a given day; however, a few of them have over 100 pieces found. This could tell us a few things: the waste bins with higher counts had more traffic, they were fuller than others (meaning litter fell out of them), or people were lazier in those areas.

Well, is there a connection between the two?

Surprisingly, there doesn’t seem to be a strong connection between the amount of litter around a waste bin, and the average distance to it.

If we take a look at the above graph, we can see that the two metrics actually have little to do with each other. In fact, using Pearson’s correlation, we find that the R values are:

Pearson Correlation values for the amount of litter and the average distance of litter, per cluster.

Meaning, we saw that only on Wednesday did the two metrics have a slight correlation. This tells us that there does not seem to be an influence of how much litter is already around the area and a person's decision to get their litter closer to the waste bin. However, we truly don’t have enough data to confirm this.

Okay, so what does this all mean?

We found that in general, an average person will drop their trash on the ground if an available waste bin is 12 meters (40 feet) away. Yet, the amount of litter already around the area does not seem to influence a person’s decision to get their trash closer to the waste bin. We also found that most of our waste bins will be getting roughly the same amount of use, except for a select few that are closer to all the action. We also notice that some waste bins are used more often during non-event time, while others are used more heavily during events.

So, when planning large events, we should try to bring waste bins closer to where most of the foot-traffic will be happening, especially if the ones we move are from areas that will be getting little-to-no use. While the rest of the outlying event space should have some form of waste bin about every 40 feet, to help reduce the amount of travel a person has to make to throw their trash away. This logic should help ensure that we collect the most amount of litter in the waste bins, instead of on the ground.

--

--