Cities around the world have an increasing number of inhabitants, and when the number of people in an area increases also the production of garbage is increased. This created the dilemma of collecting this garbage. By making this process more efficient the garbage will be on city streets for a shorter duration and thus have less negative effects on the environment. When garbage can be detected in a timely manner this will allow for a more efficient reaction from the local government, which in turn can deploy the correct resources to solve the problem. For example a truck that can pick up bulky waste or enforcers that can enforce when local regulations are broken.
In this story I will describe how we have created a computer vision model that can detect garbage on a image. We hope to use this model that scan our streets for garbage and use this data to increase the livability of the city.
What will be detect?
The first version that we wanted to use in experiments had to be capable of detecting household waste like garbage bags and cardboard. We will also learn it to detect household waste containers. This will for example allow determining the proximity of the garbage to a container which can later be used for analysis.
In a later version other types of garbage will be added, for example bulky waste such as a fridge or a sofa, or litter such as cans or bottles.
What model will be used?
The model that we will use for the garbage detector is called YOLOv3. Not the most accurate model, but it is one of the fastest. Since we want to process a high number of images this performance is vital for the success of the experiment.
To create a computer vision model examples are required so that the model can learn to identify features of these objects. So basically what is required is images of garbage and containers and information about where the objects are located on the image.
For the creation of the first model we have used pictures of waste taken by civil servants.
To prevent the model from detecting false positives, also images of street scenery with no waste on this have been added.
The data has been annotated by using CVAT and converting the output to yolo file format. CVAT is an opensource annotation tool that can easily be deployed using Docker. The XML files outputted by CVAT can be converted using this script.
For smaller annotation jobs that can be done on a single device or to display/edit the labels I would advise using LabelImg. LabelImg support yolo file format.
The resulting data set contains 804 annotated images and can be downloaded here. 140 images contain no garbage or containers. The other images contain 1096 containers, 1027 garbage bags and 841 cardboard's.
For using the model it is not necessary to train it yourself, since it is possible to download pre-trained weights.
For training the model we have used Darknet. A modified version of the repository and the data required for the training can be found here.
The model has been trained for over 3000 iterations as can be seen in the plot below.
For testing the model a part of the data has not been used for training, you call this the test set. This allows for evaluation of the models performance
In this PyTorch repository I have added the code for testing the model, in the readme instructions have been added for performing the test.
The model performs well for detecting containers, this is likely because they have similar shapes and are well represented in the training data. Overall the model has a low precision but a high recall. This means that garbage is detected often, but there are also objects detected that should not be detected. Adding more background images that contain no garbage helped increase this, so expectations that when more data like this is added the precision will also increase.
To make predictions look for the instructions in this repository. Predictions can be made on videos, images or a webcam.