Object Detection using Tensorflow Lite for Trash

24ProGun
7 min readJun 21, 2022

GitHub: https://github.com/24GUNV/aibuilders

Datasets: https://www.kaggle.com/datasets/asdasdasasdas/garbage-classification and https://www.kaggle.com/datasets/mostafaabla/garbage-classification

Introduction

(As this isn't main topic, I’ll be brief about the into)

One thing that literally everyone produces is trash. Whether it be plastics such as the plastic water bottle you drank or cardboards such as the delivery packages that are used to keep your items safe during delivery, at one point or another, we’ve produced some sort of trash.

The problems that come with it

When mismanaged, trash can cause many problems. For example, plastics can end up in the ocean, trash ends up all over the city causing hygienic problems, etc. So it leads to the solutions that people advocate for today: Reduce, Reuse, Recycle. This also helps solve the problem of global warming (a big problem in today’s world) since less materials need to be created.

Normally, people just put down different types of trash cans and tell people to sort the objects by themselves. This helps people at the factories sort through different types of objects and allows us to reuse some trash. However, I thought, why don't we just let AI do it for us?

Meme about garbage

Metrics

The model that I decided to go with was Tensorflow Lite created by Tensorflow. I decided to go with this model as I had plans to deploy this onto a OrangePi which would allow for easy deployment if I were to put this onto some sort of trashcan. Others models such as RetinaNet or Resnet wouldn’t really be effective on the OrangePi since it’s too big for the OrangePi. This means that we did trade off some sort of accuracy. Because of this, I hoped that it could at the very least, hit something like 70–80% since Tensorflow lite doesn't really provide many layers like RetinaNet or Resnet and it’ll still be better than people not recycling at all.

Data Collection and Cleaning + Data Analysis

To develop this model, I took data from these 2 places:

  1. https://www.kaggle.com/datasets/asdasdasasdas/garbage-classification
  2. https://www.kaggle.com/datasets/mostafaabla/garbage-classification

The datasets contained images of different types of trash. The first one contained 6 different categories: cardboard, glass, metal, paper, plastic and trash. The second one contained 12 different categories: battery, biological, brown-glass, cardboard, clothes, green-glass, metal, paper, plastic, shoes, trash, and white-glass. I settled on having only 5 categories: cardboard, glass, metal, paper, plastic.

The images within these datasets are 1 class per image and the file size is considered large from ranging from something like 154 x 328 to something like 202 x 249 and these images have 96 dpi which I think is specific enough to train the AI.

Example images:

Glass:

Metal:

Cardboard:

Paper:

Plastic:

Since I am trying to do object detection, I had to go through the pain staking task of labelling bounding boxes onto image.

I used this tool called labelImg in order to do this

Data is here if anyone wants to use it https://github.com/24GUNV/aibuilders/tree/main/object_detection/images

After labeling each image, we get an xml file with all of the image that we used to label each image

For example cardboard1.xml would look like this,

Then we would need to parse information from each of the files and sort them into test, train, and valid datasets to train the TF lite model. We can do it like this:

# Grab the names of all of the folders
valid_folders = glob.glob('aibuilders/object_detection/images/*')
valid_fnames = [] # List for all of the file locations
for folder in valid_folders: # Loops through each folder for images
for file in glob.glob(f'{folder}/*'):
shutil.copyfile(
file,
f'aibuilders/object_detection/train/{file.split("/")[-1]}'
)
valid_fnames.append(file)
random.shuffle(valid_fnames)# Shuffles the files so even split
# Initailize dataframe
df = pd.DataFrame(columns=["type", "filename", "name", "left", "top", "right", "top", "right", "bottom", "left", "bottom"])
# Looping through all of the files
for i, file in enumerate(valid_fnames):
# Checks if it is an xml file or not
try:
# Parse into the lists
tree = ET.parse(file)
root = tree.getroot()
filename = ''.join(list(file.split(".")[:-1]) + [".jpg"])
name = root[6][0].text
width = int(root[4][0].text)
height = int(root[4][1].text)
left = int(root[6][4][0].text) / width
top = int(root[6][4][1].text) / height
right = int(root[6][4][2].text) / width
bottom = int(root[6][4][3].text) / height
if i < 0.8 * len(valid_fnames): # 80% goes to training
df.loc[i]=["TRAINING", filename, name, left, top, right, top, right, bottom, left, bottom]
elif i < 0.9* len(valid_fnames): # 10% goes to validating
df.loc[i]=["VALIDATION", filename, name, left, top, right, top, right, bottom, left, bottom]
else: # last 10% goes to testing
df.loc[i]=["TEST",filename, name, left, top, right, top, right, bottom, left, bottom]
except:
continue

The result:

1554 images labelled, 1212 training images, 170 validating images, and 172 testing images stored in a data frame.

There are 325 cardboard images, 406 glass images, 433 metal images, 199 paper images, and 191 plastic images.

Next, I used the Dataloader to feed in images to the model. We just pass in the path to the csv file created from the previous dataframe using df.to_csv().

train_data, validation_data, test_data = object_detector.DataLoader.from_csv('test.csv')

Modeling, Validation and Error Analysis

So in order to train the model, we just follow the guide from tensorflow https://www.tensorflow.org/lite/models/modify/model_maker/object_detection

Firstly, we choose the model architecture that we want from a list of different models. I chose EfficientDet-Lite0 as it provided the least latency.

Then, we train the model using the training data and validation data

model = object_detector.create(
train_data,
model_spec=spec,
batch_size=8,
train_whole_model=True,
validation_data=validation_data
)

By default, the model with train for 50 epochs. batch_size specifies how much many images the model trains at a time and train_whole_model makes it train the whole model rather than just some parts of the model.

We can then evaluate the model on the test data

model.evaluate(test_data)

The results:

{'AP': 0.76572937,
'AP50': 0.86301947,
'AP75': 0.8448816,
'AP_/cardboard': 0.8371167,
'AP_/glass': 0.72834533,
'AP_/metal': 0.832963,
'AP_/paper': 0.8029594,
'AP_/plastic': 0.6272623,
'APl': 0.7666683,
'APm': -1.0,
'APs': -1.0,
'ARl': 0.8609286,
'ARm': -1.0,
'ARmax1': 0.827739,
'ARmax10': 0.8609286,
'ARmax100': 0.8609286,
'ARs': -1.0}

These are the same as COCO metrics (https://cocodataset.org/#detection-eval)

We can see that the model’s accuracy is around 75–85% which I think is pretty good considering the tradeoffs we made for speed and efficiency.

Deployment

I decided to deploy this model on streamlit cloud it’s free, easy to use, and accessible to many people.

https://share.streamlit.io/24gunv/aibuilders/main/streamlit_deploy/app.py

The first feature is uploading an image and then after using code from tensorflow lite’s website, we can make project these predictions onto the frame

detection_result_image = run_odt_and_draw_results(
file,
interpreter,
threshold=detection_threshold
)

Results

There is also a live function (still kinda glitchy) where I utilized webrtc_streamlit in order to get live camera feed from the user. We need to route the connections through STUN and TURN servers in order to establish a connection between streamlit cloud and the user’s camera. Using Open Relay Project, we can use their free STUN and TURN servers.

class VideoProcessor:
def recv(self, frame):

arr = frame.to_ndarray(format="bgr24")

# Run inference and draw detection result on the local copy of the original file
detection_result_image = run_odt_and_draw_results(
arr,
interpreter,
threshold=detection_threshold
)

# Show the detection result
return av.VideoFrame.from_ndarray(detection_result_image, format="bgr24")

webrtc_streamer(key="example",
rtc_configuration=
{
"iceServers": [
{
"urls": "stun:openrelay.metered.ca:80",
},
{
"urls": "turn:openrelay.metered.ca:80",
"username": "openrelayproject",
"credential": "openrelayproject",
},
{
"urls": "turn:openrelay.metered.ca:443",
"username": "openrelayproject",
"credential": "openrelayproject",
},
{
"urls": "turn:openrelay.metered.ca:443?transport=tcp",
"username": "openrelayproject",
"credential": "openrelayproject",
},
],
},
video_processor_factory=VideoProcessor)

Thank you very much to the AI Builders program and my mentors for teaching me all of this information!

https://github.com/ai-builders/ai-builders.github.io

--

--