Image Recognition Algos & Models

Published in

CleanApp Report

5 min readDec 10, 2022

Objects detected with OpenCV’s Deep Neural Network module (dnn) by using a YOLOv3 model trained on COCO dataset capable to detect objects of 80 common classes. Credit: Wikimedia Commons

If we have 500M people sending 1B+ waste/hazard reports daily, we need automated and scalable analytics processes for gleaning actionable information from the photos (in addition to contextual cues supplied by the reporters). Let’s assume that for the overwhelming majority of these reports, people have identified MOOP/MOOC (matter out of place / matter out of context): litter.

Now, let’s say we want to perform seemingly simple analytics on these images to accurately predict if reported objects are made from metal or plastic. Naturally, we would want to automate this process to handle a large volume of incoming reports and quickly provide actionable insights to responders. How might we go about doing that with existing machine learning tools?

There are several algorithms and models that could be used for simple image-recognition tasks, such as analyzing an image to determine if the object(s) in the image are made from metal or plastic. Some examples of algorithms and models that may be well-suited to this task include:

TensorFlow Object Detection API: This is a powerful open-source framework for training and deploying object detection models. It includes a collection of pre-trained models that can be used out of the box, including models that are trained to identify different materials, such as metal and plastic.
YOLO (You Only Look Once): YOLO is a real-time object detection algorithm that can be used for image recognition. It is open-source and available on GitHub, along with pre-trained models and tools for training your own custom models.
Residual Neural Network (ResNet): ResNet is a deep convolutional neural network (CNN) architecture that has been trained on the ImageNet dataset and is widely used for image classification and object recognition tasks. It is available in TensorFlow and PyTorch, and can be used to identify different materials in an image.
Inception: Inception is another deep CNN architecture that has been trained on the ImageNet dataset and is commonly used for image classification and object recognition tasks. It is available in TensorFlow and PyTorch, and can be used to identify different materials in an image.

ML Algorithms v. ML Models

In general, a machine learning (ML) algorithm is a set of instructions or steps that can be used to perform a specific task, such as image recognition or natural language processing. For example, an image-recognition algorithm might include steps such as pre-processing the image data, extracting features from the image, and using those features to classify the image into different categories.

An ML model, on the other hand, is the trained representation of the data that is generated by running the algorithm on a large dataset. This trained model can then be used to make predictions on new data without the need for further training. An ML model helps us make predictions or take actions based on input data.

To summarize: an ML algorithm is a set of instructions or steps that can be used to perform a specific task, while an ML model is a trained representation of data that can be used to make predictions or take actions based on input data.

Open Source ML Models

ResNet and Inception are both deep convolutional neural network (CNN) architectures that have been trained on large datasets and are commonly used for image classification and object recognition tasks. In this context, they can be considered models, as they are pre-trained and can be used to perform specific tasks without the need for further training.

ResNet and Inception have been trained on large datasets of images and can be used to classify new images into different categories, such as identifying whether an object in an image is made of metal or plastic.

At this stage, it seems we can use off-the-shelf algos and models to help us with our image-recognition tasks. Great. But at our hoped-for 500M+ person & 1B+ image/day scale, we also want to train CleanApp models on CleanApp data, ideally in real-time. Additionally, we want to make this data available to other researchers in real-time so they can design even more effective ML algos.

In other words, we want a reinforced learning loop that takes images, performs image/object recognition analytics, and simultaneously improves the image recognition model. This seems necessary to get to a point where our algos and models are accurate enough so we can scrub the actionable data, then, potentially, discard the images. Whether the images are stored on a cCloud or dCloud, it does not seem economical to store billions of images of trash in perpetuity. On the other hand, it might make sense to store valuable metadata gleaned from the images. For instance, its unclear whether right now Tesla retains all of its cars dashcam footage to train its powerful FSD computers. At greater scales, the storage costs alone for raw data would be astronomical.

How to Train Your Own ML Dragon

If you’re looking for open source datasets for training your own ML model, here’s an ever-growing list from wikipedia:

List of datasets for machine-learning research - Wikipedia

These datasets are applied for machine learning research and have been cited in peer-reviewed academic journals…

en.wikipedia.org

But how could we use live data streams to improve the models in real-time? Basically, how would we build what Tesla does with its FSD models, but for global waste+hazard mapping? And of course, do so in an open source way?

Online Deep Learning Models

An image recognition algorithm or model that self-corrects and self-improves in real-time based on the image processing it does is known as an “online learning” or “incremental learning” algorithm or model. Online learning algorithms are able to process data in small batches or individual data points, allowing them to learn and adapt to new data as it becomes available. This allows them to self-improve and make more accurate predictions over time, as they are able to incorporate new data into their training and update their models accordingly.

Some examples of online learning algorithms and models include streaming linear regression, online support vector machines, and online deep learning models. These algorithms and models can be used for a wide range of tasks, including image recognition, natural language processing, and predictive modeling. They are particularly useful in applications where the data is too large or complex to be processed all at once, or where the data is constantly changing and needs to be incorporated into the model in real-time.

Please note that if we use TensorFlow Object Detection API or PyTorch for CleanApp image recognition, this would already be an implementation of online deep learning, since every batch of analyzed images is helping to improve the TensorFlow & PyTorch algorithms.

If we want to deploy custom online deep learning models, we might also consider:

PyTorch Lightning: PyTorch Lightning is a lightweight PyTorch wrapper that allows developers to easily train and deploy complex deep learning models. It includes support for online learning, allowing models to adapt to new data and improve their performance over time.
Keras: Keras is an open-source deep learning library that allows developers to easily build and train complex neural network models. It includes support for online learning, allowing models to adapt to new data and improve their performance over time.
YOLO can be used in an online learning setting, allowing the model to adapt to new data and improve its performance over time.

tbc.