An Android Developer’s Introduction to Machine Learning

Lottie Hope
Bumble Tech
Published in
5 min readNov 16, 2023

As an Android Developer, Machine learning is something that I have used frequently, from my university dissertation to help identify fake social media accounts, right up to now in my role at Bumble Inc. to help keep our users safe.

We have all come across machine learning in our lives, whether we know it or not, and with tools like Chat GPT becoming more and more popular, you might be wondering how you could make use of machine learning.

In this article, I’m not going to attempt to explain how those complex systems work, but I’d love to share how I’ve found it to be useful, in the hopes that it helps you!

There are three main types of machine learning that are commonly used in most artificial intelligences.

  1. Supervised

This is the most detailed kind of learning. You, as the data inputter, must provide a set of data (images, tables, texts, etc) and label each one with which classification in which it fits. For example, I could have a set of images of different dog breeds and label each photo with a breed, eg. Labrador, Spaniel.

List of popular small dog breeds with their breed name and photo
Source: https://dogsbestlife.com/wp-content/uploads/2022/01/small-sized-dog-breeds-scaled.jpeg

The program will take the photo you want to classify (one that has not already been labelled) and compare characteristics with the images in your preloaded dataset. It will then try and find the closest match.

2. Unsupervised

This is potentially the most complex form of machine learning as it requires knowledge of machine learning algorithms. For this type, you as the user, will need to populate an unlabelled dataset. The TLDR of unsupervised learning is that it will take your unlabelled dataset, run it through a particular algorithm that will separate your dataset into categories.

Diagram showing how supervised learning take input and unsupervised learning creates classifications without input
Source: https://www.amygb.ai/blog/unsupervised-learning-in-image-classification

A good example of this would be if you loaded it with images of cats and dogs; it would take those images, run them through an algorithm and then (hopefully!) produce two separate categories: Dogs and Cats (though at this point, these categories are likely unlabelled).

You can then run your image through the trained system and it should classify your image as either a Dog or a Cat.

3. Reinforcement

This is the easiest to understand, and I like to think of it like training a pet. The program is entirely dependent on your input. It begins by trying to process your input, let’s say an image of a cat, and it classifies it as ‘unknown’, as it has nothing to compare it to yet. You would then interject to tell the program that it is wrong, it is a cat.

Picture of two boys, one with a tick sign saying right and one with a cross sign saying wrong
Source: https://www.vecteezy.com/vector-art/3610086-cute-boy-opposite-words-right-wrong

Then, if the next image is of a dog, it will likely classify it as a cat. You’d then need to correct it and tell it that it’s a dog. You’d need to repeat this multiple times until you’re confident that your program can accurately classify the data. Once you’re at that point, you can run an image (or any data type) through your program and the output should be pretty reliable.

“Okay great…” I hear you say, “…but what do I do with that and how can I use it to help do something useful?”

Gather some data

Well, to begin with, you’ll need some data and that isn’t always easy to gather. One way to do this is to manually extract all the key information you want to process, eg. in my previous article, Can you hear an image, I explained that to detect hair colour from a photo, you could go to google and manually download images of different hair colours and save them to use to train your model, but this is very time consuming.

Gif of Bill Murray saying no, thank you
Source: https://tenor.com/

Sure, you can do that, but to make better use of your time, I’d suggest an alternative route!

A cleaner approach would be to use an API that provides the data you need. This can be more challenging, as you have to pay for most of these, but there are a few good ones available for free. I like Kaggle.

As a general rule, you need at least 100 samples to train a machine learning program, though accuracy will increase with more.

However you decide to go about it, you should end up with a dataset in the form of a CSV or a folder of images separated into further folders for each label (if you’re choosing supervised learning, otherwise you don’t need the labels), etc.

Categorised images with their labels above their pictures
Source: https://mueller91.github.io/projects/labelfix/

You’ll then need an additional dataset for your input data — the things you want to test. This can be as big or as small as you like, as your program is already trained!

Put your data to the test

At this point, you should have decided what kind of machine learning you want to use, and have a collection of data. There are many ways to create a machine learning model; I’d suggest looking into TensorFlow for starters. It has lots of documentation and is relatively simple to pick up.

From here, follow the instructions for whichever service you’re using (eg TensorFlow) and train it with your data. Once you’ve done this, you’ll have a model that will be the basis for how you detect A from B.

Now comes the easy part, run your model on the sample data that you didn’t train it with. You can then check the results to see how accurate it was and once you’re confident that it’s working, you can then train it on a larger, untested dataset.

NB: If you are not getting the expected results, try increasing the dataset size that you are training the model with.

Group of mixed playing pieces that are then separated by colour
Source: https://planetconsulting.com.au/articles/is-there-value-in-segmenting-categorising-your-client-base/

This is a powerful feature of machine learning and it is undoubtedly going to be used more and more in the future. For the couple of hours/days it takes to learn, I’d say it’s definitely worth the investment!

If you’d like to see a more in-depth example of how to implement your own machine learning experiment, see my other post here. It shows how to use machine learning to help describe an image for accessibility; it’s a great starting point for anyone who is interested in giving it a go!

If you have any questions, let me know in the comments, or reach out to me on LinkedIn.

--

--