ML Tools / Platform Stuff

Sigaipurdue
MLPurdue
3 min readOct 15, 2023

--

By Brian Song

10/10/23

There are lots of structured guides on what ML concepts to learn. Learn x, then y, then z. But there isn’t really a structured guide for ML tools/platforms. Often there is a guide for that one tool or one platform, but how do you even learn that those exist in the first place to then learn more about? Here is guide that hopefully solves that problem:

Pretrained Models

You heard about some new ML algorithm and want to test it yourself. You found the code but there are no model weights, meaning to make the model do the cool thing it was supposed to do you have to train from scratch which can cost a lot of money or time. You can get “out of the box” models from:

  1. Hugging face https://huggingface.co/models
  2. Model Zoo https://modelzoo.co/
  3. Can try github. Search up algo you want and copy and paste entire code and hope author also gives the weights file. There are some repos that provide unified way to access lots of different models with weights like Mmaction https://github.com/open-mmlab/mmaction

Sometimes if you just want to just play around with model and you are lucky, you might be able to find demos. Like: https://segment-anything.com/

Or streamlit/gradio apps (more info later in this guide)

Dataset

You need data for training. Although you can just scroll through research papers in hopes to find a google drive link to the author’s datasets you can also find public datasets from:

  1. Kaggle https://www.kaggle.com/datasets
  2. Google Dataset Search https://datasetsearch.research.google.com/
  3. Hugging face Datasets https://huggingface.co/docs/datasets/index

Data Labelers

Say you have custom dataset. Maybe you took pictures of your dog or something and you need it labeled. There are services that allow you to label your data easier or get some contractor to label it for you

Tools so you can annotate yourself

  1. Roboflow https://roboflow.com/ has tools so you can annotate images easier
  2. LabelMe https://github.com/wkentaro/labelme Image annotation tool (open source)

Contractors

  1. Amazon Mechanical Turk https://www.mturk.com/
  2. Scale AI https://scale.com/ ← Haven’t personally tried

Web Apps

You trained your ML model and want to share it to others. But you are bad at making websites. You can create easy to make web apps with:

  1. Streamlit https://streamlit.io/
  2. Gradio https://www.gradio.app/

You really don’t need any website making knowledge

Click this link to interact with a Streamlit app called Knowledge GPT: https://knowledgegpt.streamlit.app/

GPUs

Say you want to train a new ML algorithm. You do the calculations and… it would take a month to train it with your laptop. There are services that allow you to rent GPUs to make your training much faster.

  1. Google colab https://colab.google/ has free gpus and can upgrade for better ones
  2. Lambda labs https://lambdalabs.com/
  3. Google Cloud / AWS
  4. Kaggle Notebooks https://www.kaggle.com/docs/notebooks
  5. Try to get access to Purdue GPU cluster?

When you hear about those scammers who use college/highschool’s computers for Bitcoin mining, they should have just used these services

https://www.cbsnews.com/boston/news/cohasset-cryptocurrency-mining-operation-high-school-nadeam-nahas/

On a more serious note you can get banned from some of these services if they think you are crypto mining

--

--