ML Tools / Platform Stuff
By Brian Song
10/10/23
There are lots of structured guides on what ML concepts to learn. Learn x, then y, then z. But there isn’t really a structured guide for ML tools/platforms. Often there is a guide for that one tool or one platform, but how do you even learn that those exist in the first place to then learn more about? Here is guide that hopefully solves that problem:
Pretrained Models
You heard about some new ML algorithm and want to test it yourself. You found the code but there are no model weights, meaning to make the model do the cool thing it was supposed to do you have to train from scratch which can cost a lot of money or time. You can get “out of the box” models from:
- Hugging face https://huggingface.co/models
- Model Zoo https://modelzoo.co/
- Can try github. Search up algo you want and copy and paste entire code and hope author also gives the weights file. There are some repos that provide unified way to access lots of different models with weights like Mmaction https://github.com/open-mmlab/mmaction
Sometimes if you just want to just play around with model and you are lucky, you might be able to find demos. Like: https://segment-anything.com/
Or streamlit/gradio apps (more info later in this guide)
Dataset
You need data for training. Although you can just scroll through research papers in hopes to find a google drive link to the author’s datasets you can also find public datasets from:
- Kaggle https://www.kaggle.com/datasets
- Google Dataset Search https://datasetsearch.research.google.com/
- Hugging face Datasets https://huggingface.co/docs/datasets/index
Data Labelers
Say you have custom dataset. Maybe you took pictures of your dog or something and you need it labeled. There are services that allow you to label your data easier or get some contractor to label it for you
Tools so you can annotate yourself
- Roboflow https://roboflow.com/ has tools so you can annotate images easier
- LabelMe https://github.com/wkentaro/labelme Image annotation tool (open source)
Contractors
- Amazon Mechanical Turk https://www.mturk.com/
- Scale AI https://scale.com/ ← Haven’t personally tried
Web Apps
You trained your ML model and want to share it to others. But you are bad at making websites. You can create easy to make web apps with:
- Streamlit https://streamlit.io/
- Gradio https://www.gradio.app/
You really don’t need any website making knowledge
Click this link to interact with a Streamlit app called Knowledge GPT: https://knowledgegpt.streamlit.app/
GPUs
Say you want to train a new ML algorithm. You do the calculations and… it would take a month to train it with your laptop. There are services that allow you to rent GPUs to make your training much faster.
- Google colab https://colab.google/ has free gpus and can upgrade for better ones
- Lambda labs https://lambdalabs.com/
- Google Cloud / AWS
- Kaggle Notebooks https://www.kaggle.com/docs/notebooks
- Try to get access to Purdue GPU cluster?
When you hear about those scammers who use college/highschool’s computers for Bitcoin mining, they should have just used these services
On a more serious note you can get banned from some of these services if they think you are crypto mining