⚙️ 4 ways to Deploy your next GenAI app: 🚀
🟦 1. Google Colab/ AWS Sage maker/ Kaggle:
Yes you read it right, you can deploy your app backend by cloning the repo In a notebook, using ngrok to get public url & using it in the frontend.
It won’t stay up for long, but it’s great for testing your apps.
I gave SuperHeroAI access to people by using this method.
Disadvantage:
> Auto disconnect after a while.
🟦 2. AWS EC2 instances :
> Choose a GPU instance, G4dn is the cheapest
> Choose a Deep learning AMI with latest Pytorch & Nvidia drivers.
> Allocate like 100 GB storage, expose ports for the backend & use the public url in your Frontend.
Disadvantage:
> You’ll need to figure out how to make it serverless on your own.
🟦 3. Hugging Face 🤗:
You can deploy your model on hugging in two ways.
> First, Create a Custom Model based on your app. ( You have to create a new file with predict function & call things from there )
⏩ Now deploy a Hugging face Inference endpoint.
⏩ or You can deploy as a Space so users can also use it via Frontend ( made with gradio) & you can use as endpoint.
Disadvantage:
> 15 minute idle Time
🟦 4. Replicate:
Replicate also provides a method to deploy your GenAI backend.
Replicate uses something called COG opensource technology based on top of Docker.
> First, you need to build a COG model, 1 new file with predict function calling your backend functions.
⏩ Now this model can be used via API or from the frontend. The model will take cold boot time & have it’s ideal time too.
⏩ To customise this endpoint, you can make a deployment of that model & adjust how many Machines do you want & it also provides Metrics dashboard.
Disadvantage:
> You don’t get to see metrics without making a deployment.
🟦 So what’s the best method??
Acc. to me you should do the testing on a free GPU notebook then later on maybe move to replicate.
As replicate will handle everything about scaling etc & have short idle time compared to 15 minutes of hugging face endpoints.
⏩ These are my learnings from building SuperHeroAI.pro