Announcing serverless GPUs for the Daisi platform

JM Laigle
Daisi Technology
Published in
3 min readOct 4, 2022

Daisi (app.daisi.io) is the go-to community platform for Python Cloud functions and apps, deployed in a serverless fashion. At Daisi, we are on a mission to advance the deployment of incredibly impactful and game-changing algorithms or ML models. We want them to be readily actionable so everyone can start using any of them with simply one line of code.

The community uses Daisi to deploy many types of algorithms, and a number of them are ML models which benefit of GPUs for fast inference. From now on, Daisi will offer GPUs to the community for the deployment of the model. Here below are some guidelines on how to make your code use GPUs (it is straightforward) and some good practices.

Step 1 — Make the GPUs visible to your Daisi

By default, GPUs are not visible. You need to override this setting with the environment variable “CUDA_VISIBLE_DEVICES” as follow:

import os
os.environ[“CUDA_VISIBLE_DEVICES”] = “0,1”

Note: each node of the Daisi cluster features two GPUS A100 PCIe with 80GB of RAM each.

Step 2 — if your code needs torch, update your requirements file

If your code uses torch, the first two lines or your requirements file should be:

 — find-links https://download.pytorch.org/whl/torch_stable.html
torch==1.12.0+cu116

Cold vs warm start

The first execution of a Daisi requires to start the corresponding web service. In the case of a large ML model running on GPUs, loading the model in memory can take some time, up to 10s of seconds. So be patient. As soon as the model is loaded, following executions will be as fast as your code can run on GPUs.

If not used during 60 mins, the service will be shut down, meaning that the next execution will be again a cold start.

Good practices

The Daisi platform deploys Python functions as serverless microservices. Serverless means that the hardware resources are allocated dynamically at runtime, when needed. When a request is made for the execution of a function, a service will be started automatically and it will stay alive for a maximum of 60 mins following the last execution. Special hardware can be allocated for the execution of the Daisi, like GPUs for instance.

Daisi can also host Streamlit apps. Streamlit apps are not deployed in the same way as Daisi functions. They have a much longer lifecycle as it isn’t straightforward to assess if a user has finished a session in the app. Moving forward, Streamlit apps will not have access to special hardware. They will delegate resource intensive computations to Daisi functions, accessed with the pydaisi Python package.

So if a function requires significant resources or hardware to run (like a large ML model for instance), it is a good idea to deploy it as an individual Daisi and put the Streamlit app in a different Daisi which will call the first one. This will ensure optimal execution for the resource intensive Daisi, and optimal usage of the resources of the community Daisi platform.

In addition, it allows others to reuse the function in their own Streamlit app.

Check for instance these two Daisies:

  1. The first one is the deployment of Stable Diffusion on GPU
  2. The second one is a Streamlit app calling the Stable Diffusion model as a service

Refer to our docs on how to give a microservices backend to your Streamlit app with Daisi!

--

--