If you have A100, then use A100.
First of all, what is FLUX? It is a model that can be used for text and image processing, similar to stable diffusion.
GitHub project link: https://github.com/black-forest-labs/flux
Let me briefly talk about the deployment process. At present, the two models FLUX.1 [schnell] and FLUX.1 [dev] have been integrated with the diffusers library. So I actually use diffusers to call the FLUX model. In the actual deployment process, the FLUX.1 [schnell] model is called.
You can directly look at this code example: https://github.com/black-forest-labs/flux/blob/main/docs/text-to-image.md
It seems that we only need to install diffusers. The installation instructions of Diffusers are: https://huggingface.co/docs/diffusers/v0.9.0/en/installation
The specific steps are to open a new virtual environment with conda. I chose the python version 3.11, and then:
pip install diffusers["torch"]
However, after installing, I found that I still got an error.
If you encounter ImportError: requires the protobuf library but it was not found in your environment.
Then you pip install protobuf
If you encounter ValueError: Cannot instantiate this tokenizer from a slow version. If it’s based on sentencepiece, make sure you have sentencepiece installed.
Then you pip install sentencepiece
If you encounter a prompt that you need to install transformers, similar to the one above, I didn’t take a screenshot,
Then you pip install transformers
If you encounter, No space left on device, Don’t hesitate! The FLUX model is really big. If you have used huggingface_hub to download the model before, then go to the path where the model is stored and delete the irrelevant model!!
For example, in the picture below, I went to ~/.cache/huggingface/hub and directly deleted the stable-diffusion-v1–5 folder that I tried before.
In addition, before running the FLUX model, you can try this stable-diffusion-v1–5 model first, because it is small. If you get an error when using it, it means that the environment has not been configured successfully, or there is a problem with the code. After all, it is the sample model used by the hugging face official website to demonstrate diffusers.
Logically speaking, if you use a 40G A100, you can draw the picture quickly.
First, let’s appreciate the squirrel drawn by A100 using Flux model. The prompt is “An image of a squirrel in Picasso style”.
The code is super simple:
from diffusers import DiffusionPipeline
import torch
pipeline = DiffusionPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.float16)
pipeline.to("cuda")
image = pipeline("An image of a squirrel in Picasso style").images[0]
image.save("image_of_squirrel_painting_flux.png")
If you encounter an error where the model cannot download SSL, such as ‘[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1125)’, please add following after import torch:
import requests
from huggingface_hub import configure_http_backend
def backend_factory() -> requests.Session:
session = requests.Session()
session.verify = False
return session
configure_http_backend(backend_factory=backend_factory)
This solution comes from the post: https://stackoverflow.com/questions/71692354/facing-ssl-error-with-huggingface-pretrained-models
At least it works for me.
Then, I thought since a single A100 (40G) works, let’s try a single 4090 (24G).
Obviously, at this time, countless OOM errors hit me:
No matter what I tried, it didn’t work. I searched online and found that many people have this problem, which fully proves one thing:
A single 4090 (24G) can’t run Flux!!!!
Then I thought, 24G doesn’t work, 40G does, then why not try two 4090s? Although it seems that the cost of two 4090s is much more expensive than A100.
It’s Black Friday soon, will anyone buy a graphics card? . . .
The 24GB 4090 on Rakuten costs $4990: https://www.rakuten.com/products/nvidia-founders-geforce-rtx-4090-24gb/00749988978580_upc?src=googadw-Search&eeid=17879&utm_channel=sem&utm_medium=sem&utm_source=20958565605&utm_campaign=nb&utm_ content=c&utm_term=ggl&utm_pub=&utm_size=&mkwid=0001&acct=bp&ds_kids=&gad_source=1&gclid=Cj0KCQiA o5u6BhDJARIsAAVoDWv89d3Xma5XHsa45rmCLSUnM-RVoxlCQAntvkeMrXXNJ4NsUHoztNYaAuzOEALw_wcB&gclsrc=aw.ds
The 40GB A100 on Nvidia costs $8,299, of course in stock: https://www.amazon.com/NVIDIA-Ampere-Graphics-Processor-Accelerator/dp/B08X13X6HF
OK, without further ado, I opened a new server, equipped it with two 4090s, and tried to run it.
However, when I still used the previous code, I found that the model seemed to run only on cuda:0, using only one graphics card.
So my next question is: how to make both cards run.
My operations at the time were roughly as follows:
accelerate’s infer_auto_device_map
device_map = infer_auto_device_map(pipeline, max_memory={"cuda:0":"20GiB","cuda:1": "20GiB","cpu":"20GiB"})
# Move the pipeline modules to respective devices
pipeline.to(device_map)
An error message was reported, cuda:0 was not recognized. The error is: ValueError: Device cuda:0 is not recognized, available devices are integers(for GPU/XPU), ‘mps’, ‘cpu’ and ‘disk’. If someone has solved it, please leave a message! ! !
Assign different model parts in the pipeline to different cuda
pipeline.text_encoder.to("cuda:0")
pipeline.vae.to("cuda:0")
pipeline.to("cuda:1")
Error, cuda1, OOM
Finally, my solution comes from the post: https://huggingface.co/black-forest-labs/FLUX.1-schnell/discussions/5
pipeline = Diffusionpipeline.from_pretrained("black-forest-labs/FLUx.1-schnell", torch_dtype=torch.float16)
pipeline.vae.enable_tiling()
pipeline.vae.enable slicing()
pipeline.enable_sequential_cpu_offload()
I changed torch_dtype=torch.float16 to torch_dtype=torch.bfloat16 at first, but I changed it back and found that it can run.
Long live Wuhu!!! Finally, let’s enjoy the squirrel picture drawn by dual 4090 cards: