BLOOM 176B — how to run a real LARGE language model in your own cloud?

Many of us use GPT-3 or other LLMs in a SaaS way, hosted by their vendors. But how is it like to run a model of the size of GPT-3 in your own cloud?

Model size

Hosting Setup

Boot the model

Use the model

"input": "The BLOOM large language model is a",
"gen_kwargs": {
"min_length": 5,
"max_new_tokens": 100,
"temperature": 0.8,
"num_beams": 5,
"no_repeat_ngram_size": 2,



