Running your own dedicated OpenAI Instance

Rohit Vincent
Version 1
Published in
4 min readJul 17, 2023

Considering the emergence of ChatGPT and the increasing prominence of Generative AI in various industries, there is a growing demand among customers who seek to retain authority over the data they utilize in model inference solely through API calls.

There are a couple of ways you could deploy a dedicated instance, “an infrastructure setup” to run your own instance of GPT or any other Large Language Model.

AI-powered code analysis and documentation — Decipher | Version 1

OpenAI Dedicated Instances

OpenAI introduces provides dedicated instances for organisations, empowering them with enhanced autonomy over model customization and system performance optimization. What does this mean?

Organisations will gain comprehensive control over the instance’s load, enabling them to enhance throughput at the expense of individual request speed. By opting for dedicated instances, you unlock two valuable features. Firstly, you gain access to extended context limits that are determined by the capabilities of the resource itself. This allows you to process larger amounts of data and extract more comprehensive insights. Secondly, you have the ability to freeze or take snapshots of the model, protecting it from potential disruptions caused by upgrades or version changes. These features enhance your business operations and provide greater control over your AI infrastructure.

As per OpenAI, for organizations that have a workload exceeding around 450 million tokens per day, dedicated instances offer an economically viable solution. If we consider an assumption of 500 words per page, it would entail the processing of approximately 900,000 pages on a daily basis. Furthermore, it empowers organizations to fine-tune their workload based on hardware performance, leading to substantial cost savings compared to shared infrastructure. (Source)

You need to contact the sales team on the OpenAI contact page to get your own dedicated instance.

Screenshot from OpenAI Contact Sales Page

How much does it cost?

Since OpenAI API runs on Azure, organisations have to pay for a specified time period, securing a reserved allocation of compute infrastructure exclusively for them. However, OpenAI does not have any official figures but here is an idea based on a leak earlier in the year:

Source(not verified): Link

Dedicated instances come at a considerable price point. Opting for a streamlined iteration of GPT-3.5 entails an investment of $78,000 for a three-month commitment or a total of $264,000 for a one-year commitment. To provide context, it’s worth noting that Nvidia’s state-of-the-art supercomputer, the DGX Station, is priced at $149,000 per unit.

Azure Fine-tuned Instances

Azure OpenAI service did provide an option to fine-tune your models. You can read more about it here. What does that mean?

  1. Exclusive Availability: Customers have the ability to upload their training data to the service and fine-tune a model. The uploaded training data remains accessible exclusively for the customer’s use.
  2. Regional Storage: Both the training data and the fine-tuned models are stored within the same region as the Azure OpenAI resource. This ensures that the data remains in close proximity to the customer’s allocated Azure resources.
  3. Robust Encryption: By default, the uploaded training data and fine-tuned models are double encrypted at rest using Microsoft’s AES-256 encryption. Customers also have the option to apply additional encryption with a customer-managed key for enhanced security.
  4. Customer Deletion Control: Customers retain full control over their data and have the ability to delete the uploaded training data and associated fine-tuned models at any time, providing them with the flexibility to manage their resources as needed.

Fine-tuned models on Azure charge an additional hourly fee for hosting, but the cost of training and inference for these models is generally lower than OpenAI’s pricing structure for similarly fine-tuned models if you process around 3 million tokens. However, currently, none of the models are available for fine-tuning at the moment and pricing for new models could change. This should be available later on during the year as per this announcement.

Conclusion

In conclusion, OpenAI offers dedicated instances and Azure provides options for fine-tuned models, both catering to organisations that seek greater control over their data and model customization. With OpenAI’s dedicated instances, organisations gain comprehensive autonomy in load management, enabling improved throughput. They can also activate advanced features and optimize their workload for cost savings and enhanced performance. While the pricing for dedicated instances can be significant, it grants exclusive access to dedicated computing infrastructure. On the other hand, Azure’s fine-tuned instances allow customers to upload and fine-tune their models, with the data securely stored and controlled within the same region as the Azure OpenAI resource. Although currently, no new models are available for fine-tuning, it is expected to be offered later in the year. The cost of training and inference for fine-tuned models is generally lower compared to OpenAI’s pricing for similarly fine-tuned models if processing around 3 million tokens. However, it’s important to note that pricing and availability may change, so it is advisable to stay updated with the latest announcements.

Version 1 AI Labs is actively exploring the possibilities of deploying personalized instances of the Language Model as a private solution using open-source models. We’re continuously working on this development, so make sure to stay tuned and follow us for future updates on this exciting progress.

About the Author

Rohit Vincent is a Data Scientist at the Version 1 AI and Innovation Labs.

--

--