Introduction: Large Language Models (LLMs) such as GPT-3, BERT, and other deep learning models often demand significant computational resources, including substantial memory and powerful GPUs. As a result, they are commonly hosted on cloud platforms, which typically involve usage fees based on the processing power, resources utilized, and time consumed. Additionally…