On-Premises AI for Engineering Teams

Published in

Akvelon

7 min readNov 17, 2023

In the modern world of software development, Large Language Model (LLM) solutions are playing a bigger role day by day. They help accelerate work processes, automate actions, and enable simultaneous execution of different tasks. Thanks to the efforts of the leading LLM providers like OpenAI, Microsoft, and Google, which offer top-notch toolkits such as ChatGPT, Copilot, and Bard, automating tasks and speeding up their completion has become much simpler. However, with such an innovation comes the responsibility to address legitimate concerns and challenges that may arise during LLM implementation.

In this article, we’ll dive into common concerns associated with LLM solution usage for business and explore how to address them. We’ll share our insights from testing and evaluating the efficiency of self-hosted LLMs as an alternative that can be more beneficial regarding data protection.

Step 0. Resolving Common Concerns in LLM Solution Usage

Protecting intellectual property and securing the codebase is crucial for businesses that stay wary of potential data exposure risks. Using LLM solutions like GitHub Copilot, ChatGPT, Bing Chat, or Google Bard involves sharing proprietary data and code with third parties. While there are methods to minimize the amount of sensitive data shared with these AI tools, concerns about the integrity of valuable information still exist. Despite comprehensive data privacy measures, no method can completely alleviate worries associated with third-party storing requests to LLM solutions.

Additionally, compliance with HIPAA and GDPR may pose challenges. Industry-specific and general privacy regulations are essential for protecting sensitive data and ensuring user privacy and security, and they must also extend to the use of any LLM solution.

Given these concerns, it may seem that business players with solid in-house software development have no other choice but to exclude LLM solutions from the software development tools arsenal with no possible substitution. However, fortunately, there are alternatives that may be suitable for projects with particular security and privacy demands.

Aiming to find a solution that streamlines and enhances development processes without possibly impacting data privacy and security, we explored the opportunity to run self-hosted LLM solutions locally on several types of workstations. In this article, we will share the results of our field research and tests on running LLMs on-premises, along with insights that can be applied to enhance your development flow.

Step 1. Unlocking Potential of Local LLMs

Self-hosted LLMs offer an alternative that addresses data privacy concerns and enhances control by keeping data within the organization’s environment. This option can be especially beneficial for companies with stringent privacy requirements.

We conducted our own survey to test the capabilities of locally run models, particularly focusing on their efficiency in managing tasks typical for key roles like Software Developers, QA Engineers, Business Analysts, and DevOps Engineers.

Key highlights of our model testing and assessment approach

#1 Throughout testing, our primary focus was on investigating the models’ adaptability and accuracy in handling role-specific tasks. For transparent assessments, we created a comprehensive set of prompts reflecting recent tasks for each role in the development team.

#2 We conducted multiple rounds of testing, applying various criteria aimed at assessing the overall accuracy and the efficiency of generated outcomes.

#3 We evaluated model responsiveness across different machines by employing a scoring system based on assessments made using specialized prompts aligned with each role’s typical tasks.

#4 To maintain objectivity and ensure the precision of our evaluations, we compared the model’s responses against those from ChatGPT, which served as an impartial validator in our comprehensive assessment of the LLMs’ responses.

The prompt used for ChatGPT to evaluate the responses from the tested models
Your task is to assign a rating from 1 to 10 (1 being very poor, 10 being excellent) to the response generated by another AI assistant tool. You should compare your own result for the same prompt and rate the provided result from another AI tool. Consider the correctness and usefulness of the answer. Additionally, you should provide an explanation for your score and details about what was incorrect in the response from the AI assistant.

Step 2. Assessing Performance Across Various Models

In our comprehensive survey of self-hosted LLMs, we focused on GPT models optimized for standard PCs. These models are tailored to operate smoothly on everyday workstations, including laptops and desktop computers, eliminating the need for high-end hardware.

We conducted a series of tests on various LLMs, from larger to smaller ones, to gather insights suitable for diverse needs. As already mentioned, we evaluated how each model performed across diverse use cases and scenarios, specifically tailored to different roles within the development team.

The list of assessed models

One of the key insights from the assessment is that larger models, particularly those with over 13 billion parameters, present difficulties when used in local environments. While these models are well-suited for precision tasks like complex code generation or unit test creation, we faced challenges with their smooth local execution. Despite their advanced capabilities and comprehensive features, larger models showed limitations in performance and efficiency during our tests.

Conversely, we found out that smaller LLMs with fewer than 7 billion parameters were more suited for local hosting. A notable example in this category is Zephyr 7B Alpha, which consistently exceeded our expectations, delivering efficient results for various prompts.

Recognizing the potential of Zephyr 7B Alpha, we further tested its adaptability and effectiveness on various local machines at our disposal. Our goal was to evaluate how the model performed under different hardware conditions, including variations in CPU and RAM. Throughout this testing, we assessed the model’s response speed for a range of developer tasks, and we got promising outcomes in terms of its delivery.

Performance of Zephyr 7B Alpha across different machines

The ability to strike a harmonious balance between response speed, accuracy, and comprehensiveness made this model a good choice for various tasks within our determined testing. This discovery opens up new possibilities for harnessing the power of self-hosted LLMs effectively and resource-efficiently.

Step 3. Selecting the Right UI Environment

When preparing for our testing, we also conducted research to determine the most suitable environment for running tests. From our deep dive into self-hosted LLM solutions, we understood that the runtime environment is crucial not only for facilitating easy model interactions but also for ensuring that Software Developers, QA Engineers, Business Analysts, and DevOps Engineers can execute their tasks smoothly. Therefore, our objective was to identify a setting that is user-friendly for all user types. Our selection of the runtime environment was guided by a set of important criteria.

A comprehensive list of criteria for selecting a runtime environment

To ensure fairness in our evaluations, we considered and tested a diverse set of environment tools. Notably, GPT4All showed the most promising results. In contrast to some tools requiring a number of actions, for instance, to run via the Docker tools, this tool is straightforward to install, requiring no complex actions. It’s available for most popular operating systems such as Windows, Linux, and MacOS, which makes it a sufficient choice due to its broad platform support. Additionally, GPT4All is compatible with various models, offers a user-friendly interface, and stores the dialogue history. This model is also actively maintained and licensed for commercial use.

Conclusions

When it comes to privacy and security, self-hosted LLM solutions stand out as a balanced alternative, harmonizing user experience with privacy needs. They enable users to maintain better control over their data and reduce the risks associated with sharing sensitive information with third-party entities.

The performance of self-hosted LLMs depends on various factors, such as the model’s size and the hosting hardware’s capacity. Our testing revealed that while bigger models may pose challenges when deployed on local environments, smaller models like Zephyr 7B Alpha offer a more effective and responsive solution across different tasks. Thus, the Zephyr 7B Alpha model proved to be a good choice for local hosting, offering fast and accurate responses for various software development tasks, making it a reliable option for those prioritizing both performance and data privacy.

Also, the runtime environment for self-hosted LLM solutions is crucial for ensuring effective user interaction. It needs to balance simplicity and user-friendliness while accommodating various licensing models to meet diverse user needs. Carefully selecting the environment that meets our criteria resulted in choosing GPT4ALL as the ideal tool.

Selecting the right LLM for your specific use case and environment is crucial to achieving optimal results and fully utilizing the capabilities of these models. The best possible LLM solution is one that aligns with the expectations emphasizing data privacy, security, and efficiency. From our extensive examination of self-hosted LLMs, focusing on their performance and the potential to enhance project efficiency, Zephyr 7B Alpha stands out as a prominent choice in our survey.