Akvelon’s Survey: On-Premises AI for Engineering Team

In the modern world of software development, Large Language Model (LLM) solutions are playing a bigger role day by day. They help accelerate work processes, automate actions, and enable simultaneous execution of different tasks. Thanks to the efforts of the leading LLM providers like OpenAI, Microsoft, and Google, which offer top-notch toolkits such as ChatGPT, Copilot, and Bard, automating tasks and speeding up their completion has become much simpler. However, with such an innovation comes the responsibility to address legitimate concerns and challenges that may arise during LLM implementation.

In this article, we’ll dive into common concerns associated with LLM solution usage for business and explore how to address them. We’ll share our insights from testing and evaluating the efficiency of self-hosted LLMs as an alternative that can be more beneficial regarding data protection.

Step 0. Resolving Common Concerns in LLM Solution Usage

Protecting intellectual property and securing the codebase is crucial for businesses that stay wary of potential data exposure risks. Using LLM solutions like GitHub Copilot, ChatGPT, Bing Chat, or Google Bard involves sharing proprietary data and code with third parties. While there are methods to minimize the amount of sensitive data shared with these AI tools, concerns about the integrity of valuable information still exist. Despite comprehensive data privacy measures, no method can completely alleviate worries associated with third-party storing requests to LLM solutions.

Additionally, compliance with HIPAA and GDPR may pose challenges. Industry-specific and general privacy regulations are essential for protecting sensitive data and ensuring user privacy and security, and they must also extend to the use of any LLM solution.

Given these concerns, it may seem that business players with solid in-house software development have no other choice but to exclude LLM solutions from the software development tools arsenal with no possible substitution. However, fortunately, there are alternatives that may be suitable for projects with particular security and privacy demands.

Aiming to find a solution that streamlines and enhances development processes without possibly impacting data privacy and security, we explored the opportunity to run self-hosted LLM solutions locally on several types of workstations. In this article, we will share the results of our field research and tests on running LLMs on-premises, along with insights that can be applied to enhance your development flow.

Step 1. Unlocking Potential of Local LLMs

Self-hosted LLMs offer an alternative that addresses data privacy concerns and enhances control by keeping data within the organization’s environment. This option can be especially beneficial for companies with stringent privacy requirements.

We conducted our own survey to test the capabilities of locally run models, particularly focusing on their efficiency in managing tasks typical for key roles like Software Developers, QA Engineers, Business Analysts, and DevOps Engineers.

Key highlights of our model testing and assessment approach

#1 Throughout testing, our primary focus was on investigating the models' adaptability and accuracy in handling role-specific tasks. For transparent assessments, we created a comprehensive set of prompts reflecting recent tasks for each role in the development team.

#2 We conducted multiple rounds of testing, applying various criteria aimed at assessing the overall accuracy and the efficiency of generated outcomes.

#3 We evaluated model responsiveness across different machines by employing a scoring system based on assessments made using specialized prompts aligned with each role's typical tasks.

#4 To maintain objectivity and ensure the precision of our evaluations, we compared the model's responses against those from ChatGPT, which served as an impartial validator in our comprehensive assessment of the LLMs’ responses.

Step 2. Assessing Performance Across Various Models

In our comprehensive survey of self-hosted LLMs, we focused on GPT models optimized for standard PCs. These models are tailored to operate smoothly on everyday workstations, including laptops and desktop computers, eliminating the need for high-end hardware.

We conducted a series of tests on various LLMs, from larger to smaller ones, to gather insights suitable for diverse needs. As already mentioned, we evaluated how each model performed across diverse use cases and scenarios, specifically tailored to different roles within the development team.

One of the key insights from the assessment is that larger models, particularly those with over 13 billion parameters, present difficulties when used in local environments. While these models are well-suited for precision tasks like complex code generation or unit test creation, we faced challenges with their smooth local execution. Despite their advanced capabilities and comprehensive features, larger models showed limitations in performance and efficiency during our tests.

Conversely, we found out that smaller LLMs with fewer than 7 billion parameters were more suited for local hosting. A notable example in this category is Zephyr 7B Alpha, which consistently exceeded our expectations, delivering efficient results for various prompts.

Recognizing the potential of Zephyr 7B Alpha, we further tested its adaptability and effectiveness on various local machines at our disposal. Our goal was to evaluate how the model performed under different hardware conditions, including variations in CPU and RAM. Throughout this testing, we assessed the model's response speed for a range of developer tasks, and we got promising outcomes in terms of its delivery.

Performance of Zephyr 7B Alpha across different machines

The ability to strike a harmonious balance between response speed, accuracy, and comprehensiveness made this model a good choice for various tasks within our determined testing. This discovery opens up new possibilities for harnessing the power of self-hosted LLMs effectively and resource-efficiently.

Step 3. Selecting the Right UI Environment

When preparing for our testing, we also conducted research to determine the most suitable environment for running tests. From our deep dive into self-hosted LLM solutions, we understood that the runtime environment is crucial not only for facilitating easy model interactions but also for ensuring that Software Developers, QA Engineers, Business Analysts, and DevOps Engineers can execute their tasks smoothly. Therefore, our objective was to identify a setting that is user-friendly for all user types. Our selection of the runtime environment was guided by a set of important criteria.

A comprehensive list of criteria for selecting a runtime environment

Positive Criteria	GPT4All	oobabooga text-generation -webui	LM Studio	Ollama
Supports macOS, Windows, and Linux	Yes	Yes	Yes	Yes
Simple one-click installation (Docker option also acceptable)	Yes	Yes	Yes	Yes
User-friendly interface, resembling ChatGPT with minimal distractions	Yes	No	Yes	Yes
Stores chat history	Yes	Yes	Yes	Yes
Allows editing of chat messages	No	No	Yes	Yes
Enables chat history search	No	No	No	No
Supports formatted responses (e.g., tables displayed as tables, code highlights, etc.)	Yes	Yes	Yes	Yes
Allows adjustments to model parameters, including temperature	Yes	Yes	Yes	No
Licensed for commercial use	Yes	Yes	No	Yes
Released under open-source licenses such as MIT or Apache	Yes	Yes	No	Yes
Active project with recent updates and a robust community (evidenced by numerous stars and forks on the repository)	Yes	Yes	Yes	No
Compatible with GUFF models	Yes	Yes	Yes	Yes
Integrates with llama.cpp	Yes	Yes	N/A	Yes
Support the prompts library (storing and utilizing prompt templates)	Yes	Yes	No	No
Support the system message (help set the behavior of the assistant)	Yes	Yes	Yes	No
Supports online search	No	No	No	No

To ensure fairness in our evaluations, we considered and tested a diverse set of environment tools. Notably, GPT4All showed the most promising results. In contrast to some tools requiring a number of actions, for instance, to run via the Docker tools, this tool is straightforward to install, requiring no complex actions. It’s available for most popular operating systems such as Windows, Linux, and MacOS, which makes it a sufficient choice due to its broad platform support. Additionally, GPT4All is compatible with various models, offers a user-friendly interface, and stores the dialogue history. This model is also actively maintained and licensed for commercial use.

Conclusions

When it comes to privacy and security, self-hosted LLM solutions stand out as a balanced alternative, harmonizing user experience with privacy needs. They enable users to maintain better control over their data and reduce the risks associated with sharing sensitive information with third-party entities.

The performance of self-hosted LLMs depends on various factors, such as the model's size and the hosting hardware's capacity. Our testing revealed that while bigger models may pose challenges when deployed on local environments, smaller models like Zephyr 7B Alpha offer a more effective and responsive solution across different tasks. Thus, the Zephyr 7B Alpha model proved to be a good choice for local hosting, offering fast and accurate responses for various software development tasks, making it a reliable option for those prioritizing both performance and data privacy.

Also, the runtime environment for self-hosted LLM solutions is crucial for ensuring effective user interaction. It needs to balance simplicity and user-friendliness while accommodating various licensing models to meet diverse user needs. Carefully selecting the environment that meets our criteria resulted in choosing GPT4ALL as the ideal tool.

Selecting the right LLM for your specific use case and environment is crucial to achieving optimal results and fully utilizing the capabilities of these models. The best possible LLM solution is one that aligns with the expectations emphasizing data privacy, security, and efficiency. From our extensive examination of self-hosted LLMs, focusing on their performance and the potential to enhance project efficiency, Zephyr 7B Alpha stands out as a prominent choice in our survey.

On-Premises AI for Engineering Teams

Step 0. Resolving Common Concerns in LLM Solution Usage

Step 1. Unlocking Potential of Local LLMs

Key highlights of our model testing and assessment approach

Step 2. Assessing Performance Across Various Models

Step 3. Selecting the Right UI Environment

A comprehensive list of criteria for selecting a runtime environment

Conclusions

Oleg Oparin

When AI Coding Tools Become an Engineering Workflow Problem

AI-Accelerated Delivery for Platform Engineering Lifecycle With Anthropic Claude

From Ideas to Impact: How Akvelon Uses AI to Accelerate Client Projects

AI-Turbocharged Core: Scalable AI, Agentic Systems, and Future-Ready Infrastructure

Akvelon’s Top Articles of 2024 Answering Most Resonating Questions

AI-Turbocharged Momentum: Client Success Stories that Made December Special

AI-Turbocharged Insights: Entering 2025

AI-Powered Breakthroughs: Transforming Retail, Insurance, and Biotech

Healthcare Empowerment with AI: Mental Health, Predictive Analytics, Clinical Management, and Beyond

AI-Powered Efficiency & Cost Reduction — The Fast Track for Your Business with Akvelon

Streamline Project Documentation and Support Operations with DocMate AI and DRI Copilot and More

This whitepaper is already on its way to your mailbox!

Blog

On-Premises AI for Engineering Teams

Step 0. Resolving Common Concerns in LLM Solution Usage

Step 1. Unlocking Potential of Local LLMs

Key highlights of our model testing and assessment approach

Step 2. Assessing Performance Across Various Models

Step 3. Selecting the Right UI Environment

A comprehensive list of criteria for selecting a runtime environment

Conclusions

Oleg Oparin

Related Posts

This whitepaper is already on its way to your mailbox!