A personal AI assistant for developers that doesn't phone home

tl;dr

It’s no surprise that developers are looking for ways to include powerful new technologies like AI Assistants to improve their workflow and productivity. However, many companies are reluctant to allow such technology due to concerns about privacy, security and IP law.

This article addresses the concerns about privacy and security and describes how to deploy and integrate such an AI assistant that doesn’t phone home. We will also discuss open large language models (LLMs) and their training datasets and licenses.

Note that all code used in this article can be found in the following GitHub repository: https://github.com/sa-mw-dach/dev_demos/tree/main/demos/05_ai_assistant

AI assistants for Application Development

Artificial Intelligence (AI) assistants have the potential to revolutionize application development by enhancing productivity and streamlining workflows. For developers, an AI sidekick can act as a coding companion, offering real-time optimizations, automating routine tasks, and debugging on the fly. It can intelligently scan code for errors, suggest best practices tailored to your coding style, and handle version control with ease.

While an AI can provide valuable insights and speed up development, it is not infallible. Having asked the AI that assisted in writing this article if it hurt to write the last sentence, its wise and open answer was:

As an AI developed by OpenAI, I don’t have feelings, so there’s no hurt involved in acknowledging my limitations. It’s important for users to understand that while AI can be an incredibly powerful tool, it is designed to augment human capabilities and not replace them. Mistakes can occur, and the oversight of a skilled developer is crucial to ensure the highest quality of software development. It’s all part of creating a balanced and realistic expectation for AI’s role in various domains, including software engineering.

Thus a developer must carefully review all outputs from the AI assistant, as it may not always be correct. Nonetheless, this technology can save a significant amount of time, particularly for experienced software developers, by offloading some of the cognitive load and allowing them to focus on creative problem-solving and innovation, leading to more efficient development cycles and higher quality software.

An AI assistant for developers typically integrates Large Language Models (LLMs) to understand and generate human-like text, code completions, and suggestions within the development environment. These LLMs are trained on extensive datasets that include code and natural language, enabling them to assist with a variety of tasks such as real-time error detection, code optimization, and even writing tests or documentation. The AI operates within the developer’s preferred IDE, offering intelligent support that adapts to individual coding styles and preferences, all while maintaining the efficiency and flow of the software development process.

Using an AI assistant in an air-gapped on-premise environment

The pervasive challenge with most Large Language Models (LLMs), including versions of ChatGPT, is their availability predominantly as cloud-based services. This setup necessitates sending potentially sensitive data to external servers for processing. For developers, this raises significant privacy and security concerns, particularly when dealing with proprietary or sensitive codebases. The requirement to “phone home” to a remote server not only poses a risk of data exposure but can also introduce latency and dependence on internet connectivity for real-time assistance. This architecture inherently limits the use of such LLMs in environments where data governance and compliance standards restrict the transfer of data off-premises or where developers prioritize complete control over their data and intellectual property.

Addressing the challenge of data privacy and security when using cloud-based LLMs, the Visual Studio Code extension Continue emerges as a compelling solution. Continue is marketed as an “open-source autopilot for software development” and uniquely, it enables the utilization of local LLMs. By running a local instance of an LLM on an air-gapped, on-premise OpenShift cluster, developers can benefit from the intelligence of AI without the need to transmit data externally. When integrated within OpenShift Dev Spaces, Continue offers a seamless and secure development experience right within VS Code. This setup ensures that sensitive data never leaves the confines of the local infrastructure, all the while providing the sophisticated assistance of an AI pairing partner. It’s a solution that not only mitigates privacy concerns but also empowers developers to harness AI’s capabilities in a more controlled and compliant environment.

Local Large Language Models (LLMs)

Open large language models (LLMs) are LLMs that are open source and/or have permissive licenses that allow for commercial use. This means that anyone can access, modify, and distribute these models, and use them to develop their own products and services. Usually this also means that the weights of these models are freely available on sites like HuggingFace and can be downloaded to be run locally.

Open LLMs are trained on massive datasets of text and code, which are also often open source. This allows anyone to contribute to the development of these models, and to benefit from the work of others, e.g being able to further fine-tune them and adapt them to their needs.

Some of the most popular open LLMs and their training datasets are:

LLaMA 2 (Meta AI): Trained on Common Crawl, a massive dataset of text and code from the web.
BLOOM (BigScience): Trained on ROOTS, a multilingual dataset curated from text sourced in 59 languages.
FLAN-T5 (Google AI): Trained on a dataset of text and code from a variety of sources, including books, articles, code repositories, and websites.
RedPajama-INCITE (Together, University of Montreal, Stanford Center for Research on Foundation Models): Trained on a dataset of text and code from a variety of sources, including books, articles, code repositories, and websites.

Code generation models (CGMs) are a type of large language model that is specifically trained to generate code. CGMs can be used to generate code in a variety of programming languages, and they can be used to perform a wide range of tasks, including generating new code from scratch, completing existing code, fixing bugs in code and generating code documentation.

CGMs are also trained on massive datasets of code and text, which allows them to learn the patterns and rules of different programming languages. Most of them have been fine-tuned on complex instruction datasets which enables them to perform well on above-mentioned tasks.

Open source large language models that are specifically trained to generate code are:

StarCoder(Base): a model trained on a subset of “The Stack 1.2”, a dataset created by BigCode consisting of permissively licensed code sourced from Github, Stack Overflow and Kaggle. StarCoder was fine-tuned especially on Python Code.
Code Llama: a code-focused version of Llama 2, built by fine-tuning Llama 2 on its code datasets and sampling more data from them for longer. It supports many popular programming languages, including Python, C++, Java, PHP, TypeScript, C#, and Bash.
WizardCoder: an instruction-tuned version of Llama 2 focused on Python.
CodeGen: a family of models trained on StarCoder Data by Salesforce with a focus on fast inference
StableCode: another CGM trained on “The Stack 1.2” dataset combined with instruction tuning, not Apache-2 licensed

Open source large language models for code generation offer a number of advantages over proprietary models. First, they are more transparent and accountable, as anyone can inspect the code and data used to train the model. Second, they are more accessible, as anyone can download and use the model for free. Finally, they are more sustainable, as the community can contribute to the development and maintenance of the model.

Even though open source large language models for code generation are still under development, they have the potential to revolutionize the way we write software. They can help developers to be more productive and efficient, and they can help to improve the quality of software all while running locally.

Demo of a private personal AI assistant for developers

In order to showcase the usage of a personal AI assistant for developers in an air-gapped on-premise environment, in the following a demonstration is carried out using the following setup & requirements of the involved software components:

Red Hat OpenShift (tested on version 4.11.50)
Red Hat OpenShift Dev Spaces (tested on version 3.9)
Continue VS Code extension (tested on version 0.1.26)
Local LLM Code LLama (with 13 billion parameters) via Ollama
Optional when using GPUs:
- NVIDIA GPU Operator (tested on version 23.6.1)
- Node Feature Discovery Operator (tested on version 4.11.0)

First, the Continue VS Code extension is being installed in OpenShift Dev Spaces. Then, a local LLM is deployed to the OpenShift cluster and subsequently, the AI assistant Continue is being configured to use this newly deployed local LLM. After everything is set up, the usage of a private personal AI assistant for developers is demonstrated using some common development use cases.

Installation of Continue in OpenShift Dev Spaces

Basically the instructions for installing Continue in an on-prem air-gapped environment are followed, as can be found here.

Open OpenShift Dev Spaces & create an “Empty Workspace” (this will use the Universal Developer Image, UDI).

Download the VS Code extension here (download “Linux x64”) and use it directly in OpenShift Dev Spaces, if internet connection is available, or store it in your artifact store of choice, from which you have access in OpenShift.

In OpenShift Dev Spaces, go to the extensions menu at the left, click on the three dots at the top of the “Extensions” page and select “Install from VSIX”. Now select the previously downloaded Continue VSIX file (e.g. [email protected]) and hit “ok”.

Since the AI assistent shall run in an air-gapped environment, a “Continue Server” needs to be run locally.

For experimentation purposes, the Continue server is in the following run locally inside the UDI container. Please note that in a productive developer environment, one would run the Continue server in an automated way for example in a container attached to one’s workspace.

First go to the Continue VS Code extension settings and select “Manually Running Server”. Then restart the OpenShift Dev Spaces workspace.

Next, inside the OpenShift Dev Spaces workspace, open a terminal and then download the Continue server (with version 0.1.73 that fits to the Continue extension version 0.1.26) from PyPI and run it using

pip install continuedev==0.1.73
python -m continuedev

Optional: Test Continue, if an internet connection is available. As default, Continue comes with a GPT4 Free Trial model that connects to OpenAI using Continue’s API key. Thus Continue is ready to be used, in case an internet connection is available.

This page contains information on how to use Continue and can be used to test Continue’s capabilities.

Adapt Continue to use a local LLM

Since the ultimate goal of this demo is to have a personal AI assistant for application development in a private on-prem air-gapped environment, the next step is to use a local LLM within Continue.

When looking into the documentation, multiple local models can be used with Continue. For the demo at hand, the Ollama framework is used as interface to Continue, which is able to run the local LLMs as described here and here.

The Ollama web server that provides communication with the local LLMs is deployed in OpenShift as follows.

Inside the OpenShift Dev Spaces workspace, open a terminal and clone the following git repo: https://github.com/sa-mw-dach/dev_demos.git

In the terminal, login to OpenShift and go to the namespace/project, where the OpenShift Dev Spaces workspace is running (e.g. “user1-devspaces”).

Deploy the resources for the Ollama web server (i.e. a PVC for the Ollama data, a deployment comprising the Ollama web server in a container as well as a service that provides access to the web server) in the proper namespace/project as mentioned above by calling

oc apply -f /projects/dev_demos/demos/05_ai_assistant/ollama-deployment.yaml

Note that here a container image from docker.io is pulled. In an air-gapped environment, one needs to pull this image from the container image registry of choice that is available from within the OpenShift cluster.

Now download the local LLM “codellama:13b” into the Ollama web server by opening a terminal in the OpenShift Dev Spaces workspace and executing

curl -X POST http://ollama:11434/api/pull -d '{"name": "codellama:13b"}'Code language: JavaScript (javascript)

This pull requires the Ollama web server to have an internet connection. In an air-gapped environment, create a new container image where the desired LLMs are inside and use this container image in the ollama-deployment.yaml.

Lastly, the local LLM needs to be incorporated into Continue. Thus go to Continue’s config.py (~/.continue/config.py) and add the Ollama web server as described here by adding

from continuedev.libs.llm.ollama import Ollama

...

config = ContinueConfig(
    ...
    models=Models(
        default=Ollama(
            model="codellama:13b",
            server_url="http://ollama:11434"
        )
    )
)Code language: JavaScript (javascript)

An example of an entire ~/.continue/config.py file can be found here.

This concludes the steps to incorporate a local LLM into Continue and OpenShift Dev Spaces and yields the personal AI assistant for application development in a private on-prem air-gapped environment that can be used as described on this page.

Alternatively using a local LLM directly within OpenShift Dev Spaces

Instead of deploying the Ollama web server directly in OpenShift as described in the previous paragraph, it can also be deployed in the OpenShift Dev Spaces workspace. By using the devfile provided here for creating a new workspace, the following two conatiners are being started:

udi: A container based on the Universal Developer Image as described above, which hosts the Continue server and is used for all other developement tasks as well (like applying terminal commands).

ollama: A container based on the Ollama container image that comprises the Ollama web server and is additonally configured to leverage GPUs by setting nvidia.com/gpu: 1 in the container’s resource request. Due to that configuration in the devfile, the ollama container (and therewith the entire pod) is being deployed on an OpenShift worker node that hosts a GPU, which significantly accelerates the inference step of the local LLM and hence tremendously improves the performance of the personal AI assistant for developers.

Please note that when running the Ollama web server directly withing a workspace using the devfile.yaml, the basic installation and configuration steps as described in the other paragraphs of this page remain the same, despite that the resources defined in ollama-deplyoment.yaml (see previous paragrah) don’t need to be deployed to OpenShift separately.

How to use a personal AI assistant for developers

Please find below a video demonstrating the usage of the previously setup personal AI assistant for application development.

The repo, on which this article as well as the demo in the recording is based, can be found on GitHub.

Please feel free to contacting us (Manuel & Steffen) in case of questions, remarks or ideas regarding AI assistants for developers, GPU usage on OpenShift and integration of AI in applications.