A Guide To The Booming Landscape Of Coding Assistants

13 min readMar 18, 2024

--

In my previous article, I outlined the background, use-cases, and components of coding assistants. This article will describe the coding assistant platforms listed above and overview my experiences with each.

Important notes:
1) This article focuses on coding assistants, not AI software engineers.
2) There is a significant difference in the role browser-based and IDE-based systems occupy. Browser-based systems occupy more of a ‘task-assistant/knowledge-assistant’ role. Whereas IDE-based systems are more comparable to a ‘pair-programmer’.
2) ‘Self-hosting’ refers to whether the coding assistant service can be offered without accessing the external internet.

First, browser-based systems:

Chat-GPT

As of the time writing this article in March 2024, the most capable coding model is Open-AI’s GPT-4 — which can be interfaced with through its API or a Chat-GPT plus subscription.

The external software built around Chat-GPT has been expanding rapidly. OpenAI has named GPT-4 variants, “GPTs”. These variants are either fine-tuned or prompted versions of the base model guided to behave in a certain manner, taught to interface with external software, or both.

In January, OpenAI introduced the GPT-Store — a platform for creating and sharing GPTs. There are now over 159,000 public GPTs (source: SEO.ai). A few GPTs that stand out for coders include:

Web Browser: For code/documentation with links to source.

When asked how to upload to azure blob storage, Web Browser provides the answer, and links to documentation.

ActionsGPT: For OpenAPI Specs.

Creating an accurate OpenAPI specification given the url to the API owner’s website

Whimsical: Turn anything (including code logic) into diagrams

“Please create a diagram for the logic of the following python code”

Data Analyst: Can execute python code, visualize data

Asking data analyst to read in a dataset, describe it, and explore a relationship between two attributes.

There are also now team and enterprise versions of ChatGPT available. These plans promise user data won’t be trained on, allow for internal creation and sharing of custom GPTs, have higher/unlimited message caps, increased security compliance (including SOC 2 compliance verified by external audit), and more — However, for IP paranoid companies, there is no plan involving self-hosting.

Takeaway: Chat-GPT is the most capable, diverse, and mature coding assistant product on the market — Every programmer can find a way gain productivity from it.

Anthropic’s Claude

Two weeks ago (March 4th), Anthropic released Claude 3 Opus, which improved dramatically upon the prior generation. The developing viewpoint is that Opus and GPT-4 are comparable in overall, including coding, performance.

The continuous LMSYS leaderboards (a growing collection of 300,000 human preference votes)

Opus does seem to have an advantage over GPT-4 in precise vision capabilities, as well as large context information retrieval. The most noticeable disadvantage is that Claude currently lacks a large set of external tools like Chat-GPT offers.

Takeaway: There are two main decision points:

How often you use image prompts (Claude’s vision capabilities are better than GPT-4's)
How much you value external tools (Chat-GPT has a much greater set of external tools than Claude)

Google Gemini

In February, Google re-branded their chat service from Bard to Gemini. The paid plan, Gemini Advanced, is currently powered by the model ‘Gemini 1.0’.

The technical report for Gemini 1.5, which significantly outperforms Gemini 1.0, was also recently released. The chat service is expected to be updated to this model soon.

Despite the recent improvement in performance, both Gemini models are noticeably less capable at coding tasks when compared to GPT-4/Claude 3 Opus. Further, while Gemini does have valuable extensions to a growing number of google plugins and also has access to certain categories of users’s google data, it doesn’t have the same expanse of external tools that the GPT store offers.

Takeaway: While Gemini is ahighly useful tool and is improving rapidly, Chat-GPT/Claude will currently better serve most use-cases browser-based coding assistants serve.

Next, IDE-based systems:

Github Co-pilot

Github Copilot, and the remainder of coding assistants in this list, are accessed as IDE plugins. Copilot has three plans, each powered by GPT-4 turbo:

Individual: $10/month. Offers the majority of the development-level features.
Business $19/month/user. Adds enhanced security.
Enterprise: $39/month/user. Enables company-specific fine-tuning, documentation search, embedding-based retrieval augmented generation (RAG), and a pull request summary feature.

Github groups Copilot’s developer-facing capabilities into three categories:

Chat
Code Completion
Smart Actions: inline chat, documentation search, commit message and pull request summary generation, and slash commands.

Inline chat, quick fix recipe. Source: github.blog/KedashaKerr

Documentation search. Source: github.com

Commit message generation. Source: github.blog/KedashaKerr

Pull request summary generation. Source: github.com

Takeaway: Copilot is the most mature IDE-based coding assistant available. It is powered by the most capable coding model available (GPT-4) and has a large set of reliable tooling features that capitalize on integration with github products. No self-hosted version is available, so you/your organization must be comfortable with the associated IP vulnerabilities.

AWS Code Whisperer + Amazon Q

Amazon’s coding assistant comes in two parts:

AWS Code Whisperer — the IDE plugin.
Amazon Q — a workplace assistant, which can be accessed from within Code Whisperer, as well as within other Amazon software platforms.

Code Whisperer offers code completion, chat, and a few built-in chat recipes.

Left: Amazon Q chatbar. Right: Shortcuts/recipes feature on highlighted code

Code Whisperer and Amazon Q have been designed with intention for serving enterprise AWS customers:

Their models have been optimized for understanding AWS APIs
Code generations include references to their likely source (to help avoid license infringement)

Generations include references with licensing information. Source: Youtube/AWS Developers

Managers can set policies on risk models take, or fully opt out of any suggestions resembling public code
Automated Java version upgrade (Amazon Q Code Transformation’)
Semantic-level security scans

Security scan. Source: Youtube/AWS Developers

AWS Code Whisperer and Amazon Q are free at the individual tier. The professional tier, at $19/user/month, provides policy management, embedding-based RAG, and ‘Amazon Q Code Transformation’. Amazon’s rights to train on user data can be disabled in both tiers.

Amazon Q Code transformation is intended to automate Java version upgrades. The service is in preview, and AWS has noted that they are in the process of adding more languages/framework capabilities. I did not test this feature, and there is a lack of 3rd party reviews online.

My experience with Code Whisperer has been a positive one, although it does not have the same performance capabilities as Github Copilot, or any of the browser-based platforms in this list.

Here are three experiences which detail Code Whisperer’s capabilities:

When given this algorithm problem, Code Whisperer can solve it. When prompted with the follow-up, or even using the ‘optimize’ recipe, Code Whisperer impressively provides the improved solution.
When given a more ‘real-world’ example of a go function which extracts frames from a video and uploads each to S3, Code Whisperer does not fully seem to grasp the logic or intention of the code — despite these aspects being well documented in the class. For example: when asked to optimize the function, the model recommends only extracting one frame to improve efficiency — not seeming to understand that the purpose of the class is to upload all the frames.
When tasked with providing a unit test for the following go function, Code Whisperer generates a quality outline for the test, however, fails to recognize that the input object defaults to a zero size, which is an essential condition for the function to run fully. When prompted sequentially to ‘correct any issues’, Code Whisperer does not find the issue. However, when the source of the issue is pointed out, Code Whisperer can provide the solution of defining the size parameter.

Left to right: The origin of the conversation, the initial generated unit test, and the fix.

Further, Code Whisperer can be quite stubborn — it is more difficult to get the model to actually perform what you want it to do than it is for the platforms mentioned previously in this list. For example, when prompted “add error handling” and provided a code snippet, Code Whisperer generates a list of how the errors could be handled — not actually generate the code. Even when given the follow-up “Generate the code”, Code Whisperer doesn’t provide the full code in an easily usable format.

Takeaway: Code Whisperer‘s code completions are effective, it has quality security features, and it’s free for individuals! Further, it can be supplemented with customer-specific data in the professional tier. However, in comparison to alternative options, its chat model’s logical capabilities are lacking.

Tabnine

Tabnine started solely as a code-completion assistant, and has recently expanded their service to include chat and supporting features. Their product has three tiers:

Basic: Free. Code completions from a model which runs locally on your machine
Pro: 90 trial, then $12/month/user. Code completions and chats from external models, basic admin tools
Enterprise: $39/month/user. Self-hosting (on-premises or VPC), models fine-tuned for your codebase, additional admin tools

Tabnine has perhaps the best documentation available among all coding assistant platforms. Additionally, their blog is extensive and covers Tabnine news as well as industry news.

They target a security-focused market — training their models exclusively on permissively licensed open source repositories, offering multiple deployment environments of varying security (including self-hosted on-prem and VPC), and providing admin tools with strict policy management capabilities.

Diagram of a VPC-hosted service. Source: Tabnine/docs/architecture/deployment-options

Tabnine does not utilize 3rd-party APIs for their code generations. This enables them to provide self-hosted solutions, but it also limits the quality of their models.

In my experience, I found that Tabnine chat was more easy to guide and utilize (less stubborn) than Code Whisperer — however, the capabilities of Tabnine’s model was similar, if not slightly lower, compared to Code Whisperer’s model.

Takeaway: Tabnine’s models lack the capabilities to compete directly with Github Copilot — as a result, their Pro plan is somewhat of a low-end budget-Copilot. The Tabnine Enterprise plan distinguishes itself as a standout amongst platforms offering self-hosting. However, it is on the expensive end of the spectrum, and doesn’t support bring-your-own-model (meaning performance will be limited to Tabnine’s model).

Sourcegraph Cody

Sourcegraph began as a code/code documentation search company, and recently pivoted much of their focus to a coding assistant product. This history helps explain why their main selling point is focused on their platform’s RAG capabilities.

Cody performs RAG using embeddings, keywords, or a blended solution (recommended). The embeddings are sourced from a pre-built search index that users can initiate at the beginning of their session.

After selecting this option, an embedding index of the current repository is generated. Source: Cody AI Visual Studio Marketplace

Cody offers two plans:

Pro: $9/month/user. Choose backend models from a variety of options. Unlimited chat, unlimited autocompletions, embedding-based RAG on local codebases
Enterprise: $19/month/user. Everything from pro, option to bring your own model, embedding-based RAG on nonlocal codebases (Up to 10 repositories), self-hosting (soon)

Both Cody plans come with tools such as inline chat, a prompt recipe toolkit, custom recipe generation, and natural language file search.

Source: Cody AI Visual Studio Marketplace

I used Cody with its default model, Claude 2. I was very impressed with Cody’s ability to gather context, even in large repositories. Natural languages questions like “where is this variable set?”, “under what circumstances would this function be called”, were answered sufficiently the majority of the time. This is an especially weak performance case for most other platforms.

While I was very impressed with Cody overall, the platform has much maturing to do — There seem to be numerous bugs and the user experience is somewhat difficult understand. Source code does tag many of the features as beta — seeming to be fully aware the platform is in its early day.

Takeaway: Regardless of how effective a model is, if it isn’t given the necessary context, it cannot provide valuable responses. Cody’s context awareness not only makes the platform better, but enables/catalyzes numerous use cases. I found Cody to be extremely helpful in scenarios where coding assistants without quality RAG systems completely failed.

Codeium

Note: There are two ‘Cod_ium’ coding assistants (Codeium and CodiumAI)(They are not associated)

Codeium is a small, rapidly growing, startup. The platform stands out for multiple reasons including: a free high-quality individual tier, a high-quality embedding-based RAG system, and self-hosting options.

The service has three tiers:

Individual: Free! Unlimited autocomplete and chat with their proprietary model (which they claim has slightly-better-than-GPT-3.5 performance on coding tasks)

Teams: $12/use/month. Admin dashboard/management. Personalization on code base. Multi-repository indexing.

Enterprise: $50/user/month. Self-hosting (on-prem/VPC). Fine-tuning on codebase(s).

Each tier also includes inline chat, recipes, and code lenses.

Left: The available options after selecting the ‘refactor’ code lense. Right: The generated diff after selecting the ‘add logging statements’ option.

For a free-of-cost product, Codeium’s individual-tier product is an incredible offering. Their proprietary model is below the level of GPT-4/Claude 3 Opus, but nonetheless provides value generations that don’t diminish the performance of the assistant. In my experience, Codeium’s free-tier is significantly more useful than Code Whisperer’s free tier.

On right: A Codeium-generated (GPT-4) unit test for an LRUCache class (above) and display of ‘Codeium Explain Problem’ button. On left: Output of the button. Codeium recommends adding an import, and provides an insert button to do so.

Similarly to Sourcegraph Cody, Codeium provides an embedding-based context awareness system. From my experience, Codeium’s context awareness system seems just as effectives as Cody’s. Further, the indexing seems to take a shorter time, and the scope of the context can be customized within the repository if desired.

Takeaway: Codeium is the most enhancing coding assistant I have utilized. The platform has all the valuable features, access to the best-of-the-best models, SOA RAG, isn’t buggy, and even offers self-hosting.

CodiumAI

CodiumAI has two very unique products:

“Codiumate”: an IDE-based coding assistant with the specialization of code testing.

Codiumate utilizes very interesting prompting/tooling techniques. When selecting the ‘Test this method’ code lense, Codium generates a code explaination and test suite for the selected code.

The generated test suite includes chat bars to adjust generated tests, as well as a set of possible test cases descriptions which could be generated to add coverage.

Further, for Python, there is an iterative ‘run and auto-fix feature’ in which generated tests can be run, and if a failure occurs, the platform will use the feedback from the failure to adjust the test case. In my experience with the platform, this appears to be highly effective.

Codiumate utilizing a failed test’s feedback to iteratively improve it’s generation.

2. PR-agent: a git extension currently compatible with gitlab, github, and bitbucket which provides auto-documentation, auto-labeling, and more forms of review.

PR-agent has many capabilities and is intended to help expedite the review process.

Takeaway: Python test/documentation generations, which make use of the auto-fix feature, are incredibly useful and unique among coding assistants. Unfortunately, this feature is only available for Python thus far. Further, the proprietary model Codium uses, for chat and generation doesn’t seem to be quite on-par with what is expected to reach the top tier coding assistant level.

RefactAI

Refact is unique in that they provide an open-sourced basic version of their platform. This open-source project allows users to easily setup a local server hosting LLMs, which can be interfaced with through Refact IDE plugins.

Screenshot of the Refact server dashboard model hosting page

The refact platform allows for model assignment, sharding, and GPU sharing. Refact is compatible with a large selection of LLMs.

Aside from their open-sourced product, they also offer three tiers:

Free: Enables the plugin to be configured with GPT-3.5 turbo.

Teams/Pro: $10/month/user. GPT-4 chat.

Enterprise: Custom pricing. Load balancing for mult-user/multi-gpu setups. Model fine-tuning. Admin dashboard. Notably, a 3-month trial for their enterprise version is offered.

Takeaway: Refact is more tightly coupled with the open source community than other platforms. They offer more flexibility, and an easier way to test the waters of coding assistants. The platform isn’t flashy, and is working out bugs, but it offers the core components of what developers want in their coding assistant experience.

Continue.dev

Continue’s main product is a fully open source IDE-extension. They offer a request-count-based free trial with access to models like GPT-4, Gemini, and many more. After the trial ends, it’s up to you to host your model. Continue plans to monetize with a paid data engine — which hosts and continuously trains LLMs on user data.

Diagram of continue’s future data engine. Source: Y Combinator/Continue

Continue’s IDE extension offers code lens/recipes, embedding-based RAG, autocomplete, and chat. Despite being a fully open-sourced project, my experience with the vscode extension has been smooth and easy to use (The Jetbrains extension, however, is significantly buggier). Further, their documentation is quite good.

The platform is highly configurable. Models can be accessed via external APIs, or with a locally-hosted LLM through providers like Ollama. Notably, Continue supports configuration with Azure’s OpenAI service, which allows companies to interface with OpenAI models through a contained VPC.

Takeaway: Continue is a fantastic option for cost savings and gives the user high levels of control over customization of their coding assistant solution.

Hopefully this article was useful to you! Please leave a comment if you think I left something important out, find an error/inaccuracy, or have any other feedback.

Youtube: https://www.youtube.com/@aiwithjustin2897

LinkedIn: https://www.linkedin.com/in/justin-milner-b190467b/