Building AI-powered software engineering tools: Essential commercial considerations for founders

Published in

Innovation Endeavors

14 min readJun 20, 2024

In 2022, Github caught lightning in a bottle by releasing Github Copilot, their AI coding assistant. Today, over 37,000 businesses–including a third of the Fortune 500–use the product, which can reportedly help developers code as much as 55% faster. This, however, is only the tip of the spear in terms of what’s possible when AI is applied to software engineering. Many aspiring founders, themselves engineers, want to get in on the ground floor of this industry revolution by bringing products to market that drive productivity gains across the SDLC.

We spent several months meeting researchers and companies pushing the boundaries of AI-enabled software engineering, from code generation and testing to migrations and beyond. In this two-part series, we will share our guide for building AI-powered developer tools and discuss some areas in particular that we are most excited to see disrupted by this technology.

In this post, a handbook for founder CEOs, we cover the key business and commercial decisions that influence how startups in this space are built and a few opportunities that we think could lead to big companies.

If you’d like to dive more into the technical considerations around design patterns and challenges related to bringing AI into software engineering tools, check out our previous post: “Building AI-powered software engineering tools: Essential technical considerations for founders.”

Business model considerations

We believe there are six factors founders should consider when trying to scale a developer productivity tool into a company: value proposition, pricing model, selling tooling vs. selling results, proof of value, recurring revenue, and scaling go-to-market.

Value proposition

We generally see three ways developer productivity tools create value for their customers:

They let developers offload less desirable tasks

Engineering tasks seen as tedious, lower-impact, or lower-visibility are often seen as less desirable than those considered cool, present a high degree of technical challenge, or can lead to bonuses and promotions. In these applications, there is high motivation for engineers to bring in tools that allow them to better leverage their time by automating substantial portions of this work. Examples of this in practice include spaces like testing (Momentic, CamelQA, and Nova), documentation (Mutable.ai), code reviews (Bito.ai), and security fixes (Pixee). It’s not that these tasks are unimportant, but rather, there is less motivation for engineers to focus time in these areas versus delivering things like highly anticipated features or cost-slashing code optimizations.

They pull forward deferred work

In a constrained resource environment, engineering leaders consider many tasks to be important but not urgent and, therefore, fall below the current sprint cutline. In these situations, it is possible for a new company to come in, perform the tasks that would have been deferred to a future sprint, and share in the value created. Code hygiene and tech debt are good examples. Savvy engineering leaders know that all tech debts eventually come due but struggle to prioritize them against immediate feature work. Many try to dedicate a fixed percentage of sprint time to addressing tech debt, but this is crude and inefficient. This creates strong incentives to bring in a company like Grit.io, Second.dev, or ModelCode to handle these projects.

They upskill existing team members

A developer’s skill can be captured in many ways, but perhaps the three most relevant ones are (1) their total coding throughput, (2) the quality of their code, (3) their ability to solve hard technical problems. For some tools, the core pitch offers ways to broadly improve the engineering team’s skills along these attributes rather than accomplishing specific tasks like a code migration or a security patch. For example, coding copilots like Augment or Github Copilot cut down on the time required to plan and write code, thus increasing the effective throughput of individual devs. Code optimization tools such as Espresso AI and CodeFlash make it easier for devs at all skill levels to write optimized code, which enhances the overall quality of the codebase. Tools that are meant to aid in things like debugging investigations (like Goast or OneGrep) help all devs debug like experienced engineers. One benefit of tools addressing this value proposition is that it is relatively easy to justify a recurring contract. The tool becomes deeply ingrained in the developer’s workflow and is used often, hence a subscription or recurring license model makes a lot of sense.

Pricing model

Once you’ve considered value creation, the next step is to consider value capture, which is your pricing and revenue model. Ideally, you want to price so that you win when your customer wins.

Seat-based pricing: This works well for products whose value can be clearly attributed to individual developers. Copilots, for example, are used by individual engineers and therefore make sense to price on a per-seat basis.
Outcomes-based pricing: This is ideal for products that deliver value at defined milestones and whose ROI is recognized as a step function, with most of the value accruing once the core job-to-be-done is complete. For example, a code migration tool can be priced on a per-project basis because all of the value is realized once the migration is complete.
Pay-as-you-go: Pay-as-you-go is the bread and butter of infrastructure companies, often manifesting as compute-based or storage-based pricing. This model is seen as favorable for customers who prefer not to be locked into a fixed contract, for example, paying by lines or size of code transformed for code modernization companies or bug fixes accepted for automated program repair companies. A variant of this pricing model would be to implement some kind of “credits” system that is effectively compute-based pricing. For example, you may sell a bundle of 50 “credits” that get consumed whenever users leverage AI features; this allows you to scale revenue with infrastructure costs, which can be significant for AI applications.
Tiered licenses: This is the most freeform of the pricing strategies and generally consists of several tiers of annual licenses depending on the size of the customer and their level of pain. The pricing should scale with the size of the customer (so an enterprise pays more than a scale-up) but usually does not scale linearly. This can be used to expand pricing while giving the customer predictability on their bill.

Selling tools versus selling results

Some customers want the end results that a tool promises but are reluctant to allocate the corresponding financial and headcount budget needed to operationalize the tool itself. In short, they really want fish, not a fishing rod.

A concrete example of this is high-complexity migrations. A segment of the market will happily pay a consultancy like Accenture or Wipro to migrate their IBM mainframes to AWS. They outsource this work today precisely because they don’t have the willingness, skill, or bandwidth in-house to take on that project. For this segment, a tool their team could use to potentially do this migration themselves is uninteresting, as they really want to pay for outcomes.

At the same time, another segment of this market has been burned by consultants in the past and is deeply skeptical of accepting finished code from 3rd parties. These teams don’t want a “skip the line” option for things that are lower on their backlog and would much rather invest in better tooling that helps their own employees be more productive and achieve the organization’s goals faster. Selling a tool to this segment is a higher-velocity, higher-margin business relative to selling services.

Proof of Value

One of the critical stages in the sales process is the proof-of-value step, in which customers verify that your tool can “walk the walk”. We can think about the complexity of proving value across three tiers.

The lowest tier of complexity would be cases where an individual developer can evaluate the tool independently and see its value without needing IT approval. An AI-enabled IDE like Cursor is a great example: a developer can try it for a personal side project and develop conviction in the usefulness of the product before taking it through the approvals necessary to allow it to be used on company data.

The next tier of complexity is when the product needs IT approval but is still relatively easy to test and validate. These are products that only show their true value once they are integrated with the company’s internal systems–think AI-assisted debugging tools like Goast or OneGrep or end-to-end testing tools like CamelQA or Blinq. Because they need to be integrated with confidential company data, IT will need to be involved to validate the security and compliance of those tools. However, the systems are relatively easy to test and validate because they are simple to integrate, show value soon after integration, and enhance pre-existing engineering workflows.

The most complex proof-of-value is one that requires IT approval and significant investment to test and validate. One example of this would be the code modernization space (companies like Second and Grit). These tools are authorized to make large, sweeping changes to codebases, and hence, IT will require extensive security reviews and controls before granting full access. Engineers will need to spend time prioritizing and assigning use cases to the AI, guiding the AI agent, and reviewing pull requests. Depending on the complexity of the modernization or migration, the value of the tool may only be clear months or quarters after the initial onboarding. Such products often command high ACVs and stickiness once trust is developed between the customer and vendor, but navigating this complex sales motion will require a skilled GTM team.

Tactically, the difficulty of proving value affects how companies should think about free trials, POCs, and paid pilots. If a product falls into the first tier (easy for an individual developer to try), it may be worthwhile to have a very generous free trial or “free-forever” tier for individuals to entice developers to kick the tires on your product. In the second tier, where IT needs to get involved, a free POC might be helpful to encourage your champion to work internal approvals to onboard your product and, given the product is easy to integrate and shows value quickly, does not dramatically increase your sales cycles or customer acquisition cost. In the third tier, a paid pilot is usually necessary to get the buy-in/skin in the game from customers and justify your support and onboarding costs.

Recurring revenue

Many workflows in software engineering tend to be one-time tasks or have lower residual value relative to their initial value creation opportunity. As an example, consider the category of “prompt-to-app” tools. In this case, the initial workflow is quite magical because AI automates a substantial portion of the overhead required to set up a web app. Once you have a strong starting point or boilerplate code, though, the job is complete, and the willingness to pay becomes primarily driven by other needs like hosting and maintenance, which may command different ACVs. By contrast, AI issue triage and bug fixes lack this self-cannibalization mechanic and could have more consistent recurring revenues as new bugs and issues are always being created.

Although non-recurring contracts can be quite large, the pressure to re-earn your revenue every year leaves a company vulnerable to disastrous, unpreventable churn. Ideally, SaaS businesses want their customer revenues to expand, not decline, with contract renewals to supplement new logo growth. There are, of course, exceptions to every rule. Large contract sizes or an extremely long tail of new projects per account are both ways around non-recurring contracts. But, broadly speaking, this is an area to tread lightly.

Scaling go-to-market

There are generally two ways of tackling a market: go after a select few high-ACV contracts or mass-market low-ACV contracts.

In the former case, you should plan on selling to large enterprises. You will most likely engage in a traditional 6–12 month sales cycle involving many senior architects or decision makers (usually the CTO/VP Engineering), countless security and compliance reviews, and rigid, formal procurement processes. The customer is likely to have a fair amount of legacy infrastructure and demand a high degree of support and customizability but will hopefully be less reluctant to churn and may scale revenue quickly if your product proves valuable. For your efforts, the customer should pay you on the order of $100K-1M a year. This means, however, that the problem needs to have executive-level visibility, impact, and prioritization.

In the latter case, you should plan on selling to individual developers, most likely at Series B-D high-growth companies. Your product should incorporate some element of self-service so you can grow through product-led growth and viral word-of-mouth via HackerNews or Twitter/X. This also means you should take cues from how consumer companies think about metrics, activation, and retention. The product should be easy to try with limited security/compliance overhead, as individual devs may be less motivated to jump through hoops to onboard your product and have a well-orchestrated activation and onboarding funnel to ensure folks who sign up to convert to long-term customers. Ideal pricing or contract sizes for this is likely on the order of $20–100/developer/month, or roughly a $25K–100K contract for mid-size or growth stage companies.

Given these options, when making a decision about which kind of company you want to build here are a few questions to keep in mind that might lead you down one path or another:

Does your product require executive-level approval? If so, does it have executive-level impact?
Does your problem resonate widely in the developer community? Is it something that developers would want to share with their peers?
Is the solution consistent or highly customized from customer to customer? Is it something that can be made easy to try? Is it something a developer might initially test on a side project?
Would you rather manage more account executives or developer relations engineers? Should you think about success in terms of “millions of developers” or “hundreds of companies”?

Exciting future areas

The space of developer productivity tooling is quite nascent and it is difficult to predict exactly what new capabilities we will see emerge over even just the next few months. However, we thought it would be helpful to discuss four broad areas, one research and three applied, that we find particularly compelling.

Code-specific artifacts in foundation model architectures

From a technical capabilities perspective, one of the foundational areas of research that we expect will garner increased attention in the next few years is around incorporating more code-specific artifacts into model architectures. Today, we use the same model architecture to learn both natural language and code. Hence, LLMs like StarCoder treat code as simply another language and rely on higher-quality training data as a primary means of improving coding performance. However, it is intuitive that the “grammar” and behavior of code is unique and should deserve special attention in model architectures. As an example, consider the architecture proposed by Ding et. al, which proposed adding additional layers to a coding model to learn intermediate execution states and code coverage, resulting in an improvement in certain code understanding tasks over competing models.

Services businesses monopolizing specialized markets

Wipro’s gross revenue during the 2023–2024 fiscal year was close to $11B and we believe some portion of this revenue can be disrupted by AI. Already, we see companies like Mechanical Orchard looking at mainframe-to-cloud migration, which could be applicable in areas as diverse as manufacturing, mining, accounting, insurance, pharma, chemicals, and oil and natural gas. There are many other projects in scope for consultants that can likely be automated with AI, including vendor integrations, data migrations, service re-architectures, optimizations, and custom applications. Existing foundation models and AI agents trained mostly on modern enterprise SaaS codebases in agile engineering cultures may not be effective in these markets or for these applications, and it may be interesting to explore whether this allows new companies to be built.

SRE workflows

Production outages and incidents have a number of factors that make them particularly painful. First, only a small fraction of logging, metrics, and other observability data are ever used, yet total volumes of data are increasing exponentially, making root cause analysis an even harder needle in a haystack problem. Second, the worst outages involve multiple teams and services, wherein few people understand the full context of every subsystem involved. The current approach is to create an incident bridge with hundreds of engineers. Third, incidents and outages directly affect revenue. Facebook, for instance, notoriously lost close to $60 million in revenue due to an outage in 2021. Finally, observability costs are skyrocketing, meaning customers are paying more to still suffer from outages.

There is a lot more to be said about new tooling that needs to be built for the observability space, but suffice to say that AI agents, such as the one from Cleric could be dramatically disruptive for how incidents are triaged and managed today. In particular, we find it interesting that (1) automation can ingest more data at scale and develop a holistic picture of the health of a system, (2) AI agents are able to iteratively explore complex systems and bring in functional experts as needed by integrating with Slack, and (3) AI can dynamically identify what data is the most valuable to keep versus discard while also dynamically altering the volume of data in response to particular incidents.

Validation and Verification

The methods we have to validate code today primarily consist of unit testing and static analysis. However, as AI-generated (or AI-modified) code becomes the majority of our codebases, the next problems to solve will be around verifying and validating that code at scale.

We believe the next large company in this category will be one that compounds insights across a number of related domains: AI, compilers, static analyzers, profilers etc. to deliver a complete end-to-end code quality platform. We imagine the core features of this product look something like:

Integrates from the IDE through to the CI/CD to catch issues at all stages of the SDLC
Provides coverage across security, performance, and functional testing
Generates, prunes, and maintains a suite of unit and integration tests
Augments AI-driven evaluation with deterministic evaluation from static analysis, compilers, or formal methods
Leverages various compute optimizations to run evaluations efficiently
Is extensible enough to interact with or guide other AI agents that may be operating on the same codebase

Conclusion

In this piece, we covered the core elements that define how an idea or problem statement can form the basis of your business model for a large, independent company. One way to summarize these learnings would be to think in terms of two guiding questions:

#1: How and when will value get created for your customer?
How will your customer measure ROI? Do you continuously deliver value over time or at discrete intervals? How will you align the value creation (ROI) and value capture (pricing)? How do you ensure the customer embraces the product and is positioned to be successful with it?
#2: Will your champion be the CTO/VP Engineering or an individual developer?
In the former case, are you delivering enough ROI and ACV to justify a long, involved sales cycle? In the latter, can you grow virally through PLG, create value for individual developers, and make sustainable margin on per-developer pricing?

Although the ideas in this post can help you craft a business model to bring an AI-powered developer tool to market, there are a number of nuanced technical decisions and tradeoffs that founders also need to navigate in order to find product-market fit. For that reason, we shared a companion guide for founder CTOs that covers the common design patterns and challenges we see across the industry today.

If you’re building in this space or would like to discuss some of the ideas in this post more, we would love to hear from you. Reach out to us at diyer [at] innovationendeavors [dot] com or harpi [at] innovationendeavors [dot] com.

We would like to thank many founders and experts who have helped us in this research, in particular, Momentic, Goast, CamelQA, PolyAPI, Second, Nova, Flytrap, Mutable, and Davis Treybig