Why I Started Building Apps with LLMs

6 min readJul 20, 2023

Droidy McDroidface cartoon: An android rambling on with ChatGPT-like phrases. — Obligatory image of an android (since this is an article about AI). If you’re ever bored, try reading ChatGPT’s responses in the rambling voice of Gavin from Kids in the Hall. *Artwork by me*.

Software’s shift towards AI in the eight months since ChatGPT’s release has been one of the biggest transitions I’ve seen in my career. It has had an even deeper impact on my own thinking and development processes than the shifts to Mobile and to Cloud did. As I started to recognize the novel superpowers of these Large Language Models (or LLMs) earlier this year, I too decided to jump on the AI bandwagon. Its lure for me was certainly about the potential applications of a powerful new technology, but it was also about the neat-o new challenges it presents.

Most of my work focuses on application development (Mobile, Cloud, and Web), but I have developed AI models and have had the technology on my mind for the past seven or eight years. In 2016 our iOS development company, Hello World Engineering, encouraged clients to consider incorporating Machine Learning into their apps for specially-targeted use cases. We argued that small and simple but effective models could be trained and deployed in a month’s time provided that there was enough good data and a desired feature valuable enough to justify the development effort.

Neural networks were starting to emerge, and ML was a game dominated by players with deep pockets for teams of specialized engineers and the large computing resources that they required. There were a few good pre-trained models available to build with, and we had good success embedding OpenCV in an iPhone app, for instance. But working with ML and AI successfully generally meant judicious decisions about which features were worth the upfront effort of data gathering & labeling, feature development, and many rounds of model training to get it right.

Today I have bought into the vision that LLMs will be one of the most transformative technology innovations for the next decade plus. While the old way enabled specific, well-defined AI functionality with an upfront development cost, today’s LLMs offer virtually limitless functionality with a development cost that is orders of magnitude less. What was a hefty technology development problem has happily crossed over into an application development problem.

Allow me to try and explain 🙂. The best overview I’ve seen to date of LLMs, how they work, what they can do, and where they are headed is The Economist’s April 22nd edition article Large, creative AI models will transform lives and labour markets (subscription required).

Screenshot of the article from The Economist. — Read this article if you can get your hands on a subscription to The Economist.

The basic story: LLMs are massive algorithms that are thoroughly self-trained on the bulk of quality written content that we have access to today. As such, they are black boxes whose inner workings are largely inaccessible to us, and the expanse of their capabilities are described as emergent — discoverable rather than planned and engineered. They have learned and built-in our languages, literature, science, history, and so forth that is represented in the written corpus, and we can now access that “understanding” through “conversations” in our native language.

This difference is really appealing for someone like me. It fundamentally changes the role of an applications developer that leverages AI from acting as a planner into acting as a prospector. The capabilities we’ll use for years to come already exist in the models of today, and it is our responsibility to think creatively about what those capabilities might be and to uncover, exploit, and apply them. It is an exciting challenge!

Like many, I have spent hours throwing challenges at ChatGPT and Bard in freeform chats to see what they might be able to handle. I have integrated Github Copilot into my development process and been delightfully surprised at what it figures out. I have also been dabbling with what these LLMs might be able to figure out from text extraction on photos. What might we be able to do with our smartphone cameras that was prohibitively difficult just a year ago?

Here’s an example that’s pretty typical in my experiments. I took a picture of a product label on a space heater I had hanging around and asked ChatGPT and Bard a sequence of questions to see exactly how much data I could draw out of them. This kind of data flow could be used in practice for an app that can estimate the value of items you have hanging around the house. Or perhaps a number of other useful things with a little imagination.

To keep Droidy McDroidface from over-explaining everything to you, you often have to request what you want very piecemeal, doing what some call “prompt engineering”. But I was able to get some reasonably accurate and structured data that I could believably use in software and write to a database within maybe an hour of fiddling around.

Flow diagram with steps 1) Photo of Something, 2) Raw Text Extraction, 3) Sequence of Prompts Sent to LLM, and 4) Nicely Structured Data. — With the aid of focused prompts, LLMs can make pretty good sense of unstructured data through their general knowledge of the world and ability to infer. It didn’t get all the details right about this space heater, but it did extract many useful product details and estimate the price accurately.

I was able to get a decent level of performance quickly, sidestepping all of the overhead of data gathering and model training, and I didn’t have to write any real code — just prompts! This is a dramatically faster approach to develop features with, and I can use prompts to get at any objective I can dream up, so long as these AI models might be able to handle it with their super broad and deep training. This enables us to build a lot of valuable functionality within short development cycles.

What’s neat about it is that the core problem to solve is using the English language to guide a human-like system to produce the response that I want. That is a very different problem than writing compiled code, and it has some potential to be pretty fun.

I continue to experiment, build, and learn. In my experience, I have observed that LLMs have three killer features that sit at the core of how they are able to produce value:

Proficiency across all languages: LLMs can work in any written language — 100+ human languages, programming languages, mathematical, or otherwise. And they can work within language contexts such as writing a casual email, a legal document, or a poem.
Generalized logical understanding: Despite their laughably poor grasp of the concept of facts, LLMs have deeply imprinted within them the capability to follow abstract logic and paths of reasoning.
Broad general knowledge of our world: Besides being able to recall facts about nearly every subject, this gives LLMs the crucial capability to infer — handle incomplete information, attribute likely meaning, fill in missing details, fix errors, and so on.

When combined with what LangChain calls Data-Awareness and Agency, the picture emerges that we’re ready to build a lot of software structure around this new approach. Functionality, like the price-guessing bot in the example, can be refined to get things really right and to do so at scale, in real products and services. I’ll get into exactly what I think those needed software structures are in following articles.

To be clear I do think that it’s also an awesome time for training models, and that has become much easier to do. It’s just that I see enough potential in working directly with LLMs to keep me and many others busy for quite a while.

There’s a lot of interesting work to do and power to be unlocked.

Next Up: A deeper dive on the unique engineering challenges of LLM data pipelines.

Why I Started Building Apps with LLMs

Written by Jeremy Huff