Rapid development of production-ready GenAI MVPs with Streamlit

Sam Patterson
ProSiebenSat.1 Tech Blog
9 min readOct 17, 2023

🎙️ How we are architecting extensible Streamlit apps to help our media business stay ahead of the curve ⚡️

As the AI Products department of a large entertainment company, it probably comes as no surprise to you that we have a lot of business units lining up to get our help productionalising their GenAI use cases.

Photo by Google DeepMind on Unsplash

The challenge for us is to respond quickly and with minimal effort so that we can help our colleagues realise the value of GenAI as fast as it evolves.

Fortunately, using our department’s programming language of choice, Python, we’ve been able to take advantage of Streamlit, a popular web app framework suited perfectly for this situation.

Frontend development with Python … really 🧐

As a team of Python-first Data Scientists and ML Engineers, a frontend for us previously relied on low-code dashboard solutions or external contractors. Prototypes were most often developed as Jupyter Notebooks that were not easily accessible to end users.

So to make use of the existing Python skillset across our team, get user feedback early, and iterate quickly, the decision to try out Streamlit and its ecosystem of third-party components was clear. But, as a software engineer, I had concerns about how well it would mature after we claimed those early wins. How would we make sure that each use case we developed didn’t turn into a maintenance nightmare that would slow us down in the long run?

Looking online, there are a lot of Streamlit demo apps that I feel justify this concern. They present powerful behaviour for users and didn’t take much effort to develop, which is great. But the architecture of their codebases leaves much to be desired. By not applying common software design principles, many of these applications tend to have high coupling, low cohesion, low reusability, low extensibility and would be quite difficult to unit test.

Photo by pipop kunachon

If we were to develop MVPs with these design flaws, we would get results quickly but development would eventually grind to a halt if our small team had to maintain and improve our first clunky MVPs while also developing new MVPs for the next use case in line.

So to overcome these technical concerns and empower ourselves to move even faster on new MVP development, we’re using an architecture that promotes separation of concerns, reusability, and extensibility.

A way to keep everything organised 🧹

Below is an overall picture of our architecture.

High-level architecture

Each use case MVP is defined by a package in the backend and a page in the UI.

In addition to the MVP backend packages, we have core backend packages that interface with database storage (e.g. AWS RDS & S3) and the LLM APIs (e.g. OpenAI & Vertex AI). This core package isolates these dependencies and ensures that they are re-usable across existing and new use cases.

A similar pattern is applied on the frontend. Common components and pages like the login page (using Azure Active Directory via the MSAL Streamlit Authentication third-party component), styling, and page layouts are provided by shared UI packages which can be re-used by each use case implementation, resulting in lower cost-of-entry, less duplicate code to maintain, and a consistent look and feel across MVPs.

The architecture additionally includes lightweight middleware services, app configuration, and an app entry-point. These are responsible for bootstrapping the application, sending users through the authentication flow, presenting a menu, and then sending them to their selected page.

Finally, the way we deliver the application to our users is via an AWS ECS Service, Application Load Balancer (with ACM SSL), and a Route 53 domain.

How to move fast, without the breakage 🧑‍💻

Frontend & backend separation

As explained above, the architecture is designed to separate backend logic from frontend Streamlit code. Although this is common sense, it seems to be sometimes forgotten about in Streamlit apps.

For the backend, this lets us test business logic thoroughly and easily slice behaviour out and use it in the backend of other applications.

It also helps us organise our modules in more meaningful ways. The backend is organised to reflect the business domain and flow of data. In the frontend, we are able to organise the code to reflect the visual interface that our users see and interact with. More on that in the next section.

Additionally, although we’re currently only using the middleware layer for the frontend code, it is not directly dependent on Streamlit, so could easily be utilised by the backend. It is also designed to be extensible, in case we decide to add other middleware services like say logging or a message queue.

UI composition

UI frameworks all take advantage of composition. If you have any frontend development experience you would be familiar with the concept of building UIs out of small reusable components.

Streamlit is also built with composition in mind. Not least with the support for third-party components. But for some reason, many Streamlit apps don’t do too much to take advantage of the concept in their codebases.

Our architecture promotes developing small parts of the UI and combining them to make a page. If one of those small parts is general-purpose enough, then it is placed in the shared component library so other use cases can take advantage of it. This helps us move quickly when developing new use cases and keep a consistent look and feel across the wider application.

Photo by Astrid Schaffner on Unsplash

Continuous integration

We apply the following four types of automated testing locally and in our GitLab CI pipeline:

  1. Static testing — linting, auto-formatting, type checking to ensure a high quality standard across the codebase, tracked with Teamscale.
  2. Unit testing — focusing on the isolated chunks of business logic we develop for the backend since unit-testing of the Streamlit frontend components is not trivial.
  3. Integration testing — ensuring the backend works together as a whole to achieve the GenAI workflows that we develop, which also confirms that our backend can be used in other non-Streamlit app products.
  4. Acceptance testing — where we automate UI testing, using Playwright for Python, to ensure users will get exactly what we’ve agreed upon.

When developing a new GenAI use case, we start prototyping with an exploratory mindset since the detailed requirements are not immediately obvious. This means we don’t necessarily follow a full test-driven development process, rather as soon as we start to land more precise requirements, the test suite is extended with new automated tests to lock in the desired behaviour.

Continuous deployment

After the code is tested, reviewed and merged, our GitLab CI pipeline is set up to automatically release the new version of the app’s docker image to AWS ECR, update the CloudFormation stacks, trigger ECS to start using the new version of the image, then finally run Alembic schema migrations and some custom master-data migrations on the productive database.

Using ECS also means that we can configure auto-scaling rules to scale out and in the number of ECS Tasks in the service to match the demand of our users.

Since our MVP development is being done with high-frequency interaction with our users, we are using a trunk-based development process that allows all new features to be integrated and delivered to customers as quickly as possible.

Security & auth

In addition to the above, to make this production ready for the business, we put the deployment behind a Route53 domain with AWS ACM SSL certificate for HTTPS for a secure connection between client and server. Our VPC and corporate firewall configurations also mean that only on-prem or VPN traffic can access the deployed application.

Then for login, we use the MSAL Streamlit Authentication third-party component to integrate with our company’s Azure Active Directory instance for group- and role-based authentication and authorisation. This means our PO can easily onboard new users, giving them as much or as little access as they require.

What we’ve experienced so far 🔎

After building the first two MVPs and refining the architecture, a prototype of the third was accessible to the business within a matter of hours 🔥

With very little cost-of-entry, we are able to deliver our customers a functioning first version of their idea that we can then iterate on based on their concrete feedback. Then, because of the architecture and our DevOps processes, as the prototype evolves into being a MVP, there is no extra migration effort required to get the app production ready because it already is.

The engagement and feedback from our customers have been great so far. Generative AI provides different capabilities than our other AI products, so it is even more important that we work closely with customers to find where exactly it can contribute value. Aside from iterating quickly on their feedback, our developers conduct job-shadowing with users to build a shared understanding of the benefits and limitations of using LLMs in practice.

Photo by John Schnobrich on Unsplash

Another benefit is that we can resource the MVP development effort with our existing Python developer skillsets. There is no requirement for a full-stack or dedicated front-end engineer, so any of the Data Scientists and ML Engineers in our team are able to contribute. And since the architecture focuses on the separation of concerns and composition, we can work in parallel on it and on-board developers quickly, mitigating the risk of just having one or two people understand how the application works.

While our experience has mainly been positive, it is not without some pain. Although we aim to separate the backend and frontend, because the app is still executed as a single Streamlit run-time, we can still trap ourselves into putting details about the presentation into the backend or business logic into the frontend.

One way that we try to mitigate this pain point is by using the Architecture Conformance feature of Teamscale, which gives us an ‘as-built’ picture of the dependencies across the codebase, and alerts us in Merge Requests if we’ve introduced an architecture violation so that we can resolve it before approving the new feature.

In our next iterations, we will also be exploring the use of a dedicated FastAPI service for the backend 👇, which will help make the decoupling even stricter.

Wrapping it up 🌯

Overall, the architecture is helping us deliver value to customers in rapid and incremental development cycles, with a reliable and secure deployment, and without a hot mess of a codebase. And most importantly, the customers are happy.

If we decide to move an MVP away from Streamlit, then since the underlying business logic is tested and decoupled from the fronted, it will be straightforward to port that behaviour over to being the backend of a different user interface.

We have built three GenAI MVPs, each being used by different business units. A fourth MVP is already underway, and we don’t have plans to stop there or limit ourselves only to GenAI use cases 🚀

Photo by Alexander Sinn on Unsplash

--

--

Sam Patterson
ProSiebenSat.1 Tech Blog

Senior ML Engineer at ProSiebenSat.1 Media SE. PhD in Decision Science, School of Mathematics, QUT.