User Stories in AI Model Development Cycle

John R. Ballesteros
8 min readAug 8, 2023

--

Traditional software engineering procedures may not be applicable to the development of AI Models, why?

Because, compared to current stories like, for instance, the resultant value will be the sum of two variables (x = a+b), AI Models are, in many cases, black boxes, or in recent reported scenarios, real data exhibits a different behavior to training data, which leads the model to perform unaccurately. However, can be still of value if used correctly in AI Models Development. To describe how we can make use of this concept, let´s start first with some definitions:

What is an User Story in software development

User stories are simple yet extremely powerful constructs: they describe pieces of functionality from a user’s point of view, expressed in a solid, compact way. They reflect what a particular class of user needs and the value to be gained. The format is straightforward to use:

“As a <particular class of user>, I want to <be able to perform/do something> so that <I get some form of value or benefit>.”

They are a way to communicate and manage user requirements.

User stories have these characteristics:

- Simplicity: User stories provide an excellent way to define your product with clarity.

- Sense of collaboration: Properly written user stories provide a solid basis for communication and collaboration — focusing on what matters most to the user.

- Sense of Priority: User stories can help define the entire product as a set of solid and well-prioritised stories. You can define a line in the sand where the scope for this current iteration, phase or release is.

- Independent: dependencies on other stories or other teams can cause delays and bottlenecks.

- Negotiable: a story is not a contract but rather an invitation for conversation.

- Valuable: If a story does not have any discernable value, it should not be done.

- Sense of estimation: A story has to be able to be estimated so it can be prioritised.

- Small size: the stories should be small enough to be estimable.

- Testable: Adding acceptance criteria.

- Vertical layered not horizontal layered: every story should have pieces of each layer like backend, front end, data storage, etc, as horizontal slices do not result in working, demonstrable software.

In conclusion, user stories are easy to work with and promote conversation and discussion between the interested parties, the AI Team and the steakholders, and make the life easier for the product owner when prioritising.

MLOps

MLOps stands for Machine Learning Operations. It is a core function of Machine Learning engineering, focused on streamlining the process of taking machine learning models to production, and then maintaining and monitoring them. MLOps is a collaborative function, often comprising data scientists, devops engineers, and IT. MLOps emphasizes on data management and model versioning, while DevOps, prioritizes overall application performance and reliability.

AI Models Development Cycle

At the end of the day, not many people keep in mind that Artificial Intelligence Models, mainly the ones that come from Machine Learning, are whether a probability to belong to one class (classification), a predicted value with certain accuracy (regression), a human-readable text (LLM) or a human-understandable image (image generator).

Compared to the traditional software development cycle, AI/ML development faces a unique and intrinsic set of requirements:

  • Account for training data.
  • Account for hardware, whether during the training or the prediction.
  • The limitations associated to the distribution of data, e.g., if training data is available on certain age range or for a specific sickness, it only allows to predict results in that range or sickness.
  • It might be a black box algorithm.
  • Risk assessment: statistical distribution shift, for instance different intensity sensed in MRI imagery of prostata obtained by varied equipment. Adversarial attacks can be performed by little changes in input data to be used during predicton, for instance a patient with an implant could result in a bad diagnosys. Cyber security is also an important concern.
  • Data quality control: it must be equal for training and for prediction.
  • User adoption: If not understood or strange results, AI adoption will fail.
  • Model Monitoring: No known answers, just evaluation of AI Model against expert performance.

Software design in AI should be separated into two different workflows:

Training the model: since training code and training data does not become part of the final software product, but it affects the performance of model it should be linked and managed appropriately. Reproducibility of training process should be done via documentation of random seeds and hyperparameter values, normalization or scaling data.

Using the model: response time and performance are key issues of quality control. Be aware that training data may not be representative of real world data and so, the test of the model should be done with real world data.

One way to think about AI/ML models is like ‘unsafe’ components, because they occasionally fail, for example a 99% accuracy means 1% failure. For certain industries like healthcare, performance metrics are more demanding. In such a cases, it is recomended to create an overall software system composed of (Traditional + ML) to be robust enough to such failures.

How to successfully make user stories in AI Models Development

AI-products development is inherently unpredictable mainly because some Product Backlog Items (PBI’s) will require more effort than expected while other items will require less, also because the obtained results might not be good enough. So, if the sprint contains only a few large items, the impact of incorrect estimation of the work, even on one item, will have a significant impact on whether you can achieve your commitment in sprint planning, and since larger items are harder to estimate, the probability of a failed sprint goal increases. To tackle this, Scrum teams know that the key to success lies in a increasingly refined breakdown of work on the Product Backlog. In AI-based product development it is even more needed to have Sprint Backlogs with many small (functional) items instead of just a few large ones. Having smaller items on the backlog improves the flow or development, reduces risks of failing the goal of the sprint and increases velocity. Refinement of the backlog should be carried out continuously to make sprints run smoother and not overload the sprint planning. This is possible because as the development is underway, new information surfaces regarding the remaining backlog items, and it is the job of the Scrum Master and Product Manager to keep the backlog up to date and organized and re-estimate time if necessary. The following are some strategies to break down an AI product backlog into bite-sized user stories.

  • Breaking into vertical slides: Horizontal slicing is when work is split on software layers: backend, UI, database, etc. This approach has many drawbacks like: Individual items do not result in working demonstrable software. For example, it is tough to add user value only from the dataset, as this work cannot be released until the model reveals a good performance on it. Only a combination of all the layers, even small, can be classed as demonstrable, testable, working and potentially releasable software if desired, this is called vertical slicing. Horizontal slicing may produce bottlenecks, for instance, if any stage of the dataset development is not working as required, the data is not enough or certain features are missing, the AI model development must stop. This can create a flow where the team is idle until a specific part has finished its horizontal slice, ex. collect more data or gain better insights. Last, but not least, the Product Owner cannot prioritise horizontal slices of work, as they do not deliver value or a working improvement on their own. Horizontal slices might be technical stories that distance the Product Owner and the team.

So if a PBI is:

‘Financial department wants to estimate the revenue of a product A’

Then the breakdown can be made as:

‘Which are the features that might impact the most on the revenue of product A’

A regression algorithm that more acuratelly predicts the revenue of product A

To avoid idle times and bottlenecks, do the data preparation in one spike during the idea incubation and prior to the project time estimation, and parallel to the project story creation.

  • Breaking using workflow steps: This is applicable if the user product backlog items involves a journey that can be detailed as a workflow. This allows the Product Owner to prioritise what work should be carried out first. Some steps might not be as necessary at the moment and can be moved to future sprints.
  • Breaking down by business rules: Backlog items often involve a number of implicit business rules.

Financial department wants to estimate the best value to be paid for an insurance claim.

We can identify a business rule:

An insurance claim made to an insurance company can not have a pulltrough above 100%

Business rules are usually implicit, and it requires business intelligence and analysis to express them in requirements. The Product Owner can then prioritise the business rules they think are most important.

  • Breaking down by platform: It can be beneficial to break down large PBI’s into platforms users might engage with. For instance, the AI Model results will be displayed in Tableau, another priotietary system, or the company’s web page.

Financial department wants the predicted value by the model to be visualized in a Tableau Report.

We can identify possible platforms:

Financial department wants the predicted value by the model to be visualized in a Tableau Report.

By breaking it down into platforms, the Product Owner can prioritise the platform that is deemed most important. This can be done using previous knowledge of the market and business intelligence.

  • Breaking down by operations: PBI’s often involve several default operations like Create, Read, Update and Delete (CRUD). CRUD operations are widespread when functionality involves the management of items, such as users, products, and orders. This strategy can be used in any case there is a management of entities to be done, such as users, blog posts, etc.

Financial department wants to predict future sales of the product A, which clients are likely to consume the product and the number of possible cancelled orders for the product.

By identifying the specific operations required to fulfil this product backlog item, we can breakdown into detailed operations:

Create a time series model to predict future sales

Make a clustering analysis of the customers

Create a regression model to predict the number of cancelled orders for this product based on the historic of cancellation

  • Breaking down by roles: PBI’s often involve a number of different roles that perform parts of the functionality.

Financial department wants to predict future sales of the product A

By breaking down into some of the roles that can use these features, we can have the following list:

The financial manager wants to be informed about the profit of the product A

A financial analyst wants to predict future sales and costs of product A

Breaking down functionality according to the role gives a more precise understanding of what functionality is needed and can more accurately be estimated. This can help the Product Owner prioritise as some roles will be more critical than others and can be implemented later on.

Other strategies to break down are:

Breaking according to test case, breaking down according to data type, and according to acceptance criteria.

it usually takes some experience and several sprints to find the most appropriate size to break PBIs. Even when delivering smaller pieces of functionality, you are still delivering value in small increments and with a nice flow and rhythm. In addition, the psychological effect of delivering a finished piece of functionality works as a boost to the team, helping them perform better.

Support me

Enjoying my work? Show your support with Buy me a coffee, a simple way for you to encourage me and others to write. If you feel like it, just click the next link and I will enjoy a cup of coffee!

--

--

John R. Ballesteros

Ph.D Informatics, Assoc. Professor of the UN, Med. Colombia. GenAi, Consultant & Researcher AI & GIS, Serial Interpreneur Navione Drone Services Co, Gisco Maps