The Key to Successful AI Solution Development

Claudia Schulz
Thomson Reuters Labs
5 min readAug 5, 2024

Bold, Realistic, and Collaborative. Read on to find out more about successful scientist-professional collaboration in AI solution development.

I recently had the privilege to be an invited panelist at the Oxford Future of Professionals (OxFOP) Roundtable to share my expertise in developing AI solutions and experience in working with professionals across industries, such as legal, medical, media, and education. Sharing recent examples from my work at Thomson Reuters, I advocated an AI solution development approach, where AI scientists and SMEs (subject matter experts from the industry in question) closely collaborate at every step of the development cycle.

1. Task Definition — The foundation of success

Any AI project starts with an idea on what task the AI is meant to perform.

The story: defining news summarization

Let me start with a recent use case from the Reuters Newsroom: automatic summarization of news articles. Initially, there was the idea to automatically create a bullet-point summary for any given Reuters article. We set out with this broad task definition and were able to create an AI model that wrote summaries. But when scientists and journalists looked at the results, we quickly realised that, of course, summaries needed to adhere to the Reuters Trust Principles and Reuters writing style. With this more refined task definition we were able to adjust the model to follow these requirements. Again scientists and journalists looked at the results and identified further areas for improvement, for example that the headline was often re-iterated in the summary. And so, this cycle of iterations continued.

The learning: early and precise task definition

Because of the initial broad task definition, multiple model development iterations were necessary to shape the task definition to include key criteria such as adherence to Reuters Trust Principles, matching Reuters writing style, no rephrasing of the headline, and information source attribution. This project highlighted the importance of defining the AI task as well as possible, as early as possible to reduce the number of development iterations required, and therefore saving time and resources. Furthermore, the task definition is best created in a collaboration between SMEs and scientists (plus other stakeholders of course), each bringing their expertise to the table.

2. Model Development — Empowering SMEs with GenAI

Traditionally, model development is primarily the domain of scientists. However, with the advent of Generative AI, SMEs can now actively develop AI models through prompt design. This shift is exciting because it puts the power of development in the hands of those with deep professional understanding and expertise.

The story: generating article headlines

As an example, some of our Reuters editors developed a prompt for generating multiple headlines in Reuters style that an editor can then choose from. Since headlines are designed to be easily understandable by anyone, you might wonder why an editor’s expertise was required during the model development. Let me tell you that it is nearly impossible for me as a scientist to distinguish a good from a slightly worse headline. Often it is a matter of changing a single word that makes all the difference. The editors’ specialist knowledge of what makes a great headline therefore allowed for rapid development iterations and improvement of the AI model, much faster than the traditional approach of a scientist developing the model and having to wait for SME feedback to improve the model.

The learning: the consulting role of scientists

Does this mean that AI has made AI scientists redundant? Luckily, not! On the one hand, not every AI problem is solvable through prompting. And for those that are, such as the headline generation, our scientists were there to support and guide the editors developing the prompt. We initiated pair-programming sessions and regular office hours to help editors quickly overcome challenges and improve their AI models. Both types of sessions proved highly beneficial for both SMEs and scientists, allowing to gain a deeper understanding of each other’s work and expertise.

3. Model Evaluation — Balancing expertise and subjectivity

Evaluation is perhaps the most obvious step in which the collaboration between scientists and SMEs is essential. Scientists have expertise on scientific model evaluation methods, while SMEs provide the knowledge to assess the quality of model outputs.

The story & the learning: evaluation subjectivity

An important consideration for model evaluation is human subjectivity. In our summary generation project, we gave the same article to three editors and asked them to write what they would consider an ideal summary. We got three different, but still subjectively correct summaries. If editors disagree on what an ideal summary should look like, it is not possible for an AI model to generate a summary that each editor is 100% happy with. This task subjectivity should be used to set realistic expectations for model accuracy: higher task subjectivity implies lower expected model accuracy.

The story & the learning: evaluation bias

We also need to be aware of potential biases in model evaluations. During our evaluations of generated summaries, we included some human-written summaries along with the set of machine-generate summaries, without informing the evaluating editors. The goal was to see if human and machine generated summaries were of different quality. Interestingly, there was no significant difference in evaluation scores between human and machine-generated summaries (so no qualitative difference). However, when we told the editors that some of the summaries had been human-written rather than AI-generated one of them admitted “Had I known that, I would have evaluated it differently”. This highlights the importance of designing evaluations in a way that minimizes biases in favour of or against AI.

4. Deployment — Assessing risk and building trust

When considering model deployment, we should not only ask “how good is the model?” but also “what is the associated risk of the deployment?”.

The story: editors-in-the-loop

For example, multiple AI-generated headlines from which human editors choose one, or none, are low-risk: even if the model gets it wrong sometimes, the associated risk is low since there is a human-in-the-loop, and the value of headline suggestions proved to be high for editors. In contrast, if generated headlines were to be published as-is without any human checks, the associated risk would be much higher, and with that the performance we expect from the model should be much higher.

The learning: advantages of humans-in-the-loop

Human-in-the-loop applications of GenAI offer several advantages: First, lower risk allows for earlier deployment. Second, it is an opportunity to collect valuable feedback for model improvement. And third, it helps build trust in AI systems and can therefore serve as an intermediate step towards any eventual automated solutions.

Take-Home Message

To summarize, successful AI solutions emerge when SMEs and scientists collaborate at every step of the development cycle. Therefore, my message to professionals and scientists in all industries is:

· Be Bold: Get involved in the development of AI solutions.

· Be Realistic: Understand the expected model performance based on task subjectivity and potential biases.

· Be Collaborative: Work together to create AI solutions that truly meet the needs of your profession.

--

--

Claudia Schulz
Thomson Reuters Labs

AI Scientist and Software Engineer | NLP, KR, ML, Data Science