How Ultimate evolved from hundreds of supervised models to UltimateGPT

Podcast: This week, Lutz and Jasper sit down with a special guest — Reetu Kainulainen, cofounder and CEO of Ultimate.

Published in

Cherry Ventures

38 min readMay 12, 2023

This week we are in for a special treat and we are joined by Reetu Kainulainen of Ultimate, which is helping businesses scale and augment their customer service workflows with, as you can guess, artificial intelligence.

But they have built their own model and just released UltimateGPT on their own data, bolstered by very successful customers using the product for years. So let’s jump in.

With us from Finland, Reetu from Ultimate.AI. Good afternoon.

Reetu: Good to be here. You flew me in five and a half years ago and I’m still here.

Jasper: Long time no see.

Reetu: Yeah.

Jasper: All right, the title of today’s podcast is Ultimate GPT of Ultimate.AI. So we have a lot of ultimates in here. And since everybody seems to be hopping on this marketing train and putting GPT next to their names, the name of your company is Ultimate.AI, we want to talk about why you have a GPT there. But maybe we start with a little bit of history, Reetu, how you came up with a company and the name and then we move into that.

Reetu: Well, let’s start with the name, I think that’s the easy part. So obviously the name came from my brilliant co-founder, Jaakko, who’s the Chief Science Officer, who’s the smart one. And obviously he was really good with names.

Yeah, I remember back in the day we were starting the AI company., We were looking at .AI I think it was just available. It was very new. So there was a lot of domains available. And I think Jaakko just mentioned Ultimate. He said there would be jokes. As Finns, we’re a bit, let’s say self-deprecating and modest, so why not compensate that with the name. And ultimately that was available and that’s it, it was like a five-minute conversation. So that’s a depth that goes into company names, but I think we got lucky on that sense.

Jasper: What’s the plan at the end of the day that you want to build the ultimate artificial intelligence? Or did you have a more modest plan at that time?

Reetu:

No, I think we’ve always been very pragmatic, very modest. So even when we started to build the company, usually people say you, they have this big vision, right? Since birth, you start solving customer service as an industry. But I don’t think that’s reality. I think you find a cool little problem, you find a cool little solution for it, and you go from there and then over time you start finding, well, okay, this is where we are going. And then you start learning more about the world and then you can have the vision. So let’s tell us what is it, what’s Ultimate ultimately doing? It’s a brilliant name because I think also when people hear the name, they subconsciously start using the word more. And I can see that everywhere, Ultimate. There’s Ultimate that, or maybe it’s my bias, but I think the thesis is we are support automation platform.

So we work in the industry of customer service. So what we are doing is making customer service more accessible for mostly consumers, which means that it saves a lot of their time and there’s a lot of those people out there like me, which means that those people can use their time, hopefully something more productive and also enable businesses to do it easier. But I think the thesis behind everything is, especially now, I think there’s a lot of proof points that using AI, conversational AI especially, you can build a solution that is very human-like I don’t think the point is to be human, but the point is to be natural towards the user who is a human. And by doing that, you can start automating conversations, emails, tickets, et cetera, and customer service at scale. And we always said that if you reach this point where you can do, let’s say 51% majority of level one customer service, you can resolve those cases automatically. You can do them instantly with the same or better customer experience as the busy human agent would do. Of course, if there’s a one-to-one mapping between users and customer service agents, I think all AI tools will lose. But I don’t think that’s the reality. Everybody’s very busy. That’s why support is usually a bit slow and bad. But if you can do that better or at the same level as human, but the 10th of a cost, that will shift the whole industry. So everybody will obviously use the solutions, and that’s kind of where we see the world is going. And then lasting the humans who work in support, who are busy, they’re like the industrial supply chain workers are putting pieces together as fast as possible right now. So that’s heading, it’s still a long journey, but it’s been accelerated for months.

Lutz: In our last call, we actually compared the interface to an API. Would you say here, just to make the loop to the last chat that Jasper and I had, that customer service is actually the human API for companies to engage with?

Reetu: Yeah, that’s the very cold way of putting it, I don’t want to take the humanity out of support. I used to work in customer service and IT help desk, so I can talk about this, but I think in support, there are two kinds of cases. You have the transactional cases, “Hey, where’s my order? I just want to know where’s my order. I don’t technically want to connect with a human being.” And then there are the cases where, “I don’t even know what my problem is. I have issues. There’s something wrong with the service, da da, can you help me? Or I was treated wrongly, et cetera.” That’s human, right?

But those transactional cases, in the end, what the agent is doing, what I was doing is I read the question, I understand the question, and I know which system usually this answer comes from because I can’t hold everything in my head. I curious the system. The system returns me the information, and then I write the information back to the user. And I do that very slowly. So that in a way that’s the connection between the consumer and the back office.

Jasper: And Lutz. I would like to stick a little bit with the product here until the point where we deep dive into, let’s say more the technology. Because you mentioned later you moved to Berlin five and a half years ago, something like that. And before that you were in Finland and actually you were part of Techstars. And I remember we had this coaching conversation and I had this guy sitting in front of me telling me, I’m going to kill chatbots. And we had a very nice conversation there because I didn’t like chatbots at the time because they were not really smart. And then you started actually building something that was a bit different than a chatbot. And maybe you can tell the listeners just briefly why you did that, why you did it differently. And then I think we should come more to the T and GPT.

Reetu: The reality is the chatbots have been, and still is mostly, at least the ones who haven’t adopted this new wave of technology, they’re quite bad. So let’s say they’re great experiences, maybe you can automate 30% of the cases, which means 30% of the cases they work well or well enough to resolve your case. But are they great? Not really

Jasper: Because they’re rules, right? I mean, they can only do what you tell them to do. They’re not creative

Reetu: And in the back of office who’s building them, they manage this complex house of cards and rules, and then you change something breaks and yeah, it’s kind of like, I think a lot of bots, even nowadays, especially, I don’t know what is even a bot, people call bots when there’s just buttons. You click one button to another. For me, that’s a menu, but an IVR is a bot. So I think it’s also really, let’s say there’s more nuance into what a bot is. You have a Chachi pt and then you have three buttons in a row. I don’t think you can say that these two are bots. But in general, I think at least back in the day, there was a huge hype and one reason for the hype bot, because it was very easy to build demos because you were one, having the conversation and showing, look, all the things I say, and it works perfect, but if the customer ask, can I try it? It’s like, no, no, no, no, no, no, no.

Jasper: Because we were essentially copy pasting the FAQs into a chat interface. That was [inaudible].

Reetu: But of course the underlying premise is right. So if we can have a human-like conversation, it’s very easy for the end user to adapt it, to use it. And it’s very transactional. It’s very easy to map out the process. Just the language part is hard.

Jasper: But if it was so great, why did you guys decide to make it more complex and not just do the same copy paste?

Reetu: We’re always, especially my company, Marcus is very, I love how brutal he’s about value. He hates hype, he hates, he just wants to really focus. But one thing we saw was, hey, it’s very complicated and difficult to create the automated experience where the consumer talks to the bot, so it’s good. So we turned it around and we said, Hey, let’s give this to the agents. So maybe we can recommend agents, some responses and answers. They have more tolerance for failures because they can ignore them. They still can do their job, but if it’s correct, they get value out of it and maybe they can train the system the same time. And then maybe gradually we can move to the phase where, okay, now the technology of the models are good enough that we can now turn it back to the users.

…We use technology to supercharge humans, and we actually help the agents to become faster and better by giving them the tools…

Lutz: What I actually collect about this approach as I first met you, and the whole discussion of everybody’s doing it manually, copy-paste, right? But they assume that no matter how fancy or non-fancy that technology was, is that actually the chatbot can take over. You guys had a way more, a nimble approach there in terms of saying, well, we use technology to supercharge humans, and we actually help the agents to become faster and better by giving them the tools. It’s essentially a fully automated search on their side running and telling them what to say. Which if you look at how OV talks about chess computers, same thing. The computer became very, very strong and nobody can be the chess computer anymore. However, the strongest players are still the chess player with the computer and you,

Reetu: Yeah, exactly. There is difficulties in the approach too, because first of all, when you have a bot versus a human, it dumbs down the conversation usually. And that’s good because it’s easier than to map that out. When you have human-to-human conversation and then you’re the sideline trying to predict what to say, it can be challenging. And especially in the beginning, we realized that it was easy to get there. Let’s say the top half of the conversation, the greetings, the first questions, the first context and the end of the conversation, the resolution and then the goodbyes, et cetera. But in the middle, people kind of like they start going the deep in the nuance in the chit-chat. The difficult part actually contain the whole context. So it’s also very challenging, but it’s just less stressful space to operate. When you’re building products, you can iterate, you can learn, you can fail a lot without completely ruining the experience.

Jasper: I mean, you really build nice models there. You had a lot of very, very strong logos, enterprise customers, and were growing, you were raising a lot of money. But I remember we had this conversation of how can this be more efficient? How can this better be better? And then you had a longer discussion, I think also with Lutz, Jaakko, what happened there? Why did you come up with something new?

Reetu: Well, I think a few things. Back in the day, the architecture we had helping the agents, I feel like now Large, Language, Models are excellent for that approach because they don’t need lot of just, they can truly be trained on just the customer conversations. So technically it was a difficult challenge back then, but I think the biggest problem was at that stage, even if it worked, it was, especially when it comes to emails, I think with chat you could cut down maybe 20%, 30% of the conversation that was meaningful. But email was one of the biggest challenge.

And then if you get, let’s say you cut down five seconds for the agent not to have the search for the macro that they already might have, and their average response time is four to eight hours. That doesn’t really move the needle. So again, agents love it, this is great. But then every customer was saying like, Hey, but now can we start automating? Can we start automated? How do we automate? Yes. So then we say, Hey, we have to go to automation. We have to solve that because if you map out the future, it will get better in terms of actually automated conversations. So the sooner we start that journey, the student were there.

Jasper: And I think that the really interesting part here is you could just have sticked with the bots, all the other companies out there that raised funding, but you really wanted to build something larger because they only did chat and you tried emails.

Reetu: Yeah, again, it comes down to working with the customers. So anyway, we’re not in love with the technology. We love the technology. It changes constantly. Even I ask tech, it’s very modular. We keep changing the whatever classifier or whatever piece of embedding piece there is as new things come out. So we’re not never been too in love with the technology. I’m a product person. I love elegant products. But in the end, when customers saying, Hey, 60% of our volume is email, and technically you could even say it’s an easier channel because not really a conversation, it’s just an email. You read it and you give one response. Maybe there’s two, one or two times back and forth, and then there’s a solve versus a chat. It might be 20 messages. Very. So it was just obvious for us. Okay, let’s give it a try.

Lutz: I actually remember quite well, we were sitting, us three actually were sitting in this one [inaudible] in Berlin and chatted about the system. The way you guys had set up at that time is you went to customer, you analyzed what the problem is, what the intent are. Somebody is calling, 30% call because of X or because they want a flight number, 20% call because they want a cancellation. So you build a model for each of those intents, right? And the-

Lutz: … Model for each of those intents. And the strengths was that you had the automatization, that you had the ability to forecast what is the best model to use, and that you actually pinpointed them. But it became pretty annoying because for every customer you had to run many new models. And the reason why I love this story of your development so much is the whole world talks about narrow AI and the way how we go from narrow AI now to big models, which have become more and more generalized. And you lived this through, your first models were very narrow.

Reetu: Yeah, very narrow. And one thing we had and still have, which is key value driver still is the ability to analyze and cluster the historical data, right? Because we did see that we’re always… Although Jasper, thank you for giving us money back then. I think it was like 500K, that was time for seed rounds. So we didn’t have much to play around with. But what it meant is that we saw a lot of our competitors, for example, back then saying, “Hey, we’d do the onboarding, we do the building for you because first off, you have to create it. It is classifier in the end, it’s back in the den. You have to create what are the intents, we call them, and do you want to classify? What are the categories of questions? You need to create that taxonomy and then you need to create the training data for it.

And usually what happens is that, oh, let’s take your FAQs from your website and then somebody, they call them AI trainers, they manually create, where’s my order? Hey, can you give me information about an order? You create 50 to 100 different ways of asking the question, which is we tried in the beginning we had a Slackbot. We had this whole team just riding, smashing the bot to give you the training data. But it’s very difficult. Once you try to say 50 different times, “Where’s my order?” I would say after 20, you start really using your imagination. And then what happens in the end, you’re just training for yourself, not for the world.

So we couldn’t afford having AI trainers or outsourcing the people. So of course, again, referring the brains of this company, which is Jaco, we looked into it like, “Hey, well aren’t those conversations already in the CRM in Salesforce or whatever between the agents and the humans? Can we use some other technologies to source them theirs?” So then we build this clustering pipeline. So you find, hey, here’s the 100, 200 most common questions you get. Here’s all the ways those questions been actually asked by a user. So that’s the 30,000 different ways of asking, “Where’s my order?” I can guarantee some of them are really difficult, even as a human to understand. And then we create this pipeline where you can first of all, even show the customer. I think that’s a lot of value that they already get. Hey, here is your actual top hundred frequently asked questions because I can guarantee that they’re different than from the FAQ page because usually the FAQ page solves all easy cases and then the rest goes to support. So we still have that, even now with customers. We say, “Hey, even before you talk to us, just do it even yourself because we don’t want to tell you that we can automate. We don’t want to tell you that we can do everything for your support. It really depends on what kind of questions, what is the process behind those questions, et cetera.” So that was also a key driver of faster setup with a small team.

Jasper: It was time to value. I remember that was one of the KPIs you followed, time to value for the customer because other chatbots you would have to create the rules. It was pretty tedious. It wouldn’t work. So that sounds already pretty cool, but why didn’t you stop there? Why did Jaco still had to do some research and test things?

Reetu: Well, it always can be better. It always should be better. If you want to get to this most level one support cases handled across channels, across languages, across talking to different systems, it takes a very smart and elegant solution to just cover all of those cases. So I think for example, we have different models per different languages, although it was language agnostic, the model itself. But we still, if you want to support a hundred languages, you have a hundred different models you need to run per customer.

Lutz: Hundred different models for customer. I mean somebody who is doing model ops, that’s like that already sounds super painful.

Jasper: The American listeners, you would probably wonder why are there 100 languages? But those guys speak Finn and…

Reetu: They don’t care. Now, but luckily their product philosophy is more of a business in the front part in the back, I think. So you can make it clean in the product side, but then in the back office, yeah, there’s a lot of things happening. So yeah, you constantly, you need to make it more simpler. So if customers have only, let’s say they have chat on email. If you can only do chat, even if you automate a lot, you limit yourselves. So okay, we want to increase their channels or depth meaning I want to automate more, I want to make the conversation better. I want to do instead of 30%, I want to do 40, 50 or 60. And there’s different challenges you face.

Lutz: To put a bow on that state. You were at that time one of the top-notch companies because you did AI models. However, the problem of the customer interaction was different for every customer, slightly different. You were missing a general model for customer interactions, therefore you had to, for every use case, you built a model. For every customer, you build for every customer. For every use case, for every language, which is a model explosion, which is nothing else than you actually manually breaking down the problem because you were missing inability to have it in a generalizable way.

When we talk about a GI and the G is for general then you had a very narrow AI and you have many from it and you worked very hard in order to get more and more on it. But in your time to value, that screwed you because meaning you get a new customer and you have now to run more and more and more in volume of models, which makes you an extremely powerful company. But you knew at some point in time that is a little bit the conclusion we had over the beer at that time. It will run into an upper ceiling.

Reetu: Yeah. And in the end, but the point was that we didn’t optimize for scale. I guess this is two things that don’t scale and the AI side. But we’re optimizing on the value. We really had to do whatever it takes for the end experience to be good. So if we can have a simpler solution in the backend, but it means that we cut corners in the experience not going to get us anywhere. So we kind of have to take the hit ourselves.

Jasper: How did we come then with Bart and the transformers? What is that maybe, what happened there? How long did it take and what was the change for the customer?

Lutz: And maybe to add, when did you do it, because you were so early this was impressive?

Reetu: Yeah, actually Jaco might have the actual timeline, but it was years ago.

Jasper: At least three and a half years ago because-

Reetu: Yeah. I think was three, four years ago when we switched and started to work. So again, simplifying. So now we were looking into models that can do let’s say all those 109 et cetera languages in one model. So you go into this, I heard from the Brits, the polyglot version of the model. So it’s a one to many, which was great. Again, simplifies things. Then another thing with transformers is just, well it became kind of the studies core really quickly. But if you look at the models, in the beginning when we started the first year’s models, I don’t think we should have started maybe at deep learning. Or maybe we did because we always took the painful road and we learned really quickly. But yes, we have this nice deep learning pipeline. I think Jaco’s working on their hyperparameters for six months because it’s sometimes you change something breaks. It’s very frustrating.

Jasper: And how was the experience when you really saw it working? Was it kind of the glue that would tie together all these hundreds of models? Or was it rather, let’s say a more slow progress?

Lutz: Hold on just to make sure because this is actually the model Jaco introduced. It was it was not the glue, it actually took out the existing models, right? Because it actually started to use over the database and just [inaudible] part was initially in introduced 2018.

Jasper: Pretty sure it was three and a half years ago because I was in Greece by the time you discussed it. So that was three and a half years ago.

Reetu: Yeah, it was a long time ago. But yes, it’s the glue, it’s kind of the clear next stage. It did simplify basically not everything but a lot. But then in the end, because we always kept the user interface, trying to get it as simple as possible for the user. They didn’t see a lot of differences except we were able to support more languages, easier for the end user, that’s for sure. We were able to say that. I think that one cool thing there was because you could kind of do a little bit zero shot passion of you train the bot in English. And then you ask questions in Finnish. It was like, okay, quite goodish to people to actually detect those. So then you have this kind of paradigm where you can just run it on one language and then train it on one language and run it all.

And then of course that’s never good enough. I’d say Finnish still, if you trained in English and you asked questions and Finnish, you would benefit giving some Finnish training data, but not as much. So I think that was huge. And also we used that of course in the clustering. So we’re able to know, hey, here’s your order question? And you can see all the different languages in the set of example questions. And you could filter search, you didn’t have to worry about the language. You could see all, here’s some Finnish, here’s some Spanish, here’s some German, here’s some English. But they all semantically meant the same thing. That’s very cool.

Lutz: Actually it’s funny, right? Because just to put this in the context of what a transformer is. And let’s geek up for a second, right? Because a transformer is you have a encode and a decoder. You encode the text, you have into a latent image of whatever this is, and then you decode it into whatever language you want to do. Or if you do the original transformer, it’s encoder/decoder pair. Now before you got this technology, you actually tried to create human-like, what are my most important use cases? So you created this map of problem statement manually. Now by using it a encoder/decoder pair, you actually automated half the transformer create this map because the transformer would know map out. Okay, these are the typical questions I’m hearing. I have a vectorial space where they’re very close to each other. So I create for me a latent space. The weights in a model essentially, which describes that. So the complexity you had by mapping out the complexity of every customer’s problems, you transferred that into the model, which is pretty brilliant.

Reetu: Yeah. But also one funny thing, because in the past we had modeled language, we immediately knew what language we’re using. But when we switch models, the model doesn’t really know what actual language it is, right? It doesn’t know. So we have to build then an extra layer of language detection because when we sent the question, the model, whether it’s in Finnish and English, it gives us the right category. Oh, this is order studies. But it doesn’t really know what the language was. So then we had to build another layer to know, okay, but now we should speak Finnish back instead of English for example. So we did have to add something. And now of course, LLMs came in and we kind of throwing that away again. So it’s a changing process, but that’s the exciting part.

Then you did the transformers, others did transformers, and now everybody can access those APIs. And they can build their chatbots exactly like Ultimate AI. Now, Lutz and I were debating the mode of proprietary data, or actually data that you can train your models on to ensure output and quality for the customer.

Jasper: And Lutz and I, in the last podcast, we had a discussion around what is actually defensible. And I don’t want to jump too much here, but maybe along the path and maybe a little bit teasing Jaco, it feels a little bit like people started with chatbots, you did it better. Then you did the transformers, others did transformers, and now everybody can access those APIs. And they can build their chatbots exactly like Ultimate AI. Now, Lutz and I were debating the mode of proprietary data, or actually data that you can train your models on to ensure output and quality for the customer. So would you say, “Oh no, everybody can now build an Ultimate AI product?” Or is there more defensibility what you have built and what you can do?

Lutz: The very nice description they do on the language needs. However, also that is the next level, which you probably very soon will integrate as much as I know you guys into it. Because if we have this latent image, we essentially have a world language from a computer in between. So we saw this Google, the new Google model was never trained in Bengali. But you can actually ask it a question in Bengali because it just translated it into the latent image and then answers back in Bengali. And up till now, this has not been seen. So for you guys to, you need it initially, this language transfer and you use the transformer for simplify the customer problems and still have the language level over it. But very soon you probably will actually have a second. You can use the same model to actually superimpose different languages, including it. You go from narrower to less narrower to very general approaches.

Reetu: And actually now if you use LLMs, basically you can just say, ask politely, “Can you answer in the language of the user?” It’s not always great. The language education of the model itself is not always perfect. So if you say, “Chow.” Or something, “Hi.” It’s very difficult sometimes. But yeah, technically you can just reduce that stage completely.

Lutz: Very cool. And back to Jasper now.

Jasper: Yeah, question. Question for Lutz and I is always, where’s the defensibility? If you have an open model, if you have APIs that you can use very easily and integrate. Everybody could build maybe Ultimate AI nowadays. But we were also discussing the.

Jasper: … with AI nowadays. But we were also discussing the data mode. So having proprietary data, labeled data, trained models that maybe even understand how the interaction with the customers are as an interface. Where do you see the defensibility of your business model right now? And maybe also, where do you actually see weaknesses in that defensibility with the new trends?

Reetu: Okay, I have two answers. One is I think what people always want to hear, that there is some silver bullet, there is some secret that, if you hold this, no matter what the rest of the world do, it’s almost like holding an IP, you always win. And I think when it comes to that, let’s say large language models, et cetera, everything will converge. Because you have these jokes of, OpenAI goes down and you look at all these $1 billion companies that are actually [inaudible] working because everybody’s leaning on that one endpoint. So then the question-

Lutz: Are you guys using OpenAI? Is OpenAI empowering Ultimate?

Reetu: With the GPT product, there is OpenAI. We also have access to the Google Vertex platform. And then in the works there’s our version of the LLM. But again, modular. We’ll see what works the best. We keep changing stuff. Even on the OpenAI side, there’s different versions of the model itself. GPT-4 is very expensive, for example. So yeah, we’ll see. I think for us right now it’s time to market. And time to market, I mean time to learn how our user and then the end user gets the most value of the product. What are the use cases, et cetera? And then the back office we can switch around later.

Jasper: What are your two answers? Reetu, let’s get back to them.

Reetu: Okay. My two answers is, one is on the data side. I think yes, there is some, let’s say, weakish some modes there in terms of having proprietary data. So for example, we stream the live conversations of a lot of big companies, between their customers. It’s all a very secured way, but we still have that data and we have a lot of people looking into that data and saying, “Hey, how can we provide more value for them?” And no one else has access to that data. So if we build our own models, I think we have something that other people might not have.

So if you start today, you don’t have that. You don’t have those hundreds of millions of conversations, and that’s good. But what can you accomplish with that? So even if in the beginning you say, “Well, now I can do 10% more automation or 5% this point better experience, et cetera.” That’s cool. That’s a little data point you can sell to the end user, the customer. But if you can choose one thing you’re good at, you’re going to win at, will it be some data modes, some technical modes in the products? I think a lot comes down to the user experience.

The distribution of the product has a big impact. So is it easy to see the value? Is it fast to see the value? What is the user journey? I for example love… these products can be complicated, but I love… I think one of an investor or someone had a Shopify memo, investment memo. And one of their customers-

Jasper: Yeah.

Reetu: Loved Shopify because it takes 10 minutes to learn but years to master. And I think someone said Excel is very similar. It’s easy to get started but there’s so much depth to it so you can keep learning for years. So if you build a product like that versus you build something that is, for example, really complicated to use but the models are better, they have some data, it’s more accurate, the model itself. So I think my philosophy is always that the end user experience, the distribution, the brand you build, you make it solid. That matters much more than some little syllable in the back office.

Lutz: This is awesome. Let’s double down on this because this has to be underlying. What matters more than the actual model is the user experience. And that is so important because we see so many companies actually running around and saying, “I can do AI here, I can do LLM here.” But it is a UX challenge overall because the LLM is an interface. And because it’s an interface, it doesn’t necessarily mean this itch which is better in accuracy will help you. It is, how can the user or the person interacting better use it?

Tech is the thing that enables things. But where the competitive edge is, I think people like to feel that they have some magic secret sauce…

Reetu: Yeah, we even dropped that. We used to be Ultimate.AI as a company name. We dropped the .AI because we’re like, “Let’s stop talking about the AI. Let’s talk about the product, the value, and the company that we are, the people.” I think that in the end… I don’t want to say… And this is not to say AI doesn’t matter. Tech is the thing that enables things. But where the competitive edge is, I think people like to feel that they have some magic secret sauce, the McDonald’s, Coca-Cola secret sauce. That if they just hold onto that, no matter what they do, they will win.

Jasper: Good. But now we started interview, Reetu, with Ultimate GPT. And I saw that Zalando, I think you’re very close with them, also has GPT. So what is about Ultimate GPT? And why did you start it? And what does it actually do?

Reetu: I’m going to talk about the generative nature of these models because of course there’s a lot of use case with these models. And what is a large language model in the end? But what is large? I think that’s the question. But the generative nature I think became really interesting towards the end of last year. I think it was… what was it? Like 3.5 or something that came out from OpenAI. It became quite robust. Actually, before I go there, story. The funny thing is when we started the company, back then we built this generative bot. It was like 2016, summer. Just as a cool thing. We just played around. When we started the company, we won a hackathon with a little bit similar idea. We got 3,000 euros from the hackathon. The first thing we bought was NVIDIA’s Titan X Pascal GPU, because you didn’t have really ready, available GPUs in the cloud.

So we bought a real GPU and just shoved it in my own computer, and I was hosting that GPU to be able to use with Jaco and others. But with that we trained this almost generative model that read the Finnish epic called Kalevala. It’s almost like the Greek. They have the epics, and we also have our epic where I think a lot of the Lord of the Rings got inspiration, of that language too. So it was this collection of poems. And it had a really specific structure of the poems. I think it was like eight syllables, et cetera. And it was structured in different specific way. And it learned that really well. And then we create this Twitter bot that started to Tweet. We called it the Kalevala 2.0. And it Tweeted these poems, and they were correctly structured. So actually it looked exactly like the poems in the past, in the original version.

But back then it didn’t really make any sense. But it was old. God, I can’t remember the year, but old school poems. So even the original ones didn’t make a lot of sense. And that got a lot of hype in the Finish media. So we got a lot of press. We got this linguistics and university people commenting on the poems. One professor says… he Tweeted really nicely. He was like, “The style and the grammar is correct, but it’s lacking soul. It’s lacking its soul,” in Finnish. So yeah, that actually was a funny generative, let’s say, start of the whole company.

And that’s how we got a lot of the big enterprise from Finland coming like, “Hey, I saw this hold. You guys are doing Finnish AI, but [inaudible] that. Can you help us automate our support? So that was kind of the origin story of the company. But now, six years later, we jumped back on the train because I think the soul, maybe it’s still not there, but it starts to make sense. You can produce stuff that actually makes sense. So we started to work on those and, yeah, we mapped out all the use cases. If you could generate text in this matter, what could you do? And there’s a lot. You can replace a lot of the stack we have right now.

But with these things, it’s hard. The issue is, can we make them reliable? Can we make them actually do what we need? Because we’re putting this in front of a consumer on behalf of some of the leading enterprises of the world. So they’re going to make sure that we can prove that this thing works. But it’s a Finnish way. We started from the deep end. It was like, “Hey, instead of a lot of people are using this in terms of creating synthetic data for their previous models, they’re maybe also doing some recommendation for their agents.” But we said, “Hey, let’s start with the automation. Let’s start with the thing that actually talks to the user, because that’s the hardest thing to do but we’ll learn really quickly. And let’s just build.”

And that’s how it started. And we got it to a really good point reasonably quickly. But I think with these models, still the issue is that the last 10%, it’s a painful mile. You get to that 90%, but tweaking it towards that 100% takes a while.

Lutz: But let’s actually talk about this quality, because what we saw is… so we totally understand now, specific case generalization is amazing. That drives your time to value. Completely get it, and it makes sense to use the technology here. However, what we see is that those models tend to hallucinate. And how do you avoid that the model doesn’t make up things but they can’t find it?

Reetu: Yeah, and I’ll respond to your question of, why wouldn’t somebody just use Chat GPT instead of comparing with Ultimate this? There is a lot to that. But the hallucination part, well, that could be one reason for example. Because if you take your GPT and you put it to your… I don’t think you should just put it in your support, talking to your consumers. But if you do that, it will hallucinate. It will say God knows what. It will answer questions that you don’t want it to answer. So what do you have to do with these models? You have to build a lot of guardrails.

I think people are even complaining now that Chat GPT has too many guard rails. But in our world there’s not enough. So you could say if you ask a recipe for pancakes, I don’t think you should answer. Although we still have some customers saying, “Well, actually we are a kind of brand that we would like it to answer. But if you ask it about our competitor, then don’t answer.” So now it gets a bit too difficult. So better to say, “Hey, let’s ground the model to something.” So what we did, for example, for the first version product, we grounded the model into the knowledge base of the customers.

It was a collection of articles, or the FAQs, et cetera. And we used it as a QA system against that. So again, it’s not replacing the Ultimate product. It replaces the simple cases, the FAQ cases. And you can ask, where’s my order? But you get a textual answer. You just get an answer, “Okay, you can check your order from this tracking link.” Versus the actual way of solving this is to talk to the system. So yeah, we grounded it to the knowledge base. And that helps to restrict it, so it should not say anything that is outside of that. So if you ask, “Who’s Elon Musk?” It should say, “Hey, my job is not to talk about Elam Musk, but how can I help you about this company’s support cases?”

Lutz: I actually have a question there because words matter. So you are using OpenAI. You’re using many different language models and OpenAI is one language model. Now, Chat GPT is a very defined with guardrails, defined chat interface. Are you using OpenAI with your own guardrails or are you using Chat GPT as chat interface, plus guardrails, plus data?

Reetu: Yeah, I think the latter in terms of, we can use different models. So whether it’s OpenAI, whether it’s 3.5, whether it’s 4, 3.5 Turbo, which are the Chat GPT models. And they have their guardrails. But their guardrails are designed for different things. So we need more strict guidelines.

Lutz: The important part here is you have the model. You say, actually, “Okay, I want the generalizability in terms of a human interaction,” which is a chat prompting, Chat GPT prompting. And then on top of it you still need to have guardrails so that people do not talk about Elon Musk or whatsoever so that you actually guide the conversation. You want an open conversation. A general conversation, that’s the Chat GPT prompting off the model. And then on top of it you do your own guardrail prompting. In order to avoid that this model goes off, which is pretty amazing stacking.

Reetu: Yeah. And I think the key thing here also is that you have to know which case you can actually extract the answer from the model itself and which case you use the model to, let’s say, analyze a knowledge base article and then synthesize the answer. So basically the model is just the talking head of, read an article and then summarize the exact answer for the customer. But nothing comes from the model itself.

Lutz: I have one other question, actually, when you know the use of models because, side note, I actually wanted to become your customer and you didn’t want me. We can’t say that. No, but more of, hey, we wanted actually to use some interface to work more with our members. And we couldn’t because obviously clinical and healthcare data needs to be HIPAA-compliant and so on and so forth. If you talk to, you have your retail customer, and you work with OpenAI, the data you are using goes back, currently, into the cloud. Is that a problem? Do people react to this? And are you waiting for OpenAI to change that? How do you see this?

Reetu: It’s okay because it’s a problem for everybody. So first of all, when we talk about the future, I’m going to talk about… I don’t think people rely on these OpenAI models, et cetera in the future, at least completely. But first of all, we use the models on serve by Microsoft through Azure. They serve us on Europe. You can actually opt out from the training, so they don’t use that. And Microsoft’s. They serve enterprise customers, so that you can trust. So that.

Reetu: … and it’s Microsoft’s. They serve enterprise customers so that, you can trust. So that actually helps a lot. But it’s more about just all this CISOs and privacy people. Their heads must been spinning. I actually used to study information security because they see this chaos that is ChatGPT and different countries banning it. But then for example, Italy banning ChatGPT doesn’t mean that the API… The API is completely different than ChatGPT. That’s the consumer product. So there’s a lot of this confusion of like, “Oof. ChatGPT is dangerous.”

Lutz: Totally. Even the confusion of the whole European community is at the moment, hunting where does the data reside, but that’s not the topic. As soon as you train the model and move the model weights, you actually have used that data for model training. And people don’t get this yet at the moment.

Reetu: Yeah, exactly. So it’s the problem in terms you have to go through a couple of hoops and have those conversations and give that information. The good side of it is there’s a lot of hype, so then on the business side people are like, “But we need this.” It’s amazing how when something is so exciting, how sometimes even the logic goes off the… Just whatever, we just want this when something is very shiny. But it’s also great because I think maybe last year was a very difficult for tech. By the end of the year, suddenly there’s a spark of excitement. And again, innovation is here, startups innovating. So it’s also amazing.

But I think on the long term, I just don’t see at the world where there’s these big companies providing their models and that’s it. So I think even at Ultimate, we’re saying that the future is maybe small and more narrow. We don’t need a model that knows how to write code. We need models that are very good at having conversations and maybe interacting with systems, not even that, very good at having conversations. So I think in the future, instead of taking this giant piece of model and try to restrict it in this tiny use case, we’re going to have our own models that are straight for this one use case.

Lutz: May I chance you attend this one? Because the discussion which we had after now is that we said your models were too narrow and that actually created problems for you. So one of the big things you guys did as Ultimate is that you went to more general, general models. Maybe it’s a more fair way to say you want specific use cases and in that specific use cases, you want the general approach of a human language interface that you can say it anyway, but it should be very general. So they can use many different ways of phrasing my question. But the use case should be very narrow because your activity following this use case is a linear activity. I look up your cancellation number, I look up your flight or whatever. It’s a very linear use case and therefore you want the model to hone in on something linear. But the entry point, the human, should be as broad as possible.

Reetu: Yeah, that’s actually a good point. Because for example, for you to ask questions of can you give me a recipe for pancakes for example? If you do have that, a little bit of that information in the model itself, you can maybe even handle that case in a way that, “Hey, I’m not here to give you recipes,” for example, versus just saying, “I just have no idea what you said.” But then you hone into the actual use case.

But also I think there’s a big impact on the concept of control. So then when you rely on these third party models, they come from an API, you don’t have a lot of control, you don’t really… It’s hard to make them really, really good for your specific use case because you just use whatever is available and that’s not much. And it will improve. And I think there is an argument for those big models, say, “Hey.” But those companies will integrate the models faster.

But I think if you give it enough time, we will start seeing everybody working on their own models. And then maybe there’s a module architectures where you can switch to if you want to use GPT6 or 7 or whatever, that’s fine too. But if you want to use a model that is built for supports, run here in Europe or Germany even… That’s a good thing actually to be in Germany. When you say we are based in Germany, people start trusting you in terms of security because you have to be. But yeah. I think that’s the direction we’re going to go, but we’ll see where we end up.

Jasper: Do you see any other things that will happen in the next two years? I know you think very strategically about this, but I mean, we already touched upon maybe there’s a bit too much hype, maybe the quality of the output is not good enough for all the applications that people are currently imagining. It’s not just the fine-tuning, but the prompting of the model is pretty tedious. I’m still trying to tame mid-journey, which is probably not possible, at least not right now. But anything else where you would say, “Dear listeners, be aware this is happening in a positive or negative way”?

But in the end, these models, this technology, it is a new paradigm. It changes the game a lot.

Reetu: Well, I think there’s a lot of hype. Hype as a concept means that there’s maybe expectations or something that is not real. So once you get into the actual weeds, you start realizing, “Oof, that last 10% actually prevents me maybe getting what I originally thought, so I have to compromise.” And then the hype starts to die down a bit. But that’s great.

But in the end, these models, this technology, it is a new paradigm. It changes the game a lot. For example, internally, when we think about we want to build more capabilities… Let’s say we want to build a sentiment classifier, which I know Lutz hates this. It’s just a one big false positive, but you want to build these different use cases. In the past, you do research, you look at which models are best for this, you study them, you build them, you test… Da, da, da. Maybe now you get period off results because you’re using an LLM and just prompting it and fine-tuning it into this specific use case. So it just speeds up things a lot.

Lutz: Actually, and sentiment classified, this is an awesome topic. Because again, in the old traditional world where we had sentiment classified, this was super narrow, I wanted to understand, is it positive or negative? So what? What can you do with it? Now, the LLMs actually, and we see this in clinical research now where people are testing LLMS as a doctor’s voice, the LLM is showing empathy and reacting. So when you trigger an LLM for the generate one, if I come to you Reetu, I scream at you, you told me the funny story that the machine would then answer, “But we all try to do our job here. Please don’t scream at me.” And in reality, it’s computer generated answer.

But LLMs have the ability not to measure positive or negative. No, they have the ability to act in a way which we understand as empathy. So actually I don’t like sentiment analysis because it is zero or one and it’s completely not useful very often. But for LLMs who have this more nuanced way of interacting, it makes actually a lot of sense.

Reetu: And I think right now the exciting part is that the vision we have in the beginning or have is… And the things we wanted to do back then, but it was very difficult to do, now start to become much easier and it’s almost like you can see it. And even in our space, I think in the beginning, even all of our partners, there’s a bit hesitation around AI, but even investors, although there was a lot of investment back then, but suddenly flip the switch and say, “Ooh, this is happening.” Now, we can see that it’s going to happen. And that’s why it’s nice to be in this position because we’ve been building this product for six years. And again, why not just to plug in ChatGPT? It’s not just have a conversational model, then you just unleash it with your customers and hope that it does the work.

I think that actual product you have to build on top of it, how do you plug this to different systems? How can you actually command it? How can you analyze the data? There’s a lot of value just in the data itself. Companies want to do QA, companies want to do… There’s a lot of stuff they want to actually do and control the model. So that’s where the value comes in. It’s almost like those transformers. Why won’t I just take a transformer and plug it into my support? It’s just classified. Okay, yeah, you can. But again, it’s a lower level where you operate versus where the user actually needs to be.

Lutz: I like this. This is so important. It’s not about the model. And you said this early on. Earlier, so neatly, you said, “We test our different models and because we test our different models, we become agnostic later on to the model.” The model is not the value, it is the UXs we discussed. And the other thing which you now said, which the whole world get excited about and just when I talked about in our last podcast about this, is connecting those models to different use cases, meaning your ability to connect into the actual systems of your customers and get the right information for those models to act on, that is an extremely high value for the customer as for you and which LLM you use, OpenAI, Lama, whatsoever becomes for you a secondary effort. And it’s UX and it’s the connectivity. These are the two main value drivers. I love that.

Reetu: That’s exactly the case. So we do see now… Again, I think it’s one of these things that we’re overestimating the impact in the next six months or 12 months. But I don’t know, maybe in this case, not underestimating, but at least the value that comes in from two to five to 10 years from now, it’s going to be massive. So I think this is the exciting part here. And our problem is almost we can have a hundred different use cases where it’s just thinking what is the chess you want to play in the space in terms of what are the first things you do? Do you build your own models? How do you manage them? Et cetera, et cetera.

So I think this is the great thing that it’s almost like you have abundance of value you can create, but you just can’t do everything at once and now it’s now about the strategy, how you go forward. But we do, again, it’s just you keep grinding, you keep grinding, there’s waves, they come and go. But it just feels very validating now that you have this little Christmas gift that came in a very, let’s say, difficult times economically to the whole world. And now everybody’s excited and now you’re the middle of the storm. But yeah. It’s going to be a wild ride.

Jasper: Perfect closing words, Reetu. Thank you so much. We went way over time, your precious time, so thank you for taking the time. All the insights, thank you so much.

Lutz: Thank you for your thoughts.

Reetu: Bye.

How Ultimate evolved from hundreds of supervised models to UltimateGPT

Podcast: This week, Lutz and Jasper sit down with a special guest — Reetu Kainulainen, cofounder and CEO of Ultimate.

Written by Cherry Ventures