2 years of AI-lab at Ruter — Part 1

How our innovation hub helps make public transportation better

Ruter is a public transportation company operating in the greater Oslo area. We have loads of historical data from a wide range of business domains. To extract value from this data one needs to develop a culture of innovation on top of a robust and scalable data infrastructure without hindering day to day operations. Ruter introduced AI-lab in 2020 as a platform to drive these kinds of innovative initiatives.

Foto: Ruter As / Redink, Thomas Haugersveen

We will first cover what defines our AI-lab. Then we will outline what the students worked on during its first iteration in 2020. To read about what the students worked on in 2021, stay tuned for part 2!

What is Ruter’s AI-lab?

The AI-lab at Ruter consists of university students with academic interest and experience in Machine Learning (ML) and Artificial Intelligence (AI). These students come from a variety of different study programs focusing on science, technology, engineering, and math. Each summer, we invite 3–4 students to help us solve business problems using advanced statistics and ML with a high degree of autonomy and independence. The summer internships are often extended into flexible part-time positions that combine well with ongoing university studies.

At the AI-lab, we try to build a culture of openness and experimentation while solving business-critical problems for one of Norway’s biggest public transport authorities. The issues we work on are interdisciplinary; we leverage the technical and domain-specific competence of many different teams. This also gives an excellent learning opportunity in cross-functional team collaboration and application architecture. The AI-lab has a product-centric mindset, but the insights one gains while working and experimenting with the data have often proved to be just as valuable as the finished applications.

The Data Science department handles historical data applications at Ruter.

The AI-lab has been a great asset to the Data Science department at Ruter in identifying novel use-cases for ML and statistics. It has also informed the design of our data infrastructures and made actual data applications that make it to Ruter’s production environment.

2020 was the first iteration of the AI-lab. Four students and an experienced software engineer set out to help our Customer Support team in optimizing their case-handling processes. In the next section, we will discuss the problem at hand and how we used AI to improve the existing solutions.

2020 — Customer inquiries, AI case handler and insights

Ruter receives approximately 4000 customer inquiries every week by phone, email, or our online contact form. A large proportion of these inquiries concerns schedule deviations on the different modes of transport we operate. Responding rapidly to these inquiries is essential, as customers often send them while waiting at the stop for an already departed, delayed or cancelled journey. Also, case handlers end up allocating large amounts of time to these kinds of inquiries while they would prefer to spend their time on more complex queries that require human understanding. The project for the 2020 iteration of AI-lab was to find innovative ways to automate parts of these case-handling processes. A case handler AI bot was proposed to give customers instant feedback with relevant information when customers complained about schedule deviations.

In parallel with this use case, the team identified that there was a need for a data visualization and insights platform for the inquiry data. Customer inquiries can be highly valuable as a data source for quickly identifying and reacting to problems with Ruter’s services. Still, to make this a reality, one needs to do proper preprocessing, aggregation, and enriching of the data.

To service both these needs, the team started creating an extensive Natural Language Processing (NLP) engine that could do several things we found useful and interesting with the inquiry text.

A high-level view of the solution architecture implemented during AI-lab 2020

Language recognition and spelling correction to ensure data quality
During the project, it quickly became apparent that only Norwegian text should be processed by our NLP engine. We simply did not have enough data for good ML models with multi-language support. Therefore, a pretrained FastText model was deployed as part of the NLP engine to distinguish Norwegian from other languages. A rudimentary spell checker was also implemented using edit distances with recognized spellings of words. Inquiries in languages other than Norwegian were filtered out, and spellings with low edit distances from ‘recognized spellings’ were corrected. These steps ensured that the data we used in model training and downstream processing steps was of higher quality.

Anonymization: concealing personal information to visualize inquiries internally
Email addresses, bank account information and phone numbers can be filtered away using fuzzy regular expressions, as these follow relatively specific patterns. ML was used to remove names and medical conditions from the inquiry text. FastText was used to vectorize all the words in each inquiry. The vectorizations were then evaluated against vectorizations of terms in a “blacklist” (e.g. medical conditions, names) and “whitelist” of words. If a word’s vector representation was sufficiently close to vectorizations from our blacklist or not close enough to vectorizations in our whitelist, it got masked from our output. This provided a form of anonymization to our NLP engine. Automatic anonymization made it possible for us to give access to the inquiry text to a wider audience internally at Ruter. However, this solution cannot guarantee 100% anonymity. One particular problem we had was anonymizing unusual names and medical conditions. Therefore, access to this anonymized data continues to be restricted and regulated.

Example of spell correction and anonymization of text fields done by our NLP engine.

Sentiment analysis and topic modeling to aggregate and visualize customer satisfaction
Sentiment analysis is the use of algorithms to systematically identify and quantify the emotional polarity in some library of texts. For us, developing metrics and trends directly related to customer satisfaction could be very useful. A BERT model pre-trained on a large Norwegian corpus from the National Library of Norway was used for sentiment analysis of the inquiry text fields. This pre-trained model was then trained on the downstream task of classifying the sentiment score for different kinds of Norwegian text. We used scraped Norwegian review data from Yelp, Google Play, Google Maps and directly downloaded data from NoReC: The Norwegian Review Corpus. The idea was to use a large dataset from multiple review sources so that the model could generalize well to our domain.

PowerBI dashboard visualizing sentiment per stop for line 37 in Oslo. The size of each bubble represents the number of inquiries and the color reflects average sentiment (scale from 1 to 5, where 1 is very negative and 5 is very positive). Inquiry text is masked for privacy reasons.

Topic modeling with LDA and clustering with t-SNE (more clearly explained here) were done to visualize new inferred categories and similarities between different inquiries. Inquiries are often complex and multi-faceted, so the functional categories we already use don’t always reflect their general themes. LDA enables inference of new categories in an unsupervised manner, and t-SNE lets us visualize the vector representations of complaints in two dimensions. When the sentiment of the complaints is layered on top of this, you can make some interesting plots.

t-SNE plot of vector representations of all inquiries we received in 2019, each dot is one complaint. Colors reflect functional topics identified by customer support.
The same t-SNE plot, but with colors reflecting topics identified by LDA with number of topics = 10
t-SNE plot overlaid with sentiment score, the ‘Santa Maria’ scale is used (MILD=positive, X-HOT=very negative). We see that inquiries regarding ticket controls and passengers being left behind are more negative than others.

Running LDA with many different configurations of the ‘number of topics’ parameter, we identified some interesting new categories. Complaints about the doors opening and closing too soon were one category we identified. Another was about customers hearing loud music either from other passengers or from the bus driver.

The results from these modeling steps, especially sentiment analysis, have room for improvement. For example, the vast majority of our customer inquiries are negative, which makes it hard to gain a robust quantitative view of customer satisfaction from this data. Also, what constitutes negative experiences on public transport can be rather domain-specific. Therefore, more granular and domain-specific labelling of our inquiry data is needed to train models that can be used for business decisions, and this is one of the things the 2022 AI-lab will try to tackle.

Classification of inquiries into functional categories to aggregate, visualize and react quickly
Multiple FastText models were trained to automatically label inquiries into the hierarchy of functional categories already in use by the Customer Support team. This lets us visualize what kind of inquiries we are getting in near real-time and can be used to alleviate manual labelling of the inquiries. Also, the classification can trigger automatic replies to be sent, which is what the AI-bot currently does.

Dashboard showing all inquiries regarding schedule deviations in the inner Oslo area in the period 25.02.22–29.04.22. Inquiry text is masked for privacy reasons.

Send automatic replies when inquiry text satisfies certain conditions
When the NLP engine identifies a complaint as being about schedule deviations, it further tries to predict the subcategories: too late, too early, or cancelled departure. This is then sent via Ruter’s internal messaging system Kafka to our Customer Support system, from which an automatic email reply is sent if the prediction is above our confidence threshold.

An automatic reply from our AI-bot when a customer complains about a delay on line 31E.

Simple FastText models were trained to classify cases for our AI-bot, which triggered automatic email replies downstream. FastText was chosen for simplicity, resource efficiency and time constraints. Using transformer-based models, like BERT or GPT-3, could create models that classify with significantly better accuracy and send generated text replies for all kinds of inquiries. However, these models are data-hungry and would probably need more domain-specific labelled data.

Today, our NLP engine and AI-bot, nicknamed Harry Botter, has run for almost 2 years performing anonymization, classification, sentiment analysis, and automatic replies to customer inquiries. It has reduced the processing time for certain complaints from days to seconds. The NLP engine also powers our Customer Insights dashboard in PowerBI, which is regularly used by multiple teams internally at Ruter, giving valuable insights into how our customers experience the services we provide.

NLP can solve many problems in organizations that receive and rely on textual descriptions of events and experiences. It can especially help lift these experiences out of siloed databases: not anyone at Ruter can view GDPR protected inquiries, but considering the aggregations and trends in the feedback data can be important for developing new products and services across the organization.

Read part 2 of this article series, where we discuss the work we did in the 2021 iteration of AI-lab in which we tried to predict travel times and delays across Ruter’s huge transport network.

I started as an intern in the 2020 iteration of the AI-lab at Ruter and now work here full time as a developer for the ML team. We try to leverage advanced statistics and ML in combination with a modern data stack to better public transportation in the greater Oslo area.

--

--