Enhancing Wynk’s Search Experience through Neural Spell Checker [Part 1]

Published in

Airtel Digital

11 min readMar 18, 2024

Opening Overture

In the dynamic cadence of modern living, music streaming has seamlessly woven itself into our daily chorus. Wynk, one of the leading music platforms, has consistently strived to improve user experiences by offering personalised recommendations and an expansive musical repertoire. However, even the most passionate Wynk aficionados encounter a web of typos and misspellings while composing their queries. These queries often yield incorrect or incomplete search results, orchestrating moments of frustration for the user.

Recognising the importance of accurate spell correction as the bedrock of our natural language solutions and understanding the frustrations faced by users grappling with typos, Wynk has introduced a feature: A Neural spell corrector that is meticulously trained on user queries, uniquely attuned to the intricacies of the musical domain.

To get deeper understanding on the intricacies, the approaches and challenges that have collectively shaped the spell check solution we have to dive deeper into the history of spell correction algorithms.

Chronicles of Spell Correction algorithms

The roots of spell-checking algorithms date back to the late 1950s and early 1960s, marked by two pioneering papers: “On the Recognition of Information with a Digital Computer” by Herbert Glantz and “A Program for Correcting Spelling Errors” by Charles R Blair. The core idea in these papers was straightforward — maintain a list of words and, for each word typed by the user, compare it against a predefined list. However, these early algorithms had limitations; they could only flag incorrect words without suggesting corrections, offering a foundational but rudimentary approach to spell checking.

During the early 1980s, Gorin pioneered a spell-checking program that not only identified incorrect spellings but also provided suggested corrections for misspellings. However, a notable limitation of this approach was its reliance on suggestions limited to words with an edit distance of 1, highlighting a clear constraint in its corrective capabilities.

In 2000, Hunspell emerged as an open-source spell-checking and morphological analysis tool, incorporating diverse linguistic algorithms. It has gained widespread adoption as a spell-checking engine integrated into applications such as LibreOffice, Chrome, Adobe Illustrator, and Mozilla Firefox.

Do note that, the preceding algorithms lacked any consideration for phonetics and context.

Among the early algorithms crafted to tackle phonetic errors, Metaphone, introduced by Lawrence Philips in 1990, stood out as an advancement over Soundex. The fundamental principle guiding these algorithms is the belief that words with similar sounds should be encoded into the same key. Consequently, Metaphone encodes words into phonetic keys to enhance recognition. However, a notable challenge persists: these algorithms necessitate individual implementations for each language, adding a layer of complexity to their application.

These techniques face a persistent issue as they lack context understanding for accurate corrections. Attempts to incorporate N-gram models, analyzing adjacent characters or words, fell short of achieving state-of-the-art accuracy. As we delve into the next section, these challenges prominently impact user queries in wynk. Today, modern programs leverage natural language processing and understanding to effectively identify and correct errors, proving crucial not only for longer sentences but also for concise queries, as we’ll explore shortly.

Lets take a look at some of the challenges that are faced by Wynk in the context of spell correction

Why Spell Correction in Wynk is not a Trivial Solution

The Challenge of Multilingual Search

Wynk serves a linguistically diverse regional audience, accommodating search queries submitted in multiple languages. Nevertheless, handling misspellings becomes a intricate task without prior knowledge of the query’s language. To effectively address these queries, the model must possess language awareness, discerning the language the user is typing, and constrain its vocabulary accordingly.

Phonetic Variations in Transliteration

Transliterating words from Indian regional languages into English introduces diverse phonetic variations for the same term. Individuals tend to write these words in English as they pronounce them, and there’s no universally agreed-upon definition for such transliterations. Consequently, our catalog and search queries may include variations like “badshah” and “baadshah.” This ambiguity blurs the concept of a “correct word.” Both queries and documents exhibit phonetic variations of a term, exemplified by users querying “bir jara” when intending to search for “veer zaara.”

Challenges in Search-as-You-Type

The functionality of search-as-you-type demands swift processing and immediate responses to user inputs, imposing a considerable strain on real-time systems. This challenge is particularly pronounced at the scale at which Wynk operates, making scalability a significant concern. The spell-check module frequently encounters partial queries as input. Therefore, the model must adeptly generate meaningful corrections even when confronted with limited information, ensuring an efficient and responsive user experience.

Tackling Multilingual Transliterations with a Music-Aware Spell Check

Imagine a scenario where your search model not only comprehends language but also possesses an intimate knowledge of the music domain. This dual capability can be a game-changer.

For instance, when dealing with multilingual queries, a music-savvy model can pick up on subtle cues in the user’s input, recognising the artist names, song titles, and genres specific to the music landscape. This heightened contextual awareness allows it to discern the language being used, mitigating the complexity of handling misspellings. Additionally, in the realm of phonetic variations and transliterations, a music-savvy model understands the nuances of how people transliterate Indian regional words in English, like the variations between “badshah” and “baadshah.” Its knowledge extends beyond language, embracing the unique phonetic patterns inherent in music-related terms.

The fusion of language understanding with a music-specific context equips the model to navigate the challenges seamlessly, offering a comprehensive solution to the multifaceted search landscape in the world of Wynk music.

There are two crucial pillars: constructing an adept model and crafting precise training data from user logs. The latter is particularly intricate, requiring the generation of accurate and diverse training datasets spanning various regional languages. Equally vital is the development of a swift and scalable model, a topic we delve into in the subsequent sections.

Deciding On The Right Architecture

Spell correction can be regarded as a sequence-to-sequence problem, where misspelled queries are considered input sequences, and their corrected forms are treated as output sequences.

After experimenting with various architectures such as LSTMs, we decided to adopt a micro transformer approach for our model. Rigorous testing confirmed the micro transformer as the optimal choice, striking a balance between performance and accuracy. Notably, we streamlined the model by significantly reducing the number of layers from the original architecture outlined in the paper “Attention is All You Need”.

Transformers are particularly well-suited for this task due to their inherent ability to capture long-range dependencies and contextual information, making them ideal for handling language nuances, phonetic variations, and contextual understanding crucial for accurate spell-checking across diverse multilingual contexts.

Architecture of the Model

Micro Transformer : A transformer with reduced number of parameters

Encoding Query

We employ bytepair encoding (BPE) to segment a given query into subwords, enhancing its representation. BPE, a data-driven approach, iteratively merges the most frequent character pairs, effectively capturing both morphological and semantic information. This segmentation method aids in mitigating out-of-vocabulary issues and contributes to the model’s ability to comprehend and generate meaningful corrections for a wide range of queries.

Since most transformer models are pre-trained solely on English text, our objective of accommodating transliterated words from regional languages posed a significant challenge. Given the smaller architecture of our model and the substantial presence of regional transliterated queries, importing pre-trained weights wasn’t feasible. Thus, we undertook the task of thorough training from scratch, leading us to create our own vocabulary to address this unique requirement.

Throughout this process, we utilised the Categorical Cross-Entropy (CCE) loss function and integrated the strategy of teacher forcing.

Training Data

A pivotal challenge lies in crafting high-quality training data during the label creation process from search logs. The approaches employed encompass strategies highlighted herein. The emphasis is on cultivating a dataset that is both diverse and representative, mirroring authentic user typing behaviours and corresponding corrections. This includes addressing single character as well as whole word corrections.

Prior to delving into the intricacies of training data creation, a comprehensive understanding of prevalent user typing errors in query formulation is essential.

Common Sources of Errors

Let’s categorize common user errors into distinct groups:

Typos: Genuine mistakes made by users.
Phonetic Errors: A substantial portion of errors results from different phonetic variations of the same spelling, such as “daku” and “daaku.”
Keyboard Layout Errors: Some errors arise from the close proximity of keys, leading users to inadvertently press nearby keys while attempting to type a specific character.
Autocorrect Errors: Given the prevalence of mobile users, a notable proportion of errors stem from the device’s autocorrect algorithm introducing inaccuracies in user input.

Training Pairs from user’s search queries

Our emphasis is on crafting domain-specific training data tailored to the intricacies of the music domain, encompassing queries with artist names, album titles, and song names.

With the search-as-you-type feature, we generate logs for each user keystroke, allowing us to discern deletions and observe how users reformulate their queries. By merging this insight with the user’s streaming history, we deduce the appropriate correction for the given query. This comprehensive information is meticulously captured within our search logs.

Additionally, To avoid biasing the model towards making corrections consistently, we include examples where users correctly typed the string and consumed the content.

For a detailed breakdown, refer to the table below, presenting columns like user’s initial query, user’s correction, user’s consumption, and the label created. The label created is treated as the corrected query, given to the model for learning, while the initial query is the input designated as the incorrect query.

Table showcasing how the training data is being created from the user logs

Augmentation using Catalog

The majority of consumption revolves around the most popular items, leaving a considerable portion of the catalog unrepresented in the training data. To ensure our model encounters a more diverse range of queries, we’ve enriched the training set by incorporating synthetically generated data. Our approach involves extracting metadata from the catalog and blending diverse entities like content titles, albums, artists, etc., to form synthetic queries resembling those a user might type. We introduce errors — additions, deletions, and substitutions — considering the QWERTY layout. For instance, a synthetically created error could be:

Table illustrating synthetic data that was created for training the model

Privacy and Ethical concerns

We adhere to stringent privacy standards by refraining from utilising personally identifying information, including demographic details and user IDs.

Training tech stack

In our training tech stack, we leverage EMR for the initial ETL process. This transforms user logs into structured training data. Subsequently, the data is transferred to a cloud multi GPU instance, for model training. The trained model is then optimised to ensure efficient real-time processing. Once the model is fine-tuned, it’s seamlessly deployed to production, ensuring a streamlined and effective transition from data preparation to the live environment.

Qualitative Results

Below are illustrative examples highlighting the effectiveness of a contextually trained spell correction model tailored for music data, juxtaposed with the conventional non-contextual spell correction, specifically within the realm of English songs.

Differences in Traditional Spell Correction and Contextual Spell correction with knowledge of music

Now, let’s examine several instances that demonstrate how this approach effectively addresses challenges associated with multilingual transliterated queries containing Hindi language words written in English.

Some Examples of the common phonetic errors that are made being made by users and their corrections for reference

Let’s explore several examples illustrating how spell correction significantly enhances the overall ranking of accurate content, even when users input queries with errors. These instances are visually depicted through screenshots of the app.

The results on the right showcase the spell corrector’s performance on a query written as “badshah,” where no correction is made.

On the left, the user initially searched for “badshah movie” but intended to access content related to the movie named “baadshah.” Notably, there exists a subtle phonetic distinction between the two words, as “badshah movie” is phonetically written as “baadshah movie.” This modification elevates the song from the “baadshah” movie to the top spot, aligning with the user’s true intent and demonstrating the spell corrector’s effectiveness in understanding and rectifying phonetic variations.

Additional instances of spell correction showcasing scenarios where users made typos but still achieved the correct results.

Now, let’s delve into specific examples that demonstrate the model’s proficiency in handling phonetic errors, thereby enhancing the quality of the search results.

Examples demonstrating the prowess of the model to correct for phonetic errors

Evaluation

Offline Evaluation

For our offline assessment, we employed the following metrics to gauge the performance of our system:

Query Level Accuracy : This metric assesses whether the corrected query aligns with the provided label.

Word Error Rate: This metric quantifies the number of incorrect words in the generated output.

Online Evaluation

In our online evaluation, we focused on the following key metric, which serves as our North Star for search quality:

Search CTR : This metric provides a crucial measure of user engagement and satisfaction, serving as a paramount indicator of the effectiveness of our search system.

Conclusion

The first part of our exploration into spell check for a streaming giant like Wynk delves into the intricate challenges posed by multilingual transliterated queries in the realm of music. We introduced a robust model designed to navigate these complexities, offering insights into its architecture and the meticulous crafting of a diverse training dataset. The overview showcased the model’s ability to address linguistic nuances and phonetic variations, ensuring accurate corrections for a seamless user experience.

As we progress to the next part of our blog series, the focus will shift to the inference side, where we unravel the fine-tuning process that enables the model to operate at scale in real-time. Discussions will unfold on optimisation strategies, shedding light on how we elevate performance metrics like P99s to enhance the overall efficiency of our spell-checking solution at Wynk. Stay tuned for an in-depth exploration of the inferential aspects and further optimizations that make our spell check system not only accurate but also swift and scalable.

Get in the groove with India’s Topmost rated App that can be downloaded from App Store & Play Store.

Feel free to reach out to Vipul Gaba and Rohit for any feedback on Wynk’s Spell Check