Building a Search Engine using a Trie Data Structure

Maximiliano Kalaydjian
7 min readDec 25, 2023

The purpose of the article is to explore and connect certain concepts in software development. It aims to create a simple search engine using the Trie data structure, which is less widely recognized in larger companies than in university courses.

The article will cover the development of a user interface that allows users to search for terms in a dictionary or any abstraction. The backend will be created in Kotlin, and the frontend in React. It’s worth noting that while developing a search engine using the Trie data structure is a good starting point, it’s important to consider that libraries specifically designed for text searching, such as Apache Lucene, employ Inverted Indexing.

The index stores statistics about terms in order to make term-based search more efficient. Lucene’s index falls into the family of indexes known as an inverted index. This is because it can list, for a term, the documents that contain it. This is the inverse of the natural relationship, in which documents list terms.

Trie Data Structure

A Trie, short for retrieval tree, is a tree-like data structure used for storing a dynamic set of strings. Each node in the Trie represents a character, and the path from the root to a particular node spells out a word. Tries are particularly efficient for text-based search operations due to their hierarchical nature.

https://en.wikipedia.org/wiki/Trie

Trie Structure

Root Node:

  • The topmost node in a Trie.
  • Represents an empty string or the beginning of all stored words.

Nodes:

  • Each node represents a character or a portion of a key.
  • Nodes contain references to their child nodes and may store additional information.

Edges:

  • Edges define the relationships and transitions between characters along the path from one node to another.
  • Each edge corresponds to a character or a part of a key.

Leaf Nodes:

  • At the end of a path in the trie, nodes known as leaves represent the completion of a key or word, and they typically do not have any children.
  • Each path from the root to a leaf corresponds to a unique key or word in the trie.

Big-O Analysis

Time Complexity

Insertion:

  • To insert a word into a Trie, traverse the Trie from the root, creating nodes as needed for each character in the word.
  • Mark the last node as a leaf node to indicate the end of the word.
  • The time complexity for inserting a word into a Trie is O(m), where m is the length of the word.
  • Each character in the word requires traversing a level in the Trie.

Search:

  • To search for a word or prefix, traverse the Trie following the links corresponding to the characters in the word or prefix.
  • If the traversal reaches a leaf node, the word is present.
  • The time complexity for searching for a word or prefix in a Trie is O(m), where m is the length of the word or prefix.

Deletion:

  • Deleting a word involves removing the nodes corresponding to each character in the word.
  • If the deletion process removes a node with other child nodes, the word represented by that branch remains in the Trie.
  • The time complexity for deleting a word from a Trie is also O(m), where m is the length of the word.

Space Complexity

  • The space complexity of a Trie is influenced by the number of nodes and the branching factor.
  • It can be expressed as O(N * L), where N is the total number of keys, and L is the average length of the keys. In the worst case, where there are no common prefixes among the keys, the space complexity might be O(N * M), where M is the length of the longest key.
  • In practice, Tries can be more space-efficient than this worst-case estimate due to the sharing of common prefixes.

Trie Advantages

Prefix Searches:

  • Trie structures excel at prefix-based searches. Finding all words with a given prefix involves traversing the Trie from the root to the node representing the prefix.

Space Efficiency:

  • Tries can be more space-efficient compared to other data structures for certain use cases, especially when many words share common prefixes.

Ordered Iteration:

  • The structure of a Trie allows for efficient iteration over all words in lexicographical order.

Auto-Completion:

  • Tries are commonly used in auto-completion systems. As users type, the Trie can quickly suggest possible completions based on the entered prefix.

Backend development using Kotlin

Let’s set up a basic Trie structure for indexing, and some endpoints for inserting and searching words.

package com.example.engine

class TrieNode {
val children = mutableMapOf<Char, TrieNode>()
var isEndOfWord = false
}

class Trie {
private val root = TrieNode()

fun insert(word: String) {
var current = root
for (char in word) {
current = current.children.computeIfAbsent(char) { TrieNode() }
}
current.isEndOfWord = true
}

fun search(word: String): Boolean {
var current = root
for (char in word) {
current = current.children[char] ?: return false
}
return current.isEndOfWord
}
}
package com.example

import com.example.engine.Trie
import io.ktor.http.*
import io.ktor.serialization.jackson.*
import io.ktor.server.application.*
import io.ktor.server.engine.*
import io.ktor.server.netty.*
import io.ktor.server.plugins.contentnegotiation.*
import io.ktor.server.plugins.statuspages.*
import io.ktor.server.request.*
import io.ktor.server.response.*
import io.ktor.server.routing.*
import io.ktor.server.plugins.cors.routing.CORS
import org.slf4j.Logger
import org.slf4j.LoggerFactory

fun Application.module() {
val log: Logger = LoggerFactory.getLogger(Application::class.java)

install(ContentNegotiation) {
jackson { }
}

install(StatusPages) {
exception<NotFoundException> { call, _ ->
call.respond(HttpStatusCode.NotFound)
log.error("NotFoundException: ${call.request.path()}" + " " + call.request.queryParameters["query"])
}
}

install(CORS) {
allowMethod(HttpMethod.Options)
allowMethod(HttpMethod.Post)
allowMethod(HttpMethod.Get)
allowHeader(HttpHeaders.AccessControlAllowOrigin)
allowHeader(HttpHeaders.ContentType)
anyHost()
}

routing {
val trie = Trie()

route("/add-word") {
post {
val wordRequest = call.receive<WordRequest>()
val word = wordRequest.word

trie.insert(word)
call.respond(HttpStatusCode.Created)
log.info("Word '$word' added successfully.")
}
}

route("/search") {
get {
val query = call.request.queryParameters["query"]
if (query != null && trie.search(query)) {
log.info("Word '$query' found.")
call.respond(mapOf("result" to "Found"))
} else {
throw NotFoundException()
}
}
}
}
}

class NotFoundException : RuntimeException()
data class WordRequest(val word: String)

fun main() {
embeddedServer(Netty, port = 8080, module = Application::module).start(wait = true)
}

Simple Unit Testing

Testing endpoints

curl -X POST -H "Content-Type: application/json" -d '{"word": “test”}’ http://localhost:8080/add-word

curl -X GET "http://localhost:8080/search?query=test”

Frontend Development with React

React’s component-based architecture allows for the creation of dynamic and responsive user interfaces. Here’s a simplified example of a React component for the search bar:

import React, { useState } from 'react';
import ApiService from './ApiService';

const SearchBar = () => {
const [query, setQuery] = useState('');
const [successMessage, setSuccessMessage] = useState('');
const [searchResult, setSearchResult] = useState('');

const handleSearch = async () => {
try {
const result = await ApiService.get(`search?query=${query}`);
setSearchResult(result.word);
setSuccessMessage('Word found!');
} catch (error) {
console.error('Search failed:', error.message);
setSearchResult('');
setSuccessMessage(`Word "${query}" not found.`);
}
};

const handleInsert = async () => {
try {
await ApiService.post('add-word', { word: query });
setSuccessMessage('Word inserted successfully');
setSearchResult('');
} catch (error) {
console.error('Insert failed:', error.message);
setSuccessMessage('');
setSearchResult('');
}
};

return (
<div>
<input
type="text"
placeholder="Type here..."
value={query}
onChange={(e) => setQuery(e.target.value)}
/>
<button onClick={handleSearch}>Search</button>
<button onClick={handleInsert}>Insert</button>
{successMessage && <p>{successMessage}</p>}
{searchResult && <p>Search Result: {searchResult}</p>}
</div>
);
};

export default SearchBar;

Packing and Running the Trie Search Engine

Build and Run the Kotlin Backend:

  • Use Gradle or another build tool to compile your Kotlin backend.
  • Run the backend server using a command similar to ./gradlew run.

Install Dependencies for React Frontend:

  • Navigate to your React frontend directory and run npm install to install dependencies.

Run the React Development Server:

  • Start the React development server with npm start.

Running locally

Visit http://localhost:3000 in your web browser to access the app.

To test Trie's efficient search capabilities, type in your search queries into the search bar and observe the results.

You can access the repository containing all the code by clicking on the following link: https://github.com/maxi-gkd/trie-search-engine/

--

--