A Primer on Elasticsearch Query String Search

Driven by Code
Driven by Code
Published in
6 min readMay 23, 2023

--

By: Grace Tsai

TrueCar’s vehicle inventory data flow is powered by an extensive technology stack. The Apache Hadoop ecosystem, along with Spark and HBase, is used for processing vehicles from the feeds of more than ten thousand dealer partners on the TrueCar network. Output data then moves through an Amazon Kinesis stream and finally gets stored in Elasticsearch and Redshift.

Our digital marketplace utilizes Elasticsearch as it provides a real-time distributed search and analytics engine for storing and searching large volumes of the dealership inventory data. Put simply, search parameters and filters from TrueCar’s new or used vehicle search engine translate to an Elasticsearch query on TrueCar’s index. Millions of queries are made each month for new and used cars on our search results and vehicle details pages.

Elasticsearch provides extensive search functionality with the Query DSL (Domain Specific Language), which is expressed in JSON. However, this blog post will instead highlight Elasticsearch’s lightweight version of search, the query string search, which is utilized daily for development and analysis purposes here at TrueCar. This primer on Elasticsearch will allow you to perform common analytical queries on a sample vehicle inventory dataset on your computer.

We have provided sample queries that you can run yourself if you would like to learn more about Elasticsearch. If you wish to run the sample queries while going through this guide, please skip ahead to the Working with Elasticsearch Locally section for instructions on local setup.

Query string search utilizes Elasticsearch’s Search REST API. The Search API is exposed over HTTP as a GET request. The request URL should include the host name and index name. The basic /_search endpoint retrieves all documents from the index. By default, Elasticsearch only shows the first 10 results.

http://localhost:9200/sample_listings_index/_search

To use query string search, we use the /_search endpoint and append a q parameter to the query. The query below returns all Toyota vehicles.

/_search?q=vehicle.make:Toyota

Tracking total hits

The query above will return a total hits value of 10,000 with a “greater than or equal to” relation. As part of a query performance improvement, Elasticsearch requests count total hits accurately only up to 10,000 hits by default. For analysis purposes, we often need to understand total counts for our queries. In these cases, the track_total_hits parameter can be set to true to return a search response with a precise count of the total number of hits.

As seen below, when concatenating two query parameters, use the ampersand symbol.

/_search?track_total_hits=true&q=vehicle.make:Toyota

Another option to understand the total count of a query would be to use the Count API. This API uses the /_count endpoint and only returns the number of matches for the query.

/_count?q=vehicle.newOrUsed:u

Modify default size

As mentioned earlier, Elasticsearch displays 10 documents as the default size. To return more inventory records, we can specify the desired size as a parameter. This will return 50 Toyota documents instead of 10.

/_search?size=50&q=vehicle.make:Toyota

Exists search

We can do an exists search to show all vehicles that have non-null values in the imageUrls field. This query returns all listings that contain dealership-provided images.

/_search?q=_exists_:vehicle.imageUrls

We can also negate exists to find all vehicles that have a null value in the imageUrls field — meaning these listings do not have dealership-provided images.

/_search?q=!_exists_:vehicle.imageUrls

Term search

Term search can be used if the values of the field are known and we want to search on an exact term. In this case, we want to find all vehicles that are eligible to be sold in our TrueCar+ experience. We can further refine this query to tell us how many TrueCar+ eligible vehicles are new and how many are used. These queries are known as multi-field searches, which will be covered in the next section.

/_search?q=vehicle.truecarPlusEligible:true

/_search?q=vehicle.truecarPlusEligible:true+AND+vehicle.newOrUsed:n

/_search?q=vehicle.truecarPlusEligible:true+AND+vehicle.newOrUsed:u

Additionally, boolean operations can be handled as part of the term search by inserting +OR+ between each value. The query below searches for all listings with a makeId of 35 or 42 that are Kia or Dodge vehicles.

/_search?q=vehicle.makeId:(35+OR+42)

Multi-field search

We can write a multi-field search to filter on multiple fields. Append each additional condition with +AND+. The query below returns all 2023 Ford F-150 trucks for sale on TrueCar.

/_search?q=vehicle.modelYear:2023+AND+vehicle.make:Ford+AND+vehicle.model:F-150

The query below returns all used BMWs where totalMsrp is less than 50K (totalMsrp stores the amount in cents).

/_search?q=vehicle.newOrUsed:u+AND+vehicle.make:BMW+AND+pricing.totalMsrp:<5000000

Range search

Query string search allows us to perform range searches in a number of ways.

First, we can use square brackets for inclusive ranges. The query below will display vehicles from model year 2017 to 2019.

/_search?q=vehicle.modelYear:[2017+TO+2019]

Curly brackets are used for an exclusive range search. The query below will display vehicles from model year 2018 only.

/_search?q=vehicle.modelYear:{2017+TO+2019}

We can utilize the square bracket with a wildcard for greater than or equal to (>=) and less than or equal to (<=) range searches. The two queries below return vehicles with model year less than or equal to 2015 and vehicles with model year greater than or equal to 2020, respectively.

/_search?q=vehicle.modelYear:[*+TO+2015]

/_search?q=vehicle.modelYear:[2020+TO+*]

If we want to query an exclusive range, we can simply use the greater than (>) or less than (<) operators. The two queries below return vehicles with totalMsrp greater than 80K and vehicles with totalMsrp less than 40K, respectively.

/_search?q=pricing.totalMsrp:>8000000

/_search?q=pricing.totalMsrp:<4000000

Working with Elasticsearch locally

To get experience working with Elasticsearch, instructions for setting up your own test environment are outlined below. (Referenced from Elasticsearch documentation)

First, visit Docker Desktop to install and run Docker on your machine. After installation, run the following commands on your Terminal.

# 1. Create a new network
docker network create elastic

# 2. Download the Elasticsearch image from the registry
docker pull docker.elastic.co/elasticsearch/elasticsearch:7.17.9

# 3. Create and run a new container from the downloaded image
docker run --name es-test --net elastic -p 127.0.0.1:9200:9200 -p 127.0.0.1:9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.17.9

After running these docker commands, you should be able to see your local instance of ES7 running on http://localhost:9200/

Next, use the Elasticsearch REST APIs to set up a test index. Download the sample index mapping and sample data files to use for testing.

# 1. Create a test index, which I have named sample_listings_index
curl -XPUT 'http://localhost:9200/sample_listings_index/' -H 'Content-Type: application/json'

# 2. Navigate to the folder path where the downloaded sample index mapping json is saved and update the index’s mapping
curl -XPUT 'http://localhost:9200/sample_listings_index/_mapping' -H 'Content-Type: application/json' -d '@./sample_index_mapping.json'

# 3. Navigate to the folder path where the downloaded sample data is saved and bulk load the data into the index
curl -H 'Content-Type: application/x-ndjson' -XPOST 'http://localhost:9200/_bulk' --data-binary '@./sample_listings_data.json'

Now you should be able to open http://localhost:9200/sample_listings_index in your browser and perform the commands described in the blog post. For example, try
http://localhost:9200/sample_listings_index/_search?q=vehicle.make:Toyota

Once the sample data is loaded, try the queries detailed above to master query string searches.

To properly clean up the test environment:

# 1. Delete the index
curl -XDELETE 'http://localhost:9200/sample_listings_index'

# 2. Stop the Elasticsearch container
docker stop es-test

# 3. Remove the container and network
docker network rm elastic
docker rm es-test

In conclusion, we have detailed Elasticsearch’s lightweight query string search, which is utilized daily by engineers at TrueCar. We have outlined adjusting parameters such as tracking total hits and modifying the number of documents returned. In terms of query parameters, we covered the exists, term, multi-field and range searches. Hopefully this blog post has provided useful information to help speed up getting the data points you need.

If you love solving problems, please check out our Careers page. We would love to have you join us!

--

--

Driven by Code
Driven by Code

Welcome to TrueCar’s technology blog, where we write about the interesting things we‘re working on. Read, engage, and come work with us!