Customizing scores in Elasticsearch for product recommendations

luqmaan
Horrible Hacks
Published in
3 min readJan 9, 2018

Elasticsearch has a really nifty feature called function_score that allows you to modify the scores of documents. It took me a while to figure out the exact syntax of function_score, so I’m sharing it here.

I’m building a WordPress product recommendation plugin that finds related products. It basically does something like this:

size: 4,
query: {
match: { title: article.title },
}

random_score

This basic query got the job done. But, I wanted slightly different results every time the user refreshed the page. To do this I added a function_score query with a random_score function:

size: 4,
query: {
function_score: {
functions: [
{
random_score: {},
},
],
score_mode: 'avg',

query: {
match: { title: article.title },
}
}
}

The random_score function generates a random value between 0 and 1. Then the score of my documents is averaged with the random_score. There are other score_modes like multiply, min , sum , etc.

field_value_factor

Later, I decided that users would be more likely to click on products that are on sale, so I used a field_value_factor function to boost fields by their discount.

size: 4,
query: {
function_score: {
functions: [
{
field_value_factor: {
field: 'price_difference_percent',
missing: 0
},
},

{
random_score: {},
},
],
score_mode: 'avg',
query: {
match: { title: article.title },
}
}
}

The field_value_factor takes the price_difference_percent value of a document and uses that as part of the score. A price_difference_percent of 0 means 0% off the original price , whereas 0.9 means 90% off the original price. This worked out pretty nicely, since random_score generates values between 0 and 1.

I tried using likes_count to boost more popular documents:

size: 4,
query: {
function_score: {
functions: [
{
field_value_factor: {
field: 'likes_count',
factor: 0.01,
modifier: log1p,
missing: 0
},
},

{
field_value_factor: {
field: 'price_difference_percent',
missing: 0
},
},
{
random_score: {},
},
],
score_mode: 'avg',
query: {
match: { title: article.title },
}
}
}

Like price_difference_percent I used a field_value_factor function. The values for likes_count vary between 0 and 1000. I tweaked the factor and modifier options to get likes_count to return a value between 0 and 1. Unfortunately, I wasn’t happy with the influence that likes_count had on the product recommendations, so I removed it.

gauss

I also wanted to boost products that were recently created. A product that hasn’t been sold in two years is less likely to be bought today.

size: 4,
query: {
function_score: {
functions: [
{
gauss: {
created_at: {
scale: "10d"
}
},
},

{
field_value_factor: {
field: 'price_difference_percent',
missing: 0
},
},
{
random_score: {},
},
],
score_mode: 'avg',
query: {
match: { title: article.title },
}
}
}

When using the gauss function with a date, it automatically sets the origin to now and decays the score from then. By default, gauss decays dates by milliseconds, but I don’t need that sort of precision so I decay dates by 10 day intervals.

Before I discovered the gauss function, I used a horrible hack to boost recent products:

size: 4,
query: {
bool: {
must: [
{match: { title: article.title }},
],
should: [
{range: {created_at: {boost: 10, gte: 'now-1d/d'}}},
{range: {created_at: {boost: 8, gte: 'now-7d/d'}}},
{range: {created_at: {boost: 6, gte: 'now-14d/d'}}},
{range: {created_at: {boost: 4, gte: 'now-1M/M'}}},
{range: {created_at: {boost: 2, gte: 'now-3M/M'}}},
]

}
}

Hopefully this helps you score Elasticsearch documents the way you want.

https://www.elastic.co/guide/en/elasticsearch/reference/2.4/query-dsl-function-score-query.html

--

--