Customizing scores in Elasticsearch for product recommendations
Elasticsearch has a really nifty feature called function_score
that allows you to modify the scores of documents. It took me a while to figure out the exact syntax of function_score
, so I’m sharing it here.
I’m building a WordPress product recommendation plugin that finds related products. It basically does something like this:
size: 4,
query: {
match: { title: article.title },
}
random_score
This basic query got the job done. But, I wanted slightly different results every time the user refreshed the page. To do this I added a function_score
query with a random_score
function:
size: 4,
query: {
function_score: {
functions: [
{
random_score: {},
},
],
score_mode: 'avg',
query: {
match: { title: article.title },
}
}
}
The random_score
function generates a random value between 0
and 1
. Then the score of my documents is averaged with the random_score
. There are other score_modes
like multiply
, min
, sum
, etc.
field_value_factor
Later, I decided that users would be more likely to click on products that are on sale, so I used a field_value_factor
function to boost fields by their discount.
size: 4,
query: {
function_score: {
functions: [
{
field_value_factor: {
field: 'price_difference_percent',
missing: 0
},
},
{
random_score: {},
},
],
score_mode: 'avg',
query: {
match: { title: article.title },
}
}
}
The field_value_factor
takes the price_difference_percent
value of a document and uses that as part of the score. A price_difference_percent
of 0
means 0% off the original price , whereas 0.9
means 90% off the original price. This worked out pretty nicely, since random_score
generates values between 0 and 1.
I tried using likes_count
to boost more popular documents:
size: 4,
query: {
function_score: {
functions: [
{
field_value_factor: {
field: 'likes_count',
factor: 0.01,
modifier: log1p,
missing: 0
},
},
{
field_value_factor: {
field: 'price_difference_percent',
missing: 0
},
},
{
random_score: {},
},
],
score_mode: 'avg',
query: {
match: { title: article.title },
}
}
}
Like price_difference_percent
I used a field_value_factor
function. The values for likes_count
vary between 0 and 1000. I tweaked the factor
and modifier
options to get likes_count
to return a value between 0
and 1
. Unfortunately, I wasn’t happy with the influence that likes_count
had on the product recommendations, so I removed it.
gauss
I also wanted to boost products that were recently created. A product that hasn’t been sold in two years is less likely to be bought today.
size: 4,
query: {
function_score: {
functions: [
{
gauss: {
created_at: {
scale: "10d"
}
},
},
{
field_value_factor: {
field: 'price_difference_percent',
missing: 0
},
},
{
random_score: {},
},
],
score_mode: 'avg',
query: {
match: { title: article.title },
}
}
}
When using the gauss
function with a date, it automatically sets the origin
to now
and decays the score from then. By default, gauss
decays dates by milliseconds, but I don’t need that sort of precision so I decay dates by 10 day intervals.
Before I discovered the gauss
function, I used a horrible hack to boost recent products:
size: 4,
query: {
bool: {
must: [
{match: { title: article.title }},
],
should: [
{range: {created_at: {boost: 10, gte: 'now-1d/d'}}},
{range: {created_at: {boost: 8, gte: 'now-7d/d'}}},
{range: {created_at: {boost: 6, gte: 'now-14d/d'}}},
{range: {created_at: {boost: 4, gte: 'now-1M/M'}}},
{range: {created_at: {boost: 2, gte: 'now-3M/M'}}},
]
}
}
Hopefully this helps you score Elasticsearch documents the way you want.
https://www.elastic.co/guide/en/elasticsearch/reference/2.4/query-dsl-function-score-query.html