Pseudo Random Sorting with Azure Search
Some hacks were meant to be documented
Using a random sort order might not seem like a useful feature of a search service. Still, there are legitimate use cases that call for such a requirement.
Why Random Sorting?
One use case could be to make sure a relatively infrequently-changing list seems fresh anytime a user sees it. ElasticSearch for example, provides a random_score clause that would randomly sort in a consistent way (more about that consistency later).
Another use case would be to return a random sample of documents. To do that, you could randomly sort, and limit the number of returned results. Or, use something like MongoDB’s $sample aggregation pipeline stage.
At Wescover, we reveal real-life creations in public places. There are just too many to capture in a static sort list and recency is not related to relevancy. Random sorting to the rescue.
We would like this random sort to be consistent. Basically, given the same seed the produced sort order should be the same.
For paged list display, we want multiple accesses to the server to operate on the same logical list. I.e. getting page 2 of the list would not repeat entries from page 1, and getting all pages in sequence would show all documents.
For long user sessions, it makes sense to be able to show the same sorted list. I.e. the user navigates away from the list and back, he expects to see the same results (assuming not much time has passed).
The service’s responsibility is to maintain the sort order given a seed while the client’s responsibility is to hand over the same seed as long as the sort order is to be maintained.
Azure Search — What’s Available?
With Azure Search, you have control over the sorting order using the orderby clause.
On top of that, each document gets a search score which can be fine tuned using Scoring Profiles. The final sorting considers this score or not depending on your needs.
Since we know the sorting has to be random, we cannot simply use the orderby clause over any fixed value within the document.
Further more, we know we need to provide a seed, so the sorting order must be parameterized in some way.
Azure Search allows the usage of some functions within the orderby clause, which is a good candidate to accept a seed parameter.
$orderby=geo.distance(location, geography’POINT(-122.13 47.67)’) asc
Accepts an origin point (-122.13, 47.67) and would sort the documents by the distance of their “location” field to that point.
Hence, The Hack
So, if we gave in advance each document a random point over some geographic area, and provided another point as the seed for the sorting function we would get a consistent random sorting. Kinda.
This is not a perfect randomness. For instance, very close seeds would still share probably a very similar (if not identical) sorting. I can live with that given the dataset I’m working on but take this into consideration.
Tag documents with random geo locations
To make the sorting more volatile relative to seed value changes and to allow some other improvements I won’t go into — I decided to arrange the documents on the perimeter of an imaginary circle.
Since the geo distance functions accept longitude/latitude points on a spherical surface (earth) I wanted to use the correct math to calculate them.
This snippet shows how to generate the random point for a document, we randomize the angle, the rest is constant.
The center I chose is (0,0) which is not a place you would confuse with real data coordinates.
I found this site useful for debugging my random point results and make sure I indeed generate a circle.
If we wanted to prioritize a group of the documents to be chosen before others, we could have given them a shorter distance from the center. Another approach would be to use scoring profiles (See Below).
Generate The Seed
Depending on your needs, you might want to generate that seed for every query or for every user session. Regardless, seed generation is very similar with difference being that it can reside on any point within a small inner circle.
The math here is nothing new.
Execute The Query
So, similar to what we have seen above, this is how we use the geo function to execute the random sorting:
$orderby=geo.distance(randCoord, geography’POINT(0.00021123 0.00034732)’) asc
Extra - Using Scoring Profiles
For more complex scenarios, where the randomness needs to be incorporated into some other sorting factors, using scoring profiles may help.
For example, assume you want a random list, but also want the search relevance of the keywords used to affect the position within the random list. (probability to appear higher).
You can create a scoring profile that would include a function to calculate the distance from the provided seed point to the random location we tagged the documents with. This would provide the same results as the orderby usage above.
Now, you could combine the “distance” function with other functions and weights to create a search score which is both random and takes into consideration other document information this way. For example:
Hopefully random sorting would make it on the feature list of Azure Search someday — until then, with some index time calculations a reasonable hackish alternative is possible.
Let me know what you think!