WEEK -2 Updates [7th June — 14th June]
Two weeks into the coding phase, I had the task of improving the search algorithm being used for OpenMRS AddOns.
How did I go about it?
My first step was to create a draft of the required changes.
I then went about reading all about elasticsearch cause just a little knowledge about elasticsearch wasn’t going to help me create an optimised solution. Hence, I took a few days off to read all about elasticsearch’s working and focused especially on the functions I was concerneed with and understood it’s working. This went a long way in helping me out. In the meantime I had also requested for some community feedback regarding the possible improvements for this feature.
The required changes were :
The following points describe the circumstance and the action taken under that scenario:
- If Query matches uid exactly then this search result is given the highest weight!
- Query which matches title of module perfectly is given highest weight
- Query which matches tag exactly given an equally high weight
- Ex: Query=”Form-Entry” and tag =”Form-Entry” then that module gets the top rank.
- Query which is sub string of the title is also given a medium weight
* Ex: ref sub string of reference application ( Current algorithm is not implemented like this and hence it gets pushed down
- Query matching title using fuzziness=1(allows one spelling mistake) given low weight
- Query which matches description as sub string given low weight
- Reason is that many modules might contain the query as part of their description but only one will have it in it’s name and that module is given highest weightage. Example: “Reference Application” term is in the description of most ref app modules but it actually matches exactly with Reference application module. Moreover, when query=”ref”, modules with “ref “ in their title should rank higher than the ones with the term in their description
- Query matching description using fuzziness given very low weight
Apart from the above :
- Modules which are deprecated or inactive given least weight
So, we have been working on this for a while now. The main issue was ensuring that all the features work well together.
For example: If we give high rank to a name match then sometimes certain tag matches would be mistaken for a name match and hence the module with those tags would not show up first.
Trial and error! This and a complete understanding of what each function does is the does only solution. The same is also mentioned in an elasticsearch documentation!
So after a few trials , we have come up with a proper algorithm which does the job.
Here are the differences:
1. Perfect match of tags
All the modules with the tag ranked before any other module.
- Perfect match of UID :
Added much more weight to UID and also mad it non analysed.
3. Allow minor spelling mistakes:
4.Perfect title match comes first:
5.Rest of the results also make more sense:
6.Partial matching improved :
Had yet another successful week with Gsoc @OpenMRS! On to week 3!