Previously I have mentioned in this article on the personalised context based query.
Lets try a query and try to figure out factors of Personalization for the query and rank relevant results for it.
For example, we have set of documents:
- Albert Einstein was a German-born theoretical physicist who developed the theory of relativity, He is best known for his mass–energy equivalence formula E = mc2. He received Nobel Prize in Physics.
- Sir Isaac Newton was an English mathematician, astronomer, theologian, author and physicist who is widely recognised as one of the most influential scientists of all time, and a key figure in the scientific revolution.
- Thomas Alva Edison was an American inventor and businessman, who has been described as America’s greatest inventor. He developed many devices that greatly influenced life around the world, including the phonograph, the motion picture camera, and the long-lasting, practical electric light bulb.
Now, lets say we have them in tokenized and in inverted index format.
Also we have each token with a Relation Term, for example:
- Albert Einstein : Physicist, Person, Nobel Winner, ….
- Sir Isaac Newton : Physicist, Person, Mathematician,….
- Thomas Alva Edison : Physicist, Person, Inventor, …..
- Physicist : Einstein…,Issac Newton, Thomas Edison…..
- German Born : Einstein
- American : Thomas Edison
And the token list goes on ………..
Okay we have all the documents sorted in inverted index with there tf-idf & Relation Term marked.
Now let say, there is a query : ‘Some Physicist Name?’
Now, The computer has the 3 documents both marked as Physicist.
A traditional Ranking Algorithm will just check the tf-idf score of each token in the document, like how many times physicist word exist in each document, etc, which is fine.
But our core goal is deliver the most personalized answer to the user, where normal tf-idf checking will fail to deliver the most desired results.
So what do we do?
Lets see what all data we can get from the user:
- IP Address & Country Name (say xxx.xxx.xx.x & country India)
- User Browser Agent (Say — Chrome IOS/Mobile)
- Device Type (Say Model Iphone X)
- If a repeated user then, the cookie id through which we can fetch all his previous searches etc.
Now considering only the first 3 points,
We can say that user’s country is India.
Okay, Now lets check in the database with all the person with a country in present at least.
Einsten from German & Edison from America.
Now I am not going to be very complex but can we say that the nearest country is German from India.
So can we pull Einstein’s article first, followed by Edison’s.
Again note this are just factors, i am explaining for a deep context personalization.
Lets get deep to add a solid point to pull Einstein’s article first.
Einstein meet Rabindranath Tagore (a Indian) or Bose Einstein has formula which he worked with Satyendra Nath Bose (Indian Physicist).
So this point adds a solid proof of putting the article first.
Followed by let say the user asked the query on 14, March.
Einstein was born in 14, March 1879. This adds a solid point to show the article of Einstein first.
See how it goes….
We can add more points like why is searching? is he a teenager ? (may be researching about there formula’s) and how many people are searching the same? or is Einstein a trending topic (came in news recently)
There are lot of factors that can influence the over all ranking problem.
Note that : Even after applying this, there still might be possibility of not delivering the desired content to the user, but this factors can add upto 70% better ranking and delivery.
So, You need to have a big list of factors to add (Try Dbpedia or WikiData, already schema’d wikipedia into action rdf format)
Similairy, Let say the user already asked previously (cosidering the user is a repetated searcher)
- Led bulbs
So this previous searches has one thing in very common:
Led Bulbs — Electric Bulb
AC Electricity — Tesla the Inventor
Tesla — Electric Car, Tesla the Inventor
So, but the most suitable data is the Led Bulb, light bulb invented by Edison (considering the above 3 articles only indexed).
So showing Edison first can also be revant.
So you see there is N number of possibilities for X set of documents for Y set of Factors.
Therefore, Finding the document with maximum possibility is required to suggest better (Again a machine learning Probability).
Hope You like this article.
Will Add more context soon with some equations to solve the problem and raw data to test out the problem in action.