SIGIR 2019: Where did all the Information Retrieval go?

Maria Khvalchik
Semantic Tech Hotspot
5 min readAug 7, 2019

The 42nd ACM SIGIR Conference took place in Paris. Directions are clear: more Machine Learning and Deep Learning

My interest in IR exploded when I took a Search Engines class at my Grad School, and next semester I was a Teaching Assistant for it. At that time, my Professor, a CMU Ph.D. with a thesis on large-scale document collections search, thought very highly of SIGIR. Hence, I had very eager interest to attend this conference.

To my surprise, indexing, retrieval, and performance now take a tiny fraction of the conference talks. The bulk of the papers focus on ranking, recommendations, and applying Machine Learning and Deep Learning techniques to it.

Best paper award…

.. went to Variance Reduction in Gradient Exploration for Online Learning to Rank. Online Learning to Rank (OL2R) approach reduces the variance of the gradient estimation by projecting the selected updating direction into space spanned by the feature vectors from examined documents under the current query. How did you make it in one read? I hate such long sentences; let’s do step by step:

  • OL2R algorithms learn on the fly from “implicit user feedback,” which is inferred from the user behavior such as click feedback in this paper.
  • The key to such algorithms is an unbiased estimate of gradients, which is often achieved by uniformly sampling from the entire parameter space.
  • Unfortunately, this leads to high-variance in gradient estimation, resulting in high regret during model updates, especially when the dimension of the parameter space is enormous.
  • In this work, the authors aim at reducing the variance of gradient estimation in OL2R algorithms. They prove that this method provides an unbiased estimate of the gradient and illustrate the benefits with significant improvements compared to several state-of-the-art models.
Illustration of model update in a three-dimensional space. Dashed lines represent the trajectory of Dueling Bandit Gradient Descent (DBGD) following different update directions.
Opening keynote by Bruce Croft The founder of the Center for Intelligent Information Retrieval and 1995–2002 editor-in-chief

Opening keynote by Bruce Croft

The founder of the Center for Intelligent Information Retrieval and 1995–2002 editor-in-chief of ACM Transactions on Information Systems pointed out future directions.

Selected papers!

Well, selected by me biased towards Reading Comprehension tasks.

1. Duplicate questions

There is a kaggle competition on detecting duplicate questions on Quora. The problem is to identify whether two questions are duplicates or not. For example: “How old are you? ” and “What is your age?” do not have a word in common but have the same intent. Interesting stuff. I was working with duplication issues by solving them with rule-based methods and fuzzy search. Now neural networks took over, and it looks like they work well on Quora, but what about websites such as stackoverflow.com? They will bring loads of new challenges when having a single character difference like “v 5.0” and “v 6.0”. One of the proposed approaches at SIGIR can be found here: Adaptive Multi-Attention Network Incorporating Answer Information for Duplicate Question Detection.

2. How about complex questions?

Another interesting QA paper is Answering Complex Questions by Joining Multi-Document Evidence with Quasi Knowledge Graphs, which looks like a challenging multi-hop generative task. It requires the model to reason, gather, and synthesize disjoint pieces of information within the context to generate an answer. Here is an example of transforming text from several documents into triples:

So, the system builds a noisy quasi KG with node and edge weights, consisting of dynamically retrieved entity names and relational phrases. It augments this graph with types and semantic alignments. Finally, it computes the best answers by a Group Steiner Trees algorithm.

It would be interesting to see the performance of that system on the HotpotQA dataset, which was designed for such tasks.

Demo, code, and data of this work: https://quest-sys.mpi-inf.mpg.de/getanswer.

3. Knowledge Graph Recommendations

Reinforcement Knowledge Graph Reasoning for Explainable Recommendation couples recommendation and interpretability by providing actual paths in a knowledge graph. Specifically, they propose a reinforcement learning approach featuring an innovative soft reward strategy, user-conditional action pruning, and a multi-hop scoring function.

The explainability is an important concept which is typically lost in the world of the prevalence of Deep Learning techniques. The authors managed to build a neural-network-free system that outperforms other methods, including the Deep Learning ones.

It is also notable that the train and test datasets are 100% e-commerce.

Learning a policy that navigates from a user to potential items of interest by interacting with the KG. The trained policy is then adopted for the path reasoning phase to make recommendations to the user.
Real cases of recommendation reasoning paths.

Code and data: https://github.com/orcax/PGPR

4. Google presented mostly on Email search

If interested, have a look at these:

Overall, it looks as Machine Learning is the center of the universe at SIGIR now

  • Striking lack in indexing and retrieval, but plenty works on ranking, e.g., at Question Answering track not a single presentation on addressing indexing
  • At least one ML track among only three in parallel with a primary focus on tweaking Neural Networks
  • E-commerce recommendations topic pops up from every angle

About the venue

Cité des Sciences, the biggest science museum in Europe. They love animals and keep chickens and goats next to the building.

Other very important news

  • There was a Women in IR event with an invited talk “My life as a researcher: what I loved, what I learned and some humor always” by Mounia Lalmas, Director of Research at Spotify.
  • The banquet took place at the main venue in the Science Museum, offering sports, robotics, and microbiota exhibitions.
  • 42.6°C all-time hottest temperature in Paris was not fun.
  • Notre Dame stands still and is in good hands.

Hope you found this post helpful and engaging to a certain degree 😊 Thanks for reading!

--

--

Maria Khvalchik
Semantic Tech Hotspot

Researcher. Reading Comprehension, NLP, ML, and WN (whatnot)