Two minutes NLP — 20 Learning Resources for Information Retrieval

Articles, tutorials, and popular libraries

Fabio Chiusano
NLPlanet
4 min readApr 22, 2022

--

Photo by Benjamin Dada on Unsplash

Hello fellow NLP enthusiasts! As soon there will be an NLPlanet Discord server for networking between NLP practitioners, I’m working on the first organization of its channels. I’m planning to add learning resources for many NLP areas, therefore this article is a step towards preparing such content. If you’re interested in the Discord server, follow NLPlanet on Medium, LinkedIn or Twitter to stay updated on its release. Enjoy! 😄

Here follows the first draft, curated by me, of the Information Retrieval learning resources of NLPlanet. Being a draft, this list will be improved using the feedback of the community.

This article is part 5 of a series of articles about learning resources:

  1. Awesome NLP — 18 High-Quality Resources for studying NLP
  2. Two minutes NLP — 21 Learning Resources for Text Classification
  3. Two minutes NLP — 20 Learning Resources for Word Embeddings
  4. Two minutes NLP — 20 Learning Resources for Transformers

What is Information Retrieval

Information Retrieval (IR) is the process that responds to a user query by examining a collection of documents and returning an ordered document list, where each document should be relevant to the user query. It’s the activity of obtaining information resources relevant to an information need.

A popular type of Information Retrieval is Semantic Search. Semantic Search is a data searching technique in which a search query aims to not only find keywords but to determine the intent and contextual meaning of the words a person is using for search.

Information Retrieval applications and use cases

  • Search engines, searching for text documents, images, videos, and so on.
  • Question answering over a set of documents (e.g. with a chatbot or a smart speaker).
  • Recommender systems.
  • Summarization of a set of documents.

Articles and tutorials

Popular libraries

  • Elasticsearch: Elasticsearch is a distributed, free and open search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured. Elasticsearch is built on Apache Lucene.
  • Jina: Jina is a neural search framework that empowers anyone to build SOTA and scalable neural search applications.
  • Milvus: Milvus is an open-source vector database built to power embedding similarity search and AI applications.
  • Haystack: Haystack is an end-to-end framework that enables you to build powerful and production-ready pipelines for different search use cases. Whether you want to perform Question Answering or semantic document search, you can use the state-of-the-art NLP models in Haystack to provide unique search experiences and allow your users to query in natural language.
  • Faiss: Faiss is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM.
  • Weaviate: Weaviate is a vector search engine and vector database. Weaviate uses machine learning to vectorize and store data and find answers to natural language queries.
  • Vector Hub: Vector Hub is a library for publication, discovery, and consumption of state-of-the-art models to turn data into vectors, such as Text2Vec, Image2Vec, Video2Vec, Face2Vec, Bert2Vec, Inception2Vec, Code2Vec, LegalBert2Vec, etc.

Conclusion

If you know any other good resources for learning about Information Retrieval in particular, please let me know so that I can share them with the community.

Other NLP areas that will need a learning resources area of their own are chatbots, language models, question answering, and speech.

Thank you for reading! If you are interested in learning more about NLP, remember to follow NLPlanet on Medium, LinkedIn, and Twitter!

--

--

Fabio Chiusano
NLPlanet

Freelance data scientist — Top Medium writer in Artificial Intelligence