ML Resources — July 20

Suhas Pai
Aggregate Intellect
2 min readJul 20, 2021

SIGIR, the premier Information Retrieval conference was held last week. Ivan Vendrov and Chris Painter have an interesting substack post where they summarize papers at SIGIR that are related to recommender systems alignment and ethics.

One of the more interesting tutorials at this year’s ICML is about Random Matrix Theory. The tutorial page contains slides and links to several resources including books.

Lee et al. show that removing duplicates from language model pre-training data is helpful. They show that deduplication reduces the likelihood of generating memorized text, reduces the number of training steps required, and reduces train-test overlap.

The papers submitted to the AKBC (Automated Knowledge Base Construction) conference is available here. Interesting papers include contextualized passage embeddings for long document representation and knowledge extraction from long fictional texts.

Aghajanyan et al. introduce HTML, a language model trained on HTML text with a BART-style denoising objective.

Aggregate Intellect

Aggregate Intellect

Aggregate Intellect is a Global Marketplace where ML Developers Connect, Collaborate, and Build. Connect with peers & experts at https://ai.science or Join our Slack Community.

  • Check out the user generated Recipes that provide step by step, and bite sized guides on how to do various tasks
  • Join our ML Product Challenges to build AI-based products for a chance to win cash prizes
  • Connect with peers & experts through the ML Discussion Groups or Expert Office Hours

--

--