Datasets on arXiv

Robert Stojnic
PapersWithCode
Published in
2 min readMay 13, 2021

We’re excited to announce our partnership with arXiv to support links to datasets on arXiv!

Machine learning articles on arXiv now have a Code & Data tab to link to datasets that are used or introduced in a paper:

Example for the paper: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

This makes it much easier to track dataset usage across the community and quickly find other papers using the same dataset. From Papers with Code you can discover other papers using the same dataset, track usage over time, compare models and find similar datasets.

Example dataset page on Papers with Code: tracking dataset usage and benchmarked models for a dataset

Authors can add datasets to their arXiv papers by going to arxiv.org/user and clicking on the “Link to code & data” Papers with Code icon (see below). From there they will be directed to Papers with Code where they can add their datasets. Once added, these will show on the arXiv article page.

All data on Papers with Code is freely available and is licensed under CC-BY-SA (same as Wikipedia).

Accelerating Progress with Datasets

Our goal at Papers with Code is to accelerate scientific progress by making research easier to discover, reproduce and extend. Datasets are a critical component for progress in machine learning, alongside models and compute.

An indexed map of datasets accelerates progress by bringing transparency to results and usage. These insights shape future dataset development: when more challenging datasets are required to evaluate models, or when existing datasets become saturated in usage.

We are happy we could work with the arXiv team to make this change happen for the machine learning community! This is the second stage of our partnership, following the introduction of code on arXiv last October.

Looking ahead, we’ll be introducing more tools and initiatives for tackling reproducibility and information overload in science. Follow @paperswithcode and @arxiv on Twitter for updates!

--

--