Datasets on arXiv
We’re excited to announce our partnership with arXiv to support links to datasets on arXiv!
Machine learning articles on arXiv now have a Code & Data tab to link to datasets that are used or introduced in a paper:
This makes it much easier to track dataset usage across the community and quickly find other papers using the same dataset. From Papers with Code you can discover other papers using the same dataset, track usage over time, compare models and find similar datasets.
Authors can add datasets to their arXiv papers by going to arxiv.org/user and clicking on the “Link to code & data” Papers with Code icon (see below). From there they will be directed to Papers with Code where they can add their datasets. Once added, these will show on the arXiv article page.
All data on Papers with Code is freely available and is licensed under CC-BY-SA (same as Wikipedia).
Accelerating Progress with Datasets
Our goal at Papers with Code is to accelerate scientific progress by making research easier to discover, reproduce and extend. Datasets are a critical component for progress in machine learning, alongside models and compute.
An indexed map of datasets accelerates progress by bringing transparency to results and usage. These insights shape future dataset development: when more challenging datasets are required to evaluate models, or when existing datasets become saturated in usage.
We are happy we could work with the arXiv team to make this change happen for the machine learning community! This is the second stage of our partnership, following the introduction of code on arXiv last October.
Looking ahead, we’ll be introducing more tools and initiatives for tackling reproducibility and information overload in science. Follow @paperswithcode and @arxiv on Twitter for updates!