The ICML 2019 Code-at-Submit-Time Experiment
Kamalika Chaudhuri and Ruslan Salakhutdinov
Reproducibility is the cornerstone of scientific endeavor, yet reproducibility of empirical results has been a challenge of late in AI research. A major barrier to reproducibility in AI is unavailability of code. Modern machine learning algorithms are large, complex systems with many moving parts. It is difficult to describe them in enough detail in research publications and inspection of the underlying software is often necessary to figure out what the methods do and to verify empirical claims. The current culture in the AI community is that most researchers do not release code, or release code long after publication, which results in a lack of reproducibility and slows development and sharing of new ideas.
For the 36th International Conference of Machine Learning (ICML 2019), we decided to explore a new measure to incentivize code release — encouragement for voluntary code submission at the time of manuscript submission. Our goal was to promote a culture change by encouraging the community to submit code. We thought this is best done at submission time — as opposed to publication time — as this gives the program committee a chance to inspect and evaluate the code during the review process at their discretion. This is much like the option given to theoretical papers to submit an appendix with full mathematical proofs that can be checked by the program committee if needed.
Thus, for the first time in a major machine learning conference, ICML 2019 implemented the code at submit time measure. Authors were allowed to submit code with their manuscripts and reviewers were encouraged to look at it. Reviewers were told that high quality submissions with credible results should be accepted as usual. If, however, there are doubts about credibility, then code, if submitted, may be inspected for clarification. Much like manuscripts, submitted code was to be anonymized for double blind review. Submitted code need not be runnable — detailed pseudocode counts — and data, particularly sensitive and private data — need not be submitted — only toy data is sufficient. The last two guidelines ensured that the process remained inclusive and not overly burdensome for authors who work with specialized libraries or sensitive data.
How did the experiment work? We are delighted to report that a great deal of code was submitted. By our calculation, about 36% of more than 3000 submitted manuscripts included code with their paper submission. Additionally, 67% of the 774 accepted papers provided code at camera-ready time. Contrast this with NeurIPS 2018, where just below half of the accepted papers had code available with the camera-ready.
Who submitted code? The short answer is authors from all over the world — both academics and industry researchers. 27.4% of papers that included code in their submission had an author from industry, and 90.3% of papers with code in their submission had an author from academia. Contrast this with the total number of submissions — 83.8% of the total number of submissions had an author from academia while 27.4% had an author from industry.
What did the reviewers and area chairs think about papers with code? To find out, we ran separate surveys for reviewers and area chairs at the end of the review period. About 31% of the reviewers who responded to the survey said that they looked at the code in at least one of their assigned papers, and about 59% of them found looking at the code helpful. In contrast, 40% of the Area Chairs who responded to the survey used code submission as a factor in their decision process, and in 73% of the cases, it improved their opinion of the paper.
How did papers with code fare in the review process against those without? About 43% of papers accepted to ICML had code at submission time, in contrast with 36% of all submitted papers. Thus the acceptance rate for papers with code was slightly higher. How to exactly interpret this figure is unclear; it is likely that papers that included code were more polished and ready at submission time, which might have led to a slightly higher acceptance rate.
Is there a case against code submission? One can argue that code submission is burdensome for authors who work with proprietary code or sensitive data or in industrial organizations with internal restrictions on code release. We believe that a well-designed code submission process can overcome these limitations. For example, providing confidentiality during the review process and allowing submission of non-executable codes can alleviate some of these concerns. We do not expect authors to submit sensitive data — toy data is often sufficient to verify claims. In cases where part of the code base is proprietary, detailed pseudocode is enough. Finally, making code-submission optional as we did for ICML 2019 covers contingencies where code cannot be submitted — proprietary systems, and code-bases for example.
In conclusion, we believe that the code-at-submit-time experiment was successful, and we are delighted by the strong community response. There is room to make the process more accessible. One possibility is to give authors an extra week after the paper submission deadline to submit code; this is helpful for those who need extra paperwork for code submission. There should also be a multi-year effort to put together a repository of code submitted after publication, since private github accounts sometimes disappear with time; at this time we did not invest in common archiving of the code submitted with accepted papers. We hope future program chairs will continue and improve on the process, and the community will move towards a culture of timely code release and improved reproducibility.
Finally, code submission is just one element of scientific reproducibility and sound science. We are encouraged to see several other initiatives taking flight, including NeurIPS 2019’s appointment of a Reproducibility Chair, and their use of the ML Reproducibility Checklist as part of their submission process.
Acknowledgements: We are indebted to many people without whose help and support the experiment would not have been possible. We thank Joelle Pineau, Hugo Larochelle, John Langford and Arthur Gretton for their help and advice in setting up the process, and Sham Kakade, Joelle Pineau, John Langford and Sanjoy Dasgupta for feedback. Finally, we thank our workflow chairs Lisa Lee, Devendra Chaplot and Paul Pu Liang, and our many authors, reviewers, area chairs and senior area chairs; they are the ones who really made ICML 2019 happen.