The Case for Open Data and Code in Scientific Research

Ali Ehsan
4 min readMay 27, 2024

--

As a postdoctoral researcher in the field of future electricity systems, I’ve often debated the availability of open data and code in journal articles. These discussions, which occur during conferences, meetings, and online forums, are crucial in our rapidly advancing field where transparency and collaboration are essential.

Proponents of open data argue that it accelerates scientific discovery by allowing others to replicate, validate, and build upon existing work efficiently. This openness fosters a dynamic research environment with shared knowledge. However, some researchers are concerned about potential misuse, loss of competitive advantage, and lack of proper attribution, especially given the academic reliance on novel contributions for career advancement. In this post, I’ll explore the benefits of open data, the concerns of researchers, and the delicate balance between personal gain and the common good of the community.

As the volume of published research continues to grow, so does the complexity and depth of the studies being conducted. This increasing complexity necessitates a more collaborative and transparent approach to scientific inquiry. One of the most compelling arguments for open science is the availability of data, code, and models, which should be made publicly accessible by authors of journal articles. This practice, while beneficial for the broader scientific community, is not without its challenges.

Open data and code have the potential to revolutionize the pace of scientific discovery. When researchers share their datasets, algorithms, and models, they enable others to reuse and extend their work more efficiently. According to Piwowar and Vision, the availability of open data can lead to an increase in citation rates, indicating that shared data contributes to the visibility and impact of the original research. By making data and code publicly available, researchers provide the tools necessary for others to validate findings, explore new hypotheses, and build upon existing work without starting from scratch.

For example, in the field of electricity systems, the Open Power System Data (OPSD) project’s open data policy has been instrumental in accelerating advancements. Researchers worldwide have utilized this data to make significant strides in understanding the impact of policy changes on electricity markets, and the development of smart grid technologies. This collaborative spirit exemplifies how open data can catalyse scientific progress.

The availability of data and code not only facilitates reproducibility but also serves as a valuable educational resource. New researchers and students can learn from the methodologies and techniques employed in previous studies. This transparency helps demystify the research process and provides a concrete foundation upon which new investigations can be built. Open data sets and code repositories like those on GitHub or platforms like the Open Science Framework (OSF) offer a wealth of information that can be used for teaching and training purposes.

Moreover, access to these resources can democratize research by providing opportunities for scientists from underfunded institutions or countries to participate in cutting-edge research. This inclusivity can lead to a more diverse and innovative scientific community.

Despite the clear benefits of open data, many researchers remain hesitant to share their work freely. The fear of intellectual property theft and the desire to maintain a competitive edge can lead to a reluctance to disclose data and code. Researchers often invest significant time and resources into their studies and may feel that keeping their data confidential allows them to maximize the return on their investment by publishing multiple papers or securing patents based on their initial findings.

This protective stance, however, can impede the collective advancement of science. When data and code are kept behind closed doors, it slows down the process of validation and discovery. Other researchers must duplicate efforts to generate similar datasets or develop comparable models, leading to inefficiencies and potential delays in scientific progress.

The tension between personal gain and the common good of the scientific community is a central issue in the debate over open data. On one hand, individual researchers seek recognition, career advancement, and financial rewards for their contributions. On the other hand, the broader community benefits most when research is transparent and accessible.

One potential solution is to develop systems that reward openness. For instance, funding agencies and academic institutions can incentivize data sharing by considering open data practices in grant evaluations and tenure decisions. Initiatives like the FAIR (Findable, Accessible, Interoperable, and Reusable) data principles advocate for the implementation of standards that make data more accessible and useful.

Moreover, journals can play a crucial role by requiring authors to submit data and code as a condition for publication. This approach not only promotes transparency but also enhances the reproducibility of published research. Some journals, like Nature and Science, already have policies in place that encourage or mandate data sharing.

While the arguments for open data are compelling, it is essential to address ethical and practical concerns. Protecting sensitive information, such as personal electricity usage data, is paramount. Researchers must ensure that data is anonymized and that sharing practices comply with ethical standards and regulations like the General Data Protection Regulation (GDPR) in the European Union.

Furthermore, there are practical challenges related to data curation and storage. Making data publicly available requires infrastructure and resources to maintain data repositories. Institutions and funding agencies must support these efforts to ensure that data remains accessible and usable over time.

The movement towards open data and code in scientific research represents a paradigm shift towards greater transparency, collaboration, and efficiency. By making data and code publicly available, researchers can accelerate the pace of discovery, facilitate learning, and contribute to the common good of the scientific community. However, achieving this vision requires addressing the legitimate concerns of researchers regarding intellectual property and personal gain. Through a combination of incentives, policies, and ethical practices, the scientific community can strike a balance that promotes both individual success and collective progress. As researchers navigate this evolving landscape, the principles of openness and transparency will be crucial in fostering a more innovative and inclusive future for scientific research.

--

--