Open Science — can we be too open?

Open Knowledge in higher education is a noble objective. It is widely viewed as a catalyst for better and faster collaboration and knowledge exchange. As an academic scientist I am excited by the potential of Open Science (open knowledge as it relates to science). The possible benefits for our global society are indeed considerable. However, the 2012 Royal Society’s report, Science as an Open Enterprise acknowledged that, “Open access to scientific information is not in practice an unqualified good”. Here I review the major benefits and discuss some of the potential pitfalls of Open Science with a particular focus on the practice of Open Notebook Science.

Open Science is essentially open access for all to scientific data, publications, research, debate, software resources and participation. This is achieved largely through internet-based technologies using databases, open access journals, web sites, video repositories and blogs. My own research and teaching practice benefits from Open Science. I frequently use public databases and freely available teaching resources. Some of the teaching resources I use are mind-blowing, 3D molecular animations of DNA being decoded, or the structure of cells are amazing. These resources are generated by specialist molecular animators and I am able to freely use these clips to bring my own lectures to life. In my research I make regular use of publicly available data. cBioPortal for Cancer Genomics is one such database which provides access to data from large scale genomics projects. I have used this data extensively to gain insight into the role of genes in breast cancer. Without the open access this research would not have been possible. My research is also widely disseminated through open access publishing. One paper has been accessed over 91,000 times. It was published under the terms of the Creative Commons Attribution License, which permits unrestricted access and use of the work on the one condition that the source is cited. It’s hard to imagine 91,000 people reading my research if they had to pay to do so under the terms of traditional author copyright, which would restrict the use and dissemination of the work. So as a practicing teacher and research scientist I am convinced that Open Science is the way forward. There is no question that individual scientists benefit from Open Science and that teaching and research in higher education are better because of it. There are also wider community and society benefits. Open Science improves efficiency by reducing duplication of data and enables the maximum use of data. The speed of discovery also increases when data and research findings are openly accessible. Open Science also enables large collaborative projects — data from all over the world on all manner of things can be brought together and shared. A notable example is the nextstrain project, which was awarded the 2017 Open Science Prize of US$230,000. Nextstrain is a website that provides continually updated analyses of publicly available virus genome data from around the world, resulting in improved responses to outbreaks of deadly diseases.

Open Science also stimulates increased public understanding and engagement. This might be regarded as a minor benefit, but this is not so; public interest in science is extremely high. For example usage data from PubMedCentral (an online public data catalogue of scientific publications) shows that 40% of users are private individuals, 25% from universities and 17% from companies (UNESCO 2012). This is an impressive level of public engagement with the scientific literature and it extends beyond this to direct participation through citizen science projects. Numerous organisations now engage with citizens to collect data in ways and on a scale not possible before the advent of widespread internet access. Recording the local weather, monitoring the quality of local river water or reporting local flora and fauna are fun and highly effective ways of taking part in large research projects.

So openness allows science to become more efficient, cheaper, faster and more effective, which benefits society as it enhances the development of new medicines, technologies, services and working practices. It is not surprising then that national and international Open Science policies are being developed. Member states of The Organisation for Economic Co-operation and Development (OECD) are implementing legal frameworks and policies to encourage Open Science (OECD, 2015). In the UK the Royal Society’s Science Policy Centre produced a significant report entitled, Science as an Open Enterprise, which made the following recommendations:

· Scientists need to be more open among themselves and with the public and media

· Greater recognition needs to be given to the value of data gathering, analysis and communication

· Common standards for sharing information are required to make it widely usable

· Publishing data in a reusable form to support findings must be mandatory

· More experts in managing and supporting the use of digital data are required

· New software tools need to be developed to analyse the growing amount of data being gathered

Progress is being made on these recommendations through government and charitable funding bodies, which insist on open access publication, the deposition of data in open databases and that scientists develop strategies to engage the public and make data available to all potential users. Funding is also provided to encourage methodologies in digital data management and to develop software that can be universally accessed. These developments are to be embraced and Open Science will no doubt continue to evolve. However, as Carly P points out in her OKHE1 post openness inevitably brings with it challenges. It is therefore important to consider the potential downsides of pushing too far. In this regard the Royal Society report recognised potential pitfalls and suggests that open science should by necessity be “qualified” openness, A commitment to open science does not imply openness to everything, to anyone or for any purpose. Open science should be bounded by considerations of quality, legitimate commercial interests, privacy and security. These issues are expanded upon in the report and I wont discuss them in detail here. Suffice to say that there are legitimate concerns, and boundaries need to be carefully considered when deciding to share data freely. In the remainder of this discussion I will highlight one emerging element of Open Science that could breach several of these boundaries, and that is the dissemination of data through Open Notebook Science (ONS). ONS is the practice of maintaining a lab book online in real time and allowing open access to it. In its simplest form the open notebook can be a blog or website containing raw experimental data.

So what exactly is a lab notebook and why is it so important? The lab notebook is central to an experimental scientist’s activities. It is the most important primary record of experiments. It contains the details of all experiments, successes and failures. Almost every aspect of a project is recorded in it, including hypotheses, detailed experimental design, experimental protocols, results of experiments and interpretation of data. In my own field of molecular and cell biology, experiments are technically difficult to perform and numerous pilot experiments and empirical testing of different reagents and conditions are required. Many results are ambiguous and difficult to interpret. Hypotheses are more often disproved than proved and new findings are often totally unexpected. In some cases the data only become meaningful in the future when new findings provide a new context for interpretation. Traditionally, lab books are handwritten paper books that are completed at the bench or desk. Alternatively, records can be kept in an electronic lab notebook (ELN) and a growing number of researchers are migrating their record keeping to electronic formats. Indeed, there has been an explosion in the number of ELN platforms available. They add new functionality to lab notes. They can be archived on cloud servers and are accessible on multiple devices, thus preventing loss of valuable data. They can be time–stamped to evidence when a discovery was made and data from multiple sources can be uploaded. But perhaps the greatest benefit is the ability to share the content with others involved in a project. Scientists can easily share their ELN with supervisors and collaborators, whether they are in the same lab or on the other side of the world. One popular example of an ELN platform is the open source software sciNote. In sciNote digital data can be directly uploaded, notes easily added, reports generated and it has the all-important ability to share access with others. The careful selective sharing of ELNs with colleagues and stakeholders is a major benefit to collaborations; partial sharing of data is also possible to wider audiences. However, the move to ELNs has been slow. This is probably a result of several factors; there is always inertia and resistance to change, paper notebooks are practical and easy to use but most importantly scientists have concerns over the security of their data. It is therefore somewhat surprising that some scientists choose to share their ELNs publicly. Proponents of ONS point to the potential benefits; it is transparent and all data are available for scrutiny. Ordinarily, negative results rarely see the light of day and others could save time and money by not repeating such experiments if negative data are made public. To others negative data might be re-purposed to provide a useful interpretation. However, when the downsides are considered, the benefits hardly seem to outweigh the potential losses. There are of course good reasons to freely share unfiltered data and I’m sure that practitioners of ONS have laudable intentions. However, the list of potential problems is significant. ONS could potentially swamp the internet with poor quality data, ruin careers through being “scooped”, inhibit commercial investment by preventing patents, sour collaborations by releasing jointly owned data or even put lives at risk by inadvertently providing data that could be used to harm others.

Hidden dangers of Open Notebook Science

In my OKHE1 post I discussed the importance of academic prestige in the development of a scientific career. This is primarily obtained through publication in high impact journals so it’s not surprising that scientists guard their data from competitors. It would be rather naive to assume that this would not hinder career progression. For commercial purposes the need to protect data arises from the legal requirement to establish ownership of intellectual property (IP). Indeed, IP protection is now expected by many academic institutions and funders — I recently waited 3 months while legal discussions occurred over the wording of a materials transfer agreement to obtain a relatively minor chemical reagent. To apply for a patent in the UK the invention must never have been made public and only confidential disclosure can have been made. This means that if potentially commercial findings have been promulgated throughout the internet, as with ONS, then it is not possible to apply for a patent. Interestingly, for most academic scientists this concern is probably unfounded as IP actually provides a relatively low financial return for universities. The Royal Society report found that from 2003/04 to 2009/10, only 2.5% of a 35% rise in UK university income was attributable to IP, suggesting that strict control over IP in universities is not necessary.

So what’s the future for ONS? ONS is a strategy dependent on the use of ELNs and new internet technologies. It can therefore be considered to follow the Gartner Hype Cycle. The Gartner Hype Cycle is a visual depiction of the trend followed by new technologies and the strategies that they enable, from their initial introduction to their stable acceptance. This is obviously not an exact science and can only be accurate retrospectively, but we should be aware of the existence of the peak of inflated expectations and the trough of disillusionment that follows. In the case of ONS the explosion in notebook platforms and possibly the lack of awareness of the importance of data-confidentiality at the early stages of a project, is likely to be driving inflated expectations of the benefits of ONS. The trough of disillusionment will come when ONS practitioners are scooped to publication, lose out on patents, sour a collaborative relationship or receive harsh feedback on their work.

The Gartner Hype Cycle

How do we avoid the trough of disillusionment and get to the plateau of productivity? Clearly a blanket ban on releasing data in ELNs is not desirable as sharing data with various communities aids collaboration and progress, and is indeed one of the major benefits of ELNs. Scientists are best placed to decide the relative merits of releasing lab book data but perhaps not all are fully aware of the benefits and drawbacks. As an advocate for ELNs I would prefer to see universities introducing more education for scientists on ONS and develop coherent policies covering all aspects of data release. ONS is potentially the most likely Open Science strategy to have negative outcomes and there is an urgent need to develop institutional policies and guidance around the release of data by students, postdocs, technicians and academic staff. Government funding bodies and universities do provide scientists with guidance on issues relating to compliance with regulatory policies on data management and dissemination but this tends to apply to processed data and its managed release. There remains little coherent guidance around potential problems that can arise with intellectual property, personal privacy, confidentiality and career development. ELNs and ONS should be part of undergraduate and postgraduate education so that future scientists are conversant with the issues. Perhaps Open Science and particularly ONS workshops could be part of staff development training. Specifically with regard to concerns around IP, universities should clearly and frequently communicate policies on data sharing that might impact on patent applications. The Royal Society also recommended that a more discriminating approach might be needed to identify and support technologies that might have significant financial benefit. In this context scientists will be better able to make judgments about when and what data to make freely available. Finally, the issue of being “scooped”; this is likely to be the most difficult challenge to ONS. Scientists thrive on recognition and for many the most significant reward is being credited with a discovery. Other benefits include grant awards and career progression. It is only possible to be credited with a discovery when data are coherently presented in the form of a published peer reviewed paper. Releasing data in an ELN before this stage gives competitors an advantage. Universities and funding bodies are adjusting how they recognize and reward open science activities however its very difficult to see how being scooped can be avoided if data is released prematurely. In conclusion then the plateau of productivity for ONS will be reached when scientists develop greater confidence and understanding of the issues around data security. Given the benefits of Open Science it is important for universities to give guidance and develop policies to allow scientists to be as open as possible without inflicting harm on themselves or the universities.