Data sharing in Africa face stiffer challenges because of a lack of resources and capacity to derive the most value from the data before sharing. The better sharing platform for them, and to achieve equity in research, could be to push for collaborations as opposed to immediate open access data sharing. The premise would be, as data generators whose reward system is publications, their fundamental value from the data are publications. The academic incentive system is a longstanding problem that will take much longer to change, even in the global North. Therefore, we need to explore a pathway towards open data sharing that still protects the data generators’ needs.

I propose the sharing of metadata — making the world know they have such data — as a pathways for collaboration and sharing. This exposure will increase the partnerships, and ultimately publications for the researchers. Therefore, of all the components of RDM, data management planning, and metadata management should be emphasised. Proper research data management, as a benefit to the researcher: ease research, promote internal sharing, and ultimately, facilitate open data access.

Metadata is data about the data.

Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource. Metadata is often called data about data or information about information” (National Information Standards Organization, 2004).

By promoting and facilitating proper metadata management, advocating for immediate exposure of metadata, creates a clear pathway for data sharing. A path that incentivises data sharing using the existing reward system: publications.

How do we achieve metadata sharing? How can we get the researchers to invest in metadata management? These are fundamental questions, but it starts with an understanding of the metadata standards in different fields and a tool that can guide researchers to decide on the metadata schema to adopt for their work. Then a platform that facilitates metadata sharing and eventually exposes the underlying data, or provides a link for public access.

Is such a system available currently?

Massive data-generating consortia usually put in place an embargo period, when the data is accessible restricts publishing output from reuse until the period elapses.

Others like EGA use controlled access for sensitive data — “personally identifiable genetic and phenotypic data, whose consent agreements are for or specific research use”. Such systems have metadata exposed to the public but restrict the underlying data.

Other users can request access from the Data Access Committee (DAC). The EGA, like other public access databases, are perfect for data sharing (controlled, and publicly) but not for internally managing data within institutions, from samples to sequence data received from the sequencing platform. For internal use, data management systems like iRODS and MISO, which also allow for metadata management, are perfect.

Are such systems like iRODS and MISO easy to set up, configure and manage?

The benefit of an EGA-like system, but for institutional use, is two-fold. To expose metadata of work undertaken in the institution, and ultimately, to preserve and curate institutional memory and research output. Furthermore, such an approach would help enforce acceptable metadata practices by researchers, and hence facilitating data reuse.

Establishing and managing a centralised research data infrastructure is expensive, and most institutions may not afford. However, promoting efficient metadata management is not.

Metadata management is a critical component of research data management framework, and as a pathway towards data sharing and reuse. We should think about it more.



