A large-scale national data approach is key to unlocking the power of genomics in Canada

Naveed Aziz
4 min readApr 22, 2022

Genomic data is increasingly supporting new research across many sectors. To continue to maintain its scientific excellence in the new era of large-scale genomics, Canada needs a plan to translate the power of “big” genomic data into positive impacts for Canadians and the Canadian economy.

Genomic data is the information coded in DNA — the chemical sequence of As, Cs, Ts, and Gs that makes up all life on earth. The last decade has seen the creation of data resources that merge and “bank” DNA sequences and individual-level data that represent a population (of any species — human, animal, plant, etc.) and yield the power to enable new discoveries through data-driven research. When it comes to genomic data the refrain “Data is the new oil” is fitting, but with one key difference — while oil is a diminishing resource, data has the ability to multiply and expand with use.

Data-driven genomics research promises long-term benefits for Canadians and our health system: enhanced disease prevention, better predictions of future illness, and more accurate and personalized treatment options. While there is little argument about these impacts, the frameworks surrounding the use of big data are complex. The ability to create large enough data resources hinges on the capacity to generate representative genomic data at scale, as well as strategies to harmonize, collate and share this data, along with demographic, lifestyle and clinical information, across geographic and institutional boundaries. Addressing these challenges, and capitalizing on the strong partnerships that exist across the health research landscape, enables the creation of a dynamic and thriving genomic data ecosystem that will lead us into a new age of genomic-based discovery.

Big Data plays a key role in supporting genomics enabled reaseach.

Federal and provincial governments, research institutions and other stakeholders are supporting this vision through investments into national infrastructures. For example, since its inception in 2017 as a Major Science Initiative, CGEn — Canada’s national platform for genome sequencing and analysis, has received support from the Canada Foundation for Innovation and other partners to enable and provide capacity for large-scale genomic sequencing data generation. Canada also boasts world-class genomics expertise — Canadian scientists play leadership roles in a number of successful disease-focused genomic research projects including The Cancer Genome Atlas (TCGA), International Cancer Genome Consortium (ICGC), Terry Fox Marathon of Hope Cancer Centres Network and MSSNG Autism Project. However, historically, Canada has lacked a national framework to access participant samples, manage data storage and stewardship, and navigate the ethics surrounding the sharing of sensitive human genomic data. In addition, Canada has yet to develop a plan for a truly national genomic resource (akin to the UK’s 100,000 Genomes Project/UK Biobank sequencing or the US All of Us Research Program) that fully represents Canada’s rich diversity, and will ensure the delivery of science-based health care decision-making in the long-term.

While the COVID-19 pandemic has been a devastating experience, it has also provided important opportunities. For one, the power and potential of genomics research surrounding COVID-19 is now evident to the general public and policymakers alike — we have a chance now to build momentum and support for genomics research in a broader sense. Secondly, the need to quickly understand the individual responses to COVID-19 infection in terms of disease severity spurred a collaborative effort from institutions, funders, diverse clinical studies and participants from across the country, enabling CGEn and its partners to deliver the COVID-19 Host Genome Sequencing project, ‘HostSeq’. The resulting HostSeq databank will be Canada’s largest national genomic databank to date, containing genome sequences from 10,000 individuals affected by COVID-19, linked with medical and clinical data that is easily accessible to investigators.

And while we were sure not to cut corners, we sure did cut some red tape — the HostSeq project ramped-up quickly, broke down previously insurmountable obstacles for interprovincial data governance, and found effective solutions to support Canadian science with Canadian data. As a result, numerous research projects are already benefiting from controlled access to the HostSeq databank. In addition, all HostSeq participants consented to broad sharing and use of their data, meaning the databank can continue to fuel other health research, beyond COVID-19. In short, HostSeq has created a national infrastructure as well as a resource that will support genomics research for years to come.

CGEn’s working philosophy is to operate in a ready state that can support projects at scale with fast turnaround times. The delivery of the HostSeq project has proven that large-scale genomics is a present reality in Canada. The time is ripe for all partners and stakeholders to come together and build on the years of investment and effort towards a Canadian ecosystem where research is enabled by — and researchers have the ability to use — large genomic datasets as a resource. The HostSeq blueprint, along with Canada’s many strengths and the realization of the federal government’s Pan-Canadian Genomics Strategy could help us capitalize on large-scale opportunities, such as a Canadian population health genomic databank, that will continually feed quality research programs as well as foster the development of talent and tools that are needed to succeed in the knowledge economy.



Naveed Aziz

Passionate about Canadian research and bioinnovation ecosystem playing a key role in the advancement of technological developments within the field of genomics.