Finding Diagnosis in a Sea of Transcripts: The Case of a Hypertrophic Cardiomyopathy Patient

Liz T
PacBio
Published in
7 min readSep 24, 2019
Echocardiogram of a healthy heart. In fact, this heart belongs to the first author Alex Dainis, the subject of this interview!

Most of the time, genetic diagnosis is about the genome, not the transcriptome. Since many of the genetic causes for diseases are already known, sequencing the DNA alone is sufficient. But even for novel mutations, DNA sequencing usually makes more sense as mutations are often in non-coding regions and their effects on transcription can be the complete absence of gene expression (ex: Fragile X). Then there are practical reasons — the diseased tissue may not always be available.

Long-read sequencing is showing promise for clinical applications, particularly for identifying long structural variations that are not possible to detect with short reads (see Mantere et al. and Eichler et al.). Here, RNA sequencing has played a confirmatory role. Merker et al. presented a case of a patient with Carney complex where a novel, heterozygous 2.1 kb deletion was found in the PRKAR1A gene. RNA sequencing of the patient against control samples confirmed reduced PRKAR1A gene expression as well as potential novel isoforms that skipped exon 2. I say potential, because while the 2.1 kb deletion was detected using long reads that spanned the whole structural variation, splicing analysis was carried out using short reads that could only span exon-exon junctions.

Figure S1 from Merker et al. Long read WGS using PacBio identified a novel 2.1 kb deletion overlapping the first coding exon (exon 2) of PRKAR1A in a patient with Carney complex. Short read RNA-seq showed reduced gene expression as well as novel splicing that skipped exon 2 (red arc with 5 read count, compared to no read count, blue arc with 0 count, in controls)

In another case, Aneichyk et al. reported a 2.6 kb SINE-VNTR-Alu retrotransposon insertion in intron 32 of the TAF1 gene for a disease called X-linked Dystonia Parkinsonism (XDP) that is endemic to the island of Panay, Philippines. The novel structural variation was identified using a variety of technologies including linked reads, jumping libraries, and long reads. Novel isoforms were identified using probe-based targeted capture of the TAF1 gene and sequenced using the Iso-Seq method (PacBio RNA-seq). The kicker, however, came when short read data showed intron retention (IR) leading up to the SVA, and when the SVA is excised, the IR goes away. The XDP study was a beautiful demonstration of the interplay between retrotransposition, defective splicing, and disease.

Figure 6A from Aneichyk et al. showing a novel 2.6 kb SINE-VNTR-Alu (SVA) retrotransposon insertion in intron 32 of TAF1 gene that leads to intron retention for patients with X-linked Dystonia Parkinsonism (XDP). Control samples (blue) and a CRISPR/Cas9 removal of the SVA variation (purple) eliminated the intron retention.

One may be tempted to say that more and better data just always leads to the right answers, but that’s not necessarily true. Genetic diagnosis, like many scientific studies, is a fishing expedition. Previous work on XDP had identified at least seven variants with no clear answers as to which variants mattered. To solve the XDP mystery, the authors threw the technology kitchen sink at it, and was lucky to have found the link between genetic variation and defective transcription. Then there’s my experience in dabbling with the FMR1 gene in premutation carriers for Fragile X and the SNCA gene for Parkinson’s Disease. In both cases, we observed a lot of novel isoforms; but we don’t know what they mean.

It is, perhaps, a comforting change, when Alex Dainis’ work “Targeted Long-Read RNA Sequencing Demonstrates Transcriptional Diversity Driven by Splice-Site Variation in MYBPC3” was published in Circulation: Genomic and Precision Medicine this year. I had helped Alex analyze the data a few years ago but didn’t know the outcome of the project. It was only after interviewing Alex for the paper that I realized how much of genomics is just a blind search in the dark. Sometimes, the data is right there; we just weren’t looking at the right gene.

In 2016, Alex was a PhD student in Euan Ashley’s lab at Stanford. The lab was interested in hypertrophic cardiomyopathy (HCM) and wanted to see if there were differentially spliced isoforms in HCM vs control samples. They knew that MYH7 and MYBPC3 were clinically implicated and did amplicon-based sequencing using PacBio (Iso-Seq method).

I remember helping Alex prepare for her poster and a subsequent PacBio User Group Meeting talk. Here’s where we went looking for the wrong fish — back then, our focus was entirely on the MYH7 gene. This can be seen in her 2016 lab poster and UGM talk. We showed that we could faithfully phase through the entire MYH7 gene using long reads and call full-length isoforms. But much like the FMR1 and SNCA story, we didn’t find any isoform that stood out between the disease and control samples.

It was only later, when Alex went back to look at the samples one by one, instead of by disease group, that she noticed one of the novel MYBPC3 isoforms (AS1.1) existed in only one of the HCM samples. The sample came from a female HCM patient with no family genetic information. Prior DNA sequencing had found a potential splice-site variant in this patient, but with no additional family information and only one other mention of the mutation in the literature, it was unclear if this single mutation was causative for disease in this patient.

Figure 1D from Dainis et al. showing isoforms found in a 21y female HCM patient with a single base mutation in MYBPC3 (c.1898–1G>A). Probed-based targeted sequencing of the MYBPC3 gene confirmed a novel isoform (AS 1.1) that skips 20, as well as other isoforms (AS 1.2 through AS 1.11) that exhibit ORF-disrupting splice patterns.

Alex followed up with more sequencing of the MYBPC3 gene, this time using a probe-based enrichment approach. Sequencing was done on the female HCM patient for both gDNA and cDNA, and with both long read technologies (PacBio and ONT). The alternative isoform AS 1.1 turned out to be unique to the female HCM patient (see Figure 1A in paper) — something that would have been missed if the analysis remained focused on cohorts instead of individuals.

By combining targeted gDNA and cDNA data, Alex was able to assign the novel isoforms (AS 1.1 through AS 1.11) to the wild type or disease-associated allele. Amazingly, this showed that all that aberrant splicing in the female HCM patient was only on the disease-associated allele! This also indicated that a single splice-site variant was creating multiple novel isoforms in this patient.

Figure 1E from Dainis et al. showing aberrant splicing (AS 1.1 through AS 1.11) associated with the disease-associated allele, whereas the canonical MYBPC3 isoform is expressed dominantly on the wile type allele.

When I got to this last figure (Fig 1E), I paused and exclaimed to Alex: “…So this is it?!” The data…explained it? That the mutation(s) in the MYBPC3 gene for the female patient was causing havoc for the aberrant splicing and only on the diseased allele?

Since the patient had no family data and there was almost no previous literature on this mutation [only seen once in a large HCM cohort sequencing paper], this is the only genetic diagnosis for her. At the time, the patient was 21y with severe HCM. If she decides to have children, she has a 50% chance of passing this on.

Alex shared my glee, “Even though this was a small side project in my PhD, it felt huge, because I felt I had helped someone.”

I had only played a tiny part in the MYBPC3 study, but it also felt huge to me. I spend most of my days staring at the UCSC genome browser with little thought beyond nucleotide sequences and exonic structures. Every once in a while, it feels nice to connect lifeless code to a living human being.

I discussed with Alex how practical the approach she took would be to future projects. As I wrote earlier, RNA samples are difficult to acquire. The HCM patient had a myectomy so a heart sample was available — many other patients would not be able to provide that. Alex thinks it would be revealing to see if the study could be replicated with blood samples. She also agrees that a probe-based enrichment strategy is better than amplicon (PCR)-based. Not only does the former detect alternative start/end sites, there’s also less potential for PCR cross-over since less amplification is required.

I also learned that the journal requested Alex cut the paper down to 800 words. This meant all the methods and additional details were removed from the final publication and can only be found in the biorxiv preprint. As a developer, I am disappointed by the journal’s decision. Well, I guess there’s always the preprint.

Since graduating, Alex has moved on — to become a scientific documentary filmmaker! Of all things I’d expect out of a Genetics PhD… this was not one of them ☺. And Alex is clearly rocking it! She’s got an amazing YouTube channel that includes short bite-sized videos explaining DNA and RNA, GFP, and how the California wildfires stimulated certain wildflowers known as “fire followers”. I am excited to see where Alex’s new journey will take her. Reach for the stars, my friend!

You can find Alex’s work at www.helicasemedia.com. She’s also on Twitter (@AlexDainis) and YouTube.

Follow @AlexDainis on Twitter and YouTube!

--

--

Liz T
PacBio
Writer for

All things RNA. Bioinformatics. Opinions are my own.