‘Reprogram’ — Modelling iPSC Technology in Human T-cells
“Embryonic stem cells … are in effect, a human self-repair kit.” — Christopher Reeve
‘Reprogram’ was the culmination of my work in the field of stem cells, and the nascent technology the biology world refers to as iPSCs — induced pluripotent stem cells.
The project was rooted in stem cells, specifically their unspecialised nature and their ability to transform into any other specialised cell. These specialised cells are what make up our body, from the blood that transports critical elements, to the muscles and skeleton that help us move, to the nervous system that is infamous for setting us apart from animals.
iPSC technology looks to tackle the problem of shortage of stem cells supply. Prior to this discovery, the primary source for getting stem cells was human embryos (hence the term ‘embryonic stem cells’). However, there were multiple problems, from a combination of high cost and ethical concerns to the challenges in feasibility of storing those many embryos. This pushed scientists to find new ways of getting a ready source of stem cells at the magnitude required for its use in therapeutics and the medical world.
One such scientist, Shinya Yamanaka, did just this, as in 2012, he won the Nobel Prize in Medicine. Why? Because of his discovery of four reprogramming factors (OSKM) that helped transform already-specialised, adult cells in our body back into stem cell-like states. This led to a rapidly growing field, which more than a decade later, leaves many questions unanswered for both the scientific community, and humanity at large.
The aim of ‘Reprogram’ was to chart out, comprehensively, the components and processes involved in the reprogramming of human T cells into embryonic stem cells.
I took human T cells considering their criticality in our immune system, meaning their reprogrammed state would have revolutionary implications in the creation of treatments for deadly diseases. My project began with a study of the process taken by existing research studies as they conducted experiments for the reprogramming of animal cells.
The end goal for me, however, considering limitations in equipment and materials, was to create online frameworks that addressed the multi-layered, multi-step approach for iPSC reprogramming. The final product being multiple visualisations taking different views of the process, from a genetic circuit looking at the overall process to individual factors showcased in their genetic and structural forms.
With clarity on my objectives, I was able to create a framework for my project by breaking it down into the following constituent parts.
- Selecting a reprogramming method
- Modelling exogenous and endogenous genetic codes
- Designing a genetic circuit for reprogramming
- Visualising the structural forms of proteins
I reviewed research studies and papers evaluating different reprogramming methods, leading to my choice of mRNA for delivering the OSKM factors described earlier.
Using a matrix form, existing reprogramming methods can be divided based on two functions : their resulting integration and their viral nature. This led to four categories, each with methods within them : integrative viral, non-integrative viral, integrative non-viral, non-integrative non-viral. Here, integration referred to the incorporation of the ‘delivery genes’ into the given cell’s genetic material, and viral and non-viral represented the two possible natures each method could be classified on.
I reviewed key methods across these four broader categories, taking into consideration key factors like reprogramming efficiency, time taken, availability and accessibility, and integration into the genetic material as potentially causing harm in certain cases to the cell. While each method had its merits and demerits, keeping the larger goal of my project in mind, I decided to work with mRNA as the reprogramming method.
Considering the chosen method, the delivery of factors would require genetic codes, for which I set up a Benchling workspace that stored genetic data from public datasets.
My aim was to obtain genetic codes from reliable sources, for which I found NCBI (National Center for Biotechnology Information). Post-obtaining the genetic codes, I used information on the sequences from the UCSC Genome Browser to annotate the Benchling files, indicating key regions. I also added additional elements to the codes, such as 5’ and 3’ UTRs and poly-A-tails, improving stability and effectiveness of the factor codes in reprogramming the target cell. Along with the four factors (Oct4, Sox2, Klf4 and c-Myc), I took two other genes (Lin28 and Nanog). These additional genes had been proved to enhance reprogramming efficiency, and in doing so, promote pluripotency (or the ability for stem cells to become -differentiate- any other specialised cell). Together, these six factors became the exogenous DNA sequences coming from outside the given cell (human T cell).
Apart from external reprogramming factors, changes in the epigenetics of the host cell are critical for prolonged states of pluripotency and the complete transformation to a stem cell state.
I researched key genes that promoted either pluripotency or specialisation in human T cells. By identifying these, I was able to find key segments in the T cell’s genetic code that either had to be activated (in the case of supporting pluripotency) or silenced (in the case of supporting specialisation). This moderation of the cell’s epigenetic state becomes critical for the full transformation of the specialised cell into a stem cell, ensuring the reprogramming process is successful in producing iPSC colonies.
Post-identification of the activation and silencing genes, I replicated the process done with exogenous sequences, using Benchling and publicly available datasets to find and annotate sequences. These endogenous codes present in the given human cell are critical for ensuring prolonged transition towards pluripotency post-expiration of the effects of the exogenous factors. Finally, with the sequence codes ready, I crafted guide RNAs for each target, making decisions based on the on- and off-target scores that dictated the guide’s effectiveness.
With the genetic sequences ready, I visualised the reprogramming using a genetic circuit diagram that clearly defined input and output layers in the process.
I discovered SBOL (Synthetic Biology Open Language), a data standard and language that helps standardise designs within the domain of SynBio, enhancing collaboration and communication. I used SBOL Canvas, a web-based tool that helped me curate the input factors (the initial four -OSKM- along with the addition of enhancers -Lin28 and Nanog-). Each factor was dividing into its constituent parts, including promoter regions, 5’ and 3’ UTRs, the main coding sequence, the terminator, and the ending poly-A-tail.
It also became critical to have indicators for the outcome of the process (whether the reprogramming was successful or not). I chose two reporters : GFP (Green Fluorescent Protein) and mCherry, a RFP (Red Fluorescent Protein). Both reporters would provide visual measures (colours) if the cell had successfully transformed into a stem cell. Gene circuits use logic gates to indicate the behaviour of the given outputs -reporters- considering all possible scenarios. I created the logic sequence wherein the GFP would respond positively (turn fluorescent green) only if all four OSKM factors were met (the AND logic function). mCherry would respond positively (turn fluorescent red) only if there was a positive response from the GFP (all OSKM factors were successful) and Lin28 and Nanog were also activated (the second AND logic function).
Finally, I took protein visualisations and annotated key features in them to visualise their structures.
This helped me create more engaging, potent imaging for viewers and also aided my understanding of proteins as the results of genetic codes I previously worked with. I first took PDB (Protein Data Bank) files from the AlphaFold Protein Structure Database. Using the ChimeraX application, I then annotated and labelled structures based on key features presented by the UniProt gene collection dataset.
I was able to create models for the six factors (OSKM and the two additional enhancing factors), along with the reporters (RFP and GFP).
As a newcomer in the field of Synthetic Biology, working with industry-level softwares in technology at the inflection point for the medical world, this project was nothing short of challenges and roadblocks.
From the onset, my ability to understand and quickly gain a basic comprehension for a topic was critical as I navigated these uncharted waters. While relevant areas required diving deeper, a lot of the initial work went in clearing the clutter from the ‘information dump’ that hit me when I began researching online. This was quite overwhelming at first, since I was flooded by information considering the recency of the field, with no idea on what to take and leave.
This helped me understand how, in the age of internet, with the onset of AI adding to the existing personalisation available with information, the most critical knowledge you need is knowing your requirements. There are so many rabbit holes I went into, only to come out an hour later realising there was no relevance to my project. Over time, I was able to hone my ability to frequently switch between a micro view of the specific segment I was researching while still keeping the macro picture of the end goal in mind. This also pushed me to ask deeper questions on my intentions with the project, and gain clarity right from its genesis on what I wanted out of it — a habit I will continue for every future project.
While each stage came with its own challenges, the hardest stage, hands down, was building the genetic circuit. I struggled for almost two weeks, testing out different biology software to fit my purpose, only to come out with a technical incompatibility, membership or institution access, or simply features that didn’t fit my needs. It took a great deal of perseverance to finally reach SBOL Canvas, the digital tool I used for the final genetic circuit.
Today, endurance is idolised in modern online content — a loosely-used term extremely difficult to see manifested in our daily lives. This project, among countless other things, showed me what resilience and persevering looked like on a practical level. My biggest takeaway? Have a ‘why’ so strong you can’t leave without reaching the ‘how’. In other words, tying back to my previous takeaway, get really clear on your purpose right from the start, so that you have a north star guiding you during any difficult phase. My clarity on the importance of this project helped me push through struggling phases that saw very little progress, knowing that each failure was slowly building up slowly towards the end success that made it all seem worth it.
In the future, I aim to expand the project’s applicability to other specialised cells, and create enhanced animations using the existing protein visualisations to depict the process.
I look to take the project to the next level, taking multiple other human cells (apart from T cells) and identifying key endogenous sequences to activate and silence for the same. In terms of the protein structure visualisations, I aim to take the given features, specifically binding sites, and look at their morphing into an mRNA construct that is introduced into the cell. Finally, I aim to create animations that depict the entire process of entry of exogenous mRNA constructs into the cell, changes in the epigenetic state, and the resulting transformation of the specialised adult versions into stem cell-like iPSC colonies. These would serve as a fitting culmination to each stage of the project, bringing all the pieces together into a beautiful picture of this revolutionary technology.