Are you curious about the projects delivered in iGEM, the world’s biggest synthetic biology competition? I did some analyses, and I can tell you that…

iGEMers love E. coli & System and CRISPR has been on the rise

Data exploration journey on how iGEMers named and described their projects and how iGEM has changed since 2004

Natthawut Max Adulyanukosol
iGEM Copenhagen

--

iGEM HQ has put all team information from 2004–2016 on its website following their initiative called iGEMeta, which aims to bring together the iGEM community through data.

Let’s find out what we can discover in this dataset and, hopefully, we can contribute to the iGEMeta initiative.

The iGEM competition started on 2004 with just 5 US teams attended (Boston University, California Institute of Technology, Massachusetts Institute of Technology, Princeton University & University of Texas Austin). The number of participating teams grew tremendously to 272 in 2016, and they were from 54 countries around the globe.

Each iGEM team must submit a synthetic biology project with a title and an abstract along with other deliverables to enter the competition. As we have over a thousand teams in the dataset, these title and abstracts give us a good deal of resources we could explore.

A cloud of words from titles of all iGEM projects!

Word cloud from iGEM Project Titles 2004–2016. Can you find anything interesting? (made with word_cloud from https://github.com/amueller/word_cloud)

Who else can make it to the top of the list of synthetic biology competition, if not, Escherichia coli ( << click here to view the projects mentioning it ), the most beloved organism by biologists? E. coli is the prime organism in testing various kinds of synthetic biology experiments. We usually attempt to introduce E. coli to foreign DNA regardless of its origin that could be from other bacteria (Prokaryotic organisms) or even Eukaryotic sources. If this attempt is not ideal, we then often turn to Saccharomyces cerevisiae, so-called Budding yeast or Brewer’s yeast.

In the world cloud above, the size of each word corresponds to its number of occurrence in all past iGEM project titles. System and Engineering are among popular words for the titles. What kind of system we are talking about you might ask. Gazillions kinds of them! Here they are.

Old vs New iGEM Project Titles

Spot the differences!

As iGEM has been running for 14 years, the scientific community has advanced considerably during the period. New discoveries provide new tools for scientists to explore new frontiers of research. It is fascinating to see how iGEM has changed over time.

There were not many teams in the early days of iGEM. Biosensor and Detection were major players back then. They were still present to some extent in 2016, but their popularities got surpassed by Cancer.

What more interesting is the presence of CRISPR & Cas9 in the word cloud of 2016. CRISPR & Cas9 has been studied extensively and used widely in biology research over the past few years. Did iGEM community follow the trend? Let’s dive into it.

Growing interests in CRISPR & Cas9 shown in iGEM Project Titles

CRISPR and Cas9 appeared in project titles for the first time in 2011, and their relative frequencies had been growing since then.

CRISPR and Cas9 were introduced to iGEM for the first time in 2011. More and more teams were interested in CRISPR & Cas9 over the past few years. The number of times CRISPR appeared in iGEM project titles, even they were not a lot, they followed the number of published articles on CRISPR.

The number of times CRISPR mentioned in iGEM project titles and the number of published articles from Scopus

I have gathered all projects mentioning CRISPR or Cas9 in their titles or abstracts here. There are 81 of them!

American (US (led by Arizona State University, Georgia Institute of Technology, University of Southern California in 2011) & Canada) and Asian (Hong Kong, Japan, China) teams pioneered the uses of CRISPR/Cas9. Then European teams realised and jumped on the bandwagon later in 2013 (only Freiburg University, Germany) and 2014 onwards.

It would be interesting to find out whether these trends of early/late adopters existed in other topics or not.

You might wonder what is the normalised term-frequency on the y-axis earlier. It is a relative frequency of the occurrence of the word corrected by the total length of the text in comparison to other texts. Confusing? Well, the values might not make much sense right now, but these values will be useful for soon-to-be-made machine learning classification & predictive models. Though, it might take a while as I have to clean the data first.

For now, you may view the trends of other keywords (by clicking their name in legend) with the interactive version of the plot below. I’d really love to hear if you can find anything interesting.

I’m afraid that it’s not optimised for mobile screen. :/

(I will publish the code I wrote for these visualisations soon. If you’d like to get an early access to my (currently messy) code, please contact me at igemku2018 [at] gmail.com)

Next steps

I would like to know whether, given only project titles & abstracts, we can find out which category a project belonged to, and which medal & prize(s) a project received. If the model is good enough, can we predict future projects? That would be something interesting to look into. Till next time, then. 👋

To receive updates from iGEM Copenhagen Team, you may subscribe to our publication iGEM Copenhagen on Medium, our Facebook page, Twitter, and Instagram.

As of June 2018, we have received support from Department of Plant and Environmental Sciences, University of Copenhagen, SnapGene, and IDT.

If you’re interested in supporting us, please contact us at igemku2018 [at] gmail.com. We greatly appreciate all kinds of support in the making of our PharMARSy.

--

--

Natthawut Max Adulyanukosol
iGEM Copenhagen

Data Enthusiast | Bioinformatician-in-training | Back-end Developer | Cambridge '16 | IBO 🏅| @MaxNA399 on Twitter