The controversy of genetic engineering and biohacking

Playing God with emerging biotechnology

Lars Hyrsting Larsen
19 min readFeb 16, 2019

Lars Hyrsting Larsen, Selin Suzan Topcu, Oliver Sjur, Julie Højris Petersen, Sarah Dina Blomquist

Due to the large development in the field of genetic engineering, we have decided to follow and uncover controversies within this topic. Genetic engineering raises questions about where and how scientists and/or lay people are able to play God with living organisms and the ethical aspects involved. Therefore, it has been interesting to uncover first the ongoing controversies of genetic engineering on Wikipedia and then investigate how the specific controversies of biohacking have evolved in comparison to genetic engineering as a whole and next compare it to Scopus. We have aimed to follow the controversies from their inception to the point of closure, which allows researchers to better understand “science in the making” (Latour 1987). We chose the comparative way because we wanted to investigate if the same controversy and issues occur within texts from a scientific database as Scopus, compared to the debate on Wikipedia. According to the comparative method, between Scopus and Wikipedia, it is investigated the differences between a scientific database and Wikipedia. To be able to understand that a possible controversy can appear when non-scientists and scientist understands a topic differently.

Mapping selected keywords from Wikipedia e.g. skepticism and ethics make us visualize connections in the network and how specific keywords links to each other. We found that there is a connection in the timeline between genetic engineering and biohacking. A map of genetic engineering is created to investigate the different categories that link to the topic.

A timeline from Scopus based on the same parameters as Wikipedia “genetic engineering” and “biohacking” has been created in order to investigate the publications and therefore maybe also the popularity of the scope in comparison to Wikipedia. An analysis has been unfolded by using the keywords “ethics” and “law” in order to uncover how the to keywords is used in relation to “genetic engineering”, which showed that there is an interest in “ethics” and “law” within “genetic engineering”. Based on the abstracts of articles related to genetic engineering a semantic analysis gives us a meaning based insight in the main topics within the scope. At last the different sources has been mapped to create an understanding which kind of material different sources publish.

Step-by-step protocol for the use of Wikipedia

We started out by choosing the Wikipedia category genetic engineering ( https://en.wikipedia.org/wiki/Genetic_engineering), 99 pages were affiliated with this category. The category also had 12 subcategories, these had 329 pages. We thought it would be interesting to create a network where the pages in the categories, were connected by hyperlinks in the text. So in other words, the nodes in our network, which are categories, are connected by the in-text hyperlinks. We therefore called the Wikipedia API, through a python script, to scrape the Wikipedia categories for html links in their texts and got a network out of this. We also called the API for some keywords used in the text on the various Wikipedia pages in these categories. We wanted to investigate those and their influence in our network aswell.

Here is the protocol for the process of making our networks and metadata

Our network — with the category genetic engineering as the seed

Our network with annotations for the different clusters

In the picture above, we have showed our network colored and sized. We have colored it with the use of the modularity algorithm and sized the different nodes with in-degree. That means that the bigger sized nodes are Wikipedia categories, which have a lot of references to them. By doing this, we wanted to show how some of the nodes in the network are central for the Wikipedia conversation about genetic engineering. That is because a lot of smaller categories link directly to some of the more central categories in the Wikipedia debate about this theme. We give an example of that here; in the purple cluster, we can see that a lot of science fiction is linked to genetic engineering, this is interesting because the arrow only goes from the science fiction pages to genetic engineering and not back. The problem is that, all of the science fiction could be seen as noise if we only want to investigate real genetic engineering, and not fiction. But on the other hand, the authors of science fiction, tells a story about the future and how emerging technology, potentially, can be used/misused. You could argue, that science ficiton also is helping lay people, to understand and comprehend new technology, which are for sure being developed in the future.

The issues of the Wikipedia network

The biggest node in the acceptance network is ‘geneticly modified organisms’. It is clearly weighted higher than other nodes. In skepticsm, we see a more wide spread weighting of the nodes. That could mean that this keyword is used more consistently thoughout the Wikipedia pages — atleast more than acceptance is.
The ethics keyword is displaying four pages that have strong connection to ethics in the network. One of them is ‘designer babies’. With the law keyword we see a lot of pages adress it, but the biggest one is clearly ‘Organic seed growers & trade ass’n v. Monsanto Co.’ This implies that there might be some potential problem areas with the law surrounding GMO crops and the genetic modifying of these. This is based on the node with Monsanto, which are a company that uses genetically modified organisms in food/crops
With the sickness keyword we see three evenly weighted pages, a movie, a novel and a videogame. Nothing else is displaying this keyword in a genetic engineering context. With the regulation keyword we see obvious ones ‘biosafety’ and ‘Regualtion of genetic engineering’. But the third largest node is ‘genetically modified food in Europe’, which is exciting because it might suggest that they need more or less regulation of genetic engineering in Europe.

A network of co-occurring noun phrases from Wikipedia, extracted through semantic analysis

Here is the co-occuring network we made through semantic analysis

The co-occuring network is made through noun phrases extracted through semantic analysis, which is visualized in the Gephi network. The clusters links to each other, through words and it is therefore representing the connection. E.g. Science fiction is a cluster in the topic genetic engineering, and the only reason for that, is the connection to the keyword United States which is represented in the green cluster.

The editing story of the two Wikipedia pages

In this next section, we want to show the editing history of the two Wikipedia pages: genetic enginering and biohacking. In our graphs we pinpoint how many unique users there are involved and the total revisions for the pages. This helps us visualize where ‘something happened’ in our controversy.

Genetic engineering

The RAW graph above shows the timeline from January 2000 to January 2019. It illustrates the Wikipedia editing history connected to genetic engineering in this time period. The graph grows slowly from 2000–2005, and then it grows fast and peaks in 2007 where it afterwards drops quickly again, since then the graph has been almost stable with a larger increase from 2012 until now.

This Python graph shows the same as Raw graph now only done by Python Jupyter notebook. The only difference is that we are now able to see how many unique users have edited the Wikipedia pages based on the topic genetic engineering from the periode 2000–2019. As showed before the graph peaks in 2007.

This Python graph is done by the same principle as the graph showing numbers of unique users making revisions. The graph give us an insight in the total editing activity from the periode 2000–2019. The graph follows the two others graphs with a peak in 2007 with around 1400 editors in total.

Graph analysis of genetic engineering

By searching through Wikipedia we wanted to uncover the peaks in our graphs, in order for us to understand controversial discoveries and controversies within the topic of genetic engineering. All three graphs peaks in 2007 because of the lead discovery with the use of the CRISPR technology to show a experimental demonstration of adapted immunity in the dairy industry. This shows how the editing history of genetic engineering got affected by the lead discovery of the use of the CRISPR technology. All three graphs also shows also a little increase in 2012, where there has been another lead discovery with the CRISPR technology,where experiments with the Cas enzyme has been discovered with succes.

Biohacking

The RAW graph shows a timeline from 2014–2019. It illustrates the Wikipedia editing history connected to biohacking. The graph shows no sign of editing from 2000–2014,from 2014 until 2016 there is a large peak with the highest point in 2015. From 2016 until now there is a large dobbelt, which is at is lowest now.

The Python graph how many unique users have edited the Wikipedia pages based on biohacking. As seen in the graph there is no sign of editing history from 2000–2014, but from 2014–2019 we can see a large increase in editing history. One in between 2015–2016 and another one in between 2017- end 2018, where it in 2019 drops.

The Python graph shows number of revisions, from the period of 2000–2019. The significant about the graphs is the dobbelt peak from 2014–2019, which is quite similar to the graph showing numbers of unique user. Before 2014 the number of revisions is stable with no editing.

Graph analysis of biohacking

By searching through Wikipedia we wanted to uncover the peaks in our graphs, in order for us to understand our controversies within the topic of biohacking. All three graphs shows a peak in 2014 and again in 2016 were biohacking has become a topic within the consequences of genetic engineering. Since the lead discovery of the technology CRISPR-Cas9 genetic engineering has become easier, cheaper and more accessible to use as lay people, because it is now possible to ‘do-it-yourself’ without a lot of scientific knowledge. This opens up for the ethical discussions about playing God.

By looking at the two timelines together, it is interesting to investigate that the discoveries uncovered in 2007 is the results of people being able to do scientific experiments themself. The large increase in the development of bio science raises questions about ethical values in how to handle new technologies. This leads us to a reflection about how we use science in public and also how the public understands the complexity of being able to do scientific experiments on themselves.

Quotation for peak in 2013

DIYbio is a specific network, DIY biology describes a general movement or idea. Two different things and therefore should not merge. Hzh (talk) 21:21, 12 February 2013 (UTC)”. This comment by the Wikipedia user Hzh in 2013 has started a debate on Wikipedia. According to our Wikipedia timeline over the editing history of biohacking we can see this peak around 2013–2014. This comment about that the two Wikipedia pages ‘DIYbio’ and ‘DIY biology’ is two different pages, and should not be merged together as one, has started a debate where people are talking about biohacking as an ambiguous word which can refer to DIY biology but also the grinder body modification community, which are two different things. This ambiguous topic started the debate in 2013 but has first increased in 2017. This shows that there is different ways of understanding the misuse of genetic engineering, where non-scientific people can easily start a debate about the ambiguous topics which has not affected the scientific community yet.

Protocol for the use of Scopus

Our protocol from how we used data from articles on Scopus to make different networks

The protocol illustrates how we step by step have developed a different analysis of articles from Scopus. The green flow illustrates the procedure of a genetic engineering network based on sources from Scopus, the pink flow illustrates a semantic analysis based on abstracts from articles. The red flow illustrates a keyword analysis of “ethics” and “law” within the field of genetic engineering. At last the blue flow illustrates how we have conducted the timelines.

Green flow — Two networks of biohacking and genetic engineering keywords based on articles from Scopus

Biohacking

A bipartite network of sources (journals) as red nodes and author keywords as blue nodes — all under articles within biohacking

This network illustrates nodes with source titles and author keywords. The network is created by searching for biohacking in Scopus, with the limitations being English articles only. Based on 28 articles of source titles and author keywords a file was created and analyzed using a network creation in ScienceScape. The network is bipartite with the red nodes being sources of journals and the blue nodes being keywords made by authors, the edges link the keywords and sources between each other. From our previous hand in we identified biohacking as being a controversy within genetic engineering. In order to uncover the actors behind biohacking an investigation of the different sources who have published articles about biohacking was conducted. The founding of the networks will be compared with a similar network using the search terms genetic engineering in Scopus.

Genetic engineering

A monopartite network of sources (journals) as red nodes and author keywords as blue nodes — all under articles within genetic engineering

This network illustrates nodes with sources linked by author keywords. The network is created by searching for genetic engineering in Scopus, with the limitations being English, most cited and articles only. The search is based on 2000 articles of source titles and author keywords where a file was created and analyzed using a network creation in ScienceScape. The network is bipartite with the red nodes being sources of journals and the blue nodes being keywords made by authors, the edges link the keywords and sources between each other. With genetic engineering being our overall topic, it was interesting to investigate what the main sources write about in comparison to biohacking and how they differentiate from each other.

Comparison of networks

By looking at the network of genetic engineering, the network has very different node sizes corresponding to how many keywords that are linked to the node, with the largest amount of keywords being the biggest node. The biggest nodes in the network refer to the sources: Gene, Nature biotechnology, Proceedings of the national academy of science of the United States Of America, Embo journal and Plant Journal. It could be assumed that the sources with the highest amount of keywords publish the most cited articles, based on the amount of data.

The sources linked to biohacking are almost equal in size, meaning that about the same amount of keywords is linked to each source. The main sources in the network of biohacking are Senses of society, Frontiers in human neuroscience, Trends in biotechnology, New media and society, and Biosocieties. Even though that biohacking is a field within genetic engineering, they use very different sources to publish their data. An assumption could be that the journals accepted in the scientific community do not publish data about biohacking. This tells a story about that the controversy about biohacking is not affiliated with the sources linked to genetic engineering. Where the network of genetic engineering has a large connection across different nodes, the network of biohacking is less created around the keyword biohacking.

There is a large range of keywords used in both networks and they do not have any specific similarities or differences, but the network of genetic engineering is far more linked across keywords which gives a strong connection between the different sources. Each source in the biohacking network links to specific keywords, which creates an idea of biohacking is more narrowed to a specific topic and is not as widely written about through different journals. It could be assumed that biohacking is either a niche topic, not a scientific topic or a scientific topic that is not as developed as the general topic of genetic engineering. As we have seen in the timeline from Wikipedia with the keyword biohacking, we also discovered that there is an ambiguous understanding of the word biohacking and its definition.

Pink flow —A network of co-occurring noun phrases from Scopus articles, extracted through semantic analysis

Earlier we showed how we made a semantic analysis with the Wikipedia dataset. We saw a lot of fiction in that dataset. We also wanted to make a semantic analysis with the Scopus dataset to show how a more scientific proven dataset, compared to Wikipedia, could show us other phrases and terms or perhaps some of the same.

First, we got the file from Scopus for use in the semantic analysis. This contained 2000 scientific articles from the search word genetic engineering and we also filtered the file with different criteria. We also filtered to the most cited articles and made sure to include the various abstracts.

In Cortext we extracted noun phrases from the text parts, being the abstracts from the articles. There were some settings we had to apply to this script. First, we sat the minimum frequency to 3, which is the minimum value for a term to have, to be included in the network. We sat the list length to 100, which means that the algorithm will extract 100 terms and it will select these after terms with the highest specificity core. Lastly, we sat the maximum length to 3, which means that the terms only can consist of 3 words or less. These settings resulted in a network looking like this:

A network based on phrases from text in the abstracts of articles within genetic engineering on Scopus

In this network we see different clusters with nodes, being the different phrases. These are grouped around different terms, such as the biggest cluster (turquoise) have a lot of nodes connected to genes. To get an even clearer view of what scientific genre the clusters was in, we ran a new script from Cortext. This script had the purpose of projecting the top 3 journals to each cluster, for us to get a better understanding of which cluster had what scientific genre. The network with the top 3 journals connected to each cluster is seen here:

A network based on phrases from text in the abstracts of articles within genetic engineering on Scopus — with an added top 3 over sources to each cluster

The network shows how the turquoise cluster is connected to journals with interest in plants and nature in general, where the dark blue cluster is connected to medicine and immunology. This shows that we can easily uncover what terms are connected to different scientific journals and genres with genetic engineering. It also pinpoints more precisely where the controversy compared to biohacking mostly could be present, within a combination of different clusters. Here is the turquoise cluster, not the obvious choice, with it being compared to nature and plants.

The comparison to the Wikipedia semantic analysis

When we compare the two semantic analysis, both from Wikipedia and Scopus, we see the debate about genetic engineering have different starting points and results. With Wikipedia, we found various scientific clusters. We also identified a cluster with a huge influence of fiction in general, which could give the indication of the presence of non-scientist actors. In the Scopus analysis, we are only dealing with scientific articles and journals. We could not find fiction at all in these articles, so we instead listed the top 3 journals from each cluster, to see which genre of science each cluster had. This made it easier to tell the story about which phrases were mostly used in each genre of science in the network. Lastly, it is interesting that the science-fiction topic has a connection to genetic engineering, on Wikipedia, because it tells a story about how to get ‘’lay people’’ involved in a difficult science topic. This could get ‘’lay people’’ to reflect on the development in genetic engineering and the potential use and misuse of such technologies and methods.

Red flow — Keyword analysis of “ethics” and “law” based on articles from Scopus

Keyword — Ethics

A network based on the keywords genetic engineering and ethics

The clusters represent the most used author keywords from articles centered on the topic of “genetic engineering”. The nodes represent the keywords and the edges represent the links between keywords that co-appears in the same papers. The largest nodes visualize how many authors that references to the specific keyword in the whole network. It is notable that we can see two divided clusters, where the red cluster shows every keyword that is centered around “ethics”, and the green cluster shows keywords centered around “genetic engineering”. The red cluster focuses on different topics from science to autonomy. The green cluster focuses mostly on biology and keywords related to e.g. medicine and stem cell research. The only way that the clusters can be totally combined is with e.g. “(genetically modified organism) GMO” or “food” keywords. Compared with the Wikipedia keyword search on “ethics”, where “ethics” was in relation to designer babies, we now notice that “ethics” as a keyword from Scopus articles are more connected to the actual topic “genetic engineering” and sub-topics that are involved with “genetic engineering”.

Keyword — Law

A network based on the keywords genetic engineering and law

The clusters represent most used author keywords from articles centered around the topic “genetic engineering”. The nodes represent every keyword and the edges represent the links between keywords that co-appears in different papers. The largest nodes visualize how many authors that references to that specific keyword in the network. By looking at the node “law” it shows how it is connected to “genetic engineering” and “social control over science”. All the nodes in the green cluster are linked to the node “law” and the keywords in the green clusters mainly about science. The network shows that articles were written about “genetic engineering” uses “law”. The other purple cluster most represented node is “transgenic crop”. The connection to the keyword “law” is between ‘’ethics’’ in the purple cluster.

Summing up on the keywords from Scopus with a comparison to the keywords from Wikipedia

To sum up the keyword searches from Scopus it is important to remember that the nodes according to our networks is based upon the whole network containing the topics “genetic engineering” and “ethics”, where the other network of keywords is based upon the topics “genetic engineering” and “law”. Due to that, the nodes “ethics”, “genetic engineering” and “law” are more frequent than others, because the authors of the articles refer more often to these keywords. By looking at the two networks of keywords according to “ethics” and “law” it shows that they refer to the same sub-keywords e.g. “food”, “ethics”, “genetic engineering”, “genetically modified products”, which illustrates that the same keywords appear in most of the same articles from Scopus. If we compare it to the keyword analysis from the Wikipedia about “genetic engineering”, it shows that “ethics” and ‘’law’’ co-appears on Wikipedia and Scopus. This shows that there is a debate about the ethical terms with genetic engineering among scientist and non-scientist.

Blue flow — Various timelines connected to the controversy based on articles from Scopus

Timeline — Biohacking

Timeline over biohacking with 2 peaks shown

The timeline shows the most 31 cited articles about biohacking from 2009–2019. As we can see in the timeline there is a huge decrease from the end of 2009–2012, where there is an increase from the beginning of 2012 to the end of 2014. In 2013 three articles were published about information on do-it-yourself biology and innovation. The timeline decreases until the end of 2014 where the graph increases until the end of 2018. In this period we see an increase in publishing articles about biohacking from 2017–2018 where there has been published 17 articles containing information about cyborgs, cybersecurity and scientific knowledge in the digital era. We can, therefore, see a shift from do-it-yourself to cyborg because we assume that there is more focus on gender, food, safety and security about using genome editing tools. We assume that the little amount of published articles on Scopus is could be caused by the interest in the topic about ‘’non-scientific’’ biology.

Timeline — Genetic engineering

Timeline over genetic engineering with 3 peaks shown

The timeline shows the 6000 most cited articles about genetic engineering from Scopus. From the period 2000–2019, we can see a decrease from the end of 2007 to 2008, where it increases again by 2008. In 2006 there has been published 365 papers about genetic engineering, because of recent discoveries of a hypothetical scheme of adaptive immunity (“CRISPR Timeline | Broad Institute” 2019). It is illustrated on the Raw Graph from the Wikipedia pages about genetic engineering. The timeline shows the main discovery in 2009 with 492 published articles about DNA targets with the use of CRISPR (Lander 2016). The highest peak is in 2014, where 719 articles have been published about new techniques that potentially can be used to produce designer babies.

Summing up on the two timelines from Scopus with a comparison to the two timelines from Wikipedia

The timelines at Scopus shows, that there is no observed significant difference between the timelines from Wikipedia compared to Scopus. However, it is important to notice that there is 6000 articles about genetic engineering and only 31 articles about biohacking. Perhaps the research and debate about biohacking will increase in the future because of the fast development in genetic engineering, where biohacking could be more relevant to investigate. As mentioned before there is not any observed significant difference between the two timelines from Scopus and Wikipedia because the databases are very different according to the scientific and non-scientific starting point.

Postscript

By choosing genetic engineering we wanted to uncover controversies within this topic. We also aimed to uncover biohacking as a subcategory within this field. Our analysis was based on the media in Wikipedia and Scopus. At first, we assumed that biohacking was a discussed and an innovative topic, but based on our findings it turned out to be different. Little data on both Wikipedia and Scopus about biohacking was harvested, which gave us a narrow analysis, of the “large” controversy we thought we were about to investigate. The limited amount of data might be because of the topic is non-scientific, a more social oriented topic and newly occurred field that might not be that investigated yet. A notion could be that our understanding of “biohacking” and what we thought we could achieve with the topic is different than what is used scientifically. We used biohacking as a one-word search maybe another unknown word would have been more appropriate to uncover the controversies. One of our main assumptions during the project was the difference in the two databases, where Wikipedia is a more meaning-based environment where non-scientific people’s assumptions and opinions are likely to occur based on interest. People can comment, add and delete if you are a user with the right access. On the other hand, Scopus is a scientific database, where only reviewed articles backed up by scientific evidence is published. Scopus makes, on one hand, an analysis more trustworthy but at the same time more limited, because the scientific evidence is needed in order to publish data, that might not be relevant according to undercover a controversy. Furthermore, the amount of data collected has according to especially genetic engineering been too much data in order to qualify the analyze and get an overview of the different pages and articles, in the time we had available. When following the controversy we followed the actors, where powerful actors in our case are the scientists and non-scientist. Under the surface, an enormous amount of data is distributed on many articles and pages which are being dictated by actors that form the controversy, based on what is being written or published.

References:

“CRISPR Timeline | Broad Institute.” 2019. Broad Institute . 2019. https://www.broadinstitute.org/what-broad/areas-focus/project-spotlight/crispr-timeline.

Lander, Eric S. 2016. “Leading Edge The Heroes of CRISPR.” https://doi.org/10.1016/j.cell.2015.12.041.

--

--