Mapping Controversy: Vaccine controversies

By Marie Hasselbalch and Trine Christensen Mayntzhusen


The controversy on vaccines is a controversy because of its embedded value based, ethical and cultural arguments (Law, J. & Singleton, V., 2014). The key issues include both scientific discussions on whether or not scientific results are valid, more specifically an example of the controversy of the Mumps, Measles and Rubella vaccine (MMR) and its relation to cause autism in children. As well as dissemination of specific arguments for or against vaccines from a broad perspective.

The nuances of vaccine controversies are not only revolving around the bilateral relation of pro- and anti-vaccination, because the controversy exists of many sub-controversies and subdiscussions. This shows a controversy of high complexity and being reduction-resistant (Venturini, T., 2010a).

(Vaccine hesitancy, 2018).

This controversy is mapped through an actor-network theory (ANT) approach; thus an actor is whatever makes a difference through action in a situation, human or non-human (Venturini, T., 2010a). An example of a significant actor in this specific controversy could be Andrew Wakefield, an anti-vaccine activist and former British doctor, who has had a great impact on the issue about vaccine hesitancy and connection between MMR and autism.

The first part of this article will revolve around data harvesting of a Wikipedia category and the member pages, and different networks and visualisations of these with annotations. The second half will focus on how debates on a social media platform communicate about vaccine controversies, here specifically Reddit.

We would like to map how different networks occur in the vaccine controversy debate. Besides, explore the key issues and actors in the debate on vaccine controversies on both Wikipedia’s category pages (under ‘Vaccine Controversies’) and Reddit as a social media platform.

Data protocol

After selecting the Wikipedia category ‘Vaccine controversies’, we used scripts to call Wikipedia’s API and scrape, respectively;
1) Members of the category, and crawled and scraped subcategories as well
2) Links between category member pages
3) In-text links between category member pages
4) External references from category member pages
5) Full text to query category member pages for keyword mentions, namely ‘MMR’ and ‘autism’
6) Text from category member pages
7) Revision histories from category member pages
8) Revision links between users and category member pages

Visualisation of data protocol based on the Wikipedia category ‘Vaccine controversies’. Here there are 117 pages, which we have used as input for a series of scripts that calls Wikipedia’s API, crawl and scrape to harvest different kinds of relationships between the pages and constructed networks to represent them.

All this was done through the use of Python scripts (Python is a code language) and we visualised the networks in the network visualisation programme Gephi (Bastian, M. et al., 2009), except the revisions, which we have visualised using a Python script to create two timelines. In this article, only some of the visualisations and network will be presented.

Vaccine controversies and pseudosciences

Our chosen network shows how the pages of the category connect to each other and to all other pages they cite. It shows an uneven distribution visualized through five clusters in different sizes (Venturini, T. et al., P. D., 2015).

The red cluster, ‘outbreak cases’, is the smallest and could be interpreted as how small a role actual disease outbreaks play in the controversy.
The green cluster, ‘pseudo-science’, and the turquoise cluster, ‘autism perspectives + controversies in autism’, are both less dense than other clusters. The two clusters are intertwined through the node ‘autistic enterocolitis’ which can be related to Andrew Wakefield since this is a pathology he claimed was linked to autism (Deer, B., 2010). The research by Andrew Wakefield on how the MMR-vaccine can be related to autism has been highly questioned and he was discredited in 2010. However, the node ‘autistic enterocolitis’ functions as a bridge between the green and the turquoise cluster, thus it can be interpreted as a current issue in the controversy on autism (Venturini, et al., 2015).
Finally, we interpret the orange cluster, ‘vaccine controversies + vaccine’, as the second most dominating in size and is a very dense cluster with edges to e.g. the green node ‘vaccine hesitancy’. This connection is not surprising as the debate about vaccines could create doubt on whether or not to vaccinate.
The largest cluster in the network is the purple, ‘powerful voices’, which includes larger publications, such as the node ‘The New York Times’ etc. A big node named ‘Andrew Wakefield’ is found here, as well as nodes with famous actors who take a large role in the debate on vaccines and autism. This cluster also includes a famous, American lawsuit on MMR vaccine being the cause of autism. We interpret the size of this cluster as these powerful voices play a significant role in the controversy debate on vaccines.

Visualisation of the relationship between pages in the category and how they are linked to each other and to all other pages they cite by Wikipedia links. We have processed the network in Gephi using an algorithm to organize the network and used the statistical tool Modularity to make the clusters clearer. See in high resolution (without added annotations) here.

Semantic analysis

A network of co-occurring noun phrases extracted through semantic analysis

In doing a semantic analysis we aim to explore the diversity of anti-vaccination arguments. The semantic analysis is executed through Cortext, which is an algorithmic tool for discovering sub-discussions in a data corpus (Elgaard Jensen et al., 2018).

Semantic analysis of the top 100 terms according to Cortext. Seven clusters occur, these have been given annotation based on the terms in the respective cluster. The “Africa map” of three clusters drawn together shows the effect of questioning the MMR vaccine in co-occurring terms. See in high resolution (without added annotations) here.

The output from Cortext and our visualisation of the co-occurring terms shows a clear tendency to linking vaccines as a cause (more precisely MMR vaccine) of autism.

The tight clustering around the “African map” in the middle (the yellow, purple and turquoise clusters) mirrors a high edge weight of the terms used in those clusters, and this explains the use of the same terms within different clusters.

The semantic analysis shows both pro- and anti-vaccination discourses, where a sub-discussion occurs on trust, e.g. the authorities and scientific community vs. personal narrations and risk of harm.

Top sources of the top 100 terms clusters, see in high resolution here.

Comparing to the above visualised semantic analysis with the visualisation presented here of the top 100 sources of the top 100 terms shows a consensus between the two. For example, the turquoise cluster has a significant source in the Wikipedia page ‘Folk epidemiology of autism’, which explains the rhetorics used in this cluster, e.g. ‘damaging medical hoax’, since this shows a great deal of mistrust to the scientific/medical authorities and a firm believe in anti-vaccination (Venturini, T., 2010b).

The purple cluster in the top 100 sources visualisation has a significant source from ‘National Vaccine Information Center’, which aligns very well with the rhetoric on ‘vaccination safety’ because we assume based on their name that this must be one of their main concerns.

Conjunctions of rising in both revision activity and unique users making revisions

Using a Python script we have created a timeline of user revisions per month, another timeline of unique user revisions per month has also been made. The difference in the two timelines are the timeline of user revisions show the number of revisions per month, whereas the timeline of unique user revision shows how many individual members make revisions per month.

The number of revisions per month, see the file in higher resolution here.

There is a conjunction of rising in both unique users and revision activity in the respective periods; August-September 2007, and again in October-December 2009. This indicates that previously established knowledge claims within the editing community are being contested by users outside the community, which we qualitatively investigate by looking into the .csv file with all the user revisions incl. metadata.
As mentioned there is a conjunction in the rise of unique users and revision activity in August 2007, where we observe a relatively large portion of comments and revisions relating to religious arguments in reference to vaccine controversies, which are marked with yellow in the above mentioned .csv file.

Another conjunction in the rise of unique users and revision activity is seen in 2009 from October-December where the primary topic of argumentation and dispute is regarding lack of neutral point of views (NPOV) and lack of sources to the claims in the revisions. NPOV is a rule for all encyclopedic content on Wikipedia, meaning that all revisions and text on Wikipedia pages must be written from a neutral point of view.

Wikipedia as an open source encyclopedia has a very flat organisational structure and a key principle in interacting there is to be neutral and non-biased (Marres, N. & Moats, D., 2015). We have harvested two periods with the Talk-sections, respectively in the period August-September 2007, and October-December 2009, the Talk-section on Wikipedia is a platform for discussing suggested revisions. The activity on the Talk-sections revolves a lot around how to behave in relation to the NPOV guideline.

Additional data protocol

Looking into social media communication on vaccines and vaccine controversies, we have chosen the platform Reddit. Reddit is a self-regulating messageboard of submissions, where users are anonymous and can vote submissions up or down based on the perceived value. Reddit contains niche communities called subreddits, which includes submissions and comments (Marres, N. & Moats, D., 2015).

Data protocol for harvesting Reddit, specifically subreddit ‘VACCINES’ using a Python script to call Reddit’s API.

After selecting the subreddit ‘VACCINES’, we used a Python script to call Reddit’s API and scraped, respectively;

1) Submissions of the subreddit including comments
2) Full text from submissions and comments
3) Activity history per day of the subreddit
4) Up-vote activity history per day of the subreddit

Besides that, we have also created a semantic network using Cortext, and constructed, respectively;

5) A semantic network of top 300 co-occurring terms
6) Heat map with top 300 co-occurring terms in a specific period

Activity level and upvoting

Timelines of the activities and the up-voting of submissions on the subreddit ‘VACCINES’ per day

The activity level based on the submissions and comments derived from the data corpus (from the subreddit ‘VACCINES’) is interesting to investigate because it mirrors the topicality of the controversy in a public debate on an open, yet anonymous platform.

Timeline showing the activities in the subreddit from 2014 to February 2019 created using the visualisation programme RawGraph. See timeline in high resolution here.

The timeline shows three periods with high spikes, respectively, early 2014, early 2017 and early 2019. This could be a reflection of greater incidents related to the debate on vaccine controversies (Venturini, T., 2010b).

As we investigated possible reasons for the increased activity in these time periods, the 2014 spike on the timeline could be explained by one of the largest measles outbreaks in the U.S. in many years, a reason for this according to the American Center for Disease Control and Prevention is a greater number of unvaccinated people and/or travellers bringing the disease into the U.S. (, 2019).

An increasing number of social media groups who disseminate word about antivaccination and the dangers of vaccination is shown to be very high in early 2017, and this has been a topic for a scientific paper on the dangers of the presence of antivaccination groups and the debate was also present in larger publications, such as The Washington Post (Evrony, A. and Caplan, A., 2017; Kaplan, S., 2017).

The current debate (February 2019) on vaccines again revolves around an increased number of measles outbreaks, there has within the first two months of 2019 been reported a high number of infected people with measles in the U.S. as well as an increased tendency and worry among e.g. World Health Organisation (WHO) for the status of Europe in relation to measles outbreak (, 2019;, 2019).

In order to go deeper into the activity per day and how Reddit as a mediating technology enacts with the controversy (Marres, N. & Moats, D., 2015), we have chosen to explore the activity of upvotes per day, as seen in the timeline below.

Timeline showing upvotes per day from 2014 to 2019, present day. See in high resolution here.

The two timelines, respectively showing activities per day and upvotes per day, have very similar spikes in certain time periods. However, the timeline for activities per day resembles every activity, e.g. both new submissions and comments, as well as reactions shown as upvotes. Whereas the timeline for upvotes pictures the concrete action of upvotes only, in other words, reactions only, not written activities. Thus, the similar spikes in the two timelines illustrate the behaviour of this social media as very ‘in the moment’.
As controversies are defined by Venturini “controversies display the social in its most dynamic form” (Venturini, T., 2010a), the great diversity in the appearance and growth of the spikes, the two timelines point towards a high dynamic among the users on Reddit as a social media platform. Besides the behaviour on Reddit being very ‘in the moment’, the timelines also show another dimension for the typical reaction on the media: When there is a written activity, an immediate reaction typically follows, e.g. an upvote. This behaviour could relate to Reddit a being a social media platform.

The situation illustrated through the timelines could frame the vaccine-debate as a matter of the concern, which is an expression by Bruno Latour (Latour, 2004). In short matters of concern are characterized by being the lived experiences and consequences of a subject (DiSalvo, C. et al., 2014). In relation to vaccines, the activity on the two timelines could be interpreted as the public’s perceived experiences with different aspects revolving some of the incidents mentioned above, e.g. the increased outbreaks of measles in February 2019.

The current debate on vaccine controversies on Reddit

From our timeline showing activity per day, the biggest spike is illustrated in February 2019, which encourage us to investigate further on the current debate in the chosen subreddit. The timeline illustrating the upvote activity also has a big spike in early 2019. Therefore, we have chosen to explore the activity based on the most commonly found terms in submissions and comments on ‘r/VACCINES’ in February 2019, illustrated below in a heat map created using Cortext.

Heat map showing top 300 terms from February 2019, where the red areas illustrate the most commonly found terms on subreddit ‘r/VACCINES’ in February 2019. See in high resolution here.

The heat map is another type of mapping a network visually including metadata. Here the red areas illustrate the most commonly found terms on ‘VACCINES’ in February 2019, and the degree of the redness indicates which terms are used more frequently.

The reddest area includes the terms; ‘real news’, ‘Andrew Wakefield’, ‘fabricated research’, ‘medical qualification’ and ‘problem for profit’. This shows a consensus to what we also found in the semantic analysis of the Wikipedia category ‘Vaccine controversies’, a general tendency of mistrust. This does not only show a pro- and an antivaccination polarisation, it goes beyond this bilateral discussion and shows a broader controversy of vaccines regarding both economics and scientific validity. This underpins the current debate on the controversy, as Venturini defines a controversy i.a. as being reduction-resistant, meaning that the issue in question is “impossible to reduce to a single resuming question” (Venturini, T., 2010a).

Terms like ‘public health risk’ along with ‘measles outbreak’ are terms in rather pale red areas, which indicates that a situation like a measles outbreak does not have high relevance in the debate on vaccines in February 2019 on this subreddit. This is presumed to be very different in a scientific or governmental context, in which a public health risk such as a measles outbreak would cause the vaccine controversy debate in a scientific and/or governmental arena to increase significantly.


Bastian, M., et al. (2009). Gephi: an open source software for exploring and manipulating networks. ICWSM, 8, 361–362. (2019). Measles | Cases and Outbreaks | CDC. [online] Available at: [Accessed 25 Feb. 2019].

Deer, B. (2010). Wakefield’s “autistic enterocolitis” under the microscope. BMJ, [online] 340(apr15 2). Available at: [Accessed 25 Feb. 2019].

DiSalvo, C. et al., (2014, April). Making public things: how HCI design can express matters of concern. In Proceedings of the SIGCHI Conference in Human Factors in Computing Systems

Elgaard Jensen et al., (2018). Identifying notions of environment in obesity research using a mixed-methods approach. Obesity Reviews. (2019). Measles in Europe: record number of both sick and immunized. [online] Available at: [Accessed 25 Feb. 2019].

Evrony, A. and Caplan, A. (2017). The overlooked dangers of anti-vaccination groups’ social media presence. Human Vaccines & Immunotherapeutics, [online] 13(6), pp.1475–1476. Available at: [Accessed 25 Feb. 2019].

Kaplan, S. (2017). The truth about vaccines, autism and Robert F. Kennedy Jr.’s conspiracy theory. [online] The Washington Post. Available at: [Accessed 25 Feb. 2019].

Latour, B. (2004). Why has critique run out of steam? From matters of facts to matters of concern. Critical Inquiry, 30(2).

Law, J., & Singleton, V. (2014). ANT, multiplicity and policy. Critical policy studies, 8(4)

Marres, N. & Moats, D. (2015). Mapping controversies with social Media: The case for symmetry, Social Media + Society, 1(2)

Vaccine hesitancy. (2018). [image] Available at: [Accessed 26 Feb. 2019].

Venturini, T. (2010a). Diving in magma: how to explore controversies with actor-network theory. Public understanding of science, 19(3)

Venturini, T. (2010b). Building on faults: How to represent controversies with digital methods. Public understanding of science, 21(7)

Venturini, T. et al., P. D. (2015). Visual Network Analysis.

Mapping controversy: Vaccine controversies

Mapping vaccine controversies within MMR and autism based…

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store