The War on Drugs — A techno-anthropological controversy mapping

By Anders, Benjamin, Kriss, Peter & Svend

1 Introduction

A controversy is described by Venturini as, “… situations where actors disagree (or better, agree on their disagreement).” (Venturini, 2009), which speaks to the notion of complex conflicts wherein the relevant actors are disputing over a correct interpretation or solution. Controversy mapping is a method that exploits the vast amount of information generated online and by including digital data -gathering -manipulation and -analysis tools, researchers have the opportunity to map different networks that can support the findings of certain controversies (Marres, 2015). Extracting information from the internet is often considered troublesome since it can be influenced by hidden agendas or digital bias (Marres, 2015), which requires the researchers to be able to interpret and organize the empirical data harvested from the internet. As mentioned by Venturini, the task of building up the cartography of controversies, which refers to the crafting of devices that can describe social or technical controversies, is not a simple task but rather complex (Venturini, 2009).

This medium post is meant to explain and describe how the use of digital methods can support the mapping of controversies, thus creating a foundation for later exploration and analysis. We have chosen to explore datasets from three different sources; Wikipedia, Hyphe and Scopus. Each source of data is meant to highlight different approaches to controversy mapping while complementing each other. Wikipedia has been the primary source, from where our topic has been selected, which then led to a deeper exploration of categories related to the chosen controversy. We have also chosen to apply a Hyphe crawl on two Wikipedia articles, which was done to illustrate how the articles would link to different or similar web pages. Scopus was our last source of data and our intentions were to see how the scientific community perceived arguments for and against drug prohibition. By cross-checking the reference list from the Wikipedia article and searching for scientific articles on Scopus, we then ended up with Scopus articles that were either for drug prohibition or against.

This approach of analyzing different datasets from different sources is structured with the intent of continuously going deeper into the controversy, using different platforms. This allows for both a deep and broad understanding.

All the data for the work presented here was harvested between the 18th and the 26th of february 2019.

2 Wikipedia

Data source

Data Harvest

To get the data from the Wikipedia category Drug policy of the United States, we used a python script that calls the API to do a crawl for all pages that are members of the subcategories and all pages that are members of the subcategories of subcategories i.e. we crawled at depth 2.

This gave us 319 articles as part of our category members.

2. All Links

We used a Python Script that calls the API to retrieve all links to Wikipedia pages from each of the pages found in the category of Drug Policy Of The United States which retrieves all links on all the Wikipedia pages in the category members which we previously got from data harvest 1. category.

This got us all links from all pages in the 319 Wikipedia category members.

3. In-Text Reference

By using a python script that calls the Wikipedia API to gather the HTML from each of the pages in our category members and then scrapes for all links found in the text of each page.

This got us all in-text references from all pages in the 319 Wikipedia category members.



Using the file we harvested containing our 319 category members at depth 2 from the category: Drug policy of the United States, we got a bipartite network containing 293 nodes and 310 edges. We did the following in Gephi:

  • Force Atlas 2
  • Scaling 10
  • Prevent overlap.
  • Modularity script run nodes colored after this class

It is interesting to investigate which pages are located between the three clusters, as these pages are recurring under different categories. War on Drugs is located between the orange cluster (depth 0) and the green cluster (depth 1). This could mean it is a very central actor that could be useful in the further investigation of the controversy as it is linked to both depth levels.

The four pages, with edges to all three clusters, are Medical Cannabis in the United States, Decriminalization of non-medical cannabis in the United States, Legal history of cannabis in the United States and Rohrabacher-Farr Amendment. These category members relate to the use of medical marijuana and marijuana in general with the exception of Rohrabacher-Farr Amendment, this points to marijuana use both for medical and recreational purposes are very central in this controversy since all levels of depth relate to them. This could mean that it is a central topic in the ongoing debate.

All links

The file we gathered from the data harvest was opened in Gephi and that got us a network consisting of 15183 nodes and 44950 edges. We then did the following settings to create the network shown below:

  • Force Atlas 2
  • Scaling: 15
  • Gravity 2
  • Coloring by Modularity
  • Node size set to In-Degree min. 10, max. 100

The picture above illustrates an abundance of clusters and smaller cloud formations. In the red cluster we see articles relating to different drug king-pins, in the dark purple cluster we see court cases and different types of elections in different states of the USA, in the orange cluster we see a lot of different medicinal drugs and chemicals, in the lighter shade of purple we see cannabis related articles. However, the large number of nodes made it difficult to navigate and identify connected nodes in Gephi. Therefore, we ran the previously used category members script again and tried decreasing the depth level to 1. This provided us with a more manageable network.

By setting the depth level to 1, the number of nodes and edges were now decreased to 10899 nodes and 30011 edges. The settings applied in Gephi were as follows:

  • Force Atlas 2
  • Scaling: 50
  • Modularity for the coloring of clusters
  • Node size regulated by degree, which means that the size of the node expand if or when it has other articles that link to this specific article or when the article itself refer to another article.
  • Prevent overlap setting on

After creating the visualization in relation to the aforementioned, we then created an overview depicting the main themes in each cluster, which is illustrated below.

In the picture above we see a lot of different clusters, they have been divided as shown by color and the biggest node in the clusters have been highlighted with the exception of the orange cluster which is made up of articles on elections in the USA, also the 2000 California Proposition 36 is a law that decreases the sentence of drug possession. The gray cluster at the top right holds the node Rafael Márquez, who is a Mexican football player, who was blacklisted in the U.S. accused of having ties with a Mexican drug cartel.

However, we still regard this network and the amount of nodes and edges it contains to be too massive for further analysis to be made in direct regards to our controversy. Therefore the next network will be notable smaller with the aim that it will provide us the ability to draw further conclusions from our visualizations.

In-Text References:

  • Filter all nodes with a degree range below 10
  • Force Atlas 2 run with scaling of 15
  • Color set from using the modularity tool in Gephi
  • Size of nodes set to degree range, size 5–50

This gave us the network shown in the picture below:

In this new network we see five different clusters, however, a qualitative look at the different type of nodes show that the right side is dominated by various latin american drug bosses, such as Nazario Moreno Gonzáles and Abigael Gonzáles Valencia. The big blue and black nodes on that side is respectively Foreign Narcotics Kingpin Designation Act and United States Department of the Treasury, which in their work are dealing with these types of figures.

On the left side, in the green cluster, Drug Enforcement Administration is the biggest node, and the green area, in general, is more generic information, such as articles on specific types of drugs, the articles for War on drugs, laws, and agencies. This distinction shows a division in the articles where some reference actual personas and events and others reference specific organizations, laws and policies.

This divide indicates the different sides in this controversy but does not show the real arguments behind whether the war on drugs is justified or the correct solution to drugs. In order to further explore this controversy, we will in the following go down other data routes. During our work, we uncovered two major Wikipedia articles we will work on: the initial article War on drugs and Arguments for and against drug prohibition.

3 Wikipedia Timelines

War on Drugs

Data Source

Data Harvest

For Google Trends the harvest was done manually. “War on drugs” was used as a search term, settings were global search from 2004 (earliest possible date) until February 2019. Screenshots were then taken of the graph, showing search popularity, the top five countries doing the search and lastly a screenshot of the same search, but limited from global to USA only. The data from the global search was also downloaded as a .csv file for manual inspection.


This graph shows how the article originated just before 2006. Revisions were made extensively in the early years, slowing down to a point where today in 2019, hardly any revisions are made, and the article seems to have stabilized in regards to revisions. For comparison, we searched for “War on drugs” on Google Trends, using the settings Global, resulting in the following graph:

Judging from the peaks in the two graphs, they do not show any distinctive similarities in activity, but the Google Trends graph shows two big spikes around 2014/15 and 2017.

We made the same search but showing different countries and their prevalence using the search:

Here we were surprised since we were expecting the USA and other middle and southern American countries to be ranking higher, as perceived from our research because this is where the war on drugs is happening and has the biggest impact. The Philippines, being clearly at the top, we discovered that this is a result of Rodrigo Duterte being elected president in 2016. He started the countries own, very aggressive, war on drugs.

Using the data from the Wikipedia article and putting it through resulting in the following graph:

The graphs show much of the same story as the graph on the war on drugs article; it was created the middle of 2016 when Duterte was elected, spiking early before stabilizing and today in 2019 there is limited activity. However, neither of the rawgraphs indicated what the three spikes in searches on Google Trends indicate.

Using the .csv file from Google Trends we see the specific time period these spikes happened:




Limiting our search on Google Trends on War on drugs to USA only, we see a similar story of spikes, indicating these spikes were not so much a global trend, but more situated in USA.

As a final task, we wanted to carry out a qualitative investigation on why the search peak rose in the summer of 2017 to see if it could be related to previously peaks or empirical information extracted throughout the work on Wikipedia or Scopus. This was done by the simple use of google searches (term: war on drugs 2017), which then provided us with the most relevant results. 2017 saw the release of both a new album from the band War on drugs, during the summer and a movie named War on drugs. We assume that the peak in 2017 was due to the new album and movie, as both show up when doing the search on Google.



Data Harvest

We also ran a python script, that retrieves data from the talk history of an article, however the time period we were interested in yielded 0 results. We used another script to provide an overview on how many revisions are being made by different unique users, via calling the Wikipedia API. This also made it possible for us to identify the 50 most active users within the page.


Initially, the impression we got was the same as the two earlier bump graphs; the article is created, heavy revision is being done until it stabilizes, and today, in 2019, barely any revisions are being made. However, during December 2015, a heavy set of revisions was made, prompting further investigation. In the .csv file showing revisions, with months in column A, revisions count in column B and unique users in column C, we see the spike:

Using the python script described in the Harvest section, we were unable to get any results. We then manually went to the history part of the Talk page for the article, confirming that there were no talks made about revision during the time period.

The answer presented itself when inspecting the revision history of the article:

The 41 revisions made during December was not a result of a sudden change in the debate, but one user, The Anome, making many small changes, most only 4 bytes in size, showing an example of the media, and the way this specific user does their revisions, affecting the data we have gathered. Following this information, we decided to map the top 50 users of the Wikipedia article, as shown in the circle graph using below.

For this graph, the size of the circles is determined by the amount of revisions made by the individual user in the specific Wikipedia article, and their color indicates the total amount of revisions made on Wikipedia by the user. This color starts at a light blue for the smallest grouping, and becomes gradually darker, the more total revisions are done by the user (also displayed in the table below). The different group sizes were determined by the group, based on the different ranges we saw would make sense.

The choice to color code the graph was made as an indicator to the validity of the users; implying that total revision count will heighten the validity of the user and their revision. This approach can be flawed though, as Wikipedia themselves acknowledge in their page on Edit Count: “Edit counts do not necessarily reflect the value of a user’s contributions to the Wikipedia project.”. On this page, they list 7 reasons to why quantity does not always equal quality. This is something we ourself experienced with the user The Anome, as his 41 very small revisions made a spike on the revision timeline, without any significant change in the article or the debate surrounding it. We do however choose to still display the colors, and they help show the full story and allows the reader to judge for themselves, with the knowledge presented here, that quantity does not always equal quality.

4 Hyphe

Data source

Data Harvest

Using Hyphe, we sat the source to be the Wikipedia article and did web crawl at depth 1. For War on drugs this resulted in 107 new pages discovered, and the same for the Arguments article; 107 pages discovered.

Using the prospecting tool in Hyphe, we manually removed some web entities before doing a second crawl a depth deeper. Reasons for removing web entities were:

  • Dead links
  • Too generic
  • Blocked access in the EU, often due to GDPR

After this process, 67 pages remained for War on drugs, 55 for Arguments.

Before next crawl each web entity was revisited in the crawl menu, making sure only to crawl the direct links and not the generic websites, as shown in the picture below.

In this example, we only wanted to further crawl the specific BBC article, and not the site itself as this would yield a far too generic result not related to our scope. Therefore we removed the top two links shown in the picture.

After this qualitative edit of our corpus, we made a new crawl to depth 1, including the original Wikipedia article, making the entire process as a modified depth 2 crawl. This resulted in 2832 new pages discovered for War on drugs and 2227 for the Argument article.


The merged file was input to Gephi as a directed, monopartite network. The following settings and actions were performed to create a base map:

  • As they were deemed too generic and occurred often, and were removed as nodes.
  • Force Atlas 2
  • Scaling: 15
  • Prevent Overlap
  • The color of nodes was set to their attribute From, meaning they were colored to what network they originated from. Purple is War on drugs, orange is Arguments and green is both.
  • Average Degree, size ranging from 11,5 to 100.

This procedure resulted in our base map as shown in the picture below:

The biggest nodes in this network, with the degree setting, can be seen with the blue color, which refers to them as a web page, where both datasets have relations to it. The biggest nodes in the network, are respectively with 734 connections, with 427 and with 400., an anti-imperialistic journalistic website, voice of Latin American countries in the spirit of Simón Bolívar. The interesting part is that with its 734 connections to other webpages only three of these are in-degree, which means that 731 of the connections are out-degree links towards other webpages. The main connections from can be seen as orange nodes, which indicates that it is mainly used in the War on drugs webpage. The opposite can be seen with the second biggest node in the network, which mostly refer to the webpage Arguments for and against drug prohibition. These two web pages can, therefore, almost be seen as particular in the network with different representatives in the network.

From this start network, two other iterations of the network were created. The first of the second networks were made to represent a merged crawl with a depth of 1. This was done by removing all nodes in Gephi’s Data Laboratory that had the attribute “Discovered, Discovered”, meaning they were not part of the original 67 sources in War on drugs or 55 in Arguments.

The following were done for the new map:

  • Average degree, size 10–50
  • Force Atlas 2
  • Scaling: 15
  • Prevent Overlap
  • Recolored according to their From attribute

This resulted in the following map:

On this map, we see a much smaller network. The number of nodes remaining is 106 and edges 407. There are two very distinct nodes, these being the original Wikipedia article. We view them as disturbing the network and what we can analyze from it, so we removed them and did the following work on the network again:

  • Average degree, size 5–30
  • Force Atlas 2
  • Scaling: 15
  • Prevent Overlap

This resulted in the following network:

The biggest node is We found it hard to draw any concrete analysis from this new network, the remaining nodes were simply too few. Therefore the last version of a network from this dataset was created, with the aim of reducing the original dataset to a more comprehensible network, while still having enough nodes remaining for it to be valid.

In the last iteration we created, we used the degree range filter in Gephi and removed all nodes that only had 1 edge. After removing these nodes, the network was left with 563 (12,9%) of its nodes and 1771 (31,8%) of its edges. After this we used the same process:

  • Average degree, size 10–50
  • Force Atlas 2
  • Scaling: 15
  • Prevent Overlap
  • Recolored according to their “from” attribute

This resulted in the following map:

The first observation made from this map is that the green nodes, those being only found in the Arguments article, are very limited in number compared to War on drugs node and those shared, making up only just over 10%. Another observation is that we have three major nodes, all purple. In ranking order of degree range these consists of,, and

The position of these major nodes is relevant when inspecting their surroundings. situated on the right side of the network and connected to the cluster of War on drugs. This node is located near other sizeable nodes such as (Council of Hemispheric Affairs), (news website with a focus on USA politics) and (Also a USA news website, but with a focus on foreign policy).

The other two of the biggest nodes, and focus more on the information and reform side of the issues of drugs. Therefore it makes sense that they are located near the top/left of the network and connected to the cluster of Arguments only nodes.

In a rough estimate it is possible to draw a line diagonal across this network, displaying the two different clusters, with War on drugs to the right / bottom and Arguments being in the top / left, with some nodes however still being located in the “wrong” cluster.

The last observation we will highlight in regards to this map is the disproportionate distribution of nodes only pertaining to either one of the original Wikipedia articles, with War on drugs having 25,22% of total nodes and Arguments 10,3%. We speculate that this is due to the reasoning and sources used in the article on Arguments can in many ways be used to argue for or against the war on drugs, and therefore be present in the article or its depth 1 nodes, it does not function the same the other way around. The war on drugs, and whether it is seen as a positive or negative event, is but one of many reasonings used in the debate surrounding the should-be legality of drugs.

5 Scopus & Cortext

Data source

Data Harvest

A possible source of error in the harvesting process could be errors made in the initial process, a few articles might have been missed. Another source of error could be itself, as some articles might not have shown up on this search portal, where they might have on other portals.

A last note regarding the initial harvest is that it only targeted scientific articles. While they could still be relevant and valid to the debate, sources such as governmental publications, news articles, NGO reports, and public speeches were not included, as does not include these. This harvest yielded 19 articles on, 5 of them were categorized as for prohibition and 14 against.

Two different datasets were compiled from these categories, where we included references used in the articles. This yielded 265 in the for category and 505 in the against category. These were downloaded as .csv files including article information, keywords and abstract, and thereafter merged.


  • Bipartite network
  • First node — Author ID with attribute from
  • Second node — Author keyword

We then got a .gexf file and opened it in Gephi:

  • We opened it mixed so we could get indegree and show which keywords were frequently used
  • Force Atlas 2
  • Scaling: 10
  • Prevent Overlap
  • The color of the nodes was set to the attribute from, with the following colors showing their allegiance in the table below

Characteristics of the network:

These settings resulted in the following network:

In the network above we see a clear distinction and clustering between the red nodes and the green nodes. This could point to a difference in focus between the two clusters and the arguments they use to advocate for their cause. If we take a closer look at the clusters we can see for example in the red cluster i.e. the against articles focus on genetics, gene vs. environment and depression. This could mean that the arguments formed in this cluster are leading towards drug-use being a genetic trait.

Taking a look at the green cluster we see that the keywords most frequently used are adolescence, marijuana, and dependence. This could point to the fact that cannabinoids are addictive and pose a threat to the development of adolescence, as the keyword development is also linked to these others. This points to the argument that drug prohibition is needed to ensure the proper brain development of youths. This is in clear opposition to the red cluster where the discourse is around genetics vs. environment and not so much the brain development in relation to drug use. We can also see that marijuana is located in the middle of the two clusters, this could be because it is a frequently used term on both sides, the same goes for adolescents, which also could point to a debate on both sides about the what marijuana or other drugs does to adolescents.

Taking this dataset further into another route of analysis, we used Cortext. We used the .csv file we gathered from Scopus, which included all the abstracts of the articles, and did an analysis in Cortext via the following steps:

  • The .csv file was converted into a .tsv file, required for Cortext
  • This file was uploaded to Cortext using standard settings, with “robust .csv” as file option. Cortext then index the file as a .DB file
  • Using this .DB file we ran the script “Term Extraction” using standard settings besides “Lexical extraction advanced settings” we set to yet, and changed the setting from “sentence level” to “document level”
  • Finally, we ran the “Map Heterogeneous Networks” script, to transform the previous Term Extraction file into a network. Settings we specified was in node selection, we set the first and second field to Term, and set 100 as the number of nodes since the previous script extracted the top 100 terms used

These steps provided us a picture mapped by Cortext, and a .gefx file for both networks; articles in the for network and articles in the against network.

In the above picture, we can see the orange green and purple clusters, situated in the lower and right part of the network, that relate to the same topics shown in our Scopus network with author keywords. In the orange/red cluster we can see the topics are about different mental disorders, the purple cluster is about genetics and epigenetics which involves the genetics vs. environment debate. In the green cluster, the topic of drugs and drug use is related to genetic influences and is connected to the red cluster. This is in direct correlation with the Scopus network further elaborating the claim that the debate amongst scholars in the against category are discussing whether or not drug use and its effects are genetic traits. However, as the dataset is the same for both networks it can be said that it just so happens that the authors used the same keywords as Cortext via co-occurrence in the abstracts of the articles.

Certain co-occurrence keywords are likewise represented in both the Cortext network and the Scopus network for the articles for. However, what is shown in the Cortext network is an abundance of positivistic research related to drugs and their effects, the purple cluster interestingly relates to testing of rats and the development of adolescent rats. The for articles found also relate the use of marijuana to cocaine, as a gateway drug, which was otherwise left out in the articles against. This could show a difference between the two factions, where the for prohibition articles put a negative spin on marijuana use as a gateway to the use of harder drugs like cocaine, where the against articles either neglect to discuss this or choose not to partake in a debate about the gateway-potential of marijuana.

6 Conclusion

Our three different networks on Wikipedia provided us insight into the controversy and a basic understanding of the actors, but our understanding of the controversy was not fulfilled. Seeking to further this we did timelines on two different articles we found highly relevant, however, these did not enlighten us further or show critical information regarding the controversy. This prompted us to go broader in our data harvesting in our second source.

Our second source of data came from Hyphe, which proved to be helpful in crawling the internet for related web pages, that would be beneficial in the quest to understand the controversy. Our findings throughout the work with Hyphe showed us, that when merging the two networks we crawled, several overlaps occurred which helped us to understand the network of the controversy and even deeper insight into the actors.

Our final source of data came from scraping the Scopus database for relevant scientific articles, according to specific references from the Wikipedia Arguments article, which served the purpose of illustrating specific keywords. This also provided us with a visualization depicting how certain keywords were used by either for or against or used by both sides. Furthermore, our Cortext visualization confirmed the alignment of nodes which the Scopus illustration provided.

Through our work presented here we have achieved two things:

  • We have ourself gained deep insight into the controversy regarding the war on drugs, especially issues and actors
  • The visualizations we have made and the arguments made from them could be useful tools for involved actors in the furthering of their arguments within the controversy


Marres, N. (2015). Why Map Issues? On Controversy Analysis as a Digital Method. Science, Technology, & Human Values, 40(5), 655–686.

Medialab (2019) Hyphe [Online]
Accessed 25/02/2019

Radioproject (2019) Making Contact Radio [Online]
Accessed 26/02–2019

Venturini, T (2009): Diving in magma: how to explore controversies with actor-network theory. Public Understanding of Science, 19(3), 258–273.

War on Drugs (2019) War on drugs [Online]
Accessed 26/02–2019



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store