A Wikipedia controversy mapping

Kirstine R. Bork

Katrine Pedersen, Kirstine Bork & Christine Halling

How would you define ‘Parenting’? Is it something you settle from a biologically point of view, from a courtroom, from story tellings, in public debates or from published scientific articles? Which actors should be given voice to and on what terms? And who decides this? The answers to these questions do not seem simple or single sited — one can only try to imagine how it will spark debate when asking “What is good parenting?” and actors would probably agree on disagreeing when wanting to answer this. All together, it paints a picture of a controversy. As Tommaso Venturini puts it “…controversies begin when actors discover that they cannot ignore each other and controversies end when actors manage to work out a solid compromise to live together.” (Venturini:2010:261)

We have chosen to explore this controversy through the media Wikipedia and in order to do so, we will present several visualizations of networks that in different ways help to map the controversy as well as try to understand how Wikipedia as a media affects what you’ll see.
This will tell stories about how some topics within the controversy seem closely related but might just be because of the way Wikipedia is structured. Mapping terms in the controversy that often occur together display discourses relating to both ‘legislation’ and ‘medicine’. Furthermore, keyword findings show how ‘mother’ is presented in more articles than ‘father’ while ‘father’ is more visible together with articles surrounding ‘child custody’. Cases from the articles within the controversy raise new questions about the issues and actors in the controversy. Timelines of revision histories show how editors of the articles are important actors when it comes to evaluating content and its importance, influencing how the controversy is staged. It also displays discussions and negotiations between the editors that doesn’t always agree.

Gathering data from Wikipedia in the category ‘Parenting’

The data we have gathered is from the online encyclopedia Wikipedia. On Wikipedia there are pages, which are encyclopedia articles about a given subject. These pages are sorted into Categories by the editors who contribute to the pages. Furthermore, we have gathered data on the revisions made on some of the pages, and the users who make these edits voluntarily. This includes the revision history which is a log of all revisions, and discussions on the Talk page, which is where the editors can talk to each other about the revisions on a given page.

Fig. 1 Diagram of how data was gathered, resulting in seven visualizations. We have gathered data from all the pages in the category ‘Parenting’, as well as the pages from the subcategories of that category, and from the subcategories of those categories. Those pages all together will be referred to as ‘member pages/members only’.

We have gathered data by scraping and interacting with an API. A scraper is a tool that works off HTML structure and grabs specific information from webpages. HTMLs structure contains instructions about what the content is — if it is a heading, text body, link and so on — and so the scraper can identify the information it is looking for. A crawler is a tool that navigates through webpages and finds the ones that are needed based on predefined rules. Instead of scraping everything, the crawler can identify where the scraper should scrape. An API, short for Application Programming Interface, is a tool that contains sets of information determined by the given media platform that other programs can receive through interacting with the API.

To gather specific data from the Wikipedia-pages, we have used several pre-made Python scripts to make it possible to construct different visual networks.

Using the gathered data to visualize networks in order to map relations and tell a story

We used the data collected to construct different visual networks of relationships between the member pages in Gephi. Gephi is a visualization program that builds networks out of nodes (dots) representing e.g. pages or users, and edges (lines) that show if and how the nodes connect to each other on a given parameter. Working in Gephi we have used the Giant Component filter on all networks, which means that nodes without connection to the bigger network have been removed. We have also used tools like Modularity, which sorts nodes into communities depending on how they relate to each other on a given parameter. If nodes cluster together after running the modularity tool, it means that they relate to each other.

Fig. 2 show the three chosen networks and explain what makes the connections between the pages in each individual network. ‘Bipartite’ in the far right network means that both pages and users are shown by nodes. Each map is made by us, and will be explored under each corresponding section.

In order to compare findings from the different relations between the member pages the networks above have been chosen as our visualizations. The in-text links network is our primary network since it is more topic based than the others. The co-reference network and the user revision network will then be presented and be used in comparison to each other and the in text links network. This is all in order to map the controversy and look at a potential media effect.

The thematic landscape of ‘Parenting’ visualized by an in-text links network

The visualization beneath can help to explore the Wikipedian landscape of the topic ‘Parenting’.

Fig. 3 An annotated directed network of pages in the ‘Parenting’ category connected by in-text links. Nodes are Wikipedia pages and edges point to the page it cites which makes it a directed network. Colored by the modularity tool to show communities. Nodes sized by the degree they get linked to. Layout ForceAtlas2, degree range 6–115.

This network shows various sized clusters of Wikipedia pages that cite each other. The thematic clusters are annotated on the network visualization. This network shows how the editors of the article link to other pages they find relevant for the article’s topic.

Something to notice is the page ‘Mother’ which is almost in the middle of the map with connection to most of the clusters (see illustration beneath).

Fig. 3 with a line

This indicates how the editors of the mother page found it relevant to cite several topics in the landscape of ‘Parenting’ and the other way around. While editors of e.g. the articles in the cluster ‘Childbirth’ did not find it natural to cite articles in ‘Parenting styles’ and vice versa. This might affect how you as a reader get presented to the topic and also tell something about how Wikipedia works as a media.
It also lead us to wanting to explore more of the relations between the pages in regard to discover discourses and issues that exist in the controversy ‘Parenting’.

Mapping relations based on external references

Another way we mapped the thematic relationships between the Parenting Wikipedia pages was through a co-reference network, based on the external sources to which each page refer.

Fig. 4 shows some of the references at the bottom of the Wikipedia page Parenting which editors of this page refers to as external sources.
Fig. 5 show the network based on the pages shared external references. Making the network consisted of fiddling with the layout setting eg. to spread it out and prevent overlap to make the connections more visible. Then the Giant Component filter were used to remove smaller island of networks not connected to the bigger network.

The modularity tool were used to identify 16 communities, each community were given a name summarizing the theme. The communities’ connections consist of pages having external references in common. The more external references the nodes have in common, the thicker the edges are.

Some of the smaller communities share a bigger amount of the same external references. We could speculate if it’s because of (1) the literature on a specific issue being limited, (2) the editors not being knowledgeable enough to know of additional literature, (3) the amount of editors being fewer than on other pages, and therefore the amount of known literature being smaller or (4) because of something completely different. We’ll take a closer look at the network later.

The user revision network with only a simple caption

Fig. 6 shows the communities within the user revision network. We put the Degree range to 2–263, and Edge weight to 3–229, and then applied the Giant Component filter so the map would only include the users editing 2 pages or more, 3 times or more.

How comparing differences and similarities within the three networks tells a story of a media effect

Working on the different networks a hypothesis started to form: Would we see a media effect if we compared all three networks? By seeing a media effect we mean if noticeable patterns in the networks will occur. If they occur it is because of the way Wikipedia is structured and not because of the controversy itself. To test our hypothesis we took offset in the in-text link network by exporting its modularity and then importing it to the other two networks.

Fig. 7 shows the three networks with the modularity of the in-text links network imported, each color on the maps correspond with the original color of the clusters from the in-text links network.

From this we see how some clusters from the in-text links network are also shown as clusters in the two other networks.An example of this is shown in figure 7. Other clusters are a lot more spread out across the networks and some are only present in two of the networks.

Fig. 8 shows the purple, green and yellow clusters in each network. The clusters are a bit more spread out across the user revision network, yet the majority of the nodes are near each other in defined communities.

Our finding is the three selected communities from the in-text network are connected in all three networks; purple breastfeeding and nursing cluster, green Childbirth and after labour cluster and yellow Pregnancy and fertilization cluster respectively. This means for the media effect that a chunk of the member pages, both link to each other within their main texts, refer to the same external sources and are edited by the same users. This could possibly say something about e.g. breastfeeding, childbirth and pregnancy being related topics, but it could also say something about the way Wikipedia is structured. Every individual with access to an internet connection can edit whatever amount of Wikipedia pages, link them to each other and use the same external sources as references. If the editors of the same pages do not oppose this then no other sources or knowledge will be cited.

Fig. 9 is a screenshot from the Wikipedia:about page.

One could speculate how exploring ‘Parenting’ through another media would give other results and possibly staging issues differently. If we were to explore this controversy further, this would be something worth tracing. If we were to dive even further into Wikipedia and our networks, it would also be interesting to look at the individual user from the user revision network, to explore if the same user refers to the same external sources and the like. Furthermore, to ask who have authored the external sources to map them as actors in the controversy.

Fig. 10 shows how the orange and the pink communities from the in-text network appear differently in the Co-reference network and the User revision network, than the clusters in fig. 8 did.

We found that there is a different media effect concerning these communities. When looking at the orange ‘Parenting style’ cluster from the in-text links network, we can see how it does not appear as a cluster in the two other networks, where the orange nodes are spread throughout the network. This could possibly tell a story of users referencing to the same couple of external sources but in different contexts than ‘Parenting styles’.

When looking at the pink ‘Goddesses’ cluster we see how it is both present in the in-text links network and the user revision network but not in the co-reference network. This could possibly tell a story of users being engaged in writing about ‘Goddesses’ but referring to a variety of external sources, which according to the in-text links network does not belong in the same cluster.

These possibilities would be interesting to look further into.

A network of co-occurring noun phrases extracted through semantic analysis

Fig. 11 A network of clusters made from 500 nodes of terms extracted from the semantic analysis. When two nodes are connected by an edge, it means the occurrence of a noun-phrase co-occur with the other noun-phrase. We have annotated some nodes to give an idea of what terms some of the clusters represent.

The semantic analysis comes from a tool called ‘Cortext’ which runs algorithms on a lot of data text in order to decide what seems to be important terms. To explore more of the terrain of ‘Parenting’ a semantic analysis might be useful in order to be presented with phrases that has been found to be meaningful and significant in the text of the articles. The algorithm is built in a way where it might help you see differences rather than commonalities — this might be useful when exploring the actors and issues more or less hidden in the landscape of the controversy ‘Parenting’.

Fig 12. This is a screenshot of some of the extracted terms from the semantic analysis organized by how many times the term occur. In the right column, you see how often they co-occur with other terms.

One thing to notice from this map is how some clusters of terms are closely connected, such as the red and orange to the left, but dis-connected to e.g. the green clusters to the right. This tells us how the controversies within ‘Parenting’ have different discourses, e.g. when you have breastfeeding as a subject, you might not be interested in speaking of custody rights, or while discussing infant formula the tendency might be to also be interested in developing countries.

On another level, one might see how a legal discourse seem to cluster to the left and a medical discourse to the right surrounded by smaller discourses concerning respectively mythology and medias. A thing to remember is how this is the ‘Wikipedian landscape’ of the controversy. One might find discourses being different than these when mapping the controversy through a different media — finding terms occurring together in other ways.

How 12 keywords can be found across the in-text links map and tell stories of issues, actors and controversies

Fig. 13 These in-text links networks shows how 12 selected keywords occur in the Wikipedian landscape of the member pages of ‘Parenting’. The bigger the nodes, the more are the specific keyword mentioned on that page. Some of the articles with the highest mentioning of the keyword have been annotated.

We selected keywords from two strategies: one was inspired by our semantic analysis that gave us the opportunity to find important terms from the member pages.

The second strategy was to use keywords related to words like ‘issues’ and ‘controversies’ such as ‘conflict’ and ‘differences’ this was to trace possible cases of controversies in the different articles.
In the end we chose the 12 keywords you see in the visualization.

This is a snapshot of some of the outcomes from the keywords search across articles and how many times the keyword appeared in total.

Looking at the distribution of ‘mother’ and ‘father’ in the networks, you see how ‘mother’ occur in many clusters and therefore might be presented in several controversies and related to many issues concerning the broad topic ‘Parenting’, while ‘father’ seems to be more isolated in the blue cluster of articles. One might notice how ‘custody’ occur in the same blue cluster as ‘father’. Exploring ‘Parenting’ through Wikipedia, it seems inevitable to connect ‘father’ with the topic ‘child custody’ and one might find a field of issues within this network.

The keyword ‘Conflict’ was among others mentioned several times in the article Parenting coordinator. Zooming in on the qualitative data in the article, we noticed how a controversy concerned this part of ‘Parenting’ with actors such as lawyers, courts, politicians, The Fourth Amendment, and parents.
Mother’s Day was an article where the key word ‘tradition’ was mentioned several times. Diving into the article, we came to understand how Mother’s Day as a tradition have been negotiated many times across countries and through time. E.g. in Germany many wanted to honor mothers but disagreed how to do so. Motives to honor mothers were e.g. motivating women to have more children and to unite the nation. The Nazi Party later put the holiday into a totally different context and met opposition from both churches and women’s organizations.
This has just been a few brief examples of how the 12 keywords can unfold the controversy in ways to ask new questions about the issues and actors involved.

Qualitative inquiry into the workers/workings of Wikipedia

Fig. 14 Timelines showing the revision history, i.e. number of edits made on the page, for ten different pages from January, 2001 to January, 2019. Most pages were not made or revised much before 2005, and Pregnancy, LGBT Parenting, Child custody, Parental leave and Parenting styles was started later.

First, we will discuss immediate observations about the ten timelines before going into depth with two of them; Mother and Father.

The pages that doesn’t have any or much revision activity in the beginning compared to the others can be because the subjects were described on other pages and there weren’t enough content to merit its own page. Some of these subjects might have gained more attention and accept in society which would merit its own page on a popular online encyclopedia, like LGBT Parenting. It might have been a smaller remark on another page, if there even was one, that same sex couples could be parents, but it was not until 2006 that the subject got a page of its own.

Some of the pages that doesn’t have a lot of revisions could be explained by the fact that they could have been a section under another page, but was given its own page because of lack of space on the original page. Editors on pages discuss when something needs a section of its own, and sometimes a section takes up so much space that it is made into its own page. Editors also criticize that some subjects are directed to another page and the fact that that subject doesn’t have a page of its own. In this sense it is the editors opinion that determines what is important enough to have its own page.

‘Child custody’ became interesting to us when the semantic analysis showed a high activity in relation to this term, but when examining the revision history on the page ‘Child custody’ there was barely any activity. Though the term ‘child custody’ is mentioned many times according to the semantic network, it could be because it is mentioned and talked about on other pages than its own page, and thus it might not be necessary for the subject to have a page of its own.

In the following we will elaborate on the Father and Mother timelines.

Fig. 15 The pages Mother and Father begun at the same time and had roughly the same amount of editing activity in the beginning. Both pages were edited a lot more from 2005–2006. The Mother page has a drop in activity in 2008, while the Father page has a somewhat steady editing history until 2018.

These pages are the two pages we chose to examine further. An interesting difference between these two pages is that in spite of the steady and active editing history on the Father page, the editors do not discuss the revisions and content of the page very much. On the Mother page, on the other hand, there is a lot more discussion about the content of the page, while there are fewer edits made than on the Father page after 2008. The Father page is almost only organized in lists with a bit of text, while the mother page has more text, about 500 characters more than the Father page, and only one or two lists.

Fig. 16 Snapshots of some of the lists on the Father page.

The most interesting discussion on both pages revolved around motherhood and fatherhood respectively. Both pages discussed it from the perspective of biology. On the Mother page, they discussed at what time motherhood begins, at the point of (assumed) conception, during pregnancy or after childbirth.

On the Father page they discussed fatherhood in relation to sperm and donation herof. There was disagreements about the word donation in relation to IVF (In Vitro Fertilization, assisted reproductive technology). If a couple is using IVF, is the man then donating sperm to his partner? And what does donating mean? Is donating simply a removal of biological substance from ones body for whatever purpose, or does it only entail giving away biological substance without being involved in the end result, like a kidney donation? In light of this, it was discussed when fatherhood begins. When expecting a child or when the child is born?

While these discussions can seem simple and a matter of opinion, they feed into a bigger controversy about what (responsibilities) is and should be expected of parents at which stage on their path to and in parenthood.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade