Health Informatics: A two-part landscape exploration on the topic of Health Informatics, examining potential media effects and possible controversies on Wikipedia.

Feb 15 · 21 min read

By: Cecilie K.R. Bertelsen, Helena A. Haxvig, Joachim Daus-Petersen & Sofia I. Stancheva


Part one: Introducing the Health Informatics category

What is Health Informatics and why is it interesting? Health Informatics is becoming an even bigger part of our modern healthcare sector, but existing communities perceive and use the concept differently, which might create contrasts in what constitutes the term and if it is being explicitly discussed on internet forums.

We, therefore, aim to explore the topic of Health Informatics on Wikipedia and obtain an overview of its network as well as knowledge on how it is understood by different actors. We identify several thematic clusters, however, it must be noted that the obtained data is biased towards English articles solely from the Wikipedia platform. Furthermore, the algorithms used for interpretation and visualisation of the data is opinionated and manipulative with a certain bias.

Protocol

We explored the seed category of Health Informatics on Wikipedia and all its member pages by using a script that connects to the Wikipedia API (Application Programming Interface). The depth is set to 1, meaning that the script also looks for (crawls) the member pages of the sub-categories associated with our seed category. Calling the API provides us with access to the HTML code for all 931 pages in our category and sub-categories.

In this case, we have chosen to run two different scripts, one that obtains (scrapes) all hyperlinks in the body text and one that scrapes all hyperlinks on a page that refers to other Wikipedia categories or pages.

Both scripts produced two network files each, one with pages of the category connected to each other, and one where the pages are further connected to all other Wikipedia pages they cite (outside the category and sub-categories). We chose to proceed with the second type of files because we didn’t want to exclude the pages outside our category and sub-categories before we could see how they were connected.

Visualisation 1: Diagram showing the sequence/order in which we applied the scripts to harvest data from the 931 pages in our category and sub-categories.

The Health Informatics Network

To get an overview of the category, we chose to visualise the network of the category and subcategories’ pages (i.e. the corpus) in two different versions: one with connections by hyperlinks in the text (InText) and one with connections by all hyperlinks on a page (All_Links).

On these, we applied the layout ForceAtlas2 to explore the networks’ structure. Both contained multiple disconnected nodes and by applying the force-directed layout, these got pushed further away. The filter Giant Component solved this by removing the disconnected nodes.

For less excess noise, we added the partition filter to remove all the not a member nodes from the network and changed the edge colour to grey to enhance the visibility of the nodes.

Visualisation 2 (Left): The InText network, with members and non-members included in the network. Visualisation 3 (Right): The All_Links network. Member and non-member pages included in the network. By applying only ForceAtlas2 we can see that the All_Links network have signs of clusters, while is not clear in the InText links network.

A comparison between the two visualisations (visualisation 1 and 2) indicated a difference in the formation of clusters appearing in the All_Links but not in the InText, which might be caused by the Wikipedia templates. These allow editors to collaborate on covering a topic comprised of multiple related Wikipedia pages.

Snapshot 1: An example of Wikipedia templates from the Wikipedia page Health Informatics. Wikipedia templates are found at the bottom of most pages to provide repetition of information and are “[…] commonly used for boilerplate messages, standard warnings or notices, infoboxes, navigational boxes, and similar purposes.” (Wikipedia, 2018)

For further insight, we continued working with the All_Links network, filtering the degree range to 150–672, in order to remove the nodes that are not well connected, followed by rerunning ForceAtlas2 to prevent overlap between nodes. Hereafter, the node size range was set to 7–50 according to the degree, which is a measure for the number of edges to other nodes visualised by increasing/decreasing the size of the node. Then we calculated modularity and applied colours accordingly, as well as increased the upper margin of the node sizes to 120.

Visualisation 4: Visualisation of the network for the 931 Wikipedia pages in the category Health Informatics and its sub-categories as well as pages outside the category, connected by all hyperlinks on a page, with annotations related to the different visible clusters.

In order to juxtapose the two networks, the same procedure was applied to the InText network, which produced faint clusters compared to the All_Links network. This resulted in the suspicion that the difference is manifested in the way the two networks harvest hyperlinks because the All_Links include references in the Wikipedia templates which creates a media effect. By looking at some of the articles, we noticed a shared use of templates and hyperlinks connecting the pages.

The four big clusters might indicate some high degree nodes drawing attention to specific crowds in the network.

Six derivatives of the All_Links network

We tried to get an overview of what type of pages our clusters in the All_Links network contained and discussed what might be interesting to gain an insight into, which resulted in a list of 15 keywords we wanted to harvest data on in relation to our corpus.

We ran the keyword-search script using the “category members.json-file” as our input, which contains all 931 page titles. The script was run on all pages in our data corpus while using a wildcard, thus accepting different endings to the keywords (e.g. policy, policies).

Table 1: Showing a snapshot of the data from our keyword-search imported into excel, showing the results of 21 pages in relation to our 15 keywords, where the rows are sorted after the occurrences of the keyword “patient” from high to low.

Afterwards, we imported the results into our All_Links network and visualised each keyword to see which were the most interesting, and ended up choosing the following for further investigation: efficien*, legislat*, law, polic*, safe*, secur*.

Visualisation 5: Six representations of the All_Links network (with the aforementioned layout applied) where the node size is based on the number of times a specific keyword is mentioned in a page in the network. The nodes are sized by the degree 7–60 and coloured according to modularity.

To better understand how the keywords were used in their respective articles, we manually looked through them. We see law, legislat* and secur* mostly in the green cluster, in the context of data security and privacy. Meanwhile, polic* and safe* are mostly used in the purple cluster in the context of medical trials, and efficien* is prominent in both purple and green clusters, in the context of technology and data efficiency, with a slight mention of healthcare quality.

So in summation, there are indications of different focus areas in the green and purple clusters, where the former mostly deals with concerns about privacy of data used in health information systems, whereas the latter is more concerned with medicine and clinical trials and only scarcely addresses health information systems.

Timelines for edit history of two selected Wikipedia pages

We created two timelines of the pages Medical Record (MR) and Evidence-Based Medicine (EBM), respectively, in relation to revision count and unique members, based on the findings from our keyword search networks. These showed that the words legisla* and law is clearly related to MR, while polic* is related to EBM, which is located in another cluster. The words are somehow related to each other meaning-wise but they seem to be context or area specific.

Visualisation 6: Revision timelines of the Wikipedia pages: Medical record and Evidence-Based Medicine, visualizing both the number of revisions (as the blue line) and the number of unique users doing the revisions (as the orange bars) since 2004 and 2001 when the articles were first written respectively, dates are written in dd-mm-yyyy format.

In both diagrams we observe the occasional high peaks in the number of revisions within a short amount of time by a relatively low number of users, potentially indicating a dispute within the community. This can be further examined in the revision histories of the pages, where we find the specific revisions and their comments as well as the talk-pages where users discuss revisions.

Snapshot 2: Snapshot from the revision history of the Wikipedia page “Medical Record”, showing the four most recent revisions and their comments highlighted with a red line.
Snapshot 3: Snapshot from the talk-page of the Wikipedia page “Medical Record”, showing two different users discussing changes in the “Introductory paragraph” of the page.

We decided to look at revisions between 01–12–2007 and 01–01–2010 (dd-mm-yyyy) in MR. We later discovered that the page was merged in 2005, which might provide an explanation for why the number of revisions is so low. However, we did not find anything of interest here. Only a short remark regarding the content of the page in relation to ethics, showing bias against certain perspectives on the topic. Other than that the talk threads are relatively short with few and polite replies.

While examining the reviews between 01–07–2009 and 01–01–2010 in EBM, we can see from the Contents table that the “Criticism” section has been the topic of a long debate. This can explain the rise of revisions. Among the users, there is one prominent reviewer, who has left the majority of the comments in the talk-page, often at the end of the threads. This user’s activity is probably the reason for the low unique user count for this period. Additionally, in section 2 we notice an argument where a user boldly expresses his displeasure at how his edits were handled. He has a similarly high count of replies as the aforementioned user, though heavily concentrated in his own thread.

Snapshot 4: Part of the contents table of the EBM Talk page.

A network of co-occurring noun phrases extracted through semantic analysis

Visualisation 7: A visualisation of a semantic analysis network of the full-text pages from the Wikipedia category of Health Informatics, made in CorTexT (a platform used for textual corpuses analysis). Each cluster is provided with a number in relation to the description below.

It seems that the semantic analysis reshapes the clusters into smaller but more specific ones, which we will shortly describe.

Firstly, number 1 mainly consists of health information and technology related pages. It has edges to number 2 made of information exchange and different health record systems as well as edges to number 3 which comprises informatics and different associations engaged within the field of informatics and health sciences. Secondly, number 4 is concerned with health information and management systems. This cluster is connected to the network through only two nodes: the health systems in number 5, and the health informatics node within number 3. The same is the case with number 6 concerning health care and care providers, which is connected to the rest of the network only by the test result node in the orange cluster. Number 7 is more mixed, with words mentioning different diseases along with data protection, patient safety, medicine and research, while number 8 is centred about the word management.

This reshaping of the clusters might indicate that even though some topics are well connected (and showed as one cluster in the All_Links network), the way they are discussed might be different, resulting in smaller and more specific groupings/clusters visualised in the network of the semantic analysis.

What might be interesting here is that the node health informatics only plays a minor role in the network since this is the Wikipedia category and thus also the title of the main article of this category. Even though the algorithms of CorTexT forces the clusters apart from each other, we still expected that health informatics would play a bigger role, and have a higher node degree.

Part two: Introducing user-revision networks

From the initial examination, we can say that Health Informatics as an overall topic does not leave clear signs of controversial issues on Wikipedia. We see this based on a stable network of the Health Informatics category with no clear clustering. Furthermore, a qualitative analysis of a selected sample from the data corpus showed a relatively stable discussion on the Wikipedia talk-pages, about page content. This sparks a curiosity, as we know from the world “outside” of Wikipedia, that certain topics, issues and cases that can be said to be within the framework of Health Informatics are controversial. In a Danish context, Sundhedsplatformen [Health Care Platform], Telemedicine and the implementation of clinical logistics systems could be mentioned as examples of such.

An explanation for the stable conditions of the category might be found in the way “guardian” users convene around protecting the topic from “outsiders” revisions. We see signs of the same unique users maintaining the pages included in our data corpus and how Wikipedia templates shapes page content. In order to conclude something about how certain elements create a media effect which influences the stability of our corpus we need to examine this further.

The idea is to gather connected items and encircling them in what Latour et al (2012) have called a monad. Thus we deal with a collecting activity, where the monads “gathers, assembles, specifies, grasps, encapsulates, envelops those attributes in a unique way” (Latour et al., 2012: 608), and these are what we wish to explore.

Looking at the media effects within the category makes it relevant to turn towards the Affirmative mode proposed by Marres & Moats (2015), that suggests the consideration of “how platform-specific dynamics may be an indicator of issue activity” (Marres & Moats, 2015: 9). As we have already seen how the Wikipedia templates might influence the structure of a network, we wonder, what else might platform-specific dynamics do?

Protocol: Exploring user-revision networks

User-revision networks:

  • Input: “category members.json-file”
  • Harvesting user and revision data on corpus pages
  • Time period: 2004–2019
  • Output: two networks with bots and two networks with bots excluded
  • Monopartite network: nodes are pages and edges are users
  • Bipartite network: nodes are both pages and users, where edges are the number of revisions

Revision timelines:

  • Timelines of 10 selected pages which were harvested and created as explained previously, c.f. visualisation 1

Keyword search:

  • Input: “category members.json-file”
  • Harvesting revision history of data corpus pages from 2004–2019
  • Search for keywords in the harvested data
  • Output: .csv file of pages and keywords in relation to their occurrence in each page’s revision history

Extended user-revision networks:

Script 1: Creating user list based on revisions on pages in the corpus

  • Input: “category members.json-file”
  • Harvesting a list of users based on revisions on pages in the corpus, with bots blacklisted
  • Time period: 2001–2019
  • Output: .json file with list of users

Script 2: Harvest revisions from users on member and non-member pages of data corpus

  • Input: .json file with list of users
  • Blacklisting bots and filtering users and revisions to only harvest top 10 most active users with a maximum of 100.000 revisions per user
  • Output: .json file of users and their revisions

Script 3: Build an extended bipartite network

  • Input: .json file of users and their revisions
  • Filtering pages outside the corpus with under 10 revisions
  • Filtering out “Wikipedia about” pages
  • Output: Bipartite network file of users and revisions beyond our corpus

Script 4: Timeline of top active users’ revisions

  • Input: .json file of users and their revisions
  • Output: .csv file with top most active users and their revisions from 2001–2019

Revision history: the relationship between pages and number of users revising them

To gain insight into the relationship between the users and the category pages, we harvested data on all revisions made in the category, which produced a mono- and a bipartite network. The former visualises the pages as nodes and the edges being users doing revisions (c.f. visualisation 8), whereas the latter visualises both users and pages as nodes with the edges representing revisions (c.f. visualisation 9).

Visualisation 8: Monopartite network of pages and user revisions, where we applied the algorithm ForceAtlas2 with Prevent Overlap on and scaling set to 10. The attribute “degree range” was set to 131–810 and in order to rank the nodes by degree, the size range was set to min=2 and max=500. The modularity for the entire network was calculated and the nodes coloured accordingly.

The monopartite network visualises an extremely interconnected network of 834 nodes and 232983 edges, which illustrates a very closely connected community. However, we can only see that the users tie the network together, but not how they are related to the pages and thus another type of network is needed.

Visualisation 9: Bipartite network of users and pages as nodes, with edges being revisions. We applied ForceAtlas2 with Prevent Overlap on, along with the partition filter in order to colour the nodes by type (users: green, pages: red). We deleted the node representing Wikipedia’s “mainframe” and applied the layout Expansion to gain a better overview, followed by a ranking of the nodes by degree 50–300 in order to view the most revised pages and most active users.
Visualisation 10: Bipartite network of users and pages as nodes, with edges being revisions. The protocol for visualising the network is the same as the one before, but in this, all bot-users have been removed, in order to determine whether or not these might have a media effect on the network visualisation.

According to the two bipartite networks (Visualisation 9 and 10) and the four tables below we can see how the presence of bots influence the list of top 10 users, seeing as they occupy four spaces in the list, whereas their presence does not affect the top 10 pages in the category (other than the weighted degree). This might be explained by the fact that some bots are created by users to enforce rulesets on pages, and thus just extends the users reach on specific pages, however, these bots must fulfil the Wikipedia bot policy and be approved by the Bot Approvals Group.

Table 2: Top tables show the top 10 users with and without bots, as well as how many pages they have edited (Degree) and how many edits they have done (Weighted degree) within our category. Bottom tables show the top 10 pages with and without bot edits, as well as how many users have edited it (Degree) and how many times it was edited in total (Weighted degree).

To understand the role of the bots, we adopted a precautionary mode instead of excluding them before starting the data analysis (Marres & Moats, 2015). The idea is to see whether the bots distorts the network or if they play an important role before eventually excluding them. However, it can be difficult to evaluate the influence of bots within the user revision network, seeing as they take on different tasks: fixing grammar, misspelling and references, while others revert vandalism and blanking of pages. The latter is, therefore, more likely to be sensitive to issues and threats that can destabilise Wikipedia pages, and thus play an important role in the network (Marres & Moats, 2015). However the exclusion of bots shows only minor differences in the interconnection of the network, thus we expect that the network is mostly maintained and kept stable through revisions conducted by the users.

Revision Timelines

The bipartite network provided an overview of the top 10 pages with the most users (not including bots) doing revisions, which we decided to look further into by creating a timeline for each page, showing how the pages evolved over time. Here we observe a mixture of peaks and slow periods as well as smaller peaks with a lot of users, of which we selected a sample of, the, in our opinion, most interesting ones, which could point us in the direction of what has caused them to attract a high number of unique users.

Visualisation 11: During the peak between 06–2007 and 07–2008 there is discussion regarding the merge of the pages Electronic Health Record, Electronic Medical Record and Personal Health Record. The reason for the second peak around 02–2010 is small revisions while in the peak around 02–2012 we find “edit wars” (InformaticsMD).
Visualisation 12: Between 12–2006 and 05–2007 users are deleting and adding large amounts of text. In the second identified group of spikes around 08–2010, there is one user doing multiple minor edits in one day, possibly explaining the peaks.
Visualisation 13: During 08–2009 a user argues that the Criticism section is biased. During 09–2012 and 11–2012 we notice the presence of user MistyMorn in the reviews, also involved in a dispute. In 10–2014 there are minor changes, as well as a deleted section detected as possible vandalism.
Visualisation 14: The 08–2008 peak is caused by a user continuously adding promotional links, and another reverting them. During 04–2013 and 07–2013 a Chinese Health Information Management student making multiple edits. Furthermore, there is a discussion on whether to include Hong Kong in one of the sections indicating a possible bias.
Visualisation 15: In 10–2009 and 02–2013 the spike is due to a lot of vandalism by the same user. Additionally, in 02–2013 there is a dispute over the usage of external links, leading to users making and reverting edits.
Visualisation 16: Between 06–2006 and 08–2006 there are many small edits by both anonymous and public users. Between 04–2009 and 09–2009 we see vandalism as well as two users reverting each other’s edits four times. Between 09–2009 and 07–2010 there is text deletion by user Toyokuni3 saying that the text was “verbatim from newspaper” and there are several negative posts on the talk-page regarding the article’s poor quality and commercial interest.
Visualisation 17: In 12–2008 to 02–2009 the page was blanked twice, possibly being a contributing factor to this spike. In 04–2016 a user adds a paragraph that copies and replaces the rest of the Wikipedia page. This is detected as vandalism by a bot.
Visualisation 18: All activity for this page seems to be focused between 04–2013 and 07–2013. The many revisions are also reflected on the talk-page, where Penbat, a self-proclaimed page-owner, deletes suggestions, insisting on actual changes being made to the page instead. Markworthen is also highly active and claims that the text is not neutral enough and prompts for discussion in the talk-page, posting Wikipedia guidelines, but is unsuccessful in sparking a debate.
Visualisation 19: The large peak between 03–2009 and 06–2009 occurs because of multiple blankings and vandalism (mostly detected by ClueBot), in addition to general edits and disagreements about external links. On 02–2013 an anonymous user blanked the page five times, causing multiple revisions. In 01–2017 bots mostly edits, while later in 06–2017 unknown users start deleting the page with bots reverting it.
Visualisation 20: During 12–2012 to 04–2013 Myfitnesscompanion made edits which were reverted due to self-promotion and spam (user is now blocked). Between 06–2015 and 11–2015 Bluerasberry adds a link twice which is also reverted. This is brought up for discussion on the talk-page resulting in a long and polite thread about the use of links according to Wikipedia’s policy. In 10–2015 there is a dispute over a large revision and its reversion, where user Dreamyshade accuses others of COI and sockpuppetry and Bluerasberry tries to resolve it.

In summary, we see that some of the pages, especially ELIZA, Telemedicine and Robot-assisted surgery, have been frequent targets of vandalism including “blanking” a term connected to users deleting sections or entire pages within Wikipedia. This tends to be detected primarily by bots. Those aside, we also see some users maintaining the pages. But among the range of different users making revisions we only see few re-occurrences within the 10 pages, showing only a few initial signs of our supposed guardians.

On the main article, Health Informatics, an interesting peak is seen in 01–2009, where one user does the majority of edits, e.g. replacing external links with internal links, indicating an effort to keep users within Wikipedia’s sphere.

Furthermore, we found disputes over external links as to what kind of links should be added to the pages. We tried to see if it was possible to find a pattern of a “reference codex” within the chosen journals and sources of the category, by doing a co-reference network, but this data proved difficult to harvest without errors.

Extended keyword search

During our qualitative analysis of the spikes on the 10 timelines, we looked for specific words in the revision histories and talk-pages that were common indicators or triggers of disputes over the content of the pages. This resulted in a list of keywords, which we wanted to examine further, to see if they were present on other pages or unique to one, c.f. The keywords are the following: advertis*, COI, conflict of interest, edit war*, edit-war*, editwar*, meatpuppet*, sockpuppet*, revert*, self-promotion, self promotion, spam, vandalis*, blank*, disrupt*, repeating characters, undid.

Seeing as these keywords were mostly used in the comments for revisions, e.g. snapshot 5, we ran a script to search for the keywords on the revision history pages of our category pages.

Snapshot 5: An example of a comment in the revision of DSM-5. After a user blanked the page, another user reverts to the previous version of the page. Here we see some of our keywords: undid, reverted, vandalism, blanking.

Applying the results from the keyword search to our All_Links network allow us to visualise how many times the keywords are present on the revision history pages in the category. Due to the nature of our chosen keywords, we might be able to visualise the extent of issues such as vandalism or editing wars within our category.

Visualisation 21: Ten representations of the All_Links network (with the aforementioned layout applied) where the node size is based on the number of times a specific keyword is mentioned in a page’s revision history. The nodes are sized by the degree 7–60 and coloured according to modularity classes.

We might say that if the occurrence of the selected keyword is low on in the category or in a page, then the category is relatively stable, while a high number would probably mean that the category (or the content) is somehow controversial or that someone tries to destabilise it. A few examples of this:

  • “Spam” occurs several times in all clusters and categories, which indicates that the pages might be “under attack” by users who are not serious about the content. The same can be said about the occurrence of “vandalism”, even though it’s not equally frequent.
  • “Undid” and “Revert” are synonyms, and both occur on several pages within the category, which might suggest that edit warring is going on or that some users are very pernickety about other users’ changes.
  • “Self promotion” and “edit war” are only evident in one page each, which might indicate that our category is not very susceptible to users self-promoting or using sockpuppet accounts.

However, when looking at the visualisation we should keep some factors in mind, e.g. that a high vandalism count might not necessarily represent the existence of controversies within a field or page but might just show that some users are very fond of deleting or inserting random things, being so-called “internet-trolls” provoking reactions for the fun of it.

Timeline of unique users doing revisions

The timeline illustrates the top 10 users and the number of revisions done within the category. We can see that there are little to no revisions in the beginning with only a few users being active: The user Rjwilmsi became slightly active within the period 2006–2008, joined by Physadvoc, who was responsible for the peak in 2007. Obiwankenobi and Mikael Häggström started making a few revisions from late 2008–2009, but not a lot happened between late 2008–2010 until Häggström created a few peaks of 50+ revisions in 2011. The rest of the users sporadically started making revisions in the period of 2012–2015.

The timeline shows three top spikes of 300+ revisions connected to different users; Physadvoc, Obiwankenobi and Truebreath. The latter was only active within the peak period of 05–2014 to 08–2014 but was banned shortly after for plagiarising. From late 2013 and after it shows a more consistent flow of revisions.

The users’ most active periods differentiate almost in a sense that allows them to fill in the gaps, where a low number of revisions are made, nearly creating a revision baseline for the timeline, indicating a relatively stable number of revisions throughout the years.

Visualisation 22: Timeline showing the top 10 unique users, indicated by the use of different colours shown to the right, and the number of revisions they have done to the pages within the category of Health Informatics.

Network of top 10 users and pages

To further examine our supposed 10 guardians’ relation to our category, we created a visualisation of a bipartite network containing the top 10 users and the pages both within and outside our category they have revised during the period 01–01–2001 until 01–03–2019. In the script scraping the users and pages, an edge-filter of 10 has been applied to the pages outside our category thus excluding pages with less than a total of 10 revisions made by the top 10 users. Visualisation 23 illustrates the base network and the following visualisations will be derivatives of this network, visualising different interesting aspects.

Visualisation 23: Bipartite network of top 10 users in the category (green annotated nodes) and the Wikipedia pages they have revised (red nodes). We applied the layouts ForceAtlas2 and Expansion of 4 in order to better the visualisation. The size of the user nodes is set to 100 whereas the page nodes are set to 40, in order to better the overview of the users.

From visualisation 23 we can see that some users (e.g. Rjwilmsi and Jytdog) are connected to a vast amount of pages in relation to other users, which means that they revise many different pages, and at this point we estimate that the number of pages the two users have revised is higher than the number of pages within the category. To figure out if the users mainly focus on the pages within or outside our corpus, we applied a filter that makes it possible to make a distinction between the member and non-member pages, c.f. Visualisation 24.

Visualisation 24: This gif visualises the difference between two derivatives of the bipartite network of top 10 users and revised pages. The green annotated nodes are users, the red nodes are pages within our corpus and the light blue nodes represents pages outside our corpus. The red nodes are sized by weighted in-degree and the blue nodes are sized by the value 15.

Visualisation 24 illustrates the users being connected only by a hand-full of member pages and in terms of revisions on pages outside the corpus, they are somewhat divided. Thus it can be said that the top 10 users seem to have diverging foci outside the category since most of the non-member pages clusters tightly around each user. To gain insight into the vague division amongst the pages in our corpus and the sharp division in the pages outside, we look further into the two networks, one by one.

Visualisation 25: Bipartite network visualising the top 10 users and the pages in the corpus, with the page-nodes sized by degree, in order to determine if there is some common ground between the users in relation to the corpus.

Visualisation 25 illustrates how the pages from the corpus, connecting the users are mostly broad topics, e.g. “Electronic Health Record” and “EHealth”. However, the pages bridging the users only represent a small amount of the corpus pages, indicating that the users’ main focus is not necessarily the same.

Visualisation 26: Bipartite network visualising the blue nodes sized by weighted in-degree (11–500) with the biggest ones annotated, in order to gain insight into which pages the users are working on outside the corpus of Health Informatics.

From the network in visualisation 26 we can see that the top 10 users primarily revise their own user-page, which in many cases contains a lot of information about how to conduct yourself on Wikipedia. The users are also concerned with Wikipedia specific pages; templates, talk-pages about e.g. AutoWikiBrowser. But overall there are a lot of medically related pages among those revised by the users in our top 10.

A qualitative examination of the top 10 users

By looking at the user pages of the top 10 users, we can see that they are all engaged in editing health and medicine related pages, which is further supported by the fact that six users have received “The Cure Award” for bringing “free, complete, accurate, up-to-date medical information to the public” (e.g. Rjwilmsi and Mikael Häggström). Five of the users have explicitly stated that they revise pages within the topics of health and medicine while five users also make sure that the pages are clean and concise, fixing grammar, references, links and moving stub-pages to the Wiktionary, which is a free web-based dictionary (e.g. ShelleyAdams and Bluerasberry).

Two of the most active users, Jytdog and Obiwankenobi have decided to retire from Wikipedia, Jytdog did so in December 2018 after making abad error in judgement” and being accused of engaging in several edit wars, which is the term used when users keep reverting or overriding each other’s contributions. However this decision has led to several comments from other users expressing sadness about the decision and it is obvious that Jytdog was committed to Wikipedia, showing guidelines and the common code for being a Wikipedian. Thus, it seems like Jytdog has been highly appreciated amongst fellow Wikipedians.

Snapshot 6: Cutouts from the talk-page of user:Jytdog. Red underlines showing Jytdog’s attachment to Wikipedia, calling it “our beautiful project”, and another Wikipedian stating that Wikipedia will suffer from the decision.

But if you look outside the user-page, you see a different side of Jytdog. For example, on Wikipediocracy there is a long thread of people criticizing Jytdog.

Snapshot 7: Cutout from a page on Wikipediocracy.com regarding the user Jytdog, showing different users’ skepticism, towards Jytdog. (About Jytdog)

The user Truebreath was blocked after several cases of copying and pasting from other sites but was highly active on pages included in our data corpus within a short period of time, as seen on visualisation 22. Additionally, some of our top users’ user and talk-pages are empty, making it difficult to potentially analyze and understand their edits better (e.g Physadvoc and Obiwankenobi).

Conclusion

Controversies might not have the best conditions on Wikipedia, because the encyclopedia seems to have a stable group of users, that take care of the content, reverts vandalism, COI etc.. These users are supported by the bots that keep Wikipedia clean from the issues that might trigger disputes, especially seeing as most bots are created by prominent users. Furthermore, the users do some kind of peer-review on the pages, which seems to prevent controversies. Of course, some topics are controversial in themselves, and this will, of course, be present on Wikipedia as well, as this is what shapes the topic in general, but this is not the case with Health Informatics.

What seems to be a lack of controversies in the category of Health Informatics on Wikipedia might be down to users not having a specific interest within one particular category, thus not having a clear incentive to protect the category from other users and it seems like they have the opportunity to move on to other pages if a challenge arises (unless it is controversial topics in itself). So, even though we see some indications of guardians for the category, we can see that this is not their only concern.

Wikipedia is (still) controversial in itself. Users are not only doing revisions within a specific category, but all over, across many different pages and categories, and it is at a level where it is doubtful that the users have knowledge within all areas. We have seen some examples of users discussing what is the proper page content, but it is also clear that there exist websites that are sceptical about how Wikipedia is maintained by the users. Furthermore, we see cases of very active communities or specific users being self-affirmative, which gives fuel to an anti-claim about Wikipedia: Wikipedia is not something that you can trust, because everyone can edit it.

This is further supported by the fact that even though bots need to be approved by Wikipedia, they are still created to maintain the pages in the users’ image of what Wikipedia should be, which there are many opinions about. Also, the users trying to limit the number of external links on pages by replacing these with internal links poses a controversial bias in relation to a potential conflict of interest within the Wikipedia sphere.

References

Latour, B., Jensen, P., Venturini, T., Grauwin, S., & Boullier, D. (2012). ‘The whole is always smaller than its parts’ — a digital test of Gabriel Tardes’ monads.’ The British Journal of Sociology, 63(4), 590–615. https://doi.org/10.1111/j.1468-4446.2012.01428.x

Marres, N., & Moats, D. (2015). Mapping Controversies with Social Media: The Case for Symmetry. Social Media + Society, 1(2), 205630511560417. https://doi.org/10.1177/2056305115604176

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade