Wikipedia the Prequel: How Did Wikipedia Become So Dominate, With An Information Architecture Slant

Randall Oconnor
19 min readDec 6, 2017

Introduction

To answer this question, we need to understand Wikipedia today, but more importantly Wikipedia’s origin story. According to Alexa Internet, Inc. in 2017, Wikipedia is the fifth most visited website in the world. Wikipedia was started in 2001 as a feeder for Nupedia. As is common in emerging technologies, one idea supplants the core business quickly. Nupedia was an English-language Web-based encyclopedia whose articles were written by subject matter volunteer contributors and then reviewed by expert editors before publication. This wiki structure was the “webafication” of the standard dictionary.

The Genesis

To generate more content for Nupedia, wiki software was selected to create feeder content for editorial review. To avoid brand confusion, Nupedia requested that the feeder program have its own name. They selected the name Wikipedia because the feeder program would be run on a Wiki platform. Wiki platforms are run using wiki software, otherwise known as a wiki engine. A wiki engine is a type of content management system, but it differs from most other such systems, including blog software, in that the content is created without any defined owner or leader, and wikis have little implicit structure, allowing structure to emerge according to the needs of the users.

In June 2008, CNET UK listed Nupedia as one of the greatest defunct websites in the still young internet history, noting how the strict control had limited the posting of articles. To add more color to this failure of Nupedia. Nupedia had a seven-step approval process to control content of articles before being posted, rather than live wiki-based updating. Nupedia was designed by committee, with experts to predefined the rules, and it approved only 21 articles in its first year, compared to Wikipedia posting 200 articles in the first month, and 18,000 in the first year. In the “Everyone’s Encyclopedia: Wiki Technology Allows Anyone To Write, Edit and Reference” article by Jonathan Sidener for the Union Trib published on December 6, 2004, Jonathon talks about the growth of content.

Two years after its launch, the English version of Wikipedia contained 100,000 articles. The pace accelerated from there. In February, Wikipedians surpassed the 200,000-article mark. In July, the project reached 300,000 published topics.

Wikipedia’s growth is a product of both the social trend of not trusting experts but trusting “people like me” and the technology that allowed a community to be a chorus of voices rather than one authoritarian voice deciding the “truth”.

Dan Pink in his May 2005, Wired magazine article entitled “The Book Stops Here” discusses the difference between Wikipedia’s goal of being considered a trusted enough source and an encyclopedia’s goal of being a definitive source.

In the beginning, encyclopedias relied on the One Smart Guy model… [Wikipedia is] One for All. Instead of one really smart guy, Wikipedia draws on thousands of fairly smart guys and gals. … Encyclopedias aspire to be infallible. But Wikipedia requires that the perfect never be the enemy of the good. Citizen editors don’t need to make an entry flawless. They just need to make it better. As a result, even many Wikipedians believe the site is not as good as traditional encyclopedias

Despite this approach or maybe because of it, the problems of disruptive editors emerged. The five pillars of Wikipedia (https://en.wikipedia.org/wiki/Wikipedia:Five_pillars) were developed to address these problems.

Trusted Source: Trusted Enough

These five pillars were intended to establish a culture of creating trustable neutral point of view information. Without enforcement these pillars in and of themselves couldn’t deal with the disruptive editor issue. The five pillars were reinforced and supported by the development in 2007 of the Arbitration System which determines final and binding outcomes for disputes between contributors. This approach at first seems a little inconsistent with the idea of many voices on a wiki platform but the need for enforcement of the five pillars was important and allowed Wikipedia to fulfill its mission without being side tracked by disruptive editors.

Daniel Pink goes on to talk about how Wikipedia is still trusted despite disruptive editors.

It turns out that Wikipedia has an innate capacity to heal itself. As a result, woefully outnumbered vandals often give up and leave. (To paraphrase Linus Torvalds, given enough eyeballs, all thugs are callow.) What’s more, making changes is so simple that who prevails often comes down to who cares more. And hardcore Wikipedians care. A lot.

Wikipedia’s Core Value: Quickly Accessible User Generated Content

How did Wikipedia grow so quickly and successfully? The idea of self-governance and quickly publishing your contribution empowered early authors to contribute. The exposure by leading web magazines helped fuel the fire. And Google’s search engine displaying user’s Wikipedia content fanned the flames to an inferno.

User Generated

Wikipedia’s success is due to the contributions of users. Yet Wikipedia provides very little in the way of rewards for contributing. The presumption must be that the individuals who comprise the chorus of many, the editors and contributors, are philanthropist performing a labor of love. Daniel Pink describes how a user contributor can get addicted to being a Wikipedia contributor and such users drove success:

First comes chance — an unexpected encounter. Chance stirs curiosity. Curiosity leads to experimentation. And experimentation cascades into addiction. … String enough of these addicts together, add a few thousand casual users, and pretty soon you have a new way to do an old thing. Humankind has long sought to tame the jungle of knowledge and display it in a zoo of friendly facts. But while the urge to create encyclopedias has endured, the production model has evolved. Wikipedia is the latest stage.

This fervor coupled with a new why to develop an encyclopedia has driven the success of Wikipedia.

My experience as a contributor

Based on my limited experience, I didn’t become a Wiki-addict. The sandbox and ability to add new content is not user friendly. The ability to edit existing articles is difficult as many information pages are protected. If a page is protected you have to petition to have an edit made. I discovered this trying to add a sentence to the fly fishing page. However, if you can find a really boring topic, like Crummey provisions, it is easy (dare I say fun) to add a description of Crummey provisions to the Crummey Trust page. My edit history is below:

Acknowledgement of My Contribution

My contribution to Wikipedia.

Crummey Trust Provision

This simple example illustrates how Wikipedia gathers user content. You search a topic, if it exists you can edit the information page. If your topic doesn’t exist you can create an information page. From my experience creating an information page is neither easy nor fast. You have to be motivated and passionate to attempt to create a new information page. An information page starts with a Header like Physics, then content is added based on the organizational structure (1.1 overview etc. discussed later). A new information page metaphorically creates a new page in the world’s largest crowd sourced encyclopedia.

Despite my experience suggesting otherwise, Wikipedia has new content and improved content arriving all the time in multiple languages. Wikipedia’s initial success has not slowed down even though Wikipedia’s reward system is horrible. Reward system: You get an email saying you made a change. You can also select the contributions button to see what you have contributed. As shown below, you can see the effusive positive feedback Wikipedia gave me for adding content:

“Reward “ Page for Contributing

Compare this praise to TripAvisor’s reward system which sends me an email whenever someone finds my reviews helpful and provides me badges based on the number of reviews I submit.

After reading Dan Pink’s article describing the intense Wikipedia contributor, my hypothesis is that the contributors to Wikipedia don’t want “cheesy rewards” because they are passionate individuals who care about the dissemination of neutral point of view information. These good Samaritans do this to improve our community. I think it also allows them to be part of something bigger and important, well feeling that they are all equally critical in being part of the chorus of the many.

Content: What Hath The Users Generated?

In the beginning, Wikipedia, probably using Nupedia structure as a starting point, was a taxonomy framework and hyperlinks (the majority internal links), looking for user contributed content. Using HTML 5 nomenclature, the content of Wikipedia is fundamentally articles submitted by users within the header framework designed first by Wikipedia and then added and modified by users. The Wiki software allows users to determine, develop, and modify the categorization scheme. To demonstrate both the fundamental similarity of the content and user flows between the early 2000’s and 2017 as well as show differences for the information architecture in terms of the organization, structure, and navigation flows during the same time period, the early 2000’s Wikipedia information for Physics will be compare to the 2017 information. Physics is used to illustrate my thinking but the concept is fungible to any topic.

User Experienced Information Page

2001 Physics Information Page (Top)

Early 2000’s Physics information page shows the first block of information on the page is a short description of physics followed by the category scheme. The Wiki structure lends itself to user’s controlling the ultimate categorization but in the early stages, Wikipedia was more involved in the initial taxonomy development and extensive use of hyperlinks.

2001 Physics Information Page (towards the bottom)

In the 2001 Physics’ information page the actual user desired content is presented further down the screen as shown to the left.

This 2000s version doesn’t appear to have the citations or links to sources that becomes more prevalent over time. Given that the original concept was to create materials for Nupedia to edit and review with a panel of experts, this makes some sense. As Wikipedia becomes materially more relevant than Nupedia, Wikipedia had to “credentialize” itself so users would trust the materials. The meteoric raise in number of submissions in the early years of Wikipedia’s history suggests that the joy of collaborating and creating the first on-line wiki based encyclopedia appears to have motivated people to contribute content. Despite the user interface being rigid and difficult (by 2017 standards), according to Wikkipedia over 18,000 articles were submitted in 2002.

Information Page Organization: Headers and Links

The 2017 Physics information page, as accessed through a google search, has a very different look and feel than the 2000s page shown just below the 2017 information page.

2017 Physics Information Page
2001 Physics Information Page

The first row of the 2017 Physics information page has a meta navigation. As we look at the 2017 Physics information page notice the information page has a very important navigation feature — Contents. The user can scan the Contents section and click to go right to the part of the information the user wants, rather than having to read through the entire article. This feature materially improves the user experience for gathering information quickly. This intra-article navigation is a major improvement in the user experience compared to the 2000s. And creates a replicable consistent format across all information pages.

Also in the Contents, we find key information about sources and further reading as shown below.

Key Items That Build Trust

Notes, references, sources and further reading are critical components of building trust in the information provided. Building trust in the veracity of the content is critical to Wikipedia’s success because the content provided on Wikipedia is user generated and user edited. Trust could have easily been destroyed by the issue of the disruptive editor, potentially ruining Wikipedia. This issue may explain why a platform dedicated to all users being equal and the concept of the chorus of many is better than the decisions of one person, had to create the five pillars enforced by Arbitration.

Lets look a little deeper at the Content intra-information page organization. The user can find content quickly be using the express internal hyperlinks to go to the section of the information page and quickly access the content desired by the user . In the Content table to the left from the 2017 Physics page, the user who wants to read about modern physics can quickly access this content through internal links without reading through pages of information.

This is sheer brilliance as a design strategy. User experience is maximized with minimal effort and the user is in control not the navigation tabs, not the bread crumbs and especially not the designer. Wikipedia even calls this feature CONTENT — maybe an homage to the Information Architecture concept of content. If we look at the underlying content links hierarchy we see that four key content links exist as shown with a Wikipedia cut and paste of a newly created draft information page ready for user content from the sandbox:

Content [hide]

1 [Information About the Title]

2 See also

3 References

4 External links

I have taken the liberty of renaming 1 as the “Information About the Title” to better demonstrate the scalability and replicability of this structure. This very elegant design allows any user content to be fit into this structure. Thus using the wiki base and this very simple yet powerful design structure with a multiple of internal links within an information page and between information pages, allows for quickly accessible user generated content that the user can understand how to find what the user desires.

Let me expand on this concept by referring to Mickey McManus who in his 2011 TedxCMU talk about Information Liquidity proposes that the efficient data storage is similar to a container for a container ship. The container is pre-defined and standardized resulting in fitting into each ship easily and smoothly. Any content can be loaded into the container based on the users decisions. By extension data “containers” would be pre-defined and standardized and the person storing data would decide how to put the data into the container. I will contend that Wikipedia follows this model. The storage container is the Wikipedia Content structure in which all data is stored in four sections — Information About the Title, See Also, References, and External Links. The user generating the content determines how to fill this metaphorical data container. The user can select as many headers and sub-headers as they desire in the information section. How the data is stored in the container can vary by the whim of the user but the container remains standardized. This makes Wikipedia instantly scalable.

Finding a Page: The Internally Linked Category Scheme

Wikipedia uses internal links to user desired content like wormholes through the universe of Wikipedia's information. Wikipedia’s entire success is built around providing trusted enough user generated content, a taxonomy or category scheme that is search engine optimized and a call to action to collect additional user generated content. Based on looking at historic Wikipedia pages, in the 2000s the Wikipedia page appears to be all about the category scheme and internal hyperlinks. In the 2000s, the extensive use of hyperlinks improved Wikipedia’s search engine rankings. I find Wikipedia is so search engine friendly that I just go back to google and search again rather than use Wikipedia's search feature.

The links allow the travel between user desired content and the category scheme is the wayfaring marks so the links are user friendly. Below is a sample of the category scheme for physics. We can see the nesting and the pattern of the Physics data on this page. Notice how Physics is broken into four types of patterns — Fundamental, Classical, Modern and Cross-Discipline.

2017 Physics Category Scheme

Below is the 2000s, Physics page. Notice the pattern assumed in this grouping is Central Theories, Proposed Theories, Concepts, Fundamental Forces, etc.

2001 Physics Category Scheme

The 2017 category scheme converts the 2000s Fundamental Forces to Fundamental Concepts and adds more information. The 2017 category scheme is organized by types of physics rather than following the 2000s category scheme where that data is organized by abstract concept. What is interesting about this reorganization is that the users, not Wikipedia, changed the organizational structure based on what the chorus of many deemed as the best pattern or category scheme to improve the user’s experience. Because this is a wiki-based platform the category scheme can be added to by anyone as I discovered when attempting to see how I would add an information page. If the Wikipedia community is happy with a category scheme like content, the scheme will be semi-protected. Otherwise the users decide on changing, or adding to the scheme.

Style Guide

The Wikipedia Style guide is fairly limited. In 2001, as shown in early screenshots, a spartan approach of titles, some simple headers (bold headers, font size headers and the most prominent bold and font size titles) as well as blue hyperlinks with text. In 2017, with the addition of the Content Section, an organizational structure was created using bold headers and a nesting system that could be replicated and allow the user to navigate to desired content more quickly. The blue links and text still remain a staple of Wikipedia’s style.

Quickly Accessible: Wikipedia is a Search Engine’s Best Friend

In the 2017 search engine world, the 2000s Wikipedia information page would still be successful. Wikipedia is and was designed so that the “question” searched is immediately answered. As shown below, by typing Physics, the first thing the Google search returns is the link to the Wikipedia page with some data.

Google Search for Physics

Wikipedia still needed to allow users with a different flow to have a successful experience. The 2000s Physics information page starts with the navigation to more topics and then provides the user desired content. This inferior design closely mirrors the journey with my first information architecture design project. Wikipedia over time has moved to a new organizational structure based on delivering three key elements: the information (first and foremost), the category scheme, and the ability to edit and/or contribute.

Still the core value is access quickly and a great description of this success is in an econsultancy.com blog dated, 14 February, 2012, titled “Why Wikipedia is top on Google: the SEO truth no-one wants to hear”, by Kevin Gibbons. Kevin lists the following as some of the reasons Wikipedia is so successful as Google search.

Targeted webpages for key terms

Each page is written individually around a primary search term, and due to the fact that this is both a strong page and domain, it’s likely to rank both for these core search terms and a long-tail of traffic (and with 12,000+ keywords on a page — that’s a very long tail!)

Very strong domain authority

Very few can rival a domain authority of 100/100 in OpenSiteExplorer, with a total of 6.13m links in MajesticSEO. That’s some link building campaign!

Great internal linking structure

Wikipedia does a great job of contextual linking internally, allowing it to spread the domain strength across the site.

The symbiosis between Wikipedia and search engines creates a critical feedback loop so users continue to get reinforcement of their that feeling that Wikipedia is a trusted enough source of information,

Digging Deeper Into Organizational Design

Looking at Data

To understand the data that is presented as content on a Wikipedia page, I used import.io to scrap the data from both the 2001 information page and the 2017 information page.

Here is the first paragraph of the content as displayed to the User in Wikipedia from the 2017 Physics information page:

Physics (from Ancient Greek: φυσική (ἐπιστήμη), translit. phusikḗ (epistḗmē), lit. ‘knowledge of nature’, from φύσις phúsis “nature”[1][2][3]) is the natural science that involves the study of matter[4] and its motion and behavior through space and time, along with related concepts such as energy and force.[5] One of the most fundamental scientific disciplines, the main goal of physics is to understand how the universe behaves.[a][6][7][8]

Wikipedia 2107 Physics Import.io Data row

Above is the first row of the data scraped by import.io for the 2017 Physics Page. Notice column 1 identifies the URL for the information page. Column 2 is the actual text shown the user. Column 3 an 4 are the link keys with column 3 identifies the words that are linked and column 4 contains the actual links for each word identified in column 3.

I then used import.io on the wayback machine version of the 2001 Wikipedia Physics information page. The user desired content was not successfully scraped; however, the Physics’ category scheme was scrapped. A row of import.io data is shown below.

Wikipedia 2001 Physics Import.io Data Row

The 2001 display of this data is below;

Fundamental Forces

GravityElectromagnetic interactionWeak nuclear forceStrong nuclear force

If the import.io screen scrapping is accurate then the basic data structure remains fairly consistent between 2001 and 2017. The information page URL is in column 1. The text displayed is in column 2 (2017) or column 4 (2001). Then the next column identifies which words in the prior column are links. The next column after that one has the actual link. The 2001 import.io data also has a header with the same next column identifies if the header is a link and the next column the actual link.

Looking at HTML

To look into the organizational design of Wikipedia when it was started, I copied the Wayback machine’s Wikipedia HTML into JSFiddle.

The picture below shows the HTML for the 2001 navigation of the information page. Notice the navigation bar is across the top with four actions. In addition, we see the use of titles and headers.

2001 Wikipedia Physics Information Page in JSFiddle

The “Physics” is a large title that lets the user know you have indeed found physics and to the immediate left is the Home button. The Home button is of a large font so as to grab the eyes and blue to let the user know it’s an internal link. We have a very basic organizational structure. Two big action items at the top showing similarity and continuity. We have a smaller navigation underneath that uses proximity of the words, similarity of both font, color and underline and continuity of being in a row.

As we go the HTML, we see Physics is a header in black bold and font that starts the content part of the page. Each paragraph is labeled as a <p> in the HTML.

The Physics’ information page goes from about 100 lines of HTML in 2001 to about 1,500 lines of HTML in 2017. The 2017 uses HTML 5 for the headers and structure as shown below.

2017 Wikipedia Physics Information Page in JSFiddle

The 2017 page has images but their is Physics in black bold just like in 2001. Looking at the HTML you can see the evolution in the user display of the content. The inclusion of videos, images and other more mobile friendly content may allow Wikipedia to pivot to a mobile based experience.

Looking at Wireframes

I was going to draw a wireframe for the 2001 Wikipedia. But how do you draw 18,000 pages with a multiple of 18,000 links between the pages. If you can imagine that, you have a visual image of the brilliance of Wikipedia’s ability to allow users to access desired content through a search either within Wikipedia or a search engine.

I think webpagFX may have summarized my findings from digging deeper into organizational design better than I could in their blog (https://www.webpagefx.com/blog/web-design/information-architecture-101-techniques-and-best-practices/), when they discuss the Information Architecture of Wikipedia.

Considering the user-generated nature of Wikipedia, it’s a great feat that the site manages to have anything resembling good information architecture. But, for the most part, the site is organized in a manner that makes it easy to find content.

The difficulty of Wikipedia’s IA is that it’s organic, and thus, categories are difficult to set for such an open and malleable system. Any website with that much information will need to rely on search so that users can locate content they’re seeking. In addition to search, the interconnectedness of Wikipedia articles makes it easy to move from one article to virtually any other related article. This in itself makes Wikipedia’s information architecture one of the best online; they understand how visitors use the site and make it easier for them to do what they need to do.

Future

Wikipedia’s long term success relies on a large number of passionate almost zealous contributors and editors as well as a healthy symbiotic relationship with search engines. Both of these could be at risk in the near term. As users go more mobile, to survive, Wikipedia will have to create a friendlier mobile user content delivery and editing mechanism. My experience with providing content to an existing page with my computer was a bit burdensome and using my cellphone was possible, but even more difficult. Creating new content on my computer seemed difficult and on my phone appeared to be close to impossible for me.

I’m uncertain which is more reliant on the other — Wikipedia or the search engine. Most users expect a Wikipedia option when pulling out their bigger brain (smart phone). So a successful search engine needs to have Wikipedia information available. However, Wikipedia needs the search engines to produce results near the top of the search to remain relevant. They appear to be very mutually reliant on each other. Yet, when searching for information about movies, movies stars etc. IMDb tends to consume the top of the Google search . If Wikipedia and search engines need each other so much, why would that happen? Maybe because search engines are attempting to break their dependence on Wikipedia. Wikipedia made the decision to be commercial free. Search engines may be interested in finding content providers that offer a better monetary result. This could create long term friction between Wikipedia and search engines.

For sustained future success, Wikipedia will need to create a mobile wiki experience that brings in more true believers who want to waste hours creating and editing content for the amazing feedback Wikipedia provides. These loyal Wikipedians will make it impossible for search engines to marginalize Wikipedia. Therefore, Wikipedia’s future success will be dependent on its ability to adapt to a mobile world.

Conclusion

Wikipedia started as a feeder for Nupedia but quickly evolved into a trusted enough source of encyclopedia type information. Wikipedia avoided the expert review and instead relied on the veracity and integrity of the user community. The user of the content receives appropriate warnings about the user editors’ perception as to the accuracy and veracity of the content. The providers of the content and the editors of the content are passionate conveyors of neutral point of view knowledge. Wikipedia’s decision to embrace the wiki model, created both loyal users and contributors who, using new technology, developed the first free user generated encyclopedia.

Wikipedia was introduced at a time when trust in experts was waning and trust in “people like me” was ascending. Wikipedia rode this societal sea change into a top five world wide website. But as mobile devices begin to dominate, will Wikipedia have the wherewithal and funds to adept to the mobile experience and maintain its relevance or will it go the way of Nupedia not adopting to the next major societal change?

--

--