How an Open Web Index Could Help Solve the “Search Problem”

Dirk Lewandowski
In Search of Search (& its Engines)
7 min readApr 13, 2021

By Dirk Lewandowski & Olof Sundin

Image of a printed index

Disputed search rankings

In 2018, then-president Donald Trump complained that Google suppressed conservative news outlets, instead offering “only the viewing/reporting of Fake New Media.” While Google was quick to reply that they “never rank search results to manipulate political sentiment,” the anecdote points to a broader problem with result ranking in search engines: Any search engine has to present its results in a way that prefers some results over others.

Search engines rank results based on relevance calculations. In the worst case, a search engine ranking may result in only one of many possible viewpoints presented in the top search results. But such cases are rare. More often, a particular viewpoint dominates the top results by taking up the majority of the top results; even which of two competing views lands the first position can matter (Epstein & Robertson, 2015), as users predominantly view and select the result listed first. In practice, users often rely on the search engine algorithm instead of doing the evaluation themselves (Sundin & Carlsson, 2016).

It is no wonder, then, that the ranked output of popular search engines — especially Google’s — has been criticized, whether for gender bias (Otterbacher et al., 2017; Noble, 2018), racial discrimination (Noble, 2018), bias towards commercial sources (Lewandowski & Sünkler, 2019) or even Google favoring its own offerings (European Commission, 2017; Wall Street Journal, 2020).

The problem is that one search engine dominates what results we see

“Biases” are inherent in any search engine, as measures of desirability are always embedded into ranking algorithms (Haider & Sundin, 2019; Lewandowski, 2017). There is no such thing as a neutral search engine. There are many possible algorithmic interpretations of web content, leading to very different rankings of search results. As any ranking per se prefers some items over others, the real problem comes when one search engine with a huge market share dominates what is shown in response to user queries.

Google Search clearly dominates the search engine market, serving more than two trillion queries per year with a market share of 93% in Europe and 88% in the US. Search engine optimization companies know that web pages only gain visibility when they achieve top positions. A search engine that has a vast market share can not only make or break products or services but also opinions, viewpoints, and even “facts.”

The unhealthy market power of a single search engine can be broken by establishing an Open Web Index as a public infrastructure. Such an infrastructure could be used by a plethora of small and large services to build competing search engines. The ability to provide different result rankings may be the most important reason to demand an alternative infrastructure and a variety of search engines, and the value of fair competition justifies such an endeavor.

Proposed solutions

The problem is clear, but several solutions have been proposed. The simplest answer to the aforementioned issues in the search engine market is to hope for competition to solve the problem. But as the last 15 years have shown, no company has been able to establish a competitor to Google. Microsoft invested billions of dollars to establish its Bing (Competition & Markets Authority, 2020, p. 90); despite this considerable effort, Bing has gained only a market share of 3% in Europe and 6% in the US. Search is, for most people, Google Search.

Another proposed solution is to establish an alternative to Google by supporting companies through funds or by establishing a search engine as a public service. These proposals focus on building a single alternative to Google. But would we be better off if there were such an alternative established? From our point of view, this would not solve the general problem. Furthermore, there are many reasons why a search engine could fail — with only one competing service, we risk failure, and a reversion to the situation we’re already in.

It’s the index, stupid!

The question is not how to establish one more search engine, but how to foster the development of many such engines. We argue that this is best achieved by building a public database of the web’s content, an Open Web Index (see Lewandowski, 2014; Lewandowski, 2019). The index constitutes the backbone of any search engine: it is the database of web pages collected through crawling the web, and it is updated continuously. This is what makes it so hard to build a search engine: one has to build a huge index first, before one can even start to think about ranking results and serving users. A recent report from the United Kingdom Competition & Market Authority (2020) shows just how big the problem is: Google’s index consists of between 500 and 600 billion documents; Bing’s, between 100 and 200 billion.

An open index would solve this significant barrier to entry for new actors on the search engine market. An Open Web Index would crawl and index all the web’s content, removing this technical and financial burden from upstart search engines. They could instead focus on ranking algorithms, the presentation of results, and user-engine interactions. We would soon have access to multiple, diverse algorithmic interpretations of the web’s content. We would break the dominance of a single corporation determining what users get to see and which results they select from.

There could be risks. If an open index of the web is successful, we might get search engines for different ideological positions — one search engine for conservatives, one for socialists, another for liberals. We already see such developments in social media, where sanctions for conservative voices on Twitter have resulted in a stronger interest in alternative platforms. This risk is real, but the benefits of having a number of alternative search engines competing are much greater than these potential disadvantages.

Separate the infrastructure from the services!

An Open Web Index (OWI) is very different from a publicly-owned search engine. The intention of the OWI is not to build a search engine, but to build and maintain the infrastructure necessary to run a multitude of search engines. The main technical idea is to separate the index from the services built on the index. While the index would be public, the services can be proprietary.

diagram: Basic architecture of an Open Web Index (OWI)
Basic architecture of an Open Web Index (OWI)

The public infrastructure would be responsible for crawling the web, indexing its content, and providing an interface/API to the services built upon the index. As at least some basic form of indexing is required to make the OWI searchable, this is provided through the OWI basic indexer, which preprocesses the data to make it usable for the services built on top of it. A service can then use the index and its preprocessed data, and can amend the index with its own technology. Services that need even easier access to preprocessed data can query ready-to-use data from the OWI advanced indexer.

The OWI could aggregate anonymized usage data from all services using the index and make them available to all interested parties. Established search engine providers like Google have a huge advantage from the data they have already collected. Sharing usage data between the services in the OWI infrastructure would minimize the so-called cold-start problem of not having such data when starting a service.

Who will build it?

It will not be easy to build the Open Web Index and finance its development. Technical issues, though complicated, have been solved in the past, as we have seen with Google and Bing’s indexes. A more significant obstacle is money.

We believe that the Open Web Index, as a public infrastructure, should be financed by the public. A multinational actor like the European Union or the United Nations would be in the best position to start such a project. But the Open Web Index must be independent of any state actor who might be tempted to influence how the index is built and what is included or not included; we suggest establishing an independent foundation to build the index.

All it now takes is the political will to establish this infrastructure. An Open Web Index would allow us to truly “organize the world’s information and make it universally accessible and useful.”

References

Competition & Markets Authority (UK). (2020). Online platforms and digital advertising. Retrieved from https://www.gov.uk/cma-cases/online-platforms-and-digital-advertising-market-study

Epstein, R., & Robertson, R. E. (2015). The search engine manipulation effect (SEME) and its possible impact on the outcomes of elections. Proceedings of the National Academy of Sciences, 112(33), E4512–E4521. https://doi.org/10.1073/pnas.1419828112

European Commission. (2017). Antitrust: Commission fines Google €2.42 billion for abusing dominance as search engine by giving illegal advantage to own comparison shopping service — Factsheet. Retrieved from http://europa.eu/rapid/press-release_MEMO-17-1785_en.htm

Haider, J., & Sundin, O. (2019). Invisible Search and Online Search Engines. Oxford, New York: Routledge.

Lewandowski, D. (2014). Why we need an independent index of the Web. In R. König & M. Rasch (Eds.), Society of the Query Reader: Reflections on Web Search (pp. 49–58). Amsterdam: Institute of Network Culture. https://networkcultures.org/query/wp-content/uploads/sites/4/2014/06/4.Dirk_Lewandowski.pdf

Lewandowski, D. (2017). Is Google Responsible for Providing Fair and Unbiased Results? In M. Taddeo & L. Floridi (Eds.), The Responsibilities of Online Service Providers (Vol. 31, pp. 61–77). Berlin Heidelberg: Springer. https://doi.org/10.1007/978-3-319-47852-4_4

Lewandowski, D. (2019). The web is missing an essential part of infrastructure: An Open Web Index. Communications of the ACM, 62(4), 24–27. https://doi.org/10.1145/3312479

Lewandowski, D., & Sünkler, S. (2019). What does Google recommend when you want to compare insurance offerings? Aslib Journal of Information Management, 71(3), 310–324. https://doi.org/10.1108/AJIM-07-2018-0172

Noble, S. U. (2018). Algorithms of Oppression: How Search Engines Reinforce Racism. New York, USA: New York University Press.

Otterbacher, J., Bates, J., & Clough, P. (2017). Competent Men and Warm Women. In G. Mark, S. Fussell, C. Lampe, M. c. Schraefel, J. P. H. Iowa, C. Appert, & D. Wigdor (Eds.), Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems — CHI ’17 (pp. 6620–6631). New York, New York, USA: ACM Press. https://doi.org/10.1145/3025453.3025727

Sundin, O., & Carlsson, H. (2016). Outsourcing trust to the information infrastructure in schools. Journal of Documentation, 72(6), 990–1007. https://doi.org/10.1108/JD-12-2015-0148

--

--