On the Myth of AI Democratization

Co-written by Vincenzo Lomonaco and Marta Ziosi

“The world’s most valuable resource is no longer oil, but data.” — Copyright © David Parkins, The Economist [1]

The last decade has witnessed tremendous advancements in the context of Artificial Intelligence (AI) to the point that many are framing it not only as a groundbreaking technology but even as “the new electricity” echoing the unique impact its analogue counterpart had and still has on our society.

Despite the great hype and inflated hopes for the imminent future, it is undeniable that recent advances in AI under the name of “Deep Learning” or the more recent rebranding “Differentiable Programming” have radically pushed the boundaries of what’s possible, enabling a rich set of applications which were even unthinkable before.

AI technologies are now employed in almost any digital product or service we daily use (movie recommendations, on-line shopping, smart home devices, surveillance systems, etc…) but also for ground-braking, innovative frontiers like self-driving cars, personalized health-care and many others.

In a context in which many have already expressed concerns about the power and pervasiveness of such technologies [1][2], major IT companies are publicly declaring their will to democratize AI, making it “for every person and every organization” [3][4][5], but also open to developers and researchers around the world through the transparent works of their top-notch AI research labs [6][7][8].

In this (not-so-brief) post we give a better look at this AI democratization process, hoping to spark a new interest in the subject and start talking more about something we think is going to strongly affect our present and future society.

Outline

  1. AI “Democratization”: Just a Marketing Stunt?
    1.1 Who Owns Data Owns Intelligence
    1.2 Growing Disparity through the Network Effect
    1.3 Winning the Race for top AI Talents
    1.4 The Huge Gap from Research Ideas to Deployment
  2. Other AI Democratization Initiatives
    2.1 Open Research Labs
    2.2 Blockchain-based Technologies
    2.3 Is it Enough?
  3. On the AI Market Regulations
    3.1 What Should we Care About?
    3.2 Real-time Collateral Damages
    3.3 Existing Regulations and Proposals
  4. Conclusions

AI “Democratization”: Just a Marketing Stunt?

While it is true that most of the cutting-edge research in AI made inside major private IT companies is shared publicly or made quickly available to anyone through services which are generally free to pay, this is very far from the notion of democratization of Artificial Intelligence. These companies like to sweep aside the issue while implicitly assuming that AI accessibility means AI democratization. This is clearly wrong since we don’t need just to use it but have control over it. In fact, on the Internet we are not only “content users” but also “content producers”.

AI Democratization today is just a business-strategic idea. It is worthy of doubt to witness that IT companies are ready to give away one of their core assets that easily and generously. So, if on one hand they give back to the community much more than their predecessors, on the other hand they have many means of control over their AI technologies.

Nevertheless, many scientists fall for it. Some actually defend these companies openness efforts and value it as a sincere effort towards the AI democratization process. But while this may critically accelerate shared AI algorithmic progresses, that’s just one ingredient of AI democratization, are they going to let you have the others?

Who Owns Data Owns Intelligence

The more we digitize our life (i.e. we use more digital products and services) the more valuable data we create, fueling what has been called the new “data economy” [9]. This in turn makes products and services even smarter as AI technologies are somehow proportionally good to the amount of data they can crunch.

Yet somehow not surprisingly, people are still quite unaware that every digital action they make is recorded and saved: from their first like and click they made in 2004 on Facebook to the last voice search they made on Google in 2018 [10]. Even less understood is the concept that AI technologies can also work very well with a handful of data once they have been trained on the bulk of it.

Researchers at the University of Cambridge and Stanford University proved in 2014 that just based on people “likes” in a social platform, their algorithm was better able to predict a person’s personality traits than any of the human participants. It needed access to just 10 likes to beat a work colleague, 70 to beat a roommate, 150 to beat a parent or sibling, and 300 to beat a spouse [11].

It is clear that having access to every trace of your digital presence and being able to cross-correlate them across different platforms, services and people is an enormous power, especially considering the monopolizing presence IT giants like Facebook, Google, Amazon, Apple and Microsoft can have in our daily digital life.

Of course, data, as an increasingly important asset (and often the only significant advantage over the competition) is jealousy kept private by these companies. This is completely rightful and strategically sound, but unfortunately completely in contrast with the AI democratization process.

Indeed, while from gigantic amount of data even simple algorithms can achieve incredible results, the opposite is hardly true since even the most sophisticated AI algorithm needs high-quality, high-volume data to be able to generalize to new situations. AI democratization can not work if we do not democratize data first.

Growing Disparity through the Network Effect

The “network effect” is a well-known social and economic phenomenon where an increased numbers of users improves the value of a good or service in turn attracting even more participants and triggering a virtuous cycle which becomes very difficult to break. This effect is very common and especially true in the digital market where it is possible to reach millions of users at the distance of a click.

Time needed by Internet and non-Internet products to reach 50 Million users — Copyright © TechToday

This effect makes the situation even less “democratic” enabling the enormous gathering of worldwide data (and hence Intelligence) into the hands of very few companies. Is it not scary that the entire “search engine market” is dominated by just two companies (Google and Baidu detains ~86% the entire market share)?

This means that these companies are not just far from helping the democratization process of AI but are also in the dangerous, monopolistic and unique position of deciding the future of AI for everyone else, like when and how it should employed. This has already raised many criticisms from experts of different fields like Tim Berners-Lee labeling them as tech giants as “obstacles to innovation” or George Soros who even argue about “far-reaching adverse consequences for democracy” [12].

Winning the Race for top AI Talents

Nevertheless, tech giants open up their AI research labs, flattering the best scientists in the world and giving them complete freedom, huge salaries and all the resources they would need. Researchers are free to collaborate with other institutions, make everything open-source while baked by tons of money and freed by research grant applications, teaching and other bureaucratic burdens.

From a research prospective this has lead to great advancements and an enormous speed-up in the developments of AI research. Yet, behind this “openness” policy we can find three major business motives:

  1. Attract the top AI talents worldwide (who like to be recognized by the research community for their public works and ideas) and make them work nearby their engineering teams to speedup the integration of new research ideas into products and services.
  2. Make outside developers improve (for free) open-source tools used for both research and production (e.g. Tensorflow, Caffe2, etc..) resulting in an ulterior innovation speed-up inside the company.
  3. Being recognized as a cutting-edge, progressive and innovative companies, increasing brand value.

That’s to underline that nothing is generously given away in the name of the AI democratization (as constantly advertised) but has a clear business intent.

The Huge Gap from Research Ideas to Deployment

Another important obstacle to the actual democratization of AI is the huge gap from a research ideas proven on a toy benchmark and the deployment of a real-world, reliable and working AI system.

In his famous essay “The Seven Deadly Sins of Predicting the Future of AI”, the pioneer roboticist Rodney Brooks stresses out the importance of the speed of deployment, often overlooked in the enthusiasm and hype of the moment.

Truth is that just a few companies have the power to transform interesting research ideas into powerful digital services and products. And this is just another reason why the same tech giants are not worried about giving up their latest research findings (which between 65% and 90% of the times is not even reproducible [13]).

Other AI Democratization Initiatives

While major AI companies intentionally take advantage of the common illiteracy in AI by oversimplifying and undoubtedly undermining more serious AI democratization approaches, more than one concern has been already raised on the dangerous development of AI technologies in the hands of only few organizations.

Oxford University philosopher Nick Bostrom in his pioneering book “Superintelligence: Paths, Dangers, Strategies” talks at length about the risk of a monopolized or secretive AI development and its negative impact on society as a whole in the risk of an intelligence runaway (also known as “technological singularity”). He argues that “openness in AI development”, other than having an immediate positive impact in the short-term could lower the probabilities of an AI singleton (that is a single actor which detains all the control) and hence existential risks.

But again, what we mean with “openness in AI development”? While in a more recent essay on the matter [14] he admits that “Openness in AI development can refer to various things” and that “Openness is not a binary variable, but a vector with multiple dimensions that each admits of degrees”, he fails to point out the key attributes which can actually lead to a truly democratic AI development.

Open Research Labs

OpenAI, founded at the end of 2015 has been the first non-profit AI research initiative which has “openness” explicitly built into its own brand identity:

“We must have democratization of AI technology and make it widely available. And that’s the reason that… [we] made OpenAI, to help spread out AI technology so it doesn’t get concentrated in the hands of a few.” — Elon Musk (co-founder of OpenAI) [15]

Yet, while we acknowledge the noble intent, we fail to understand the difference between OpenAI and any other private research lab like FAIR, MSR AI, DeepMind, Google Brain, etc… where their openness standards are pretty much the same: publishing at the same conferences and releasing the related code. Truth is, openness in research in not enough to ensure that “AI technology doesn’t get concentrated in the hands of a few” for the same reasons we talked about in the previous sections.

Blockchain-based Technologies

Especially due to the recent explosive growth of cryptocurrencies like Bitcoin or Ethereum, Blockchain technologies have sparked a lot of interest in Silicon Valley and the IT world in general, essentially enabling the creation of always growing, robust, collective (and distributed) datasets without the need of trusted third-party with privileged access to them.

While the technology today does not seem mature enough to directly support general purposes applications (especially for scaling and privacy issues), its core ideas, democratic by design, have inspired many new AI democratization initiatives.

Ocean protocol, Enigma, and Datum for example aim to offer a safe, decentralized privacy-enabled marketplace for sharing data and associated services (storage, compute and algorithms). The smart use of Blockchain technologies (not storing data itself on it, but meta-data) can enforce availability and integrity that serve as verifiable service agreements without the need of a trusted third-party. Plus, they can naturally embed the monetization aspect in a tokenized system which has been already proved to be robust on the Blockchain.

Another group of blockchain-based initiatives like OpenMined, SingularityNet, SynapseAI and ConsensusAI share similar ideas but with an immediate focus on the AI world. OpenMined, for example, plan to offer AI training facilities in a fully homomorphic encryption (FHE) framework with outstanding privacy guarantees other then a marketplace of trainable data.

Even if at their early-stage development, the potential of these projects looks incredibly high due to their ability to create data commons without intermediation and hence especially rewarding the data owner. However, it does not seem enough for fueling an equilibrated AI democratization process.

Indeed, while an open marketplace for data, algorithms and computation, would sure help even small communities and organizations to catch up with state-of-the-art AI systems; it does not mean key data, compute and algorithmic (or deployment ready) assets will be shared or made available at reasonable prices in the platform.

Is it enough?

We have had a look at the most promising AI democratization initiatives, but at this point it should be pretty clear that in order to really make AI technologies available and reproducible by everyone we don’t need just the latest research idea, the bigger dataset or the engineering resources but rather the combination of them.

We argue that for achieving the real democratization of AI, every community and organization should be provided reasonably equal access to:

  1. High-quality/high-quantity data.
  2. Cutting-hedge AI algorithms and research tools.
  3. Computational and engineering resources for ready-to-deploy solutions.

On the AI Market Regulations

It is by now a fact that Google owns 88% of the market share in search advertising, Amazon 74% of the e-book market and — get ready for the surprising name as our dulcis in fundo — Facebook owns 77% of mobile social traffic. These numbers suggest that these giants are worthy of the “monopoly” label (not matter how much Zuckerberg denied it in its recent hearing!).

Even though the above paragraphs more generally referred to a wide range of AI initiatives, we now want to concentrate on these “giants” — Google, Facebook… In fact, they are the ones who have access to incredibly large datasets and that are using AI algorithms in developing their products. They are not merely AI but also Big-Data companies. As such, we start our analysis from the key-holders of the biggest potential threats or benefits for AI Democratization. Once we transfer our concerns about democracy to the market, the question presents itself; should we regulate?

If we are to hold that history is our “magistra vitae”, then it seems that looking into past examples of monopolies could represent an informative and useful approach for our cause on how to regulate the AI market. Plainly, a similar fate as the one of “Standard Oil” and the “Bell System” in the 20th century could be reserved to these tech giants who are at the forefront of AI research. These monopolies were broken-up into smaller companies as soon as their powers became judged to big “too big to be safe”.

However, the break-up of these monopolies mainly had the goal to have the Government creating terms of competition which allowed for smaller actors to also participate in the industry and for price to be driven by concerns other than one sole company’s interests. We think that the case of the Internet includes these concerns and yet, it also goes beyond. Our case is one in which knowledge and data are collected and produced. This knowledge is about anything that concerns us and thus, it is tainted with public concerns and interests.

A parallel could be drawn with the “Human Genome Project” in the 1990s [16]. This project had the aim to identify and map all of the genes of the human genome. At the time, Clinton and Blair promptly secured an agreement about the public release of the information as a “common good”, of which nothing could be privately patented. This act was indeed “prompt” as the private Celera Corporation had already requested to file patents over 6000 genes and as decades later, talks about the selling of these information readily got Google interested. Admittedly, the privatization of that kind of information that concerns us could have lead to the “ownership” of genetically engineered organisms.

Even though this last concern almost sounds science-fictional, it makes a case for the regulation and government oversight on matters of knowledge that is ‘about’ us. The case of data here resonates.

The above-cited acts — similarly to the AI case — were, among others, motivated by democratic concerns. Louis Brandeis, an advisor to Woodrow Wilson, had already voiced these concerns at his time by stating that “…in a democratic society the existence of large centres of private power is dangerous to the continuing vitality of a free people” [17].

Given these considerations, could considering the AI market regulation issue as a “free market” regulation issue be a fitting approach? After all, Google, Amazon and Facebook can reasonably be considered to be monopolies and the considerations put forward by Woodrow Wilson’s advisor about democracy resonate with our concerns about AI democratization. However, do similar definitions and similar concerns justify using the same approach?

We ought to at least challenge this claim before relying on it. To begin with, differently from Oil and other goods, data is in abundance and not in scarcity. Furthermore, data are not only about quantity. In fact, algorithmic techniques such as Data mining extract more value out of data and, once this aspect comes together with the “network effect”, by collecting more and more users the firm collects more and more data which it can use to generate more valuable information.

This is an advantage that is denied to other firms and services. Finally, there is a newly added aspect to these alleged monopolies; their surveillance systems span the entire economy. Thus, Google can see what people search for and Facebook what they share, exclusively.

These aspects are new to our knowledge about monopolies and free market regulation. They raise important questions on whether to consider them as “stronger” monopolies to regulate and break down by means of more sanguine laws or to completely change our frame of reference and create new regulations, uniquely tailored to our current situation.

What Should we Care About?

Instead of getting lost into the millions of questions facing us, we can at least pin down what we consider paramount to be preserved and protected, regardless of the approach chosen.

Primarily, we value democracy. The latter being a vague term, it seems as it can be secured by simply starting policies of “openness” and “transparency” such as the ones cited previously in the article. However, this would have probably been enough if we were dealing with the typical old monopolies for which people are simply consumers and not also producers of valuable data. This present case is different as also consumers have a necessary role in the production process. One suggestion could be to shift a bit of the control and power from the receivers of data (tech giants) to the suppliers, us. How could this be achieved?

To begin with, data create for us an online identity which generates the need for and yet still lacks a new set of rights. On one side, regulation could start from identifying a new set of relevant rights for the newborn homo technologicus. This could help tackle more short-term problems such as privacy, surveillance and also job replacement by new programs and algorithms.

On the other, special attention should also be devoted to the various range of companies and research centers that are involved in the development and deployment of these new technologies. In this ambit, it is paramount that ‘democratization’ processes that are more procedural in practice rather than figurative in theory are established [18]. This approach could potentially slow down the pace of development and this might turn out to be beneficial. Indeed, this way governments will have more time to build proper legislations and the effects of AI will allegedly hit society more gradually. This could help tackle also more long-term problems such as existential risk and political instability.

Real-time Collateral Damages

On the note about “political instability”, we are reminded about the case of the latest contested event about data breaches; the Cambridge Analytica case. This case study presents itself as an example of how advancements in technology are deeply intertwined with democratic processes.

It has recently been confirmed by Christopher Wylie, a whistleblower previously working for the company, that millions of facebook profiles of US voters were collected during the American elections by the agency Cambridge Analytica. Their ultimate aim was “creating a gold standard of understanding personality from Facebook profile information” [19]. This was a powerful political tool for identifying potential swing voters and to target them with craft messages more likely to resonate with their opinions.

As the data were gathered — more or less legally depending on who reports the story — from Facebook, several questions have been raised about the role of the company in the face of these data breaches. “Given the stakes here, why do you think Facebook should not be regulated?” asks the CNN Senior Tech correspondent to Zuckerberg. “I am actually not sure we should not be regulated” [20]. Facebook CEO thinks that the question is rather “What are the right regulations?”.

Currently, he has been asked by Senator Markey to personally testify in Parliament — as it has recently happened — and by other lawmakers to endorse the “Honest Ads Act”, an already existing act preventing media ads to engage in targeted, dishonest advertising. Even though he is not sure whether he or another of Facebook representatives would be the best repository of expertees in matters of regulations, he thinks it sensible that regulations such as the existing ones related to ads transparency in TV and printed media should be drafted for the internet as well.

Existing Regulations and Proposals

The “Honest Ads Act” is a regulation that is currently applying to the field of Media & Communications in the US. We should thus inquire, are there existing regulations in place and are any being drafted for the Internet and data?

We first stop and recognize that the few laws that exist highly differ in implementation and approach between Europe and the US. The most striking difference truly resides in matters of an important concern previously raised in this article — namely, consumers’ rights. While the EU considers “data protection” to be a fundamental right for the individual, the US opts for an “ad-hoc” approach where sector-related laws target specific problems (specifically in matters of data related to health).

This difference is striking especially in one case. In 1995, the EU passed the “Data Protection Directive” [21]. This law grants more power to the individual by giving them the right to delete or correct any personal information about them online. In the US, the Bill of Rights provides an implicit right to privacy in some weaker form. However, this has not yet been extended to online privacy. Indeed, the existing US regulations only protect online data regarding the sectors of health and credit.

Government AI Readiness Index” — Copyright © Richard Stirling, Hannah Miller and Emma Martinho-Truswell

As a result, Tech-Giants are presented with strikingly different regulations depending on their area of operation. This is not only a form of inconsistency in itself, it also comes with the dangerous possibility that these differences prevent each law from being unleashed in its full authority. Indeed, potential loopholes allow companies to find their ways through the unmatching components of these laws.

And what about AI regulations? On that matter, it is more difficult to talk about already existing laws. Given the recent character of progress in AI, all that we can comfortably talk about is different “approaches” that countries all over the world are undertaking. The US and China placed AI in the top-list of their policy agendas. Both countries already laid down general policy plans to fund AI education, research and industry.

What represents a fundamental difference in these two cases is that the AI potential resides mostly in the private sector in the US while in the case of China it is the Government which holds most of the control over AI. The UK features important centers for AI research situated in leading Universities such as Oxford and Cambridge whose experts communicate with the Government in order to address potential risks and opportunities. As per Europe, both France and Italy have appointed leading researchers to come up with recommendations to the government on how to successfully integrate and develop the potential of AI in society and more recently twenty-four EU countries pledged to band together to form a “European approach” to artificial intelligence [22].

As the above paragraph shows, — apart from some exceptions such as the Future of Humanity Institute (Oxford) — most of the attempts to deal with AI address its opportunities rather than its risks. On one hand, this could be due to the fact that AI is in itself still a rather vague concept which first ought to be developed to be understood and only later, potentially constrained. On the other, this approach, by not addressing the issue in terms of “regulations” and “restrictions”, could point towards a different strategy. Is regulation really the way to go? Or does it present itself as a necessity only in the face of an undemocratic approach to the problem? As this article has proposed, democratization of AI could be facilitated by providing equal access to:

  1. High-quality/high-quantity data.
  2. Cutting-hedge AI algorithms and research tools.
  3. Computational and engineering resources for ready-to-deploy solutions.

Once these three points are given and a democratic approach is secured, will the focus on regulations need its separate space and discourse or will several regulations become unnecessary once multiple actors work on AI in the light of the public sphere? It could be argued that several regulations are needed right because of the undemocratic approach that we have taken towards AI.

Conclusions

Ensuring that AI technologies are used and will be used ethically and sustainably undoubtedly constitutes one of the greatest challenge of our time and the exponential speed-up of technological progress [23] urges us to consider the problem with even more urgency.

AI democratization, that is the power of each individual and organization to use, control and replicate AI services and technologies is thought by many as a required (and significantly important) condition towards that goal.

Nevertheless, cutting-hedge AI technologies are mostly offered today through commercial products and services and controlled by private companies (often in dangerous monopolizing positions). In this blog post we have explored the issue, highlighting why we think this can be dangerous and very far from the notion of AI democratization, despite the AI “openness” policies often advertised by these companies.

It is worth noting though, that an increasing number of AI democratization initiatives have emerged over the past few years with promising future developments. However, we fill that a more centralized and comprehensive view of the issue should be embraced by all parties involved. We hope with this blog post to stimulate a renewed and dutiful discussion on the issue for ensuring the ethical development and adoption of AI technologies in our individual lives and for the benefit of our increasingly digital society as a whole.