“Some people say, “Give the customers what they want.” But that’s not my approach. Our job is to figure out what they’re going to want before they do. I think Henry Ford once said, “If I’d asked customers what they wanted, they would have told me, ‘A faster horse!’” People don’t know what they want until you show it to them. That’s why I never rely on market research. Our task is to read things that are not yet on the page.” — Steve Jobs
One of the giants in digital advertising — Google (now Alphabet) is well known for its search service. They have started with a mission to organize the world’s information the best way possible. The web search engine was one of the initial offerings they have made and is still the successful one. For the past 20 years, they have played a major role in shaping the Internet. One such change I admire is continuous, sometimes monopoly efforts to transform unstructured web to some level of semi-structured form. The adoption into ‘Knowledge Graph’ and the introduction of schema.org was one of the smartest moves in this. These allowed them for near real-time, hassle-free information extraction from dynamic websites like e-commerce or live events. In addition to this, indirect forcing towards well-defined SEO rules, making influences in web development tools (Angular, Flutter) and practices, defining order through client-side monopoly (Chrome, Android) are some other genius steps they took. Even though these measures brought some level of structure into the Internet, something went wrong. Individuals and organizations started asking the questions — “Who owns my data? What do I get in return if it’s not me?”. People and organizations have started building walls. Sadly, the Internet now is getting fragmented, as it gets structured.
Competition versus Monopoly
Competition is good for consumers as it keeps a balance as we call a fair price for products and services. However, when it comes to standardization of new and sophisticated technology, monopoly plays a crucial role. Standardization brings innovation. It spreads knowledge, brings in a level field in engineering, and a fast-growing ecosystem around it. Also, when it comes to quality, a monopoly can deliver off the roof results.
When it comes to Google, everyone agrees that they are a monopoly in the search industry, even though the company is not positioned in a way to avoid it. They are delivering the best search service in the industry for free*. They are one of the entities that push AI innovation in aggression. They disrupted the telecommunication industry, powering the people in the lowest sectors in society. With YouTube, the whole media and publication industry got massively diversified. So, “Google is not evil” in contrary to many voices stay out there. But I would say, it is time to retire/rethink some, painfully one of the most profitable practices for the greater good of the Internet and technology. The sky is not a limit for Google, so why not?
Public, Closed and Private data
Classification of data into Public, Closed, and Private — although this is not a genetic approach, I have introduced this to convey a couple of points conveniently. Web as a network of linked documents is a public library to keep human knowledge and communications. We human race, all agree that our progress has always driven by knowledge sharing. This knowledge gets aggregated generations after generations. Nobody has to reinvent the wheel every time. Finding the right information with minimal effort boosts this pace of development in large multiples. As we know, most of this information is publicly accessible to anyone on the internet. It’s at your fingertips if you know exactly where to look for it. This public data resides in an unstructured format, which makes it difficult to — first of all, find the right document and then the right data, recorded in these documents.
Google’s intelligent search mechanism powered by a well-organized knowledge graph (by closing once open but unsustainable system) solved this pain point and been serving quality service for the past decade. This unlocked a wide, much rewarding ecosystem under the growing monopoly of Google. Even though this system intentionally caused no harm, as an aftereffect of centralized and tight holding of data and related technologies, a large amount of useful but non entertained (under Google ecosystem) information doesn’t make its way into it. Let’s call it (partially / fully) closed data.
The generators of closed data are mainly organizations. We nowadays call this data — the big data staying frozen inside organizations. If you are from the IT industry, every product/service has an offering nowadays to address this closed behavior or organization data. Even though this is a common practice and an opportunity for product developers from a business perspective, the organizations mostly don’t get to taste the best technologies being offered to them (in most cases, it’s at the price of data privacy). It’s worth noting that, companies including Google are investing in homomorphic encryption to address this problem to provide a unified service across all customers. Still, the opportunity limited by the centralized behavior of service and gate-keeping is a big problem when it comes to the accessibility of the data across multiple vendors. Even with these encrypted computations, vendor locking will persist by moving to higher levels and a boundary will get redrawn around the organizations. Many organizations will possibly move out data with much-relaxed restrictions for the public good if it were free out of these boundaries.
Private data should stay private. The user (person/organization) should have control over it whatever the conditions are. This works best when the computation is moved to the user. As of now, it’s not fully practical and stays as an open problem. One thing for sure, this requires methods to establish the controlled flow of data and computation beyond central boundaries.
Open-source software and Non-profit organizations
When we look back to history, Open source software has disrupted the technology industry like a storm. At the very beginning, the idea of open-sourcing software was a hacker thing. Now it’s standard. Many companies ranging from startups to corporate have started acknowledging the power of the community for reasons of their own. Most startups see this as their customer acquisition technique while many of the corporate see this as a competition killer or their own standards propagation. Talent/product hunt, public relations are possibly other motivations for this. Whatever the intentions, the results are impactful that we all know. The software industry as a whole is benefiting from this. It doesn’t matter who authored a piece of code as long as it is valuable to someone and it will be. This is a co-operative non-zero-sum game. Continuous forking and improvements across developers who are indie, from an organization or an institution, are boosting both technology education and social evolution. To me, the beauty lies in the openness of source code brings with it trust and co-operation without revealing the identity of engaging peers — thus enabling fairness in innovation.
This same thing can happen to data as well if we build a sustainable system. As we saw in open-source software, the data producers and consumers can come down build a self-sustainable marketplace. Open-source software gave birth to cloud computing services. And open data could give birth to storage and analytics services that anyone can offer. Everyone adds more value to the system; data generators, data keepers, and data analysts — data can be forked by generating derivatives (insights) to bring more value.
There are non-profit organizations like Wikipedia already working on this. And as we have seen the derivative — Wiki-data has added more value to the overall system. Also, developers and companies are using Wiki-data to generate more information and value is keeping it closed. In most cases, the reason is that there’s no mechanism to open it up without any incentives. Because the value gets locked up in the pipeline, non-profit organizations like Wikipedia are always on the verge of existential threat. And it is a known fact that the companies that generate more value (Google, Amazon Alexa) give very less in return to Wikipedia which is mainly funded by the generous contributions from the general public. This model is neither sustainable nor progressive. What we need is a self-sustainable data market.
Google 2.0 will not be a single person or company. It will be an open market for data and management of it. New and existing entrepreneurs or companies have their own space in this. Of cause Google 1.0 can be a part of this. This market can eventually open up more of closed data in the long run, as entities start recognizing the value for any data they have. Individuals will generate and sell data and services as they wish.
Data as a foundation layer
To make this a reality, we need to design the foundation layer. Which asks the question of how the data represented at this layer. It should be,
- uniform across networks (ontology should be the same)
- both machine and human-readable
- distributed (edge storage and processing)
Numerous efforts are happening in this domain already. IPFS is already setting up a decentralized data storage with incentives to the participants. Underlay is working on a distributed knowledge graph over the IPFS network, BigchainDB is working on the BFT NoSQL database over IPFS, Aquila Network is working on semantic search indexing to be integrated with IPFS and so on.
A data marketplace’s core activity is to bring together a buyer and a seller. This gives a boost to an ecosystem of related services around this including finding the right data, subscribing to events, prediction markets, data restructuring, analytics services, isolated computation, etc. The ecosystem will be driven by clusters of decentralized organizations and carefully designed token economics.
Data ownership is another important feature of data markets. When data got introduced to the market, the owner who created it will get credited during its lifetime. Data protection makes sure only the authorized parties can access to it even though the storage and networking are made possible by trustless nodes.
As we have discussed previously, innovations in homomorphic hashing will multiply the possibilities of privacy-first computation and distribution of data without trusting anyone.
Decentralized collective intelligence
We at ‘a-mma’ believe that in the long run Artificial General Intelligence (AGI) needs to be solved through the emergence out of Human-Machine interactions. As we approach the problem, the first and foremost problem to be solved now is a self-sustaining data layer which is accessible to both humans and machines. Decentralized data markets present a potential opportunity for us.
a-mma (a_മ്മ) is a non profit organization with focus on long term development of swarm intelligence and related technologies. a-mma gives incubation & community support to commercial/non commercial projects in this field of interest and doesn’t own them.
Jubin Jose is one of the early members of a-mma, still helping it to reach a sustainable point of independent decentralized operation.