Why is your data still a corporate commodity?

Rory Byrne
Metro Platform
Published in
27 min readJun 15, 2018
Chesnot / Getty Images

When you think of the term “personal data”, you probably think of natural facts such as your age, nationality, sexual orientation, or political opinions. You may even extend the notion as far as your music taste, your favorite movies, or the exact movement pattern of your mouse while visiting a website. Wait, what?

Since the dawn of the internet we have always accepted — if we were even aware of it — that personal data is a normal part of life online, a sort of collateral-damage which makes possible the wonders of Facebook and Google, and that we make our data available for those companies to collect and use to improve their — hold on…

Let’s rewind. The word ‘collect’ may be first word that springs to mind here, but it’s not an accurate word.

Tech companies have carefully chosen their words to frame personal data as something that we possess and “allow” them to collect, whereas the truth is that we give them ‘information’ and they turn that information into ‘data’, a physical, digital commodity which is often referenced as the oil of the 21st Century.

What this means is that data — the stuff that powers AI — is a corporate commodity, created and owned by corporations, driven by the corporate interests of capitalism as opposed to ethical interests of the public. That model made sense once, when data was only narrowly useful in select corporate environments and we didn’t all have internet-connected supercomputers in our pockets, but today that model is not only stunting growth in the emerging AI industry, it’s downright dangerous to democracy in ways exemplified by Facebook and Emerdata (Cambridge Analytica) in recent months.

Create Is An Anagram For Collect

I want to first take a look at what “data” is and its two primary uses: transporting information and combining information.

All data is ones and zeros. We encode pieces of information as ones and zeros. As long as we all agree that “01000001” is the letter A, digital data is possible in its basic form.

However, not all data is created equal.

The reality of the data collection I mentioned in the introduction is that when you press the ‘like’ button, you arhen you think of the term “personal data”, you probably think of natural facts such as your age, nationality, sexual orientation, or political opinions. You may even extend the notion as far as your music taste, your favorite movies, or the exact movement pattern of your mouse while visiting a website. Wait, what?

Since the dawn of the internet we have always accepted — if we were even aware of it — that personal data is a normal part of life online, a sort of collateral-damage which makes possible the wonders of Facebook and Google, and that we make our data available for those companies to collect and use to improve their — hold on…

Let’s rewind. The word ‘collect’ may be first word that springs to mind here, but it’s not an accurate word.

Tech companies have carefully chosen their words to frame personal data as something that we possess and “allow” them to collect, whereas the truth is that we give them ‘information’ and they turn that information into ‘data’, a physical, digital commodity which is often referenced as the oil of the 21st Century.

What this means is that data — the stuff that powers AI — is a corporate commodity, created and owned by corporations, driven by the corporate interests of capitalism as opposed to ethical interests of the public. That model made sense once, when data was only narrowly useful in select corporate environments and we didn’t all have internet-connected supercomputers in our pockets, but today that model is not only stunting growth in the emerging AI industry, it’s downright dangerous to democracy in ways exemplified by Facebook and Emerdata (Cambridge Analytica) in recent months.

Create Is An Anagram For Collect
I want to first take a look at what “data” is and its two primary uses: transporting information and combining information.

All data is ones and zeros. We encode pieces of information as ones and zeros. As long as we all agree that “01000001” is the letter A, digital data is possible in its basic form.

However, not all data is created equal.
The reality of the data collection I mentioned in the introduction is that when you press the ‘like’ button, you are simply informing Facebook that you think positively of, say, Britney Spears. What happens is that a piece of data — we’ll call it an “information packet” — is created by your internet browser and sent to a Facebook server as a series of pulses of light through optical cables which span the globe, and when Facebook’s server receives that information it first laughs at you and mocks your music taste, then turns the “information packet” into a physical, persistent, “datapoint” on its hard-drive.

Under the hood, an “information packet” and a “datapoint” are the same thing, but they have different purposes and different uses.

An “information packet” is a messenger, carrying information encoded as ones and zeros from your device to Facebook for about a fifth of a second before disappearing, and one of its primary goals is to protect itself from corruption by avoiding all interaction with other pieces of data.

The “datapoint” that is then created by Facebook is technically the same thing — ones and zeros — but it is much more powerful because it lives for a long time and can be easily combined with other “datapoints”. In fact, interaction with other pieces of data is its entire purpose, and so it has some extra features and information to make that easier.

We generate data for transportation purposes, Facebook then generates “improved” data for AI purposes

“Datapoints”, as we have decided to name the more useful kind of data, can be combined and used billions of times in innumerable different ways with other datapoints, in a process we call “machine learning” or “AI”. This results in products such as Siri, targeted commercial ads, or even elections influenced using marketing campaigns tailored to each voters fears and desires.

Clearly “datapoints” are a much more valuable and useful version of “data” than mere “information packets”. So they must be really difficult to create, right?

Wrong.

Data is extremely easy to create. It’s created automatically every time you do anything online, and it is essentially free to store on the scale of an individual. However, we seem to accept that Facebook/Google are the only ones who can create it, store it, and therefore monetize it.

We have been tricked for decades into viewing data from a perspective which suits corporate interests while being simultaneously told that we are the owners.

This cognitive dissonance is one of the greatest tricks pulled this century, allowing corporate empires to be built upon a resource which is not only free to create but whose creation has only one ingredient: us.

Your Data Votes, Not You
One of the many uses for data is to help guide marketing campaigns, and at this stage I think most of us are at least aware of how common psychological marketing techniques are and how strongly they can affect us. If you own an Apple product, raise your hand.

While I am totally against the use of pseudo-science to sell shampoo, and the advertising of a ‘new’ $60 version of an old toy to children, the devil’s advocate in me can see the economic value of stimulating spending in a capitalist economy. Even if the means are deplorable, the end at least produces jobs and increases the flow of money, and that’s why we turn a blind eye.

That sympathy does not extend to the Orwellian activity of Emerdata.

Before they changed their name to Emerdata, Cambridge Analytica were said to have influenced elections and referendums around the world — including Brexit and the 2016 US presidential election.

They do this by building marketing campaigns designed to target certain political demographics with messages tailored to their desires, opinions, and fears.

This type of ‘political consultancy’ is therefore only possible with access to data (“datapoints”, not “information packets”) about your desires, opinions, and fears.

Since data is a commodity governed by corporate interests rather than public interests and we lack the regulation needed to bridge that gap, Facebook sells your data.

Facebook’s activity as a data collection company has never been regulated in any meaningful way, and they have been the target of continuous “social justice” for the past few months regarding their data ethics. However, in spite of the perceived backlash due to its role in supplying data to Emerdata’s campaigns, Facebook’s stock price has bounced back up.

Regulation, while it is certainly needed to some degree, is not enough to solve the problem on its own. That’s not its purpose either.

A tool as powerful as government regulation powerful should be used as a surgical knife rather than a jackhammer.
GDPR, the set of new European data regulation laws, tries to be a broad and complete solution to a problem which requires combined effort from many players, which means it may be blunt enough to cause a lot of confusion and collateral damage in many industries.

In a capitalist society, regulation imposed by elected officials is not the only way that the public can influence the activity of corporate bodies, nor is it the most effective way.

We vote with our wallets and business actions, and that combined with sensible regulation will be the most effective strategy.

There is enough technological advancement today to make ethical data collection more than just a philosophical concept, and there is enough anger in the post-Brexit/Trump world for the public to take some responsibility for the companies they support.

Whether it’s us here at Metro or some other company, an organic, business-driven alternative to unethical data collection is on the way. We all need to support these honest ventures and help propagate a shift in perspective on what data is and who ought to control it.

We need to lay the foundation for the next generation of internet companies — the generation that actually gets it right.
The Data Monopoly Is Bad For Innovation
“Life starts at a billion examples”
…Or so Ray Kurzweil said recently in a talk at MIT, and much like how Ray Kurzweil merged his startup into Google because they had access to data, future innovators don’t seem to have much choice but to join one of the few billionaires in town.

However most of these were, at one point, small companies run by 2–5 people who had innovative ideas and an open platform on which to build them.

Access to an open platform is key here, because nobody can deny that Tim Berners-Lee’s altruism in making the internet free and open is what allowed the innovations of Larry Page and Jeff Bezos to succeed so well.

Those companies have since cemented their places in the tech industry by providing fundamental services for other tech companies — Google’s Android platform and Amazon’s AWS platform come to mind.

The idea is to provide a service so crucial and so useful that you become part of the furniture at every future company in the space.
Once a tech company reaches a certain size, it can adopt a business model which lives on a level of abstraction immune to ageing, a technological trump card that places you in the role of a demi-God. And the more closed-off and monopolistic that trump card is, the easier it is to protect — especially in the total absence of anti-trust laws in the United States, something which deserves a blog post of its own. Perhaps we can revive Teddy Roosevelt to write it?

As we saw at I/O last month, Google has its eyes on the most hyped trump card of all, AI. This focus on is something that Eric Schmidt made clear during a talk to Stanford last year.

Here’s a quote from that talk in reference to machine learning, which was his focus throughout the talk:

“It’s possible that the next generation of companies can get to 90% market share models. That success model has not slowed down. There’s never been more opportunity than now to create these companies because the barriers are so low”
While he is correct in a way, his subtle wording completely ignores the ultimate barrier to entering the AI industry, data.

No altruistic platform exists for data the way it did for digital network communication back in 1999, and why would Google provide it when they can provide access to the resulting trained models as a service instead? The closed system is better for their bottom line, and if they can provide AI as a service then they will cement their place in tech for the third, if not fourth time over.

Cracking that closed system and opening the door for data to be created locally as a personal commodity and traded using a secure, distributed ledger is what will open the floodgates for the AI industry.

We will see innovation coming from the little guy again, being built on open platforms again, just like it used to.

The future inventor of Artificial General Intelligence could well be alive today, and that person shouldn’t need to work for Google in order to change the world.
We have no shortage of examples of the immense power of data today. The resurgence of neural network research and reinforcement learning techniques has produced some incredibly impressive AI in the past 10 years, and this is just the beginning. However, the advancement of computer hardware along with the birth of secure distributed ledgers have opened other doors of possibility.

Data should be something that an individual owns and controls completely, from its inception, through its secure transfer, to its ultimate destruction if the individual wishes it so.

All of this should happen without the need for a centralized authority touching the data, nor a centralized authority who writes the data-collection code.

We have the technology now to make it as easy as possible for users to generate and securely send their own data in real-time to a startup companies in an enriched, useful format that can immediately be used, and those users should be paid fairly. As easy, in-fact, as it is for Facebook to receive the same service through my use of a web-browser or their smartphone app.

I’ll go a step further, and say that we can make it even easier, we can make it more powerful, we can make it future-proof, and we can make it trustworthy by default.

That’s what Metro is about. We are two founders, recent graduates, and we are creating an ecosystem of ethically-built AI startups and the users who power them. It’s a platform for crowdsourcing data, and it allows users to generate their own online data and sell it to AI startups — or any project — in real-time, cutting out the centralized data collection companies entirely. We believe that he who wishes to collect data should not be the one to write the code for it, and so all data-collection code exists as community-made, open-source plugins called DataSources.

Check us out at our (WIP) websithen you think of the term “personal data”, you probably think of natural facts such as your age, nationality, sexual orientation, or political opinions. You may even extend the notion as far as your music taste, your favorite movies, or the exact movement pattern of your mouse while visiting a website. Wait, what?

Since the dawn of the internet we have always accepted — if we were even aware of it — that personal data is a normal part of life online, a sort of collateral-damage which makes possible the wonders of Facebook and Google, and that we make our data available for those companies to collect and use to improve their — hold on…

Let’s rewind. The word ‘collect’ may be first word that springs to mind here, but it’s not an accurate word.

Tech companies have carefully chosen their words to frame personal data as something that we possess and “allow” them to collect, whereas the truth is that we give them ‘information’ and they turn that information into ‘data’, a physical, digital commodity which is often referenced as the oil of the 21st Century.

What this means is that data — the stuff that powers AI — is a corporate commodity, created and owned by corporations, driven by the corporate interests of capitalism as opposed to ethical interests of the public. That model made sense once, when data was only narrowly useful in select corporate environments and we didn’t all have internet-connected supercomputers in our pockets, but today that model is not only stunting growth in the emerging AI industry, it’s downright dangerous to democracy in ways exemplified by Facebook and Emerdata (Cambridge Analytica) in recent months.

Create Is An Anagram For Collect
I want to first take a look at what “data” is and its two primary uses: transporting information and combining information.

All data is ones and zeros. We encode pieces of information as ones and zeros. As long as we all agree that “01000001” is the letter A, digital data is possible in its basic form.

However, not all data is created equal.
The reality of the data collection I mentioned in the introduction is that when you press the ‘like’ button, you are simply informing Facebook that you think positively of, say, Britney Spears. What happens is that a piece of data — we’ll call it an “information packet” — is created by your internet browser and sent to a Facebook server as a series of pulses of light through optical cables which span the globe, and when Facebook’s server receives that information it first laughs at you and mocks your music taste, then turns the “information packet” into a physical, persistent, “datapoint” on its hard-drive.

Under the hood, an “information packet” and a “datapoint” are the same thing, but they have different purposes and different uses.

An “information packet” is a messenger, carrying information encoded as ones and zeros from your device to Facebook for about a fifth of a second before disappearing, and one of its primary goals is to protect itself from corruption by avoiding all interaction with other pieces of data.

The “datapoint” that is then created by Facebook is technically the same thing — ones and zeros — but it is much more powerful because it lives for a long time and can be easily combined with other “datapoints”. In fact, interaction with other pieces of data is its entire purpose, and so it has some extra features and information to make that easier.

We generate data for transportation purposes, Facebook then generates “improved” data for AI purposes

“Datapoints”, as we have decided to name the more useful kind of data, can be combined and used billions of times in innumerable different ways with other datapoints, in a process we call “machine learning” or “AI”. This results in products such as Siri, targeted commercial ads, or even elections influenced using marketing campaigns tailored to each voters fears and desires.

Clearly “datapoints” are a much more valuable and useful version of “data” than mere “information packets”. So they must be really difficult to create, right?

Wrong.

Data is extremely easy to create. It’s created automatically every time you do anything online, and it is essentially free to store on the scale of an individual. However, we seem to accept that Facebook/Google are the only ones who can create it, store it, and therefore monetize it.

We have been tricked for decades into viewing data from a perspective which suits corporate interests while being simultaneously told that we are the owners.

This cognitive dissonance is one of the greatest tricks pulled this century, allowing corporate empires to be built upon a resource which is not only free to create but whose creation has only one ingredient: us.

Your Data Votes, Not You
One of the many uses for data is to help guide marketing campaigns, and at this stage I think most of us are at least aware of how common psychological marketing techniques are and how strongly they can affect us. If you own an Apple product, raise your hand.

While I am totally against the use of pseudo-science to sell shampoo, and the advertising of a ‘new’ $60 version of an old toy to children, the devil’s advocate in me can see the economic value of stimulating spending in a capitalist economy. Even if the means are deplorable, the end at least produces jobs and increases the flow of money, and that’s why we turn a blind eye.

That sympathy does not extend to the Orwellian activity of Emerdata.

Before they changed their name to Emerdata, Cambridge Analytica were said to have influenced elections and referendums around the world — including Brexit and the 2016 US presidential election.

They do this by building marketing campaigns designed to target certain political demographics with messages tailored to their desires, opinions, and fears.

This type of ‘political consultancy’ is therefore only possible with access to data (“datapoints”, not “information packets”) about your desires, opinions, and fears.

Since data is a commodity governed by corporate interests rather than public interests and we lack the regulation needed to bridge that gap, Facebook sells your data.

Facebook’s activity as a data collection company has never been regulated in any meaningful way, and they have been the target of continuous “social justice” for the past few months regarding their data ethics. However, in spite of the perceived backlash due to its role in supplying data to Emerdata’s campaigns, Facebook’s stock price has bounced back up.

Regulation, while it is certainly needed to some degree, is not enough to solve the problem on its own. That’s not its purpose either.

A tool as powerful as government regulation powerful should be used as a surgical knife rather than a jackhammer.
GDPR, the set of new European data regulation laws, tries to be a broad and complete solution to a problem which requires combined effort from many players, which means it may be blunt enough to cause a lot of confusion and collateral damage in many industries.

In a capitalist society, regulation imposed by elected officials is not the only way that the public can influence the activity of corporate bodies, nor is it the most effective way.

We vote with our wallets and business actions, and that combined with sensible regulation will be the most effective strategy.

There is enough technological advancement today to make ethical data collection more than just a philosophical concept, and there is enough anger in the post-Brexit/Trump world for the public to take some responsibility for the companies they support.

Whether it’s us here at Metro or some other company, an organic, business-driven alternative to unethical data collection is on the way. We all need to support these honest ventures and help propagate a shift in perspective on what data is and who ought to control it.

We need to lay the foundation for the next generation of internet companies — the generation that actually gets it right.
The Data Monopoly Is Bad For Innovation
“Life starts at a billion examples”
…Or so Ray Kurzweil said recently in a talk at MIT, and much like how Ray Kurzweil merged his startup into Google because they had access to data, future innovators don’t seem to have much choice but to join one of the few billionaires in town.

However most of these were, at one point, small companies run by 2–5 people who had innovative ideas and an open platform on which to build them.

Access to an open platform is key here, because nobody can deny that Tim Berners-Lee’s altruism in making the internet free and open is what allowed the innovations of Larry Page and Jeff Bezos to succeed so well.

Those companies have since cemented their places in the tech industry by providing fundamental services for other tech companies — Google’s Android platform and Amazon’s AWS platform come to mind.

The idea is to provide a service so crucial and so useful that you become part of the furniture at every future company in the space.
Once a tech company reaches a certain size, it can adopt a business model which lives on a level of abstraction immune to ageing, a technological trump card that places you in the role of a demi-God. And the more closed-off and monopolistic that trump card is, the easier it is to protect — especially in the total absence of anti-trust laws in the United States, something which deserves a blog post of its own. Perhaps we can revive Teddy Roosevelt to write it?

As we saw at I/O last month, Google has its eyes on the most hyped trump card of all, AI. This focus on is something that Eric Schmidt made clear during a talk to Stanford last year.

Here’s a quote from that talk in reference to machine learning, which was his focus throughout the talk:

“It’s possible that the next generation of companies can get to 90% market share models. That success model has not slowed down. There’s never been more opportunity than now to create these companies because the barriers are so low”
While he is correct in a way, his subtle wording completely ignores the ultimate barrier to entering the AI industry, data.

No altruistic platform exists for data the way it did for digital network communication back in 1999, and why would Google provide it when they can provide access to the resulting trained models as a service instead? The closed system is better for their bottom line, and if they can provide AI as a service then they will cement their place in tech for the third, if not fourth time over.

Cracking that closed system and opening the door for data to be created locally as a personal commodity and traded using a secure, distributed ledger is what will open the floodgates for the AI industry.

We will see innovation coming from the little guy again, being built on open platforms again, just like it used to.

The future inventor of Artificial General Intelligence could well be alive today, and that person shouldn’t need to work for Google in order to change the world.
We have no shortage of examples of the immense power of data today. The resurgence of neural network research and reinforcement learning techniques has produced some incredibly impressive AI in the past 10 years, and this is just the beginning. However, the advancement of computer hardware along with the birth of secure distributed ledgers have opened other doors of possibility.

Data should be something that an individual owns and controls completely, from its inception, through its secure transfer, to its ultimate destruction if the individual wishes it so.

All of this should happen without the need for a centralized authority touching the data, nor a centralized authority who writes the data-collection code.

We have the technology now to make it as easy as possible for users to generate and securely send their own data in real-time to a startup companies in an enriched, useful format that can immediately be used, and those users should be paid fairly. As easy, in-fact, as it is for Facebook to receive the same service through my use of a web-browser or their smartphone app.

I’ll go a step further, and say that we can make it even easier, we can make it more powerful, we can make it future-proof, and we can make it trustworthy by default.

That’s what Metro is about. We are two founders, recent graduates, and we are creating an ecosystem of ethically-built AI startups and the users who power them. It’s a platform for crowdsourcing data, and it allows users to generate their own online data and sell it to AI startups — or any project — in real-time, cutting out the centralized data collection companies entirely. We believe that he who wishes to collect data should not be the one to write the code for it, and so all data-collection code exists as community-made, open-source plugins called DataSources.

Check us out at our (WIP) website here. And if you’d like to follow our tupdates then sign up to our newsletter.

Join our Slack channel to discuss Self-Generated Data, Metro, and start contributing to the platform.

If you’re interested in what we are doing or want to get involved, then feel free to give me a shout personally at rory@metro.exchange.e here. And if you’d like to follow our tupdates then sign up to our newsletter.

Join our Slack channel to discuss Self-Generated Data, Metro, and start contributing to the platform.

If you’re interested in what we are doing or want to get involved, then feel free to give me a shout personally at rory@metro.exchange.e simply informing Facebook that you think positively of, say, Britney Spears. What happens is that a piece of data — we’ll call it an “information packet” — is created by your internet browser and sent to a Facebook server as a series of pulses of light through optical cables which span the globe, and when Facebook’s server receives that information it first laughs at you and mocks your music taste, then turns the “information packet” into a physical, persistent, “datapoint” on its hard-drive.

Under the hood, an “information packet” and a “datapoint” are the same thing, but they have different purposes and different uses.

An “information packet” is a messenger, carrying information encoded as ones and zeros from your device to Facebook for about a fifth of a second before disappearing, and one of its primary goals is to protect itself from corruption by avoiding all interaction with other pieces of data.

The “datapoint” that is then created by Facebook is technically the same thing — ones and zeros — but it is much more powerful because it lives for a long time and can be easily combined with other “datapoints”. In fact, interaction with other pieces of data is its entire purpose, and so it has some extra features and information to make that easier.

We generate data for transportation purposes, Facebook then generates “improved” data for AI purposes

“Datapoints”, as we have decided to name the more useful kind of data, can be combined and used billions of times in innumerable different ways with other datapoints, in a process we call “machine learning” or “AI”. This results in products such as Siri, targeted commercial ads, or even elections influenced using marketing campaigns tailored to each voters fears and desires.

Clearly “datapoints” are a much more valuable and useful version of “data” than mere “information packets”. So they must be really difficult to create, right?

Wrong.

Data is extremely easy to create. It’s created automatically every time you do anything online, and it is essentially free to store on the scale of an individual. However, we seem to accept that Facebook/Google are the only ones who can create it, store it, and therefore monetize it.

We have been tricked for decades into viewing data from a perspective which suits corporate interests while being simultaneously told that we are the owners.

This cognitive dissonance is one of the greatest tricks pulled this century, allowing corporate empires to be built upon a resource which is not only free to create but whose creation has only one ingredient: us.

Your Data Votes, Not You

One of the many uses for data is to help guide marketing campaigns, and at this stage I think most of us are at least aware of how common psychological marketing techniques are and how strongly they can affect us. If you own an Apple product, raise your hand.

While I am totally against the use of pseudo-science to sell shampoo, and the advertising of a ‘new’ $60 version of an old toy to children, the devil’s advocate in me can see the economic value of stimulating spending in a capitalist economy. Even if the means are deplorable, the end at least produces jobs and increases the flow of money, and that’s why we turn a blind eye.

That sympathy does not extend to the Orwellian activity of Emerdata.

Before they changed their name to Emerdata, Cambridge Analytica were said to have influenced elections and referendums around the world — including Brexit and the 2016 US presidential election.

They do this by building marketing campaigns designed to target certain political demographics with messages tailored to their desires, opinions, and fears.

This type of ‘political consultancy’ is therefore only possible with access to data (“datapoints”, not “information packets”) about your desires, opinions, and fears.

Since data is a commodity governed by corporate interests rather than public interests and we lack the regulation needed to bridge that gap, Facebook sells your data.

Facebook’s activity as a data collection company has never been regulated in any meaningful way, and they have been the target of continuous “social justice” for the past few months regarding their data ethics. However, in spite of the perceived backlash due to its role in supplying data to Emerdata’s campaigns, Facebook’s stock price has bounced back up.

Regulation, while it is certainly needed to some degree, is not enough to solve the problem on its own. That’s not its purpose either.

A tool as powerful as government regulation powerful should be used as a surgical knife rather than a jackhammer.

GDPR, the set of new European data regulation laws, tries to be a broad and complete solution to a problem which requires combined effort from many players, which means it may be blunt enough to cause a lot of confusion and collateral damage in many industries.

In a capitalist society, regulation imposed by elected officials is not the only way that the public can influence the activity of corporate bodies, nor is it the most effective way.

We vote with our wallets and business actions, and that combined with sensible regulation will be the most effective strategy.

There is enough technological advancement today to make ethical data collection more than just a philosophical concept, and there is enough anger in the post-Brexit/Trump world for the public to take some responsibility for the companies they support.

Whether it’s us here at Metro or some other company, an organic, business-driven alternative to unethical data collection is on the way. We all need to support these honest ventures and help propagate a shift in perspective on what data is and who ought to control it.

We need to lay the foundation for the next generation of internet companies — the generation that actually gets it right.

The Data Monopoly Is Bad For Innovation

“Life starts at a billion examples”

…Or so Ray Kurzweil said recently in a talk at MIT, and much like how Ray Kurzweil merged his startup into Google because they had access to data, future innovators don’t seem to have much choice but to join one of the few billionaires in town.

However most of these were, at one point, small companies run by 2–5 people who had innovative ideas and an open platform on which to build them.

Access to an open platform is key here, because nobody can deny that Tim Berners-Lee’s altruism in making the internet free and open is what allowed the innovations of Larry Page and Jeff Bezos to succeed so well.

Those companies have since cemented their places in the tech industry by providing fundamental services for other tech companies — Google’s Android platform and Amazon’s AWS platform come to mind.

The idea is to provide a service so crucial and so useful that you become part of the furniture at every future company in the space.

Once a tech company reaches a certain size, it can adopt a business model which lives on a level of abstraction immune to ageing, a technological trump card that places you in the role of a demi-God. And the more closed-off and monopolistic that trump card is, the easier it is to protect — especially in the total absence of anti-trust laws in the United States, something which deserves a blog post of its own. Perhaps we can revive Teddy Roosevelt to write it?

As we saw at I/O last month, Google has its eyes on the most hyped trump card of all, AI. This focus on is something that Eric Schmidt made clear during a talk to Stanford last year.

Here’s a quote from that talk in reference to machine learning, which was his focus throughout the talk:

“It’s possible that the next generation of companies can get to 90% market share models. That success model has not slowed down. There’s never been more opportunity than now to create these companies because the barriers are so low”

While he is correct in a way, his subtle wording completely ignores the ultimate barrier to entering the AI industry, data.

No altruistic platform exists for data the way it did for digital network communication back in 1999, and why would Google provide it when they can provide access to the resulting trained models as a service instead? The closed system is better for their bottom line, and if they can provide AI as a service then they will cement their place in tech for the third, if not fourth time over.

Cracking that closed system and opening the door for data to be created locally as a personal commodity and traded using a secure, distributed ledger is what will open the floodgates for the AI industry.

We will see innovation coming from the little guy again, being built on open platforms again, just like it used to.

The future inventor of Artificial General Intelligence could well be alive today, and that person shouldn’t need to work for Google in order to change the world.

We have no shortage of examples of the immense power of data today. The resurgence of neural network research and reinforcement learning techniques has produced some incredibly impressive AI in the past 10 years, and this is just the beginning. However, the advancement of computer hardware along with the birth of secure distributed ledgers have opened other doors of possibility.

Data should be something that an individual owns and controls completely, from its inception, through its secure transfer, to its ultimate destruction if the individual wishes it so.

All of this should happen without the need for a centralized authority touching the data, nor a centralized authority who writes the data-collection code.

We have the technology now to make it as easy as possible for users to generate and securely send their own data in real-time to a startup companies in an enriched, useful format that can immediately be used, and those users should be paid fairly. As easy, in-fact, as it is for Facebook to receive the same service through my use of a web-browser or their smartphone app.

I’ll go a step further, and say that we can make it even easier, we can make it more powerful, we can make it future-proof, and we can make it trustworthy by default.

That’s what Metro is about. We are two founders, recent graduates, and we are creating an ecosystem of ethically-built AI startups and the users who power them. It’s a platform for crowdsourcing data, and it allows users to generate their own online data and sell it to AI startups — or any project — in real-time, cutting out the centralized data collection companies entirely. We believe that he who wishes to collect data should not be the one to write the code for it, and so all data-collection code exists as community-made, open-source plugins called DataSources.

Check us out at our (WIP) website here. And if you’d like to follow our updates then sign up to our newsletter.

Join our Slack channel to discuss Self-Generated Data, Metro, and start contributing to the platform.

If you’re interested in what we are doing or want to get involved, then feel free to give me a shout personally at rory@getmetro.co.

--

--