Who owns your Data? Organisation of the New Capital

Urban AI
Urban AI
Published in
11 min readJun 23, 2022

This essay was written by Saulė Gabrielė, Nadia Leonova and Lukas Utzig as part of our Emerging Leaders Program.

Introduction

These days we are generating a lot of data simply by carrying our phones around and consenting cooking on websites. While some of the data collected can be accessed by researchers for social good, other data is being used to provide targeted advertisements. Companies buy and sell our data, what is currently lacking is a mechanism for individuals to examine/get a grasp of where their personal data is being used and for what purpose. On the one hand we may want more autonomy about how the data about us is collected and used, on the other — we have already given up a lot of our data without this consideration. For any other form of data collection and storage to take place we would need a mindset shift.

This paper is going to examine how data collection and exchange might take place in the future, particularly with regards to personal oversight, centralised access and how the proposed strategies may influence the future of research done by private companies and for public good.

We argue that currently there is not enough public interest and understanding about the data ownership and use. We review ideas of data management proposed in Building the New Economy (Pentland et al, 2021) and discuss the feasibility of those ideas in today’s context.

Building the New Economy: Data as Capital — Alex Sandy Pentland

Literature Review

Aggregated data can provide valuable insights into a variety of socio-economic factors which in turn can be used to identify and analyse target audience etc., while individual data points do not hold much value on their own. However, Pentland et al (2021) argue that individuals should be able to own their data and can rightfully expect for it to be protected and secured to avoid being identified and targeted. Having a third party manage data access is imperative to allow for the citizens to claim back control of their data. This is the central idea discussed in Building the New Economy (Pentland et al, 2021).

Pentland (2014) and Pentland et al (2021) criticise the way data is currently given away by individuals to large entities, such as companies, without a proper bargaining process for consumers as to its value, or a safe and reliable record of permissions. He compares the inequality of power to the situation of workers during 19th century Industrialisation, which led to the formation of unions for collective wage bargaining and the establishment of credit unions and cooperative banks to support low- and mid-income households with lending and financial services.

In the UK, Trade unions are labor unions which first emerged during the Industrial Revolution to defend workers’ rights such as salaries, work hours, and collective bargaining

Using the terms ‘data union’ or ‘data cooperative’ Pentland et al (2021) lay out a strategy of decentralised community organisations that have members who share a common bond, i.e. geographically, socially or through their consumer behaviour. These unions would hold a record of their members’ data, together with the history of usage rights that were granted, and they would represent the collective interest legally and financially. The members’ gain would be to make the aggregate data available for collective analysis insights and improvements in health, transport etc. to allow improvements of the community while also bargaining for fair compensation for the use of personal data for commercial purposes by corporations.

In the book Social Physics (2014), Pentland uses the term ‘public data commons’ making a reference to the economic concept of a shared public good that is accessible to and usable by everyone but also requires a strict set of rules for everyone to follow. This concept has evolved in his later work Building the New Economy (Pentland et al, 2021) as the unions and cooperatives he mentions in it are private entities, sometimes not-for-profit, motivated by a certain interest of their members. This indicates a conceptual shift from a system of individual usership under state rules towards a fluid bargaining system of opposing interests and negotiations of large companies and collectives with many members, re-establishing a so-far missing equilibrium.

Pentland, A. (2014). Social physics: How good ideas spread-the lessons from a new science. Penguin.

One of the proposed ways to achieve this is by creating a third-party which would be in control of data collection, storage, and access. Pentland et al (2021) argue that such data exchange mechanisms are important to allow research both for public good and by private companies to happen while maintaining anonymity of individual subjects of the research.

“Data exchanges are platforms that gather data from many different sources and allow third parties to run algorithms on this data. As a result, these third parties can generate insights (knowledge) with new sources of data. Hence, data exchanges give rise to the concept of shared data …” (Pentland et al, 2021, pg. 36)

In principle this idea can be implemented and the OPAL initiative is currently doing just so using some of the principles, such as “bringing code to data”, meaning algorithms are run where the data is securely stored without sharing it and focusing on privacy of data

As described by Pentland et al (2021) data is nonfungible, implying that the same amount of data may not be of the same value and therefore this makes it difficult to trade within our society. However, it is also nonexclusive, it can be used for a variety of purposes at the same time, unlike labour or capital. Which in turn allows one to gain a variety of insights for different purposes by supplying an algorithm which is checked and applied to the data held by the data exchange.

One of the advantages highlighted by Pentland et al (2021) is that no matter how small or large, established or a start-up — all companies would be able to have equal access to the information, should they wish to use it. For example, about the customer base, thus allowing fair market competition.

Discussion

One of the main ideas described by Pentland et al (2021) is the idea of data cooperative and the need of such third party to manage citizens’ data. These cooperatives would work as labour unions defending the rights of its members. Moreover, the main focus of such cooperatives is at community scale and would have strong geographical constraints. If someone wants to know more about a community and develop data analysis over it, one can do it through a data cooperative of that community. If the particular community is widely involved in giving consent about their data, such a request could be successful. However, most often it would lack the full representation of the community population, meaning that the data would not come in high quality thus being not representative and useful.

In our view, such quality issues could be solved if data cooperatives are not geographically bounded, but united by the particular platform’s users, such as social network platforms such as Facebook or healthcare providers such as NHS. That would mean users of particular platforms sharing their consent about their data with a union, which is designed to manage only the data footprint of the particular platform they are at. In this way the data cooperatives concept would be clearer for businesses willing to use the data for analysis — they could address a particular type of data that they know they need regarding their research question or business. In terms of geography, it could be easily split geographically depending on the needed region. This way of data cooperatives would solve the data quality question, because if the data is provided by a specific platform’s data union, a specific data feature of the company or platform comes in more complete coverage. Avoiding geographical limitations would allow such cooperatives usability to be scaled geographically much faster, since it would not be limited to the specific community.

Atlas of Inequality is an example of insights that can be generated through location data analysis

Moreover, having data cooperatives around the specific company or particular use case could motivate the users of it to join it. Since users already are on the platform, this could work as an upgrade of terms and conditions, as if it were when signing a contract for a job and it includes a part about labour unions rights. Such data unions could also defend users’ rights against the platform in case of need and would be more powerful since there would be not one person fighting a company such as Facebook, but millions that are unsatisfied with the data policy. We have already seen cases of people uniting in a similar way to protect their rights in cases like a data breach.

The other important aspect about data cooperatives’ success is the indifference of users, who’s data is being used as capital. For example, many people are unaware that their data is collected by cookies and there is an option to opt out of everything that isn’t necessary for the provision of the service they are after instead of accepting all. On this note, there are also probably very few people who read terms and conditions of social platforms that they are registering at. It becomes too common that if the platform is for free in terms of money this means users pay with their data currency. Even though data cooperatives on specific platforms could help to advocate for its users’ data sharing rights, if data sharing is not a concern for the majority the progress to implement any changes would be slow. Such indifference in data security might arise from the lack of data and technology literacy. Most people do not really understand what the idea of data is and what information is being collected around them. Even though Pentland et al (2021) advocate putting control of the data back in the hands of the individuals, there needs to be universal support and at the moment there seems to be a lack of understanding within the general population.

Even if data communities based on geographical proximity would exist, the question of data quality remains. If the community consists of people of various ages, that could lead to very different types of data being collected over them. Since older generations have smaller digital footprints, data in such a cooperative would become more scattered, less representative and might have bias over elderly and other minorities. This data could well describe individual data points over individual people but would be hardly used by businesses that seek to see the full picture of multiple areas with aggregated high-quality datasets over it.

It might be difficult to get a full grasp of what data is being collected and how it is being used when we don’t have a full picture available to us yet and our data is just out there being used for any purpose someone finds useful — be it ethical research or targeted advertisement. Even when something as tangible as land ownership is in question — everyday citizens may not question it (Minton, 2009) until their rights are infringed. While some of us might question why outdated rules still govern in the 21st century, such as land ownership (Minton, 2009) and access to nature (Right to Roam, 2022). It will take time to realise the scale of what we’ve given up by agreeing to share our data over the recent years and it will take time for the regulations to catch up. Most notably the scandals such as Cambridge Analytica etc. have shined a spotlight on the issue but it will take time to put regulations in place to avoid this taking place so easily without any oversight. The only way we can hold the companies accountable as of now is in the court of public opinion.

Facebook–Cambridge Analytica data scandal

Conclusion

While Pentland offers a coherent framework on how data unions offer a way for citizens to control and monetize their data, there are key obstacles in the way to its successful implementation. First, the ongoing process of raising awareness about the value of data has not yet reached a significantly large enough number of people. This inertia relates to a lack of data literacy among users. Another key problem is the common element grouping individuals into data unions. A mere community based, geographical organisation could lead to heterogeneous and inconsistent datasets as citizens generate data with different companies. This would make the aggregate dataset difficult to use for analysts and may devalue its market price. In addition, it would generate a higher entry barrier for people to join the union, as they would have to actively create or join this entity and then organise their different data permissions with them.

A possible organising element could be if the union is attached to a single company or operates in a specific sector, such as social media, across different platforms. This would be similar to existing labour unions or workers councils in companies. Users would specify their union membership during registration for a company’s service, which would lower the bar for people to sign up to this process. This would make the data a union manages more homogenous and more attractive for potential buyers and analysts who are looking for a specific data type or data set.

As an outlook towards further investigation, it would be useful to study the emergence of labour unions in the US and Europe, and specifically the intensive pushback they are faced with by corporations in the United States. Union membership there has declined over the last decades due to aggressive measures by companies. The outlook of global enterprises giving up their use of free data is likely to evoke a similar response including political lobbying efforts. In order to overcome this, a thorough knowledge of negotiation processes and precedents in other countries with higher union membership rates such as Sweden or Denmark will be necessary.

References :

Minton, A. (2009). Ground control: Fear and happiness in the twenty-first-century city. London: Penguin Books.

Right to Roam. (2022). [online] Available at: https://www.righttoroam.org.uk/. (Accessed: 10/05/2022).

Pentland, A. (2014). Social physics: How good ideas spread-the lessons from a new science. Penguin.

Pentland A., Lipton A., and Hardjono T. (2021). Building the New Economy: Data as Capital. The MIT Press.

Saulė Gabrielė Petraityte is a spatial data scientist from Lithuania working on data-driven cities projects. She is the CEO of Datahood and Co-Founder of GovTech Lab Lithuania.

Nadia Leonova is a consultant for the World Bank’s Global Facility for Disaster Reduction and Recovery (GFDRR). Her work focuses on the analysis of the impacts of natural disasters on urban environments. Nadia holds an MSc degree in Smart Cities and Urban Analytics and a BSc degree in Architecture

Lukas Utzig is a researcher and designer holding a master’s degree in spatial research from the Space Syntax Lab, UCL. Currently he works as lead architect and urban designer for an international practice. In his research, he focusses on understanding spatial patterns of movement, segregation, and social networks.

--

--

Urban AI
Urban AI

The 1st Think Tank on Urban Artificial Intelligences