Digital Ownership in Data Science

Akif Berber
Sopra Steria NL Data & AI
13 min readJul 8, 2024
“Symbolize the digital world and the ethical dilemma of data ownership” by DALL-E

Introduction

Imagine walking through a busy city square and an artist paints your portrait. Who owns the painting: you or the artist? This question mirrors a similar issue in the digital world: when a company uses algorithms to track and analyze your online activity — like what you read or buy — does this data belong to you or the company? This issue of data ownership brings us into a complex ethical area, blurring the lines between personal and public spaces, and between the creator and the created.

As we dive deeper, the emergence of Large Language Models (LLMs) like ChatGPT brings these ethical questions into sharper focus. Known for generating text that mimics human writing from extensive digital data, these models introduce critical considerations for navigating ethical issues responsibly. We face two basic dilemmas of data ownership:

Human-Created Data: This includes content like photos on social media or blog posts. If you post a photo online, who does it belong to — you or the platform?

LLM-Generated Data: This involves content created by LLMs from existing data, like a new piece of writing based on learned patterns. When LLMs produce something new, who owns it — the LLM developers, the data providers, or is it a new form of ownership?

Exploring these issues shows we’re moving into new ethical areas in data science and AI. The role of LLMs in our digital world highlights the need for early ethical considerations to prepare for upcoming challenges. This article aims to delve into these important issues, advocating for pursuing a proactive approach to the ethical dilemmas presented by LLMs and data science, ultimately promoting a culture of trust and ethical responsibility.

History of Data Ownership:

The concept of data ownership has significantly evolved with technological and societal changes. Initially, data was linked to physical records and databases, where ownership was clear and tied to the creator or collector. However, the digital revolution, led by the internet and cloud computing, changed this landscape drastically.

From Tangible to Digital: In computing’s early days, data ownership was straightforward, tied to physical items like documents and storage media. Owners controlled their data as long as it remained within their physical domain. Databases and digital storage then broadened data ownership’s scope, but if you created or collected the data, it was yours.

The Internet Era: The internet made it hard to control data due to the ease of copying and sharing digital information. Issues like copyright infringement became widespread, leading to legal battles and the creation of digital rights management (DRM) technologies.

Cloud Computing and Big Data: Cloud computing complicated data ownership further. With data stored and processed remotely, it was challenging to define ownership, especially as big data analytics combined data from multiple sources, complicating ownership identification.

The Rise of LLMs and AI: The development of Large Language Models (LLMs) and AI, which generate new content from vast data, blurred the lines between original and derived data, raising questions about the ownership of AI-generated content.

Legal and Ethical Considerations: Legal and ethical issues around data ownership have evolved from focusing on copyright laws for tangible assets to addressing digital innovations. Regulations like the General Data Protection Regulation (GDPR) [1] emphasise data protection and privacy, granting individuals rights over their personal data.

Government and Corporate Dynamics: The relationship between governments, corporations, and individuals has shifted. Governments regulate data use for privacy protection, while corporations seek less restrictive regulations to use data for innovation and profit. Meanwhile, individuals demand more control and transparency over their data.

As technology and societal values continue to develop, navigating data ownership requires a balanced approach that respects privacy, promotes innovation, and ensures fair data use.

Dilemmas in Data Ownership:

“Dilemmas in data ownership” by DALL-E

The evolution of data ownership introduces many ethical dilemmas that challenge how we understand privacy, consent, bias, representation, transparency, and accountability. Solutions to these challenges are as varied as the dilemmas themselves.

Privacy and Consent: Cloud-based data storage and processing have raised significant privacy concerns. The risk of data breaches and unauthorised access highlights the need for strong privacy protections. Despite legal frameworks like the Consumer Privacy Act [2] aiming to protect privacy, keeping up with technology is a constant challenge. Consent is crucial, as users should know what data is collected and its use. Yet, the volume of data and complex ecosystems make it hard for users to have true informed consent.

Bias and Representation: Using large datasets for progress has put a spotlight on data bias. Algorithms might perpetuate biases from skewed data. This is not just a technical issue but an ethical one, needing a focus on diversity and fairness in data collection and design. Commons-based data management [3] can help address these biases by promoting more equal data use.

Transparency and Accountability: Cloud computing’s impact on customer services and internal procedures brings up transparency and accountability issues due to the ‘black box’ nature of algorithms. It’s often hard for users to see how their data is used or understand decision-making processes. This lack of transparency challenges accountability, especially when decisions have significant impacts.

Regulatory Compliance: The privacy concerns and regulatory compliance of cloud-based data warehousing add complexity. Organisations must navigate varying data protection regulations across jurisdictions, complicating operations. Despite the importance of privacy-preserving techniques, the decentralised nature of cloud data poses challenges [4].

Authentication and Authorization Mechanisms: Securing data warehousing in cloud networks requires strong authentication and authorization. Encryption and secure access protocols protect against unauthorised access and breaches. Whitelists and blacklists help organisations control data sharing, meeting legal and organisational needs [5].

Tackling these dilemmas needs efforts from everyone in the digital ecosystem. Collaboration is key to developing ethical frameworks that prioritise privacy, consent, and transparency, addressing biases and promoting fair data representation. The future of data ownership ethics relies on our ability to proactively navigate these challenges together.

Regulations and Policies in Data Ownership:

“Regulations and Policies in Data Ownership” by DALL-E

The complex ethical dilemmas of data ownership have necessitated the development of various regulations and policies worldwide. These legal frameworks aim to balance the rights and interests of individuals, businesses, and governments, promoting a digital ecosystem that is secure, equitable, and respectful of privacy. Among these, the General Data Protection Regulation (GDPR) in the European Union stands out as a landmark policy, influencing global data protection standards.

General Data Protection Regulation (GDPR): Introduced in May 2018, the GDPR has significantly reshaped data protection practices across the European Union and beyond. It emphasises the principles of consent and transparency, granting individuals control over their personal data. The GDPR requires organisations to obtain explicit consent for data collection and processing, ensuring that users are informed about how their data is used. It also introduces the right to data portability and the right to be forgotten, empowering individuals to have more control over their digital footprints​​.

Consumer Privacy Act and Beyond: Similar to the GDPR, various jurisdictions have enacted their own privacy laws, such as the California Consumer Privacy Act (CCPA) in the United States. These regulations share the common goal of enhancing data privacy and protection, though the specifics may vary. They typically include provisions for greater transparency, the right to access personal information, and the right to request the deletion of personal data​​.

Digital Governance and Data Sharing: Beyond individual privacy protections, there’s a growing emphasis on digital governance and the responsible sharing of data. The concept of data as a public good, especially non-personal data, has led to discussions around commons-based data management. This approach seeks to facilitate data sharing and innovation while ensuring that data remains a resource accessible to all, not monopolised by a few. It reflects a shift from ownership to stewardship, where data is managed for the collective benefit​​.

Legal Frameworks for Cloud Data: As cloud computing becomes integral to modern IT infrastructure, legal frameworks have evolved to address the security and privacy challenges inherent to cloud-based data warehousing. These regulations cover aspects such as data breaches, malware attacks, and unauthorised data access, aiming to create a secure cloud environment for both private and government organisations. Contractual agreements and data ownership clauses play crucial roles in mitigating risks associated with cloud data​​.

Future Directions and Global Impact: The influence of GDPR has sparked a global conversation on data protection, leading to the adoption of similar regulations in other countries. This trend towards harmonisation reflects a growing consensus on the importance of data privacy and the ethical use of data. However, the dynamic nature of technology and the emergence of new data-driven innovations, such as LLMs and AI, will continue to challenge existing legal frameworks. As such, ongoing dialogue and adaptation will be essential to ensure that regulations keep pace with technological advancements.

The landscape of regulations and policies around data ownership and protection is continually evolving, driven by technological advances and changing societal values. These frameworks form the backbone of our efforts to navigate the ethical complexities of the digital age, ensuring that innovation progresses hand in hand with respect for individual rights and equitable access to data. As we look ahead, the collaboration between governments, businesses, and individuals will be paramount in refining and expanding these legal protections to meet the challenges of tomorrow.

Data Ownership in the European Union:

The European Union (EU) stands as a prime example of how regional governance and legal frameworks address the complexities of data ownership, privacy, and digital ethics. At the heart of the EU’s approach is the General Data Protection Regulation (GDPR) [1], a robust data protection model that has set a global standard for privacy and data management.

European Union: Leading in Global Data Protection

The EU’s strategy for data ownership and privacy, driven by the GDPR [1], has had a profound impact worldwide. The GDPR’s introduction marked a pivotal shift in data protection policies, emphasising user consent, data portability, and the right to be forgotten. This comprehensive approach to privacy has not only raised the bar for data protection but has also influenced international companies and non-EU countries to consider adopting similar measures.

Digital Governance and Cross-Border Data Flows

The EU is actively developing digital governance frameworks to facilitate seamless cross-border data flows, ensuring that high privacy standards are maintained. Initiatives like the Digital Single Market are aimed at removing obstacles to data sharing across EU member states, thus promoting innovation and economic growth within a secure and trustworthy digital ecosystem.

Emerging Technologies and Ethical Considerations

As emerging technologies like LLMs and AI continue to evolve, the EU is at the forefront of integrating ethical considerations into their development and deployment. This involves establishing ethical guidelines for AI and investing in research focused on responsible AI practices, ensuring that technological advancements align with the EU’s core values of dignity, fairness, and privacy.

The EU exemplifies the challenging yet rewarding path of navigating data ownership in the digital era. Its approach underscores the importance of robust legal frameworks, sector-wide collaboration, and a strong commitment to ethical principles in leveraging data for societal gain. As the EU continues to adapt to new technological developments and global data trends, its experiences and models provide invaluable insights for the global community.

Solutions and Best Practices for Addressing Data Ownership Challenges

The evolving regulations in the European Union (EU) underline the need for varied strategies to manage data ownership challenges effectively. Here, we try to outline solutions and best practices developed to ensure responsible data management and ethical data use.

Typed and Scenario-Based Data Protection: It’s vital to recognize the diversity of data types and their use. The EU’s GDPR [1] provides a versatile framework for protecting personal data across different digital scenarios. This strategy involves distinguishing between various data types, such as non-public corporate data, which might need trade secret protection, and data on public platforms that might require specific rights protection.

Proactive Data Sharing and Protection: Entities must proactively foster a data-sharing culture that respects both ownership and privacy. Inspired by the GDPR [1], organisations are increasingly adopting measures for controlled data disclosure. This includes creating whitelists and blacklists to manage data sharing in line with organisational preferences and legal requirements.

Strengthening Legal Frameworks and Compliance: As digital innovations advance, legal frameworks governing data use must also evolve. The GDPR [1] and similar global regulations are part of continuous efforts to tackle new digital challenges. Compliance demands an ongoing adaptation from all involved, from large corporations to small cloud-based businesses.

Enhancing Transparency and Accountability: Trust in digital ecosystems hinges on transparency and accountability. Beyond meeting legal standards, it’s important for stakeholders to understand how their data is used and safeguarded. Auditing and reporting mechanisms can clarify algorithms’ “black box” nature, shedding light on decision-making processes and personal data usage.

Fostering Collaboration for Data Governance: Effective data governance requires collaboration across government, business, and civil society. Developing standards and protocols that enable data sharing while protecting sensitive info is crucial. Such collaborative efforts are key to leveraging data for the public good, prioritising access over ownership to drive innovation and societal benefits.

Navigating the complex landscape of data ownership in the digital age, although challenging, is achievable through strategic solutions and best practices. Learning from the European Union’s approach provides valuable insights into managing data rights, privacy, and governance effectively, ensuring technology’s role in promoting the greater good.

Building an Ethical Framework:

As technology evolves and our reliance on data grows, we face new challenges in data ownership. Addressing these requires proactive measures, focusing on ethics, governance, and innovation, especially with technologies like Large Language Models (LLMs) and AI.

Anticipating Ethical Dilemmas: The first step is anticipating ethical dilemmas that new technologies might bring. This means continuous research and discussions among technologists, ethicists, and policymakers to spot potential privacy, bias, and transparency issues, particularly how LLMs use personal data. Scenario planning and ethical simulations can help prepare for these challenges.

Establishing Ethical Principles: A proactive data ownership framework should be built on ethical principles like fairness, accountability, transparency, and respect for individual autonomy. These should guide AI and technology use, ensuring societal benefits and avoiding privacy erosion. GDPR [1] offers a good consent and transparency model, but we also need AI-specific principles like explicability and non-maleficence.

Adaptive Governance Models: We need governance models that can keep up with tech advancements. This includes dynamic legal frameworks and establishing oversight bodies to evaluate AI and data practices continuously. These bodies should work on a global scale, setting international data ethics standards and ensuring ethical guideline compliance.

Innovation in Data Management and Sharing: New data management and sharing models must prioritise ethical considerations. Exploring decentralized governance structures, like blockchain and federated learning, could provide secure, transparent data sharing methods, potentially changing data ownership models and improving privacy.

Education and Empowerment: Educating stakeholders about data ethics and AI impacts is crucial. This means not just technical training for developers but also public campaigns to inform people about their data rights and digital privacy protection.

Encouraging Ethical Innovation: Promoting an environment that values ethical innovation is essential. Supporting R&D efforts that address social challenges through technology, considering ethical implications, and offering incentives for ethical AI solutions can encourage a focus on ethics in development.

Facing the future challenges in data ownership demands collective effort to establish a strong ethical framework. By preparing for ethical dilemmas, setting clear principles, creating adaptive governance, fostering innovation, educating stakeholders, and promoting ethical innovation, we can ensure AI and technology advancements benefit society, protect rights, and support a fair digital future.

Standing at a pivotal moment of digital transformation, the push for ethical data science and ownership brings significant challenges and opportunities. This exploration has highlighted the complexities of managing data ownership amidst the rise of technologies like LLMs and AI. It’s clear that proactive, principled measures are essential to protect privacy, maintain transparency, and ensure fairness in the digital space.

A Collective Effort: Ethical data science is a collective journey requiring active participation from governments, businesses, technologists, ethicists, and the public. Each plays a critical role in creating a digital ecosystem that balances individual rights with innovation and societal advancement.

Key Principles for Ethical Data Ownership:

Anticipation and Adaptation: Keeping pace with ethical dilemmas necessitates ongoing vigilance and the ability to adapt. Our ethical frameworks, governance models, and regulations must evolve with technological advancements to meet current and future challenges.

Education and Engagement: Raising awareness and understanding of data ownership complexities is vital. Informing and empowering society about data ethics enables individuals to advocate for their rights and interests effectively.

Innovation with Integrity: We must encourage innovation that respects ethical principles and societal values. Ethical innovation is crucial for leveraging AI and data technologies to meet global challenges without sacrificing our core values.

A Call to Action: Moving forward requires active engagement and collaboration. We’re encouraged to participate in data science discussions, share insights, address challenges, and work together on ethical solutions. Open dialogue, shared standards, and a dedication to ethical progress are key to ensuring that data science continues to benefit society.

Vision for the Future: We look towards a future where ethical data use and ownership are foundational to a digital landscape marked by trust, fairness, and innovation. In this future, AI and LLM advancements are managed responsibly, enhancing human capabilities while aligning with our values. By addressing ethical challenges proactively and nurturing an ethical data science culture, we aim to create a more equitable, transparent, and inclusive digital world.

References

  1. General Data Protection Regulation (GDPR)
  2. California Consumer Privacy Act (CCPA)
  3. Fia, Tommaso. “An Alternative to Data Ownership: Managing Access to Non-Personal Data through the Commons.” Global Jurist, 2021, 21(1): 181–210.
  4. Zuziak, M K, Hinrichs, O, Abdrassulova, A, et al. “Data Collaboratives with the Use of Decentralised Learning.” Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, 2023: 615–625.
  5. Hummel, P, Braun, M, Dabrock, P. “Own data? Ethical reflections on data ownership.” Philosophy & Technology, 2021, 34(3): 545–572.
  6. Fadler, M, Legner, C. “Data ownership revisited: clarifying data accountabilities in times of big data and analytics.” Journal of Business Analytics, 2022, 5(1): 123–139.
  7. Baijens, J, Helms, R W, Velstra, T. “Towards a framework for data analytics governance mechanisms.” 2020.
  8. Hart, D. “Ownership as an Issue in Data and Information Sharing: a philosophically based review.” Australasian Journal of Information Systems, 2002, 10(1).
  9. Chandra, S, Verma, S. “Big data and sustainable consumption: a review and research agenda.” Vision, 2023, 27(1): 11–23.
  10. Zhan, Y., Tan, K.H., Li, Y. et al. “Unlocking the power of big data in new product development.” Annals of Operations Research, 2018, 270: 577–595.
  11. Robinson, D., Yu, H., Zeller, W.P., Felten, E.W. “Government Data and the Invisible Hand.” Yale Journal of Law & Technology, 2008.
  12. Mikalef, P., Pappas, I.O., Krogstie, J. et al. “Big data analytics capabilities: a systematic literature review and research agenda.” Information Systems and E-Business Management, 2018, 16: 547–578.

--

--