This cover represents the most frequently used terms in this article, their relationships and their musicality.

GDPR and Data Visualization: When Design Meets Compliance

Published in

Dataveyes Stories

12 min readDec 23, 2019

(A French version of this article is available here)

Data visualization can serve GDPR compliance: to improve interfaces for the general public, to map the digital assets of businesses, and even to support the audit of algorithms. That’s why data visualization must look to improve interactions between humans and data. Here’s our feedback on uses and best practices.

The General Data Protection Regulation (GDPR), which came into force in May 2018, amended the rules for the management of personal data in European Union countries. It aims to bring more transparency to citizens, strengthening their rights over the use of their personal data, and making the various players processing their data more accountable.

The GDPR not only introduces technical challenges, related to anonymization or data storage, but also design challenges. The design of an interface can influence users’ ability to consent to data collection: it can either help them make an informed choice, or, on the contrary, divert them.

Data visualization is an integral part of interface design. Through visuals and interactions, the translated data becomes easier to grasp and understand. As such, it has a role to play in building relationships that are more respectful of citizens’ rights and liberties.

The context is favorable. Six months after the GDPR was implemented, the CNIL (National Commission on Informatics and Liberty) carried out a survey in partnership with Ifop (French Institute of Public Opinion). The result: out of 1,003 respondents, 46% had already noticed the misuse of their personal data, and 66% said they were more sensitive than before to the protection of their personal data.

1. Leveraging data visualization to ensure GDPR compliance

1.1 Improving the interface of services collecting personal data

The GDPR entrusts actors involved in data collection: it’s up to them to design collection methods that don’t fool users, and clearly inform them of potential implications. The regulation states that an application or a website that collects personal data must ensure the users are in a position to give, or not, their informed consent. For this to happen, users need to understand how their data is used and where it comes from.

Using data visualization in the design process of an application or a website can help achieve this goal. Visual storytelling and interactions are key to helping users understand the information that lies in their data. They become able to grasp more complex concepts. The data reveals itself and becomes easier to apprehend.

*My Companion* by Dataveyes: an application to understand and control power consumption

We have worked on a prototype application connected to the data of a smart meter, to help the general public better understand how to monitor their electricity consumption. We started with comments and feedback from the inhabitants of newly built homes, equipped with electric sensors and smart regulators. They had feelings of intricacy, fear of disruption, doubts about their capabilities, difficulties to read the data, etc. Our prototype solves these problems by leveraging human-data design and interactions: visualizations facilitating the monitoring of electricity consumption over time, a notification-based approach, a gamification system with reward badges, and even a comparison with the consumption of the neighborhood — based on the principles of green nudges. Since smart meter companies often deprive users of useful information, by not giving them access to raw data or informative graphs, we wanted to show how to empower citizens through design.

An educational simulator for understanding energy flexibility, by Dataveyes

Here’s another of our projects: an educational simulator to explain the concept of energy flexibility, using data from sensors installed in test homes. Users can visualize data around a clock and interact with the home’s regulatory systems, following different scenarios. This way, we facilitate the user’s understanding of the complex electrical control system, and make it more transparent.

As we can see, GDPR’s deployment is as much an ordinance as it is an opportunity: the one to build better mediating schemes between users and data.

1.2 Mapping the collected data

Data visualization can help raise awareness on the magnitude of the data being collected.

First, for the general public: the striking clarity that can emerge from a data visualization is effective in making users more aware of the data they share. OCR’s project “Behind the banner” for Adobe, Mozilla’s “Lighbeam” or the CNIL’s “CookiesViz” are such examples: they explain the process of targeted advertising through data storytelling, and encourage citizens to take an interest in it.

*Behind the Banner* by OCR: a data visualization experiment to explain targeted advertising

Secondly, for companies that collect personal data: they are now obligated to have a clear vision of their databases and processes in order to verify their compliance with GDPR, and conduct impact analyses. Visualization tools can help them, such as Datagalaxy or Dawizz. These tools can label and map their data in meaningful ways, with visuals such as treemaps or social graphs. They contribute to better data understanding, a better ability to inventory it, and better governance. To achieve objectives that are hardly solved by relying on lengthy or tedious technical documentation, data visualization introduces a visual language understood by all members of the company.

1.3 Making data processing algorithms auditable

Finally, data visualization can be a valuable tool for auditing data processing algorithms and ensure their GDPR compliance. Especially when these algorithms involve machine learning: their inner workings is often opaque, and even referenced as black boxes by data scientists themselves — especially for neural networks. They can introduce biases and sometimes lead to discrimination, if left unaudited. For example, yield management algorithms that are used in dynamic pricing systems use many parameters, including personal data such as your postal code, in order to optimize sales of products or services.

In this case, data visualization can be used to make these algorithms and their results more tangible, explain how they transform data, and help identify biases.

*Affinity Index* by Dataveyes: an application to make the Outbrain’s algorithms more legible.

We’ve made such an example of a visual tool, called “Affinity Index”, created for the sponsored content recommendation specialist Outbrain. The aim of this project was to clarify the behavior of algorithms that were perceived as opaque: the ones that allocate sponsored articles in an inventory of available spaces, taking into account the reader’s affinities. We have visualized the results collected over a year, and, using colored bubbles, we made the recommendation dynamics visible.

2. Human-data interaction design and GDPR best practices

2.1 Adopting an iterative, user-centered design approach

Beyond these different uses of data visualization, practices from human-data interaction design can contribute to a greater alignment with GDPR’s ambitions.

Adopting a user-centered design approach, such as design thinking, helps create GDPR-compliant projects by considering end users from the start. And that’s a practice that goes hand in hand with the concept of privacy by design, present in the GDPR (data protection by design and by default). Thus, we recommend an iterative approach, guided by small-scale tests, prototypes, and user feedback. It makes it possible to verify, for example, that presenting certain data sets together doesn’t raise confidentiality problems, or that an offered service is comprehensible and valuable. Putting the product in the users’ hands right from the start is a motivation for the designers to beget confidence and support.

Working with data from the beginning of projects and prototyping visualizations early are also important, enabling a clear understanding of the data’s behavior. By testing several visualization options, it’s possible to identify the ones that cause the least bias and highlight the most valuable information for users.

Finally, we recommend a modular and iterative approach to projects, so that applications and services remain easier to maintain, as the user’s consent is likely to evolve over time.

*Career mapping* by Dataveyes: an HR data visualization tool, serving the employees

Using these methods, we have designed a tool for the HR sector. Using data from job posts and internal role changes, we created an application that summarizes individual career histories and structures it into paths followed by employees, from one job to another within their company. Where the HR reporting tools are often simple dashboards with general statistical indicators, we have made a career guidance tool for employees. The application shows the career paths in the company, without any institutional filter or company bias, thus providing more transparency.

2.2. Making data collection explicable, progressive, and optional

It all comes down to the moment the users are asked to give consent and allow the collection of their personal data. In practice, they trade personal data for a service. From the moment the product is designed, they must be able to evaluate whether this trade is a win-win… or not.

When users are asked to provide or allow access to information, such as geolocation, a good practice is to explain what this information will be used for, and especially what won’t be available if they don’t consent. For example, without geolocation, a calendar will not send a reminder when it’s time to leave for an appointment. Without a viewing history, there will not be relevant suggestions for upcoming videos. With such explanations, the user is better equipped to assess the impact of a refusal to data collection.

Allowing users to progressively share their data is another good practice, asking for it only when it’s essential to their experience and not before. All too often, the sharing of personal data is a prerequisite for the use of services: it’s an all-or-nothing approach. The gradual sharing practice is consistent with the GDPR’s requirements to only collect useful data, ideally matching the exact purpose of its subsequent processing.

This also implies that applications can run in degraded modes, with features that don’t require access to personal data. For example, it’s entirely possible to use a mapping application without continuously sharing location: the “directions” functionality may be disabled, but this doesn’t prevent the search of a place. A privacy-friendly practice is therefore to design applications with a minimum set of features that don’t require any personal data, and advanced features taking advantage of user data.

When we undertook the “Crea Carte” project for Société Générale in 2015, we had these good practices in mind. Crea Carte was a competition on the theme of generative art. Participants were invited to create an aesthetic pattern from data. The users could submit their designs to the jury and try to win a prize, or simply add the design to the collaborative gallery, as a contribution to the collective artistic experience. During the process, the participants’ contact details were only asked if they wished to show their creation to the jury. All other features, constituting most of the experience, could be used anonymously. As for the data used as the base of the aesthetic pattern, we made sure they were unintrusive: they were spatial coordinates describing a gesture.

An example of the user’s progressive involvement in the Dataveyes “*Créa Carte*” experience

2.3. Improving the understanding of the data’s origin

Designing an application that respects the rights of users and provides value is not always enough for a good human-data mediation. It’s also useful for users to understand how the data is used, so that they can eventually question their consent. Not everyone knows about the technical intricacies of platforms such as Google or Facebook, which can feel like magic and lead to distrust.

In order to overcome this problem, we recommend making the data sources explicit, in all projects, even when the data is not personal nor confidential. This can be done by providing access to a paragraph or a page describing the data collection and processing methodology. Pictograms with tooltips are another option to display the data’s origin without cluttering the interface.

Providing context for the data presented also helps to understand its origin. For example, indicating the number of people interviewed in a survey, the date when their opinion was collected, and the exact questions they answered. Or specifying when the data is processed by an emotion recognition or term identification algorithm, as may be the case with the analysis of social network content.

Ideally, the source data should be accessible in open data, when it doesn’t raise confidentiality issues, in order to contribute to data transparency and offer users more opportunities to criticize its use.

2.4 Relying on user intelligence

We must finally ensure that users are cognisant of the information extracted from their data. We believe we should always consider that users are able to grasp complex information if they’re explained clearly, rather than presume it doesn’t interest them, or is out of their reach.

Thus, we prefer visualization modes that make the data tangible and, for example, assign it physical properties. All too often, the designers’ reflex is to simplify the data, in order not to frighten, by using simplified scores or composite indices that are easier to display. These KPIs (Key Performance Indicators), however, can be black boxes: they don’t have meaningful units, they are often cut off from any analysis context, and they aggregate too many dimensions. This makes their variation hard to interpret — or impossible. Under these conditions, it’s difficult to know which underlying personal data is being used and whether it was worth sharing them.

We recommend expressing the data in meaningful units as often as possible. For example, by displaying a number of people instead of a percentage. Or by showing probabilities assigned to categories rather than imposing a single verdict.

Transparency, purpose, relevance… if all the ingredients are there, it’s possible that users voluntarily entrust their personal data. To serve their individual interests, of course, but also to serve collective projects. The sharing of individual data can go beyond the relationship between consumers and digital service platforms, and become part of a voluntary, citizen initiative. This is already the case with quantified self applications where sharing a car’s journey, sharing individual pollution sensors or noise measures, make it possible to better document cities and to give their citizens new levers for action.

In summary: best practices in human-data interaction design to comply with the GDPR

Design projects by iterating, prototyping, with a user-centered approach.
Describe the data collection and processing methodologies, and provide access to source data.
Make the sharing of personal data progressive, desirable and explicable: it must be mutually beneficial.
Visualize, make the data tangible and interpretable.
Use meaningful indicators and avoid black-box-type composite indices.

Three projects designed with a thoughtful use of personal data

My neighborhood

While data collection is sometimes necessary at the individual level, it’s not necessarily the case for visualization. For the project “My neighborhood”, carried out for the Nexity real estate group, Dataveyes collected precise locations of a sample of inhabitants’ frequented places, but the application only displayed areas of density or distances. The value for the end user is not in a precise geolocation, but in the accessibility of services in a given area.

Scope Santé

This website gathers information to help the general public choose health institutions. The aim is to deduce which informations are relevant to users from their behavior on the website, without asking them to provide their medical situation on their profile (Facebook connect, a form, cookie retrieval, etc.). For example, the users who performed a hospital search with a “pediatric emergencies” filter will get specific information on pediatrics, without being identified at any time or asked for personal data.

Commute

The input data for these two experimental projects are audio files, recorded when users travel around the city to analyze the noise level they’re exposed to. These projects have the potential to be very intrusive, except that the audio signal is not stored as is but is immediately converted to frequency, volume, tone, and other signal dimensions. This way, it’s impossible to listen to what has been said or to find the places covered during the journeys.

GDPR and Data Visualization: When Design Meets Compliance

1. Leveraging data visualization to ensure GDPR compliance

2. Human-data interaction design and GDPR best practices

In summary: best practices in human-data interaction design to comply with the GDPR

Three projects designed with a thoughtful use of personal data

Written by Dataveyes