The near future of digital profiling

Yangyang Ding
Nov 30, 2020 · 15 min read

Thesis link

Application link

Digital profiling is the process of gathering and analyzing information that exists online about an individual. Platforms who provide services always have control of this powerful tool. As a result, I used Twitter API as the principle medium to conduct ‘digital profiling’ as a third party. The visual identity of the new digital profile is not an ads interest list anymore, but a graph that stores personal information that could be used as an avatar. On top of the new visual identity, I speculated several possible applications of the new visual outcome. The idea put forward in this thesis is that shifting the purpose of digital profiling toward being human-centered rather than advertising-driven may draw worthwhile arguments about the practicality and policy issues. This thesis is a speculative design project, an avatar design project, a UX design project, a generative design project, and an information design project. My point is multiple mediums to demonstrate my future vision or wish for this technology and propose a mutually beneficial strategy for tech companies.

We are currently facing potent social and economic shifts. Covid-19 is not necessarily the protagonist of these shifts but rather a catalyst. It is accelerating the process of transforming offline life towards heightened cyber connection. The most significant change right now is working from home and the homebody economy has suddenly become the near-real future. In addition, there are lots of other shifts happening online that average users are unable to perceive and resonate with since they are a hidden, abstract, discrete, and changing phenomenon.

At the front-end of the internet, cheaper, smaller, and faster chips cybernate devices, bodies, and places to construct the ‘Internet of Everything’. Meanwhile, increasing features like cookies and the ‘Like’ button are designed to help track data. At the back-end of the internet, digital profiling is intensively developed. Subjects related to digital profiling, such as data collection, data processing, algorithms, and applications, are well developed for activities such as advertising.

It’s more important to understand the relationship between the individuals and the system and how to leverage the balance. Opting out of the system is one utopic solution, although it’s not practical. Unveiling the supply and demand of personal data: More shared data piped into algorithms improve performance, better algorithm performance leads to higher advertising revenue and better-customized services, and higher revenue increases investment in algorithm development. Shoshana Zuboff used the word ‘reciprocity’ 16 times in her article ‘Big Other’ to describe multiple levels of relationship between individuals and firms. The current situation is one of asymmetric reciprocity, as described in Varian’s claim: exchanging private information for new information and communication tools, which are essential requirements for social participation. From an individual’s perspective, sharing data will cause issues such as data privacy, secondary usage, and targeted pricing. There are direct and indirect benefits and costs attached to sharing personal data.

Looking back in history, there is a spectrum between two extreme conditions: Intact privacy on the left, and no privacy on the right. The ‘current’ condition slowly moves from left to right. Its pace of seesawing from one to the other is getting faster. (Does it also follow Moore’s law?) If full data privacy becomes a pseudo-proposition in the future and blockchain-based digital identity like SoLid and interoperability will not be realized in the foreseeable future, then what should we expect for the near future? Research into the current tendencies from individual users and companies says that data collection is inexorable, but will become more permeative in the future.

Digital profiling

The digital profile is essentially the string of 1s and 0s that represent you in the systems. The current visual representation is seen on the ad setting page. Google, Facebook, Instagram have different user interfaces. However, they all point to the reality that our digital profile is an ad interest list based on our online trace and activities.

It would be naive, but understandable, to think that we have control over this profile in the first place. We have some control over what we agree to share, however, that is just the tip of the iceberg. It’s not hard to imagine that by merely building connections between each piece of data, you could infer behavioral patterns. Even more so when algorithms take over this task. The map “Three layers of your digital profile” done by Panoptykon is an onion-like map that reveals the hidden layers behind digital profiling.

Three layers of your digital profile by Panoptykon

The first layer is the one you control or trigger. It includes your profile information, your posts, likes, search queries, and other types of personal interactions. In other words, it is your online trace. The second layer is one step further. It consists of your behavioral patterns like your typing speed, mouse movements, location pattern, voice recognition results, internet connection frequency, etc. These are not conscious choices you make, but rather the metadata which is embedded with the data you shared. The third layer is composed of the interpretation of the first two layers. Your data will be preprocessed to remove noise and reduce complexity. After that, the data will be analyzed with multiple algorithms. There will be connections built between each piece of data to form specific patterns, and a comparison with other users will be conducted to help evaluate the relevance and validity.

There are seven steps in the profiling process: Preliminary grounding, data collection, data preparation, data mining, interpretation, application, and institutional decision. The purpose of profiling is as follows: “it is not merely a matter of computerized pattern-recognition; it enables refined price-discrimination, targeted servicing, fraud detection, and extensive social sorting.” From the perspective of tech companies, data collection helps them build profitable profiles that could help them link advertising to the targeted group of people. ‘Internet’ delivers people, ‘you are the end product.’

Commercial television delivers 20 million people a minute. In commercial broadcasting the viewer pays for the privilege of having himself sold.
It is the consumer who is consumed.
You are the product of t.v.
You are delivered to the advertiser who is the customer.
He consumes you.
The viewer is not responsible for programming —
You are the end product.

Richard Serra “Television Delivers People” (1973)

In the near future, I do not expect any fundamental change in digital profiling. Preliminary grounding identifies the goals of the analysis. It will remain profit-oriented. There are already changes happening outside the digital profiling occurring in law. According to data protection regulation, GDPR requires companies to bring more transparency into tracking and profiling. Thus, before blockchain-based digital identity and interoperability are realized, we could expect increasing transparency in digital profiling and a sense of control over personal data. Other than transparency and access to the digital profile, is it also plausible to expect feedback, like psychological mind-mapping, that could benefit users to some degree?

Confirmation bias

Confirmation bias is a state of intellectual isolation, people tend to resonate with information that helps confirm and enhance their beliefs or hypotheses. But such a phenomenon had been studied long before this term came out in history, which was long before the internet and algorithms were even invented.

This is what filter bubbles actually look like — MIT

‘Filter bubble’ is one book that critiques algorithms that are enhancing confirmation bias. I agree with this opinion because the diminishing incentive of the subject is complemented by the algorithms. One used to search for information to confirm one’s belief proactively. However now, one needs to receive the information recommended by algorithms reactively. The book ‘We are data’ brought up the idea of ‘data derivatives’ to critique algorithms that extrapolate the future based on the present and the past. The book ‘Why We’re Polarized,’ written by Ezra Klein, provides a political perspective. Media, Congress, candidates, journalists, and voters will form a system in which there’s a feedback loop that accelerates the process of adopting more polarized strategies. Political polarization is just one of the associated effects and outcomes of confirmation bias.

Algorithms enhance the confirmation bias. It also causes bad user experiences. There have been complaints about content fatigue, which is caused by an algorithm constantly pushing similar content. When Twitter applied the ‘while you were away’ algorithm, it pushed the ‘best tweets’ to the top of feeds, which caused complaints about restructuring timelines. There is a button to opt-out from algorithms to have a regular timeline. There is also an option to avoid content fatigue, which is tapping the post/feed and selecting ‘not interested’. What is getting peculiar here is that if you want to prevent content fatigue, you have to give a click on ‘not interested’ to something overwhelming your feeds, which is your interest.

What’s more, diversity matters for confirmation bias. There are two major recommending algorithms which are collaborative filtering and content-based filtering. Current algorithms combine one or more filtering approaches into a hybrid system. Different platforms will adapt to or favor one of the filtering approaches according to their strategy to differentiate products. One crucial measure beyond accuracy for evaluating algorithms is ‘diversity.’ Though the hybrid system allows various recommendation algorithms, whether it effectively delivers diverse content is whole another thing. The problem of finding the right mix for sequential consumption-based recommenders … individually adjusting the right level of diversification versus accuracy tradeoff.

There is very little feedback conveyed from the platform to users, needless to say, feedback for confirmation bias. The feedback discussed here contains two layers: behavioral feedback and data analysis feedback. Eli Praiser(an author, activist, and entrepreneur focused on technology and media democracy) mentioned in his Ted Talk that structured physical spaces give people social feedback. But in the online environment, other than plain text with emoticons, there is a lack of body language and expression generated by other users, and it wasn’t until recently that a dislike button was added on Facebook. Shoshana Zuboff mentioned in his article ‘Big Other’ that tech companies “eliminate the need for, or the possibility of, feedback loops between the firm and its populations.” The reason is to separate subjective meaning (revenue) from the objective results (profiling). This is an asymmetric reciprocal relationship between users and firms which is reflected “in the fact that the typical user has little or no knowledge of Google’s business operations, the full range of personal data that they contribute to Google’s servers, the retention of that data, or how the data is instrumentalized and monetized.” Furthermore, concerns around using algorithm modifying behavior emerged after Facebook’s emotional contagion experiment was revealed to the public.

This essay is not trying to introduce a method to get rid of confirmation bias (which is incapable of ourselves) but rather to propose the question of what we can do to nudge the situation. A good feedback loop might help change your behavior. Does the internet user need some feedback about their confirmation bias from the platform? How can information be designed to offer feedback for users which could intrigue them to conduct proactive searching? Are some of Spotify’s algorithms excellent examples of paradigm shifts since it allows users to steer slightly out of their comfort zone and expand their music listening?

What I have built?

In a nutshell, lack of control from the user’s side, lack of transparency in data collecting and processing, and lack of feedback loop for confirmation bias are the three main issues in digital profiling. As mentioned before: the idea of stopping data collection and changing the goal of digital profiling may not be easy to realize because digital profiling serves as monetizing traffic based on algorithms. The degree of transparency needs more laws to level up. Although there are options for people who realize their confirmation bias to opt-out, the political correctness of the existence of those options is more important than their practicality.

Mimesis starts by modifying the preliminary grounding, which is the first step of digital profiling, into constructing a feedback loop for users. The fundamental logic is to reuse existing data to conduct digital profiling for the second time. It falls in line with the concept brought up in SoLid: reusing existing data. Since some platforms offer public and friendly API, it is possible to reproduce others’ ‘digital profiling’ as a third party.

Platform: Twitter
Account: GigiHadid
Count: 116
Time range: 2018–2020
Mimesis 1.0
Tree Ring
LintonArt Shop

In this project, Twitter is the central platform used. Twitter API has three aggregated streams of Tweets, which are home timeline, user timeline, and mention timeline. The home timeline consists of retweets and the user’s tweets, which represent the user’s thoughts. Mimesis 1.0 pulls tweets from the Gigi Hadid home timeline and then pipes tweets into a sentiment analysis model which will spit out 1,0,-1 as sentimental results meaning positive, neutral, and negative. After that, it uses Processing to match these digits to green, red, and blue and draws circles from inside to the outside. Similar to how a tree ring stores data about climate and atmospheric conditions, Mimesis 1.0 stores personal sentiment data. It is a visual representation of the internet user. It is an avatar, as well as a simulacrum. Mimesis 1.0 meets the requirement of substituting the current visual of digital profiling. The result of sentiment analysis is a representation of how users’ emotions flow in their expression. But it has a relatively loose connection with confirmation bias.

‘It is the reflection of a basic reality. It masks and perverts a basic reality. It masks the absence of a basic reality. It bears no relation to any reality whatever: it is its own pure simulacrum.’

— Jean Baudrillard: Simula- cra and Simulations

Mimesis 1.0

Thomas Goetz(co-founder of Lodine) mentioned that a feedback loop involves four distinct stages: evidence, relevance, consequence, and operation. The data comes in first: A behavior has to be calculated, recorded, and processed. Second, the information must be conveyed to the user in a way that makes it emotionally resonant, not in the raw-data form it was collected in. But if we don’t know what to make of it, even persuasive evidence is useless, so we need a third stage: consequence. The knowledge needs to illuminate one or more paths forward. Finally, a clear moment must come when the person can recalibrate a behavior, make a decision, and act. Then the action is evaluated, and the feedback loop can run again; each step triggers new habits that get us closer to our objectives.

Data Distribution
Mimesis 1.0

All data collected from the users after being processed is conveyed back to the user. Mimesis 1.0 did manage to complete the first two stages: providing users with behavior evidence which they can resonate with. It did illuminate the past pattern and left the future paths, such as being more positive or negative to the users. However, it is unable to break the loop of confirmation bias because the message unveiled in the graph is not related to diversity. Diversity of perspective matters in confirmation bias. If there is a visual representation of confirmation bias, it will be the ‘filter bubble.’ The color inside the bubble has much less diversity. Thus, what Mimesis needs to illuminate is the information relevant to the diversity of perspectives.

Mimesis 2.0 is the speculative result based on the same process as Mimesis 1.0, which is extracting data, conducting analysis, and drawing the result. The difference between 1.0 and 2.0 is not only replacing the sentiment model with the cluster analysis model but also much higher complexity during the process. The MobileNet model in RunwayML (a democratized machine learning tool for artists and designers) is the one that I have used and tested for 2.0. Since low fidelity models perform similar to the required cluster analysis model, which needs to be trained by enormous labeled datasets, the list of which will refer to the popular synsets in Imagenet. After data is piped into the model, it generates several clusters. The names of these clusters are subsets of the labels in the data set, which will be mapped to colors.

Color mapping List resource-imagenet

There are a lot of details in the process unresolved in Mimesis 2.0. What is the definition of diversity? What is the benchmark of adequate diversities? For example, one user is mainly interested in animals, but the information about animals in his feed has depth and breadth. The other user has a wide range of interests, but the information about each topic lacks depth. Which scenario counts as having more diversities? At the same time, how does Mimesis 2.0 transform when the case changes from scenario 1 to scenario 2, and vise versa? Moreover, what is the process of color mapping? Is the result of mapping based on some degree of universal agreement? Can users decide how colors are mapped? Although there are two variables in color, saturation, and variety, that could be mapped to depth and breadth, what are the criteria for mapping? Have all the questions above fallen in the realm of how to design a compatible label system for the stream of diverse information?

The ideal Mimesis should have a wide range of data input that could represent the user’s epistemology and ideology. Input data should be analyzed by a customized cluster analysis model which will be trained by a specifically designed training data set. In other words, this model should generate neither clusters that are too general nor clusters that are too scattered. The system should find the balance between the number of colors being used and an appropriate level of complexity that users can easily comprehend. The complexity of ideal Mimesis outweighs my imagination. But Mimesis 1.0 tests and proves the logic and process, and Mimesis 2.0 is low-resolution speculation for the ultimate program. Thus, I believe it is manageable. Plus, tech companies already have ‘Mimesis2.0’ which is the ad interest list. It just needs some adjustment and one more step — drawing.


This is an information design project. This is an avatar design project. This is a UX design project. This is a generative design project. This is also a speculative design project.

There are multiple ways to define this project. Different definitions are pursuing different goals. I believe there are two major goals (based on Emotional Design) in a design project: changing perception and changing behavior.

Perception and behavior are tightly interconnected. Tim Brown from IDEO suggested designing simple digital tools to provide feedback as an additional tip to nudge people into new behaviors. Thus, likely, the best way to change perception is to let users use the Mimesis rather than an informative one-time report. There are multiple ways to convert information design outcomes into tools for other activities. Pantone linked global environmental issues with the color of 2019. Designers, artists, and industries will keep reminding ordinary people of the issues when they keep applying color in their work. Mimesis shares the same idea of generating customized results based on one’s information with the Casa da Música logo generator.

Casa da Música logo generator

Mimesis 1.0, a customized graph, could be interpreted as an avatar. The applied platform is Tinder. You can choose to connect with Mimesis like Snapchat and Instagram photos. From a user’s perspective, bridging Snapchat and Instagram with Tinder increases the plurality of information, though it might turn into a concern for some users. Mimesis shares the same idea offering Tinder users an option to show more about themselves. On top of that, Mimesis provides an alternative for matching algorithms. In color practice, there are multiple choices to collocate different colors. What if matching algorithms could use Mimesis as a source to incorporate color combination theory to expand users’ options? Users’ cards will be pushed to others based not only on age, distance, and gender preferences but also on their Mimesis matches.

Mimesis 1.0

For Mimesis 2.0 and the ideal Mimesis, the speculative application happens on platforms functioning as user’s information resources. In this scenario, the demonstration executes on Instagram’s home page, which will be used as the background of the app. There are two different stages of the Mimesis. In the first stage, colors won’t change but rather they’ll slowly float. The second stage will be triggered by posts that are liked by the user. The new liked post will be analyzed by the system, and the analysis outcome will decide to increase, decrease, or change colors. Referring back to the four stages of the feedback loop: evidence, relevance, consequence, and operation, Mimesis embedded in the background functions as the relevant evidence of the user’s liked posts. When users’ likes are limited to a few categories of posts, the diversity of colors in Mimesis will diminish. This builds the consequence stage, which informs users of their ‘bubble.’ And finally, the ‘bubble’ nudges users to operate and completes the feedback loop.

Finally, the speculative scenarios outlined here need to be proof of concept that could generate revenues through further experimentation, research, and development of policy.

The idea put forward in this essay, that shifting the purpose of digital profiling as a third party, may draw worthwhile arguments about the practicality and policy issues. My point is to demonstrate my future visions or wishes for this technology and to propose a mutually beneficial strategy for tech companies.

The Startup

Get smarter at building your thing. Join The Startup’s +787K followers.

Sign up for Top 10 Stories

By The Startup

Get smarter at building your thing. Subscribe to receive The Startup's top 10 most read stories — delivered straight into your inbox, once a week. Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +787K followers.

Yangyang Ding

Written by


The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +787K followers.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store