Subjective Data

I’m an artist, musician, and data scientist. My practice is centered in AI, particularly through data sonification, natural language processing, and generative music. For a few years now I’ve been working on an algorithm called TransProse, which identifies emotions in a piece of text and translates it into a musical piece with a similar undertone.

Through my work on emotions in AI, I’ve become particularly interested in the subjectivity of data, and have recently started further research into this area. Studying subjective data means studying things like:

Emotions and other subjective experiences: How do we incorporate these into AI, especially as AI starts moving towards more complex and personal areas? How can we capture abstractions like emotions in datasets, and how can we model things like personality and life experience?

Identifying subjectivity in “objective” datasets: If we examine commonly used machine learning datasets, what types of subjectivity do we find, especially when we look at how the datasets were created, what the motivations and demographics of the taggers are (if taggers are present), where the data comes from (if not manually tagged), who funded the dataset, what the dataset leaves out, where the dataset breaks down, etc.?

“Artisanal data”: In the current AI landscape, particularly related to art, it is possible to create and use explicitly subjective datasets for artistic purposes with no claims to objectivity/completeness/usefulness. (This, to me, is interesting and positive) Two of my favorites are Shinseungback Kimyonghun’s Animal Classifier and Sebastien Schmieg’s Decision Space.

Bias retention over time: A culture’s current set of values is a type of bias and is inherently present in datasets. How can we avoid bias retention over time, and is it possible to update datasets with newer values?

Terminology: All the above issues require labels, terms, and vocabulary to talk about them more concretely. How can we create these and use them with regularity? Would it make sense to label the ingredients of our datasets like we label the ingredients of our food (data marinara, anyone?)

Check back here and at dataobscura.org for more!


Data Obscura is Hannah Davis, mykola bilokonsky, and collaborators investigating liminal spaces in the digital landscape.