Designing a taxonomy for annotating creative works required making multiple decisions – in other words, performing bias. In my case, creating data required reducing rich artworks into keywords describing agents and their actions. What felt particularly challenging was to classify characters. Assigning gender, age, race or ethnicity to fictive figures or personas was in many ways problematic. At the same time, it forced us to return and rethink the notions of classes and attributes. Often, I simply wished I could externalize this component of classification to machine classifiers. Yet, the extensive task of logging artworks into the database was simultaneously helping me understand that machine vision trained to classify humans behaved in troubling ways.
My analysis of artworks or ‘artistic audits’ as I call it, eventually revealed that the lack of diversity in datasets resulted in misgendering machines
, that binary gendering leads to stereotypical classification
, and that machine vision continues to perpetuate
the Western colonial gaze. I believe machine vision should be understood as Intuition Machines
because datasets embed world views. Like human intuitions, harmful bias in machine vision leads to discrimination; and artworks are capable of showcasing how machine vision particularly discriminates
those already marginalized in society.
Machine Vision Datasets – Scrutinizing Images and Their Labels
Artistic research that critically examines visual datasets has pushed the Computer Vision community to revisit several influential datasets. Among widely-circulated examples are Trevor Paglen’s and Kate Crawford’s artwork ImageNet Roulette
and their accompanying essay excavating.ai
, which highlighted the problematic labeling of humans in multiple datasets. Artists Adam Harvey and Jules LaPlace’s project exposing.ai
, on the other hand, brought to attention how images in facial datasets are used without consent. To a great extent, such critiques have been directed to the content of visual datasets and their biased labeling, or how images are classified into arbitrary categories. Therefore, the responses to ‘fix’ mainly involved taking corrective measures on the problematic content of the datasets in question. Corrective practices such as increasing diversity by balancing demographics, erasing problematic categories, or removing publicly available datasets are sensible approaches, but address only part of the problem. Biases embedded in datasets already propagate through pre-trained models.
Both data curators and artists exhibiting content from datasets must engage with ethical questions about consent and reflect critically on how their work might inflict ‘data violence
’. Many of these questions resonate with critical archival studies
that can be helpful for both data curators and artists in the emerging field of critical dataset studies. When creating the artwork Suspicious Behavior, we used material from datasets assembled to detect anomaly behavior in the domain of video surveillance; and had to critically reflect on how we engage with these visual datasets.
Suspicious Behavior © esc medien kunst labor, CYBORG-SUBJECTS exhibition. Photo: Martin Gross.
Suspicious Behavior – Critique of Curating Datasets
is an artwork that invites readers to critically examine machine learning datasets from the perspective of an annotator, whose task is to attach labels to images. Artistic research for the project required a deep dive into the annotation practices of visual datasets. For the tutorial, we developed an interface mimicking designs that are used to facilitate speed and accuracy of outsourced annotators. Our speculative interface
asked the annotator to spot suspicious behavior in 10 second long video segments. One of the speculative training modules gives the user 60 seconds to accurately annotate as many videos as possible. The test ends with a report declaring whether the result qualifies the trainee for the job. It becomes clear that to earn a living wage, the annotator can afford to only glance at images. Where ‘the “glance” is the norm,’ as Nicolas Malevé suggests
, an annotation apparatus does not allow the questioning of how categories are made, or how classes are defined. Addressing this, Suspicious Behavior
became an exercise in understanding when bias is introduced into machine vision: A way to examine the interplay between data curators and annotators, and the circumstances in which labels are attached to images.
The annotation interface in Suspicious Behavior. Screenshot by KairUs.
Annotation report in Suspicious Behavior. Screenshot by KairUs.
My takeaway from both projects is that we need to critically examine the content of datasets to understand machine vision bias, yet even more importantly, we need to critically reflect on the processes of performing bias when we create datasets, as a nonbiased dataset can’t exist. As computer vision datasets scale up
from millions to billions of images, dataset curation requires increased automatization and classification becomes all the more externalized to machines. Yet, choices are still made, requiring us to perform bias. Those choices are more about operations and processes than categories and labels. It is thus equally important that we understand and reflect on the practices of curating datasets.
Linda Kronman is a media artist, designer, and is currently completing a PhD at the University of Bergen, researching how machine vision is represented in digital art as a part of the Machine Vision in Everyday Life project. Kronman is a member of the artist duo KairUs, where she creates art together with Andreas Zingerle, exploring topics such as surveillance, smart cities, IoT, cybercrime, online fraud, electronic waste and machine vision. KairUs has recently been recognized by the Austrian Federal Ministry of Arts and Culture, Civil Service and Sport with the BMKÖS/Mayer Outstanding Artist Award 2022
in the category of Media Art. This research was funded by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement No 771800).