[This is one of the essays wining this competition: https://www.hiig.de/en/twentyforty-call-for-submissions/]
Amsterdam, Friday 4th March 2041,
The term “computer” used to refer to a human, someone doing computations the old-fashioned way: with paper and pen. The term “programmer” also used to refer to a human, someone doing programs the old-fashioned way: with keyboard and mouse. Now our “voice-ops” handle more than programming and computing. More than the digital world of measuring and analysing, they handle a large part of our intellectual world. They read and write our books as they do our computer code. Voice Operators are our librarians, our curators, our secretaries, and at times, our caretakers. They might soon become our journalists, our lawyers, and even our teachers. Should we let them?
Back in 2024, when the deep fake crisis ended with the ban on blacklisted AIs for public influence purposes, their infamous deep ontologies for language manipulation were leaked and publicly released. The open source community repurposed this back-alley monster of neuro-marketing, and soon gave birth to the first talking computers. Back then, we did not imagine the coming revolution, as we did not imagine the Internet revolution in the 20th century.
After toying with talking apps for entertainment, we developed the first Voice-Operated Programming technologies (VOP). Academics painstakingly devised ways to translate human intent into low-level computing operations. Starting from high-level descriptions of our goals, Voice Operators would start a conversation until enough information is gathered to specify the programs we need. The Voice Operators could then write the code, and deploy the programs, and even conduct tests and debugging.
After a consensus emerged on VOP standards, they flourished in the 2030’s, allowing a large public to construct their own information systems. By the mid 2030’s, our administrative matters were largely voice-operated. And since 2039, Voice-Operated Programming can even be performed by children.
At first VOP required somewhat unnatural communication. Should the voice-op ask “Please specify ‘sending a reminder’?”, we would reply in a vernacular such as “The reminder is a summary, length is 3 sentences, content is action- and consequence-oriented. Send it through the user’s GDPR-certified contact channel”. Then our dialogs with voice-ops became more natural, and more personal. We rather say “Drop these clients a note on their bill”.
This is, after all, thanks to the infamous deep fake technologies that can mimic our personal styles. Our styles of expression, but also our styles of cognition. Deep fake technologies allow ads to speak our own language, our own vernacular. They also allow computers to speak our own language, not formal computer vernacular. It saves us from writing computer code. Now it saves us from writing, period.
Voice operators have incredibly transformed our society because they remodelled our work environment, our access to information, and finally, our sense of community. Our work environment is inevitably transformed: we barely need to type on a keyboard anymore, thanks to voice-ops, and we barely need to use a mouse either, thanks to track-cams. Ergonomically, typing text with our voices rather than our hands is incredibly healthier. No more slouching and RSI, but instead, improved breathing capacity and blood oxygen balance.
However, voice-ops introduced considerable acoustic stress in our work lives. Office work is like a permanent meeting. Whether colleagues interact among them or with voice-ops, work sounds like a gigantic call centre. Fortunately, in most public spaces voice-ops are mostly used with courtesy and parsimony. Sadly, in our social lives, noise-cancelling headphones and microphones are replacing face-to-face interactions. We spend most of our time equipped with noise-cancelling headphones, incidentally shunning each other. We let human touch turn into a vocal presence.
We also turn our books into vocal presence. When texts are read by voice-ops, we cannot explore their content at our own pace. Voice-ops unravel the inner rhythm of our reflections. Fortunately, we have algorithms that render pauses in speech more naturally, and at a controllable pace. Yet pauses are imposed, and we understand less of a text when hearing it rather than reading it in silence .
Fortunately, written texts remain largely present in our environment: our newspapers, our books, our reports. Yet, more and more of our texts are written by machines: by web searchers, by summarisers, by data analysers. They layout our news briefs, write our reports, and tweak our ads. They write books too now: handbooks with Virtual Reality add-ons, school books with lessons and exercises, or novels with plots and well-crafted suspense.
The machines writing our texts, and our voice-ops’ words, have a seemingly real personality of their own. Although sometimes their awkwardness betrays their artificial nature, machines look all the more alive: with an awkwardness of their own as part of their personality.
Text machines and voice-ops can create the illusion of a companion, a seemingly real interlocutor. Most people spend at least 20% of their vocal interactions with voice-ops, and 40% of their reading time on machine-generated texts . This amounts to the time we spend with close family members: machines are part of the family. They can as well have a role in our psychological balance. Machines can generate artificial voice tailored to influence our emotions, for example to mitigate anxiety and anger or help us fall asleep [3,4,5]. Voice-ops can also provide solace to the lonely and the depressed [6,7]. They take part in our social and developmental fabric. Children can start learning and playing with voice-ops from kindergarten, but we do not know yet how this can impact their psychological and cultural development.
The toll of voice-ops on our social fabric and acoustic comfort is not the only worry. From a more practical perspective, Voice Operators add several layers of complexity to our computer systems. We used to program our systems ourselves, with programming languages that are hard to learn but that are explicit and unequivocal. To this initial 2-layer framework of code and compiler, we added a layer for Voice-Operated Programming and a layer for personalising our dialogs. We can program while being left largely ignorant of what exact programs are interpreted by VOP, and what VOP commands are interpreted from our dialogs. As a consequence, our computer programs are left largely uncontrolled. Most applications do not justify investing in thorough verifications of VOP results, and the job market for human programmers is gradually declining. Hence our programming workforce may decline too, and we may loose the ability to fully control most programs we use.
Beyond impacting our system governance and job markets, Voice Operators also impact our planet. The interpretative layers of voice-ops require additional computations to infer the programs to execute and discuss them with us. Furthermore, the resulting programs may consume more computing resources than man-made programs or optimised software architectures. Computing resources are extremely fast and cheap, but they entail consequential ecological costs. At the scale of the planet, voice-ops are widespread and their carbon footprint significant. In our damaged ecosystems plagued by heat waves and severe blizzards, it is urgent to start limiting the use of voice-ops, at least for hazardous or superfluous applications, to start measuring and regulating their ecological costs, like any other factory under the UN Climate Control Regulations, and thus to start investing in engineering workforce for optimising their programs.
[1–7] If I were writing this text in 2041, I would back up these points with scientific references.
The Digital Society School is a growing community of learners, creators and designers who create meaningful impact on society and its global digital transformation. Check us out at digitalsocietyschool.org.