Deconstructing unpleasantness in videogame sound effects. Part 2: Practice
In the previous post, I described several acoustic properties that, according to scientific literature, make sound unpleasant to hear. In the second part, I’m trying to apply that knowledge to analyze the actual sounds from videogames.
Let’s assume we can reduce the non-contextual unpleasantness of a sound to a set of acoustic properties. Then game audio designers must have already applied them in their work based on intuitive judgment. That’s why instead of designing horrible noises and torturing the test subjects with them, I decided to analyze real sound effects from well-known games. In the end, scientific knowledge on this front should demonstrate at least some parallels with professional intuition.
Disclaimer: I have no intention to prove or disprove scientific theories and hypotheses with this post. First, I’d let real researchers generate this kind of knowledge. Second, working with numeric data, graphs, and spreadsheets is not a part of my daily routine, so the chances I have messed something up are close to 100%. Third, in case you forgot, I’m not an academic, this post is not peer-reviewed, and I don’t have any supervisor to stop me from making stupid mistakes.
I did not examine many of the features mentioned in the previous post. Here, I mostly focus on assessing roughness and sharpness that were frequently named as critical factors for auditory unpleasantness. Ambiguity was, well, too ambiguous. Others, like loudness and low pitch, were hard to test in a web-based experiment where each participant uses their device. The attack time would require a rather specific set of data. Fluctuation strength was the feature I really wanted to examine, but I couldn’t find good enough tools to do that.
First, I needed the sounds to analyze. Gathering the data was much more challenging than I expected. Most players can name several cases of unpleasant sonic experience in games. Still, they usually refer to the sound of specific scenes or complex events, but not to individual sound effects. For obvious reasons, I had to limit the scope of my study to sound effects only.
After some googling and asking people within my social bubble, I filled a spreadsheet with 240 entries, pointing at unsettling, scary, disturbing, and disgusting sounds from videogames. Many of the entries came from old articles and forum posts, so my data is heavily biased towards the sounds from the older games.
Out of those 240 entries, I shortlisted 19 sounds with more than three mentions. A few of them were still somewhat ambiguous. For instance, “Headcrab sound from Half-Life 2” could refer to every sound the Headcrab makes. For such entries, I gathered a small set of sound effects that felt different enough from one another. When collecting those variations, I mostly relied on my subjective perception of unpleasantness, which could potentially damage the outcome. This way, I ended up with a list of 27 sound effects, and shamelessly moved on to sampling the sounds from youtube.
I didn’t expect every sound in the list to be perceived as unpleasant solely because of its acoustic features. Some must have ended up there because of the message they communicate in the game. Even the most beautiful sound effect can become unsettling if it signifies something negative.
Since I was interested in context-agnostic acoustic features, I wanted people with limited gaming experience to rate the subjective unpleasantness of the sounds from my list. Together with my wife Julia, we found 24 people who don’t usually play games and didn’t mind participating in a small web-based experiment. The test subjects were mostly Russian-speaking, 25–65 years old people of different genders.
I made a web survey with PsyToolkit¹ ² and asked the participants to rate the sounds they hear as “neutral or pleasant,” “somewhat unpleasant,” “very unpleasant,” or “hard to tell.” The Likert scale would be a more obvious choice, but since I only cared about the degree of unpleasantness, I thought it would be excessive. I warned the participants that people called some of the sounds in the survey unpleasant, but they didn’t know this was true for every sound they heard. The sounds naturally varied in duration from 1 to 10 seconds but played on comparable loudness levels (normalized to -15.6 LUFS momentary) and in randomized order.
Data processing and results
To quantify the unpleasantness, I used a formula U = 2VU+SU-NoP where U stands for unpleasantness score, VU, SU, and NoP — for the number of “very unpleasant,” “somewhat unpleasant,” and “neutral or pleasant” responses respectively. As a sanity check, I also tried different weights for different answers, and they didn’t show a significant effect on the relative positions of the items in the sorted list. You can check the spreadsheet with the results here.
The upper quartile (Q3) is composed of 7 sounds with the score from 23 to 31 (most unpleasant). I assume that these sounds have some context-independent acoustic properties that make them unpleasant to hear. The sounds are:
- Stalker scream from Dead Space 2
- Three different vocalizations of the Witch from Left 4 Dead
- One Fast Zombie scream from Half-Life 2
- Hunter scream from Left 4 Dead
- One Clicker vocalization from The Last of Us
The lower quartile (Q1) includes eight sounds with the unpleasantness score from -16 (least unpleasant) to 2. The test subjects mostly called them pleasant or neutral, so they have probably ended up on my list because of contextual reasons.
- The Creeper hiss from Minecraft
- Double Chainsaw sound from Resident Evil 4
- One of the Headcrab sound effects from Half-Life 2
- Cliff Racer scream from The Elder Scrolls III: Morrowind
- “Spotted” alarm sound from Metal Gear Solid series
- Stomps of Mr. X from Resident Evil 2 Remake
- Invasion sound effect from Dark Souls
- ReDead moan from The Legend of Zelda: Ocarina of Time.
All seven sounds in the upper quartile resemble either a human scream or a scream-like animal vocalization. This fact alone doesn’t say anything, because 16 out of 27 sound effects in my list are subjectively scream-like. And yet, I have several reasons to connect a scream-like temporal structure with unpleasant sensations.
First, this assumption aligns with a study by Arnal et al.³ that I’ve mentioned in the previous post. Second, there is a study on “nonlinear” sounds in movies, showing that filmmakers widely use scream-like sounds to alter our emotional responses⁴. Third, we are naturally hard-wired to pay more attention to human-made sounds⁵, so there is no surprise people name something similar to a scream when asked about unpleasant sounds. The initial list of 240 sounds was not random, and that many scream-like sounds have likely ended up there for a reason. Finally, most of the sounds in the lower quartile don’t show any similarity to screams. So the next step was to examine the acoustic properties of Q3 sounds in greater detail.
Modulation Power Spectrum
Arnal et al. connect the scream-like temporal structure to psychoacoustic roughness, specifically to temporal modulations in the range between 30 and 150 Hz³. I decided to analyze the roughness of the sounds using the same method as in their study: Modulation Power Spectrum (MPS). MPS is a 2-dimensional Fourier transform over a spectrogram that shows both spectral and temporal (amplitude) modulations of a signal on a graph. If you want to learn more about it, you can start with this excellent explanation on Stack Exchange.
To build the MPS graphs, I used the Python package called Soundsig. Since I have a fairly vague understanding of the underlying maths, I didn’t dare to touch the main code but made a few tweaks to the graph presentation. To be precise, I adjusted the limits for the X-axis and the color bar, added contours, and drew two vertical lines at 15 and 30 Hz. 30 Hz come from the Arnal et al. study: this is how they defined the borderline of the roughness region³. 15 Hz comes from a definition of roughness in the book Psychoacoustics: Facts and Models by Fastl and Zwicker⁶.
To make sure the tool works correctly, I analyzed some reference tones using a 500 ms window. I used a C major chord built out of 3 sine waves; the same chord where each sine wave is amplitude modulated at 11, 29, and 73 Hz, respectively; Gaussian white noise, and a randomly picked female scream from a sound library.
After that, I used the same settings to build an MPS graph for every sound effect in my list. Here are the figures from the Q3, the sounds with the highest unpleasantness score come first:
And here are the figures from the Q1, the sounds with the lowest unpleasantness score come first:
The MPS graphs for other sounds are available here.
The upper quartile sounds share a somewhat similar pattern that probably reflects a scream-like temporal structure. The lower quartile expectedly shows much more variety. All the sounds in the bottom row have an overall “thinner” spectrum graph that indicates fewer amplitude modulations that stretch into the “roughness” region. But the entire top row shows a lot of modulation power above 30 Hz; the least unpleasant sounds seem to be more “rough” than most samples from Q3.
The top right Q1 sound is the Cliff Racer screech from The Elder Scrolls III: Morrowind. Subjectively, I would rate it as scream-like, unpleasant, and weird. At the same time, I can imagine people finding it funny or goofy, which could explain why it received a low unpleasantness score.
The more interesting case here is the Creeper hiss from Minecraft — the least unpleasant sound rated far below anything else in the list. Only three people called it unpleasant, and nobody perceived it as very unpleasant. The sound itself has a relatively flat, white-noise-like spectrum with a moderate boost in the high end. The Double Chainsaw from Resident Evil 4 and Headcrab’s rustling sounds from Half-Life 2 are also quite noisy, which is reflected on their MPS graphs if you compare them to the white noise graph above.
What if noise-like sounds are less unpleasant to us, disregarding how “rough” they feel? One experiment shows that narrow-band noise is less unpleasant than pure tones with the same central frequency and sound pressure level⁷. Also, there is an idea that chaotic, nonlinear, almost-periodic sounds easily make us feel scared⁸. Seth Horowitz, in his book The Universal Sense, says that near-periodic nonlinear sounds resemble a screech of unbearable pain or panic and trigger a fight-or-flight response⁹. We probably feel this way because of unpredictability. Both periodic sounds and random noises are somewhat predictable, so our brain can easily filter them out after the initial processing stage. Quasi-periodic, nonlinear sounds are harder to ignore, which makes them cognitively demanding.
Alternatively, roughness alone could be insufficient to make the sound unpleasant. Maybe we are only sensitive to amplitude modulations in some specific parts of the spectrum? Arnal et al. focused their study on screams and scream-like alarms, where most of the energy comes from the vocal range. There is a parallel with a paper by Kumar et al.¹⁰, which connects unpleasantness to slower amplitude fluctuations (1–16 Hz) in combination with a lot of energy in the range from 2500 to 5500 Hz. The assumption about the frequency range makes a lot of sense once we take a look at the equal loudness curves that show higher sensitivity in the region centered at 3500 Hz.
Frequency spectra, roughness, sharpness, and spectral flatness
Unfortunately, I did not find a reliable tool to examine temporal modulations in the subsonic range. But I totally could analyze the average frequency spectrum of sounds from my list and again compare Q3 against Q1, looking at the region between 2,5 and 5,5 kHz.
Frequency spectra images of all 27 sounds from the list are available here.
Even though most of the energy in the Q3 sounds spectra lie in the mid-range, they generally show more density between 2500 and 5500 Hz, confirming the assumption. In contrast, two of Q1 sounds show salient high-frequency content. It is an indicator of psychoacoustic sharpness, another possible factor of unpleasantness I wanted to examine.
The AudioCommons Timbral Models Python package allows measuring both mean sharpness and roughness of sound files. Both models are regression-based and trained on subjective ratings. They output values between 0 and 100 that have nothing to do with aspers and acums — the measurement units used in psychoacoustics. Still, it was much better than nothing. And now is an excellent time to check the second sheet on the unpleasantness score spreadsheet and see the results of both evaluations.
The data shows no correlation between sharpness and the unpleasantness score. Many sounds in my list are indeed sharp, but the property is more or less equally distributed throughout the sorted list. But I wouldn’t draw any conclusions from this for at least one reason. Weber and Eilers mention that sharpness has more effect on psychoacoustic annoyance on higher loudness values¹¹. When answering the survey, the participants used their own devices, and there was no way to find out how loud did the sounds play. So, testing how sharpness influences unpleasantness without taking the loudness into account is probably not a good idea.
More surprisingly, the data also demonstrates no significant correlation between roughness and unpleasantness. Nevertheless, the upper half of the list appears to be more “rough” than the lower half. This observation made me return to my previous guess about noisiness that “neutralizes” the effect of roughness.
If we filter out the “noisiest” sounds from the list, we can see a moderate correlation between roughness and the unpleasantness score. To assess the noisiness, I built spectral flatness graphs using the Iracema Python library. Even though the library comes with a separate noisiness feature extractor, the model takes input from the pitch detecting algorithm, and I wasn’t able to get reliable results from it with my data. So, based on the spectral flatness graphs, I filtered out six sounds: all four Half-Life 2 Headcrab noises and already mentioned Creeper Hiss from Minecraft and Double Chainsaw from Resident Evil 4. The results are present on the third sheet called “SF-Filtered.” The spectral flatness and (nearly useless) spectral centroid graphs are available here.
If we assume that this correlation is meaningful, it could lead to two conclusions. First, as I said, noisiness could somehow neutralize the effect of roughness when it comes to the unpleasantness of the sound. Second, amplitude modulations in the roughness range may only feel unpleasant when they apply to some specific region of the frequency spectrum or when the sound shows salient frequencies in that region.
I acknowledge that my research process was far from perfect. Both data collection and processing could be done in a much better manner. For example, the sounds varied in duration and dynamics, which could affect both subjective evaluations and measurements.
Besides that, I could have misunderstood or misused the audio analysis tools or misread their outputs. I am not a programmer, and I suck at signal processing maths, so I was using analysis tools without a good enough understanding of what they do. The tools could have introduced some undesirable processing stages, altering the output. For instance, the participants of the experiment listened to loudness-matched sounds, but the models could apply some normalization to the sound files.
Third, I know very little about statistics and data analysis, so I could have made a mistake evaluating the correlation and its statistical significance. And even if I didn’t, the sample size was way too small to draw serious conclusions from this study.
With all that in mind, let’s carefully assume that the results are meaningful. First, it seems that the scream-like spectral and temporal structure is a feature shared by the most unpleasant sounds in my list. Roughness, as it is, correlates with the unpleasantness of the less noisy sounds, or the ones with a less flat spectrum. I didn’t observe any noticeable correlation between sharpness and unpleasantness, but in this specific sample, it seems to correlate with roughness.
Initially, I started this study as an attempt to find a reliable technique of modulating the unpleasantness of the sound effects in games. Controlling “Scream-likeliness” seams to be the best candidate for such technique, but it is still hard to tell which exact acoustic features make the sound appear scream-like. A combination of roughness and spectral density at 2500–5500 Hz sounds plausible to me, but I need to study this further.
Still, it is not entirely clear what we should do if we want to “descream” an annoying sound effect. Maybe, demodulation based on the MPS analysis, combined with simple equalization, could do the trick? There is a lot to test on that front, and I’d like to somehow verify my hypothesis before pushing it further.
The weird relation between spectral flatness, roughness, and unpleasantness raises a separate question. Could it be that higher spectral flatness or noisiness makes a “rough” sound less unpleasant to hear? Or maybe such sounds feel less unpleasant because they don’t show a lot of energy in the frequency range I specified above? I’d love to find an answer to this question in a separate experiment. Hopefully, in a better designed and executed one.
Finally, I’m still curious about other acoustic features I listed in the previous post. Fluctuation strength deserves more attention when it comes to this topic. It’s a shame I couldn’t find a good enough tool to analyze my data for this feature, and I’d be happy to see one eventually. The entire Zwicker’s psychoacoustic annoyance model⁶ could be a great analytical tool for a sound designer if there were an approachable, user-friendly implementation. Maybe there is one that I’ve missed? If yes, reach out to me and let me know.
This post has raised more questions than answered, and I hope you are not very disappointed with inconclusive results. You are very welcome to experiment with the data I collected and draw your own conclusions or prove me wrong in any way you like. I very much appreciate your criticism because it can save me a lot of time in the future. And even though I didn’t deliver the answers you were waiting for, I hope you can make fair use of some of the analytical methods I used in your work and your creations.
: G. Stoet, “PsyToolkit: A software package for programming psychological experiments using Linux”, Behavior Research Methods, vol. 42, no. 4, pp. 1096–1104, 2010. Available: 10.3758/brm.42.4.1096.
: G. Stoet, “PsyToolkit: A novel web-based method for running online questionnaires and reaction-time experiments”, Teaching of Psychology, vol. 44, no. 1, pp. 24–31, 2017.
: L. Arnal, A. Flinker, A. Kleinschmidt, A. Giraud and D. Poeppel, “Human Screams Occupy a Privileged Niche in the Communication Soundscape”, Current Biology, vol. 25, no. 15, pp. 2051–2056, 2015. Available: 10.1016/j.cub.2015.06.043.
: D. Blumstein, R. Davitian and P. Kaye, “Do film soundtracks contain nonlinear analogues to influence emotion?”, Biology Letters, vol. 6, no. 6, pp. 751–754, 2010. Available: 10.1098/rsbl.2010.0333.
: K. Stavropoulos and L. Carver, “Neural Correlates of Attention to Human-Made Sounds: An ERP Study”, PLOS ONE, vol. 11, no. 10, p. e0165745, 2016. Available: 10.1371/journal.pone.0165745.
: H. Fastl and E. Zwicker, Psychoacoustics: Facts and Models.Berlin: Springer, 2007.
: K. Kurakata, T. Mizunami and K. Matsushita, “Sensory unpleasantness of high-frequency sounds”, Acoustical Science and Technology, vol. 34, no. 1, pp. 26–33, 2013. Available: 10.1250/ast.34.26.
: W. Fitch, J. Neubauer and H. Herzel, “Calls out of chaos: the adaptive significance of nonlinear phenomena in mammalian vocal production”, Animal Behaviour, vol. 63, no. 3, pp. 407–418, 2002. Available: 10.1006/anbe.2001.1912.
: S. Horowitz, The universal sense. New York: Bloomsbury, 2013.
: S. Kumar, K. von Kriegstein, K. Friston and T. Griffiths, “Features versus Feelings: Dissociable Representations of the Acoustic Features and Valence of Aversive Sounds”, Journal of Neuroscience, vol. 32, no. 41, pp. 14184–14192, 2012. Available: 10.1523/jneurosci.1759–12.2012.
: R. Weber and R. Eilers, Combined contribution of roughness and sharpness to the unpleasantness of modulated band pass noise. DAGA ’07, Stuttgart, Deutschland, 2007. ISBN: (978–3–9808659–3–7), p./pp. 565/566. Deutsche Gesellschaft für Akustik.