VRChat — The Sound of Progress

Published in

VRChat

10 min readJul 31, 2019

Sound is an extremely important part of any experience, and VRChat is no different. Having good audio balance and spatialization in your world for both audio clips and user voice can make the difference between hours of immersive time spent and a quickly-dismissed world.

As part of our “Stability, Security, and SDK” mantra we talked about in our last post, we’re introducing some significant changes and upgrades to the sound system in VRChat. We’ll be going over them here. These features will be released in VRChat 2019.2.5, our next update — which should be live now!

Spatialization Upgrades

As a whole, we’ve upgraded our spatializer (Oculus Native Spatializer for Unity) to a newer version and tuned the settings a bit. This upgrade opens some pathways for us that weren’t available before, fixes a few issues, and gives us a chance to tune values we haven’t circled back around to in a while.

We’ve improved the default voice fall-off parameters as well. This means that when using default settings, other users should sound like you would expect them to in the real world. This helps a lot with your individual ability to “decode” which voice belongs to who, where they are in relation to you, and how close they are.

In addition, we’ve removed the “Voice Prioritization” system we had in place. Voice Prioritization would quiet the voice of people farther from you when someone nearer to you spoke. This was needed previously for various technical reasons, but we found in testing it had no further useful effect. Due to the more effective spatialization, there’s no real need to quiet far voice — it is very easy to discern who is closer.

We also have a directional voice bias in VRChat. This isn’t a new feature, but it is important to keep in mind. People that are facing you will be louder than people facing away from you. There’s a roughly 120 degree cone that projects forward from the speaker’s face, and if a listener is in that cone, they hear the speaker at full volume (as determined by the fall-off curve). If they’re outside of it, the speaker’s volume is reduced to 70%.

Finally, we’ve added a low-pass filter to voice at long ranges. This filter really helps when a group is having a conversation farther away from you, muffling them out a little bit more. Previously, their volume would simply lower — but that isn’t how your brain expects distant conversation to sound. We’ve tweaked the filter a bit so it isn’t just a duplication of reality, but instead helps you pick out conversations that matter near to you.

We’ve also made some other changes that attempt to ensure that voice audio ends up at higher volumes than world audio and avatar-based audio. This includes adding a compressor / limiter to the avatar audio channel that helps to diminish overly loud sounds and prevent malicious avatar audio from affecting users. We also have a compressor on the user voice channel to prevent similar issues, but these two compressors operate separately so you cannot affect user voice from overly loud avatar sounds (and vice versa). Read more about the avatar audio compressor (including tips on how to avoiding hitting it) in our documentation.

Some New Toys in the Toolbox

We’ve also got two new components for both avatar and world creators that should make dealing with audio significantly easier and more intuitive.

If you’re more of a visual learner, check out our updated SDK. We’ve put an example scene in there called Example-PlayerAudioOverride. If you’d like to learn more, read on!

VRC_SpatialAudioSource

The first of these components is VRC_SpatialAudioSource, which replaces ONSPAudioSource. It looks like this in the Editor:

Each of these properties has a tooltip, and we also have extensive documentation on the component itself and how it works. In short, these are the properties:

Gain — An additional boost to volume. The default value is a 10db boost. Avatar-based audio sources are limited to a 0 to 10db boost (unless overridden by the world’s settings). World-based sources can have a boost of 0 to 24db.
Far — The radius (in meters) where audio intensity falls to zero. By default, this is set to 40m for World audio.
Advanced: Near — The radius (in meters) where audio intensity begins to fall off. We strongly recommend you keep this at zero!
Advanced: Volumetric Radius — Normally audio sources are considered point sources. However, this value (in meters of radius) cause the source to sound as if it emits from a larger area. Be careful using this! It is meant for making distant audio sources sound “large” as you move past them. The listener should never get close to the object for best results. Keep it at zero, normally, especially for avatar audio sources. If you want 2D sound, then you’d use the next two options.
Advanced: Use AudioSource Volume Curve — This defaults to off. When off, this uses the spatializer’s “realistic” fall-off that simulates loss of audio intensity as the inverse-square of distance from the source. We recommend keeping this off! If it isn’t checked, the AudioSource graph for falloff is used, which you’re free to edit to your heart’s content.
Advanced: Enable Spatialization — This defaults to on. When on, this uses our spatialization solution for making sounds appear as if they’re coming from the direction they’re emitting from. If you’d rather use Unity’s implementation (or use “2D” sound), you’ll need to turn this off.

The component also generates a few Unity Gizmos to show the Near and Far radius.

We hope this component helps with de-mystifying how audio behaves on Audio Sources in VRChat worlds and on avatars.

There’s some limitations on avatar-based audio we’ve noted above, as well as some other behavior that’s slightly different for avatar audio. Read more about it in our documentation.

Finally, this component’s properties cannot be adjusted at run-time (via animations or etc). The values are set at initialization. You can still disable/enable the object or audio source itself, and you can still animate most properties of the Audio Source component itself.

VRC_PlayerAudioOverride

Ever wanted to have a stage where the performer’s voice or avatar audio can be louder, and the audience carries a bit less? Wait no more, as VRC_PlayerAudioOverride is here!

This component permits you to alter a player’s voice spatialization parameters as well as the avatar audio limits when they enter a region defined by a trigger collider. This is what the component looks like:

A VRC_PlayerAudioOverride component with default values. Note that Region is empty and Global is disabled — so by default, this won’t do anything.

You’ll notice you can also select which collider is used! You can use any collider (including Mesh Colliders) to define these regions. Make sure you check “Is Trigger” or else you’ll just have an invisible block you can’t walk into!

You should tune your world’s audio properties depending on the general layout of the world. For example, if you’ve got a wide-open world with a ton of space, you’ll probably want to drive up Voice Far a bit more. If you’re got a world of claustrophobic corridors, you might want to drop it a little bit.

Here’s a summary of the properties on this component:

Global — If checked, these settings are global. Otherwise, settings only affect players who enter the trigger region. You can only use one of these — we’ll only use the first one found. This is False by default.
Region — This field takes a GameObject with a Collider-class component on it. The collider must have Is Trigger enabled. When a player enters this region, their settings are changed. This is Empty by default.
Region Priority — Higher number means higher priority, can be negative. If a high-priority region overlaps a low-priority region, the high-priority region’s settings will be applied instead.
Voice Gain — An additional boost to voice volume. Must be between 0 and 24db.
Voice Far — The maximum range at which voice can be heard. At values close to this range, voice is going to be pretty quiet.
Advanced: Voice Near — The near radius, in meters, where volume begins to fall off. It is strongly recommended to leave the Near value at zero for realism and effective spatialization for user voices.
Advanced: Voice Volumetric Radius —An audio source is normally simulated as a point source. However, changing this value allows the source to appear to come from a larger area.
This should be used carefully, and is mainly for distant audio sources that need to sound “large” as you move past them. The listener should never get close to the radius for best results. If the listener is inside the Volumetric Radius, strange behavior may result.
It is strongly recommended to leave the Volumetric Radius value at zero for realism and effective spatialization for user voices.
Advanced: Voice Disable Low-Pass Filter — Disable the lowpass distance filter. As an aside, the Low-Pass filter kicks in at 1/2 of Far. May be useful for non-standard ranges.
Avatar Gain Limit — Limit for avatar audio gain in decibels. Setting this to -1 (or any negative value) will mute all avatar audio. This adjusts the limits, and does not boost avatar audio.
Avatar Far Limit — Limit for avatar audio max range, in meters. Setting this to zero will mute all avatar audio. This adjusts the limits, and does not boost avatar audio.
Advanced: Avatar Near Limit — Limit for avatar minimum range, in meters. This adjusts the limits, and does not boost avatar audio. See note on Voice Near.
Advanced: Avatar Volumetric Radius Limit — Limit for avatar volumetric audio, in meters. This adjusts the limits, and does not boost avatar audio. See note on Voice Volumetric Radius.
Advanced: Avatar Force Spatial — Force the “Use Spatialization” setting on for all avatar audio sources. Turning this on means you don’t want to permit 2D avatar audio.
Advanced: Avatar Allow Custom Curve — Allow avatar audio to use a custom AudioSource intensity curve for falloff. Otherwise, Inverse Square falloff is enforced. Use this if you want to disable the ability for users to define their own falloff curve on avatar audio.

This component’s properties also cannot be adjusted at run-time (via animations or etc). The values are set at initialization.

If you want to have a region toggle on or off, you should use an animation to move the region around (into the floor to disable it, for example). You can’t toggle the region on or off by disabling or enabling the component or GameObject — unfortunately, due to the way Unity fires relevant events, this won’t work.

What’s a Fall-off?

Now that we’ve covered your new toys, let’s talk about some technical details.

In the real world, audio intensity (volume, sorta) falls as the “inverse square” of your distance from the source.

An illustration of a chalkboard with the equation “Intensity varies as the inversely as the square of distance from source” — *Image sourced from Public Domain textbook “Basic Audio” by Norman Crowhurst, uploaded by the Epina eBook Team to the* ***Virtual Institute of Applied Science website***.

This means that if you’re right on top of your audio source (near-zero distance), you hear the sound at full intensity. But if you’re 4 meters away, you’re now hearing audio at 1/16th of its original intensity. If you’re 40 meters away, you’re now hearing the audio at 1/1600th of its original intensity!

In VRChat, our spatializer roughly follows this curve. This is typical for most games and applications, but is very important for VR applications. There’s some more nuances here (varying sound depending on direction of emission, of listener, delay of left/right channels, occlusion, reverb), but let’s stick to the surface level.

You may have seen earlier where we mentioned a hard limit for voice audio at a certain distance. VRChat has always had this, but you can now define this yourself. When using VRC_PlayerAudioOverride, the voice cut-off is equal to the Far parameter.

In short, you should be using VRC_PlayerAudioOverride to ensure that your falloff and voice settings make sense for the world you’re building. We’ve tried to tune our defaults (visible in our docs) for the best “general-case” use, but it is impossible to find a perfect setting for all worlds. Make sure you check your audio settings as part of quality-checking your worlds!

So, what’s next?

Like many things (all things?) in VRChat, we’re not done with audio. We have a few ideas we’re pushing toward, and these additions are just a stepping stone towards that objective.

We’re working on several ideas to give creators more freedom in controlling audio in their worlds. There’s a lot of ways we can facilitate various activities in VRChat, and audio plays into that heavily.

Another very important piece of the audio puzzle is occlusion, or allowing objects to block (or partially muffle) audio. Occlusion is a feature we want to implement, but making it easy to use is important. We’re working towards it, so keep an eye (ear?) out!

There were many enthusiast and expert users of VRChat (and audio wizards!) who spoke up about the changes we were making in early Open Beta versions. As a direct result of that feedback, we circled around and re-approached the problem with a focus on ensuring creators were not having options closed to them with this update. Creators now have even more control over audio than before, and we’re very excited to see what our community thinks up with these new tools.

In other words, we’d like to offer thanks to our Community testers in both our closed testing as well as our Open Betas. We wanted to make sure our audio changes were a positive impact for all users, and your feedback made that possible. Although it meant that this update took significantly longer, we were able to iterate and work with our Community to figure out the best way to get the Audio Update out.

We plan on releasing some more posts regarding what we’re working on. We’re excited to get this new update out to users, hearing feedback, and seeing what else we can do to help our creators make amazing experiences.

Thanks for reading, and we’ll talk again soon.