The Problem
In Rec Room we’ve had a problem for a while that if you’re near a group of players all talking, trying to talk to someone else can be a pretty annoying ordeal. All the voices intermingle and appear too close and loud to be able to tune them out of your own conversation, so it’s hard for anyone to focus on what anyone else is saying.
We wanted to make this more manageable, so we started off by researching a bit about how the sound of voices works in the real world. Turns out there’s a bunch of math involved, and a lot of fancy physics. We tried simulating these effects in some simplified ways which gave us some promising results, but a high performance cost. Beyond that, there were some strange side effects which felt weird in Rec Room by virtue of being more “realistic” (e.g. a voice would become louder if you pointed your ear at it). This, compounded by the fact that the system didn’t really provide satisfactory results when going through our engine’s relatively simple spatial blending, made us rethink our approach and attack the problem from another angle.
I started by taking some basic volume intensity measurements of people talking at different distances and facing in different directions within the distance ranges we were interested in, and found that the volume differences were pretty minor, especially at close distances. This made me suspect that most things that allow us to “tune out” other conversations around us are just our brain and inner ear doing some voodoo magic based on subtle cues of spatial sound. Our brain is just choosing to focus on one thing and tricking us into thinking certain voices are more clear than others. Most of the subtlety in those cues is lost, however, in the simplicity of the spatial audio model of the Unity engine, which is built more for speed and efficiency rather than high fidelity. You can modify its behavior a bit by adding things to your scenes that change the way sound behaves in certain spaces, but they require careful tuning, precise positioning, and a bunch of experiential testing to make them sound right. That was not an option for us, given that most environments that players visit in Rec Room nowadays are not even built by us!
Directional Voice Chat
Our new attempt at solving this problem comprises three different effects that we layered together. Given our results from the previous prototype, this time I decided to try a more “game development” approach - that is, make something completely fake, but which feels like it’s doing the thing we want. Usually involves going super heavy-handed with stuff that is nuanced and subtle in real life.
The first thing I thought about is the direction in which people are facing when they talk. Intuition told me that if others faced away from you, you’d hear them less. Even though this is actually very subtle at close distances in reality, I thought that if someone was facing away from you, they probably aren’t really talking to you, so I aggressively attenuated voices of players that face away from you.
Next, I wanted to get a stronger effect of voices becoming more muffled and less intelligible as you move farther away from them. This happens naturally to some degree because higher frequencies need to be louder for us to be able to hear them. And I could tell you that I also wanted to simulate the effect of air, which attenuates higher frequencies faster than lower ones, but really I just wanted to get the more cinematic effect of people farther away seeming like muffled voices. You know how characters in movies can have a perfect conversation in a full restaurant while everyone around them is “talking”? Yeah, that’s not real, and neither was the low pass filter that we added to voices as they move away from you in Rooms :D. It starts artificially cutting off higher frequencies in players’ voices as distance grows to get this effect.
At this point the solution was already sounding pretty good. We could definitely hear conversations we cared about better than “background noise,” but it still felt like something was missing: the ability to “focus” or “pay attention to” a certain person more deliberately. That’s when Radiant Blur thought of having something similar to the directional attenuation we were doing on voice audio sources, but applied it to the listener (i.e. yourself) as well.
So I implemented the same type of attenuation, but where all player voices decrease in volume as you face away from them. Kind of how your dog must hear when wearing the “cone of shame”!
This is the “fakest” part of our Directional Voice Chat solution in the sense that it has no basis on any real physical effect of sound, but we found that it was surprisingly effective at capturing player intent regarding who they want to pay attention to while in conversation with multiple people in a room. So it remained as the third element in our voice chat solution.
Pros and Cons
We think Directional Voice Chat is a definite improvement in the goals we set out to achieve. From our experience with it, it is much easier to hold conversations in crowded rooms, and parse out the voices that interest you from the ones you want to ignore. Also since we simplified the approach so much, the resulting algorithm is very fast and shouldn’t impact performance even with a lot of players in a Room on lower-powered devices.
The flip side to simplicity is, of course, missing out on some of the nicer things provided by higher fidelity approaches. It doesn’t currently take into account the environment at all or how sound bounces off of surfaces. This means you can get into situations where you hear someone on the other side of a wall that’s facing towards you louder than someone on the same side of the wall as you but facing away from you.
Since this feature is focused on the most common scenarios for player-to-player conversations, less common uses for voice chat might have undesirable side effects. Calling to someone across the room or playing music through your microphone, for example, might not be as easy as before. That’s why we decided to give players the option of disabling through the Sub-room settings.
Even though this is a first approach and we might still need to iterate on it, we think this is definitely a step in the right direction for making voice chat more manageable and enjoyable in Rec Room. Happy chatting!