I’ve been working with audio my whole life. Three years ago I started working for Timber Soundscapes and got really involved in 3D sound for headphones, also referred to as binaural audio. 3D sound gained a lot of interest lately because of the applications in VR and AR with head tracking options; when the person moves his/her head, all sounds need to be repositioned for a good VR/AR experience.
But I also think there are great opportunities for remixing more traditional content. After all, more and more people use headphones to listen to music, watch videos, or play games. Normal stereo mixes always sound a bit ‘too close’ on headphones, and lack the natural space that speakers add.
Experimenting with 3D sound coming from mixing in stereo and surround has been a great experience and a lot of lessons learned. I’d like to explain what I’ve learned, and what I think is really important to understand when you want to get into 3D audio.
It seems so easy when you see the websites of 3D audio plugin developers. With just one extra piece of software all of a sudden all sounds can fly around you in 3D. Everybody is promising 'total immersion', 'ease of use' and a 'set and forget' interface. And some of these plugins are offered for free, or included in the authoring software.
But when you speak to game developers or producers who tried these plugins you often hear disappointment. The 3D effect is hardly there, or very limited and hardly ever what they expected.
Because 3D audio isn't as easy as these plugin developers suggest. The theory is relatively simple, but it is the implementation that counts.
Not only the implementation of the plugin developer, but also of the game UI and game sound designer. There are three main reasons for this disappointment:
1 - to implement 3D audio properly, the designer needs to know about the psychology of sound, how humans locate sound, and (s)he needs to apply that knowledge during design of the game.
2 - some plugins have a rather limited feature set, or take shortcuts with some calculations.
3 - thinking that the same plugin is equally good in positioning a jet fighter or a bee in a hive is not right. It is like a film sound engineer using only one reverb program to simulate a church and the interior of a car. Anybody who worked in film knows that for realistic sounding spaces you need different algorithms, and a large set of controls to fine tune the simulation for each individual space.
And maybe we should mention number 4: how is 3D audio used?
You could differentiate between 'active 3D sound' and 'passive 3D sound'.
The first one uses an interactive playback system with head tracking like the Oculus, where the user can turn his head and all sounds will be repositioned.
The second one is where the player uses normal headphones, and repositioning the head will not change the position of the audio.
We thought it would be good to summarise the most important factors that determine how faithful a 3D audio simulation will be.
Note: to keep this article readable some concepts are slightly simplified.
We know from which direction a sound is coming because our brain compares the sound of the left and right ear. We check for differences in loudness/volume, for differences in timing/phase, and for differences in frequency content.
Normal stereo panning (or surround panning) only uses volume to position sounds. But volume doesn't tell us wether a sound is in the front or back, differences in volume between left and right will make a sound only move in the left-right plane. That's why surround needs (at least) 5 speakers. Positioning based on volume works for all audible frequencies except the very low frequencies.
By combining volume differences with phase differences our ears have some more data to work with. If a sound is closer to either one of our ears (more to the left or to the right) the sound will reach that ear sooner than the other ear. This causes a difference in phase/time.
But location based on phase doesn't work for all frequencies. For humans the range is up to ~ 1500Hz.
And just like volume differences, phase does not help in determining if a sound is in the front or in the back or elevated.
The third one is differences in frequencies. Depending on the angle a sound hits our ears, our head and our body, different frequencies will be attenuated or emphasised.
And they are the most powerful for 3D localisation, both for front/back and elevation.
But now something else kicks in...
To localise sounds more accurate we perform multiple measurements; we change the position of our head slightly and listen again, and compare the two measurements to determine a more exact position. We repeat that process until we're reasonably sure from where the sound is coming. If we do not manage, we turn our head and use our eyes.
In software, this part is only possible when using head tracking. And that's why head tracking makes a huge difference in 3D sound allocation.
But even without head tracking you can achieve much more convincing 3D positional audio for headphones than with normal stereo is possible simply because you use much more positional cues than volume alone. And the unnatural ‘closeness’ that is associated with headphones is gone.
Also, research has learned that the more people use 3D audio, the more experienced they get in localising sounds, even with passive 3D.
In a way, by actively listening we train ourselves, especially when our eyes can confirm what we hear.
These things are good to know, and good game design takes these issues into consideration.
Especially if you work without head tracking it is important to keep the above in mind: if you introduce a new sound at the back of the listener chance is not big that (s)he will know it is in behind her: the sound is new to her, and because of that the listener cannot recognise any frequency differences caused by the fact that it is in the rear plane. It is best to introduce new sounds in the front/front sides, and once the listener ‘knows’ how it sounds when it is in front, she will recognise when it is in the back.
Another example is a sound that is playing stationary in the rear; it is difficult to recognise as being in the back. Moving the sound towards the side and then back will help in locating.
Most 3D audio software uses HRTFs (head related transfer functions) to mimic the volume, frequency and phase differences described above.
It is obvious that the quality and resolution of the HRTF set has a great impact on the quality of the 3D experience.
But here is the next caveat: because heads differ in shape and size, ideally each person should have it’s own HRTF set.
An HRTF set that works great for one person might not work convincing for someone else. And obviously, also the design and quality of the headphones make a difference.
HRTFs have one other nasty problem: they introduce phase differences.
We just saw that this is needed to simulate the position of the sound, but it also means that all sounds that were carefully designed by the sound designer all of a sudden sound different. And often this means less clear, less defined, less low end. There are ways to deal with that, but the implementation is not done in the same way by all manufacturers.
Usually an HRTF set describes the differences at a fixed distance, and to simulate changes in distance the plugin developer needs to add volume calculations.
But, with change of distance the frequency behaviour also changes, because not all frequencies attenuate the same when the distance changes.
And this also depends on the environment.
For proper distance simulation additional filters need to be designed which need to be adjustable by the game (sound) designer.
By using HRTFs you can only describe the angle of sound, and by adding volume and extra filters you can simulate distance.
What about the room?
Early reflections and reverb
Most manufacturers add some form of reverb and or delays/early reflections. But this is more tricky than it seems:
When a sound is made, this sound is reflected by the surrounding walls, the ceiling and the floor and objects in the room. Even outdoor, depending on the volume of the sound, reflections take place.
The first reflections are called early reflections, and the later ones are called the reverb tail. Early reflections can help us locate the position of a sound, but can also obscure the location. This depends on the level and the amount of reflections, and how accurately the reflections are calculated.
Many manufacturers take shortcuts here. They calculate the reflections based on the assumption that the listener is always in the centre of the room.
Or they just add some early reflections without really taking the size of the room in account. Some also add reverb.
Although reverb is a nice effect that can make things sound bigger or further away, reverb does’t really help in locating a sound. On the contrary, it is often makes it more difficult to locate a sound. We have some suggestions for the sound designer if (s)he needs to use reverb in the sound design.
Most manufacturers suggest that with their 3D plugin all audio is positioned accurately, just by inserting the plugin. That would of course be great, but reality is different.
The results of a 3D audio production will be much better if the behaviour of the plugin is tailored to the application. Some plugins offer parameters that can be tweaked, while others have a very limited set of controls.
Any sound designer knows that often what is theoretically right, doesn’t sound right. And in the end that is what counts.
In real life it might be true that a doubling of distance means an attenuation of 6dB, this might just not sound right within your application. And then you need to be able to bend the rules of physics, and the 3D plugin needs to allow you to do that. In the end, a convincing result is the only thing that counts.
The same applies to reflections, frequency behaviour, etc. Can you choose the right settings? What are the right settings?
If you have a look at the film world you’ll see that film sound designers spend a great deal of time in designing proper acoustic environments for the scenes. 3D game audiodesign shouldn’t be different. But you need the tools to be able to do this. And that means that a lot of tweaking should be possible. And the knowledge what to tweak.
If you consider the above, it might be clear why we at Timber 3D do not offer a simple plugin, but prefer to work with the design team for the best 3D audio experience.
By working with Timber 3D you choose to work directly with the developers of the plugin, which means that we can offer a level of support, integration and customisation that is simply not possible with a standard plugin.
Our plugin is a combination of HRTFs and a sophisticated early reflections engine, combined with a large set of controls.
We calculate the real reflections based on the position and angle of the listener, the position and angle of the source and the size of the room.
We support head tracking but can also work without. We’ve designed custom filters that keep the audio crisp and clear, despite the fact they run through HRTFs.
For us, that is just the starting point. If needed we can customise the plugin per project so that the 3D experience is maximised. We can help with the sound design to make sure that the sounds work well with 3D engine. We can help with the 3D audio design in general, and we can help in maximising the use of our plugin.
And we can customise our plugin in many directions. Different laws of physics, different reflections, etc. All depending on the project kind.
We can also help in creating your own HRTF set for a signature sound, or multiple HRTF to give the user the option to choose between different HRTFs.
We can also help in converting existing content into 3D audio, may it be normal stereo or surround content, both for film and for game productions.
Want to know more?
Feel free to contact me at daniel -at- timber3d.com