3D (binaural) audio is becoming more and more popular. Not only in games and augmented reality, but also in more traditional media like radio and film.
Why? Because more and more people are listening or watching media over headphones on their mobile devices.
And everybody agrees that normal mixes optimised for speaker playback do not translate very well to headphone listening.
Everything seems to be too close, and stereo imaging seems to happen in between the ears, instead of around the listener.
3D audio gives the listener much more sense of depth, width and distance. Most people agree that once you’ve heard the same mix in 3D over headphones, you do not want to go back to listening to stereo on your headphones again.
An interesting side effect of 3D mixing is that although the mix is optimised for headphone use, it translates very well to speakers. Especially on smaller multimedia speaker systems and laptop speakers a 3D mix gives more width and depth than a conventional stereo mix.
This presents a challenge for today’s sound designers and mixing engineers. They need to master a new medium, with it’s own rules and own do’s and dont’s. Below we’ll try to explain some main differences between recording or producing for stereo or surround, and producing for 3D.
note: this blog was written with both film and game designers in mind. However, some remarks below apply more to game sound design while others are focussed more on film mixing in 3D. Because of the interactive nature of game design some aspects are dealt with in different ways in film than in game design. (like for instance distance)
In the beginning of recorded sound there was mono... An orchestral recording was reduced from it's original width and depth to one point in the room; the speaker of the old gramophone player. Strangely enough people would still recognise the orchestra, even though all spatial information was lost.
To create the illusion of width (and depth) stereo was invented; the playback system had two speakers instead of only one.
Two new techniques became possible:
• stereo recording
Instead of one, two mikes could be used to record the orchestra. One mike would record the 'left ear', and the other mike the 'right ear'.
Several stereo recording techniques were developed, some commonly used ones are AB, XY and MS.
Depending on how loud each mike would pick up the different instruments that make up the orchestra, together with tonal differences between left and right and the reflections of the room in each mike, a stereo image was created. It seemed that the instruments didn't only come from the left and right speaker, but magically also appeared in the space between the two speakers.
• stereo mixing
Even if a sound was recorded in mono, we could play back the sound over both speakers and by varying the relative volume between the left and right speaker the listener was tricked into the idea that (s)he would hear the sound somewhere in between the two speakers.
Then engineers discovered that more depth and space could be achieved if the sound engineer would not only use relative volumes to position a mono sound in the stereo image, but would also use a slightly different version of the same sound for L and R. (variations in phase, frequency content, reverb)
When recording with multiple mikes this was automatically the case, but sound designers and studio (post)engineers in both music and film would design sounds that had different content in the L and R channel using time based or other effects (reverb, haas effect, chorus, eq's, etc.)
This way of working more or less stayed that way, even when we went to surround. Positioning sounds was based on volume and other differences in sound character between the different speakers. Differences either created in post or sound design, or by using multiple mikes. It became a standard to create or record most sounds in multichannel format, either in stereo or even in 5.1 or 7.1.
An explosion or aeroplane sound effect from an effects library will typically be a stereo file, where the left and right channel differ from each other. Most instrument samples are also produced in stereo or surround.
The above shows that when coming from traditional sound design for stereo or surround, sound designers are used to work with or create multi-channel sounds. When you start to work with 3D sound design you need to get rid of this habit because 3D audio works in a different way.
Back to mono
In 3D audio each 3D processor will place one sound source in a 3D space. Therefor a 3D sound process (should) always have a mono input.
This means that when you record a dog barking you no longer record it in stereo, but in mono. And when you design an explosion it will be in mono too. The 3D process will turn this into a stereo sound. The 3D processing will calculate the volume, phase and tonal differences between L and R (and front and back) based on the position of the sound relative to the listeners’ position, and our Timber engine also calculates the true reflections based on the position of both the position of the listener and the sound in the environment.
The width of a sound
The main thing for the 3D sound designer is to consider the width of the sound (s)he needs to produce.
A dog's bark originates from one point in space: his mouth. There is nothing wrong with recording this bark in mono. The bark only 'becomes' stereo by interacting with the acoustic environment. And that is what the 3D engine will take care of.
If the object is large, and it emits sounds from several points (like the above mentioned orchestra), then the approach is different.
Let's take an aeroplane that lands and then turns around. The engines are in the back, far away from the front wheel, and when the aeroplane turns around these sounds will rotate around the centre of the aeroplane. The first instinct of a sound designer/mixer will be to create or record this in stereo.
But for 3D this needs to be approached differently: The rear engine and the front wheel should both get a separate 3D engine, so that they can both have their own position and can rotate relative to each other.
Depending on the importance of the aeroplane and audio cues in the scene you might even need more 3D engines, maybe you'll use one for the centre and two for the other wheels as well. Obviously you'd only do this if the aeroplane plays an important part in the story, because realtime 3D processing is more 'expensive' than playing back samples in stereo or mono.
This is another change in the approach for the sound design team. Sounds should be categorised by importance, and then, also depending on the playback platform, it is decided how much (if any) 3D processing is assigned to each sound.
These decisions will also depend on the platform; a dedicated game console or PC can easily instantiate between fifty and hundredfity 3D engines, while a phone might be limited to five or ten 3D engines.
For ambience you might use a combination of stereo and 3D tracks. And especially for ambience there are some tricks to get a great 3D ambience while still only using a couple of 3D engines. But that’s another blog to be written.
Another thing sound designers are used to is to add 'distance' to the sound. Especially library sound effects often have volume already 'built in'. A typical aeroplane sample will fade in the sound from one side, pan it through the centre, and then fade it out on the other side. They even might already added some reverb to simulate distance.
When you animate a sound in 3D then distance is already catered for in the 3D process; based on the changes in distance the 3D process will automatically increase or decrease the volume and frequency behaviour. If you start working with sound effects that already contain volume differences to simulate movement, these will work against you because the 3D processing will add it's own attenuation curves based on the distance of a sound.
You basically need to 'undo' the attenuation in the sound effect, so that the 3D process can recalculate it based on the movement and distance parameters.
And even then, for distance you'll notice that the 'real life' physical attenuation is not always what works within a film or game environment.
That's why it is important that you have control over the attenuation parameters of the 3D plugin.
Our Timber 3D process defaults to the 'real life' setting (-6dB/doubling of distance) but you can adjust this in both directions to make sounds fade faster over distance, or slower.
We discovered that some 3D plugins offer a stereo input. It is often unclear how they process this input. Some might be summing the stereo input to mono which is then processed by one 3D engine. This can of course really harm the character of the sound effect if there are a lot of phase differences between the left and right channel.
Other plugins might do it more correct and allocate a 3D engine to each input channel. If they process it this way it then you're actually using two 3D engines (which doubles the processing power needed, which might be undesirable) and this only makes sense when you're able to specify the relative distance between the left and right input, the pivoting point, etc.
This is why we only offer mono input and leave it up to the sound designer how (s)he wants to deal with large objects like the aeroplane in the example.
Questions or remarks?
Feel free to contact me at daniel -at- timber3d.com