Immersive audio you’ve never heard that could revolutionize virtual reality

Image credit: Dirac Research

Surround sound is a thing of the past. Ten years ago it might have been considered cutting edge, but with most music and videos now being watched on cell phones, the fight is on for audio… that moves.

Built around a 360-degree sphere, so-called immersive or spatial audio technology is designed by Dirac Research, DTS, Dolby and THX primarily for virtual reality (VR) headsets, but who can ignore the 2.5 billion smartphones in the world? The race is on to produce the definitive format for 3D audio.

What is immersive audio?

Designed primarily for virtual reality, but also for mobile devices, immersive audio has three parts.

The first is the canals; home theaters use a 5.1 system to handle front, left, right, rear left, rear right, and a subwoofer, and immersive audio is initially based on that same framework. The only difference is that it can now mimic an 11.1 or higher array.

Fraunhofer Sound Lab for Immersive Channels Reproduction |  Credit: Fraunhofer IIS

Fraunhofer Sound Lab for Immersive Channels Reproduction | Credit: Fraunhofer IIS

The second part of immersive audio is ambisonics.

“Ambisonics are scene-based audio elements that describe not individual sources (such as channel or object-based formats), but rather the sound field as a whole from a point in space. », Explains Julien Robilliard, product manager at Fraunhofer IIS, who invented the mp3 and AAC codecs.

Immersive sound can be produced using the Head Linked Transfer Function (HRTF), where binaural stereo microphones are placed in a mannequin’s ears and external sounds recorded to create a ‘footprint’ profile. head ”(in the future we could all get its personalized to the shape of our head and our face).

However, binaural sound is just smart stereo and is more suitable for headphones. For true 360-degree “ambisonic” audio recordings tailored to speakers, the microphones capture audio from four different positions.

The third part of immersive audio consists of audio objects.

An audio object is a mono track accompanied by metadata that specifies the exact position of that sound. “With virtual reality, you want to have the sounds that immerse you in the scene and that can be reproduced from any direction,” says Robilliard.

Why is immersive audio important?

“Sound in any immersive content experience plays an equally important – and often overlooked – role as visuals in transporting the viewer into the action,” said Canaan Rubin, director of production and content for the production company. VR and AR. Stroll.

It uses ambisonic microphones installed on the surrounding ensemble to authentically capture sound in the round. “When playing our 360 content, audio technologies such as Dolby Atmos for VR, DTS Headphones: X, and recently unveiled new version of Dirac VR all offer proprietary audio formats enhanced by HRTFs (Head Related Transfer Functions) to deliver a true 3D sound experience, ”explains Rubin.

Why is HRTF so important?

“Without it, headphone-based audio cannot accurately reproduce sound sources coming from the top, bottom, front or back of the subject, leaving your experience limited to the left-right plane.” Rubin explains. “This can happen due to the proximity of the headphone speakers to your eardrum, which negates the physical and psychological effects of hearing sound in a room.”

HRTF is essential for producing immersive sound |  Credit: Dirac Research

HRTF is essential for producing immersive sound | Credit: Dirac Research

However, there are different rendering and processing technologies that are very important for transferring immersive audio to devices – and each has its own strengths.

Dirac VR explained

Although most of us are familiar with Dolby, DTS, and THX, Swedish sound company Dirac Research is a relatively small, but rapidly growing company.

Fresh from putting his tech inside Xiaomi Mi AI smart speaker in early 2018, Dirac used the recent CMM to give TechRadar a demo of the second generation of its Dirac VR headset technology.

It emits sounds coming from all directions in a sphere, but its main characteristic is that it moves when you move your head. This is crucial because if you are wearing a VR headset you need the sound to stay in one place, which means everything in a mix changes position in real time.

This is dynamic positioning, which creates a 360-degree audio sphere where sound moves freely in all directions. It is incredibly impressive.

It can be used, for example, to create a soundstage where the band you are listening to appears to be in front of you. But when you turn your head to the right, your left ear becomes louder. If you tilt your head up, the sound moves down through the mix. It can also be used to mimic the experience of being in a movie theater.

The second generation Dirac VR offers dynamic positioning |  Credit: Dirac Research

The second generation Dirac VR offers dynamic positioning | Credit: Dirac Research

“By fixing the sound sources in the horizontal plane, virtual environments such as movie theaters can be recreated with extreme precision, because the end user and the audio sources remain in static locations,” says Lars Isaksson, Managing Director of research and commercial director of Dirac. of AR / VR.

Isaksson continues: “Our second-generation Dirac VR, however, places each user at the center of an ‘audio sphere’, allowing users to feel, for example, the sound of the wind swirling around their head or an arriving plane. and departure on a tarmac.

Most importantly, however, Dirac VR has a small CPU and memory footprint, so it performs well in small devices like phones.

“Although Dirac’s technology is less well known, it promises very efficient CPU performance given the HRTF processing and reverb engine it contains,” says Rubin.

Sound for gamers

Launched at MWC 2018, DTS Headphones: X 2.0 virtualizes stereo sound and transforms it into surround sound.

It is designed for gamers. The new version includes proximity marks and support for audio based on channels, scenes and objects.

DTS also features DTS: X Ultra, which adds support for Ambisonics and Audio objects, and can be critically listened to over speakers as well as through headphones; it is intended for VR and AR games.

“What’s unique about DTS Headphone: X 2.0 is the way we wrote the algorithms, customized the HRTF, and used our extensive library of tuning curves from over 400 pairs of headphones,” said Rachel Cruz. , Director of Product Marketing for Mobile and VR / AR at Xperi, which owns the DTS brand. “They give a competitive advantage because sometimes it’s the audio signal that tells your eyes where to look, and often you get them before a visual signal. ”

It’s also a very personalized soundstage. “DTS: X allows you to manually amplify the sound of individual objects if you have trouble hearing a given object, such as dialogue, compared to the rest of the soundstage,” explains Rubin.

Dolby Atmos for VR, MPEG-H and Cingo

Although he gets a lot of press, Dolby Atmos is technically difficult to pin down because Dolby does not make the technologies it contains public.

Although it is more geared towards traditional surround sound and his cinema, Dolby Atmos for VR also deals with spatial sound. “Atmos offers auralization and spatialization of up to 128 objects simultaneously,” explains Rubin.

Plantronics Makes Headphones Compatible with Dolby Atmos |  Credit: Plantronics

Plantronics Makes Headphones Compatible with Dolby Atmos | Credit: Plantronics

German Fraunhofer IIS, known for mp3, now has a container to manage immersive audio; MPEG-H audio. Although the “H” doesn’t stand for anything in particular, think of it as height.

“This codec supports the broadcasting of channels, audio objects and ambisonics to televisions, soundbars, as well as mobile and VR devices,” explains Julien Robilliard, product manager at Fraunhofer IIS.

MPEG-H has been used in South Korea as part of 4K terrestrial broadcasts since May 2017, and Samsung TVs for sale there can decode it. THX and Qualcomm have just demonstrated their THX Space Audio Platform using MPEG-H.

Cingo Post-Processing Technology Delivers Authentic, Lifelike 3D Soundstage Reproduction Through Headphones |  Credit: Fraunhofer IIS

Cingo Post-Processing Technology Delivers Authentic, Lifelike 3D Soundstage Reproduction Through Headphones | Credit: Fraunhofer IIS

So what happens when an MPEG-H bit stream arrives in a headset? “This is where Cingo comes in,” says Robilliard. “It’s a binaural rendering engine that tricks the brain into thinking that sounds are coming from outside the headphones. “

However, while Cingo supports the rendering of fully immersive 3D audio content with formats that add a dimension of height, it is MPEG-H that has the greatest future. “MPEG-H is our core business, and it’s the codec that allows all of these technologies – Dirac, Atmos, Cingo and DTS – to exist,” explains Robilliard.

MPEG-H is currently the only codec specified by the VR Industry Forum guidelines, but it’s not just for VR; It can take mono, stereo, binaural, 5.1, 11.1 audio signal, up to dynamic immersive audio signal to any compatible device.

While they probably won’t become mainstream until VR headsets start selling in greater numbers, immersive audio formats are only half the story, with MPEG-H destined to play a pivotal role. Said Robilliard: “If you don’t get the signals in your house, there’s no point in doing magic.

This article has been updated, after some clarification from Fraunhofer IIS.

Comments are closed.