Arc Ultra Speech Enhancement: Transforming Listening Experiences with State of the Art Speech Enhancement
Principal Audio Researcher, Advanced Technology
In the last Sonos tech blog post, we showed how collaborating with the Royal National Institute for Deaf People (RNID) allowed us to really understand what people with hearing loss need from a speech enhancement feature. We saw that the high levels of enhancement made possible by Arc Ultra’s new Speech Enhancement were able to strike the important balance between preserving the immersive quality of content and delivering crystal clear boosted dialogue. In this post, we’ll dive deeper into how people feel about the feature, looking at the results and feedback from our beta participants.
A new generation of speech enhancement
After conducting the research with the RNID, the Sound Experience team took what we’d learned and incorporated it into the tuning of the new speech enhancement feature. The Sound Experience team are responsible for ensuring that all Sonos speakers deliver the best possible experience, so it was their job to ensure that the feature not only improved dialogue clarity, but also maintained the quality of the non-dialogue audio.
To do this, they use a broad variety of film and TV content while tuning, illustrated in the figure below:
As we see here, on the left we have content for which no speech enhancement is required - simply because there’s no dialogue, so there’s no speech to enhance! On the right, we have the opposite: cases in which there is only dialogue. In these cases, there’s also no need for enhancement, as there’s no other content masking the dialogue. But, between these two cases, we find scenarios where it can be helpful to ‘lift’ the dialogue. While traditional speech enhancement approaches often provide a static ‘lift’ (as we touched on in the first blog post), the Sound Experience team used the extracted speech to develop a dynamic approach for speech enhancement: preserving the characteristics of content when speech isn’t present, and enhancing speech when it is.
One of the key innovations introduced on Arc Ultra is the option to select from multiple enhancement levels. This came about as the Sound Experience team realised there was no one-fits-all solution for speech enhancement: people use it for a wide variety of different reasons, and so the best solution is to give listeners more control. With Arc Ultra’s new Speech Enhancement, the Sound Experience team not only improved the existing levels using the AI-powered speech extraction, but introduced a new Max level. This was introduced thanks to the feedback from the RNID studies, and is a level which wouldn’t have been possible without the additional capability afforded by the AI methods.
After months of refinement, the Sound Experience team were happy that the feature was ready – it was time to put it to the test.
Transforming listening experiences
An important part of the development life cycle is beta testing, during which we give hundreds of beta pool members access to new products and features. This is the proving ground for everything we work on, helping to ensure that we’re delivering experiences that people love. This was no less crucial for Arc Ultra’s new Speech Enhancement: we’d just reworked speech enhancement from the ground up, so a lot was at stake:
Would it provide excellent dialogue clarity?
Were we able to maintain the quality of the sound experience?
Did people find multiple levels of enhancement useful?
To answer these questions, we put out a survey to our beta pool after they’d spent several weeks living with the feature. The survey captured information about the participants’ listening/watching habits, hearing, and - most importantly - how they felt about the new speech enhancement feature.First, we’ll take a look at some information about participant demographics. A total of 215 people took part in this study. Of these, 15% - or 32 people - identified as having some form of hearing loss. This tells us that at least 15% of participants were likely to benefit from speech enhancement, but as we know from our earlier post, hearing loss isn’t the only reason for using speech enhancement.
To build a better picture of who may benefit, we also asked people whether they frequently used subtitles. As we see below, this tells us that 45% of survey respondents frequently used subtitles. This is pretty significant when you consider that only 15% identified as having some form of hearing loss. It also agrees with recent YouGov research, which showed that a significant proportion of people prefer to watch with subtitles - with over 60% of adults between 18 and 24 preferring to watch with subtitles.
Now that we know a bit about our beta participants, let’s take a look at which levels of speech enhancement they preferred:
As we see here, the vast majority of survey respondents preferred to watch with speech enhancement on, with only 15 of the 215 participants preferring to have it off. Of these, ~10% preferred the low setting - opting for a gentle nudge in dialogue, while ~57% preferred the medium setting. The remaining ~25% of survey respondents opted for the higher levels of enhancement, high - which provides a more significant boost, and max, which was designed with hearing loss in mind.
While we expected the higher levels to be used less frequently than other levels, it was surprising to see that such a significant majority of respondents preferred to keep speech enhancement on. This sent a strong message that the feature was doing its job, and whether people were using it due to hearing loss, challenging dialogue, or watching content in a noisy environment, Arc Ultra’s Speech Enhancement was helping to deliver better sound experiences.
This message was further emphasised by the fact that a staggering 97% of participants felt that the sound quality of Arc Ultra was improved with speech enhancement enabled, with over 67% giving it a 4+ out of 5 for sound quality being ‘significantly improved’:
The last statistic we’ll look at is how much the feature reduced the need for subtitles. This helps to quantify how much of an impact it’s making for the sound experience - it’s great that people are using it, but is it improving how they engage with content?
The answer to this is resoundling positive, with one third of respondents saying that they use subtitles less than before when using Arc Ultra’s Speech Enhancement. This indicates that the feature is transforming how people consume content, allowing more people to engage with dialogue directly rather than through subtitles. This is a huge positive for both audiences and content creators: more viewers are able to enjoy fully immersive experiences, relying less on subtitles and connecting with the carefully crafted content directly.
It’s great to see how the success of Arc Ultra’s Speech Enhancement can be told with numbers, but they’re only part of the story. To finish this post, we’ll take a look at some direct quotes from participants, allowing us to get a better sense for how this has impacted people personally:
“Seamlessly enhances the vocal quality on demand”
“...it feels noticeably clearer, more natural, and significantly improves overall audio quality.”
“I immediately thought of my mother, who has a hard time hearing characters speak when there is too much background noise. I could clearly tell the difference between the levels.”
“I went from never using speech enhancement to actually finding a use for it. It definitely has its time and place not only in the Sonos feature set, but mine as well.”
“I was pretty blown away by how much better it sounds now than it did before. It has definitely improved, and with the introduction of high and max, it does make a difference in some movies where dialogue needs a boost.”
“... dialogue in movies and shows came through much clearer without making the rest of the audio sound flat or unnatural. It especially helped during scenes with heavy background music or sound effects, where voices used to get lost. This update definitely elevates the overall listening experience.”
We also asked some of the RNID participants for feedback, to understand how people with hearing loss felt about the feature:
“It’s brought a lot of TV to the fore… some of the sounds I wasn’t catching before.”
“It’s like going back in time and remembering all those sounds.”
“...clear to the point of not needing the subtitles.”
“The background sounds were still there, but the dialogue was much easier to listen to.”
This feedback didn’t just help to verify that the enhancement was doing its job - it was really touching to see that we had created a feature which, for some people, allowed them to experience sound that they simply weren’t able to experience before. It goes without saying that Arc Ultra’s new Speech Enhancement passed the test, and has since become available to all Arc Ultra customers. We’re proud to see that the feature is improving the sound experience for a broad variety of listeners, and hope to see it continue to deliver lasting impact on Arc Ultra and future soundbars.
Roll the credits!
Achieving Sonos’ first real-time AI audio feature was only possible thanks to the contributions of many researchers and engineers. To close out this series of Tech Blog posts, we’d like to thank the following:
RNID Researchers
Lauren Ward, who led the research from the RNID side, and Alastair Moore, who helped to design and run the listening tests.
Sonos Sound Experience Team
Paul Peace, who led the tuning of the feature, Harry Jones, who led soundboard collaborations and assisted with listening tests, and Bob Dizon, who architected the feature.
Sonos Voice Control team
Mathieu Poumeyrol, lead developer on the Tract inference engine, and Julien Balian, who were instrumental in ensuring the AI model could be deployed on Arc Ultra’s hardware.
Thanks also to Clement Doire, Matthias Leimeister, and Adrien Gosse, who developed some of the key tools and contributed to many valuable conversations.
Sonos Audio Team
Huge thanks to Phil Knock, Govind Jeyaram, Madhur Murli and other audio software developers, without whom this would not have been possible.
Sonos App and UX Teams
Dinu Chiriac, who was responsible for implementing the Speech Enhancement UI. Thanks also to Louise Whitaker and Katie Toohil for their input on user experience and content strategy.
Sonos Beta and Customer Experience TeamsThanks to Stacy Archibald, who ran Beta testing for the feature, and to Ben Schulsinger and Jack Harwich on our Customer Experience team.
Sonos Marketing, PR, and Communications Teams
Thanks to Will Fielder, Tanya Elm, Shakira Payne, Lindsey Carver, Katie Loeffler, Luana Mettel, and others for helping to spread the word about the new Speech Enhancement feature.
Sonos Researchers and Research Engineers
This work would not have been possible without the initial exploration undertaken by Chris Pike and Adib Mehrabi, or without the many contributions of Joe Todd, who led research engineering and technology transfer on the project.
Thanks also to Yesenia Lacouture, Orchisama Das, and Patrick McPherson.
Last, but not least, thanks to program and product management past and present: Matt Spexarth, Greg McAllister, and Ryan Armentrout.
This work was led by Matt Benatan and James Nesfield.