December 15, 2020

Trueplay Spectral Correction

Tim Sheen

Distinguished Audio System Engineer

Have you run Trueplay?

I expect if you’re reading this blog, you have, perhaps a number of times, though probably not as many times as I have. I hope you were pleased with the results.

But perhaps you weren’t pleased, or maybe you were expecting a big improvement and only got a small one. Perhaps you felt a little silly waving your phone around, and you wonder why we ask our customers to do this. Couldn’t it just be made as good as possible out of the box? In this article I hope to give you some answers to those questions.

And if you haven’t run Trueplay on all your Sonos products, I hope you will as soon as you finish reading. Beg, borrow, or steal (temporarily!) an iPhone if you don’t have one handy—I’ll get to that in a bit.

What is Trueplay?

You might be surprised to learn that even inside Sonos there’s been some friendly debate about just what Trueplay is. Of course, it’s a feature that guides you through using your iPhone or other iOS device to take an acoustic measurement in your listening area, then it uses a bunch of math to come up with adjustments for the product’s signal processing, and then it adapts the sound to the room. Auto Trueplay is an extension of Trueplay. It performs a similar function, using microphones built into the product rather than a mobile device microphone, and uses the actual music playing rather than special test tones. Auto Trueplay’s performance is limited compared to manual (‘wave-the-phone-around’) Trueplay since it can only respond to sound in the immediate vicinity of the product, but it’s better suited to a portable product. I won’t be discussing auto Trueplay in this article, but I expect to in a follow-on post at some point.

So what’s the debate about?

Well, is Trueplay just anything that makes the product sound better? That’s too broad a definition. We’ve settled on this:

"Trueplay covers a suite of technologies that optimize the sound quality or sound reproduction parameters of our speakers in response to their specific environment, setup and local conditions, and playback situation."

While Trueplay has traditionally included features that use microphones to improve the sound, Sonos’ broader definition anticipates features that take advantage of other sensors providing information about the product’s environment.

Rest assured, manual Trueplay uses the iPhone microphone only during the tuning process - while the device is being waved around. Once the final wave is done and the math is finished, the adjustments are transmitted to the products. The celebration tone plays and the process is over.

And why is manual Trueplay only offered on iOS devices? It’s not that it couldn’t work on other phones or tablets, it’s that acoustic measurements depend on reasonably well-calibrated microphones. Even iOS device models are sufficiently different from one another to require individual microphone calibration curves. We measure every new iOS device and create a Trueplay calibration curve for it. As you can imagine, this is a substantial undertaking. It’s not presently feasible to do this for the enormous range of non-iOS phones, which vary too much from device to device, sometimes even depending on the carrier they’re connected to, but we continue to investigate alternatives.

So that’s what Trueplay means to us. To help it mean more to you, let’s delve a bit deeper into acoustics and human auditory perception. We’ll largely avoid math, but since Trueplay uses a lot of math to work, there will be some of that in the discussion.

Why is Trueplay needed?

Music is made in rooms, mostly. It may be made in concert halls or clubs, places of worship, homes, and yes, sometimes not in a room.

collage showing an indoor rock concert, symphony hall, church nave, cathedral nave, indoor drummer, outdoor piano player

Why are rooms important?

Rooms are important because music tends to sound better indoors. The sound from the instruments or voices bounces off the walls, floor and ceiling, again and again, building up and fading out in a way we are used to and find comforting. The buildup of sound in a room makes the music much louder than it would be outside; the dying away gives it breadth and spaciousness.

For centuries special rooms have been built specifically for music—concert halls, houses of worship, opera houses, clubs, recording studios. Often these rooms serve multiple functions, but good sound tends to be on the architect’s brief. Over time, composers and performers have learned to take advantage of the acoustic behavior of those rooms.

And now we can record that great sound and play it back whenever and wherever we want!

Easy! Just put microphones in the right places, record the music and the reverberation together, and it should all work out just fine. If the room isn’t that great acoustically, no problem—just move the microphones up close to the performers and add artificial reverberation. Nowadays we can do a great job of simulating the acoustics of great rooms with computers and digital signal processing. On playback, if the speaker designer has done a decent job, the whole shebang should just flow seamlessly and it will sound great!

But often it doesn’t, unless you’re very, very lucky, or spend a lot of time and money adjusting your room, the location of your seating, and the position of your speakers. What prevents the recording from sounding as good in my room as the producer intended?

It comes down to three things, plus one more:

When we listen to recorded music over speakers (though not over headphones), we’re unavoidably hearing two sets of acoustics. The acoustics of your room are layered on top of the acoustics of the room where the music was recorded. That’s like putting two layers of frosting on a cake.
It’s unlikely that my speaker is located where a singer or musician would likely stand in my room. Even in a small performance room with a tiny stage the performers are normally well away from the walls, while speakers in homes are usually placed close to or against walls, where their sound is modified by the ‘local’ acoustics. Speaker designers know this, but they have to guess what the local acoustics are or provide placement instructions, and ‘voice’ the speaker appropriately. If the speaker isn’t in the location specified the results will differ from what was intended. This is especially true for speaker locations that are partially enclosed, such as in a bookcase.
The rooms we live in and listen to recorded music in aren’t much like the rooms music is typically played and recorded in. For one thing, they’re usually smaller. Low notes behave very differently in large rooms than in small ones. So, between the listening room’s local and general acoustics, the two layers of acoustic frosting are almost certainly gustatorially incompatible.

And there’s more to life than positioning speakers for optimum sound. If you’re listening on Sonos, you may be in your kitchen, your bedroom, even your bathroom—rooms where listening to music probably isn’t the primary activity. So, you may not be able to position your speakers and seating for optimum sound in all the places you want to have music.

Which of these is more like your speaker location?

pair of Sonos Play:5 speakers on a credenza

collage showing Sonos speakers in various locations in a house

How big a deal is this?

The following graph, made early in the Trueplay project, shows the room-averaged frequency response for 60 identical speakers (Sonos Play:1, all tone-related adjustments set to nominal). All these speakers were measured in Sonos employee homes and were left where they were already placed.

frequency response of Sonos Play:1 in many different home locations

This isn’t speaker-to-speaker variability. If it were, we’d be out of business! Speaker variability measured under controlled conditions is a tiny fraction of this. You can see that one speaker model can produce a range of output at 55 Hz spanning 25 dB.

The bass or treble controls can’t compensate for these differences. They affect too many other frequencies. If your speaker has a peak at 55 Hz, and you compensate by turning down the bass, you’ll lose almost all the sound at 40 Hz, and a lot of it at 80 Hz. Your As will come out OK, but you won’t get the low E, or much of the one an octave higher. And no, you can’t compensate for this with the typical multiband equalizer either—the bands just aren’t close enough together.

You probably noticed a few other things about these curves, such as that even on average, they aren’t ‘flat.’ That is, on average the bass is higher than the midrange, and there’s a rolloff in the treble above 10 kHz.

What I want to emphasize isn’t the overall shape of the graphs, but the fact that they’re all different and by huge amounts. Whether the curves should average out to flat, or if not, what should they average to, is related to the target curve. Good sounding speakers don’t measure flat in rooms. If they’re adjusted to do so, they’ll sound thin and bass-shy, compared to other speakers as well as music heard live.

As for the extreme treble, that’s partly an artifact of the measurement process, partly a consequence of some rather complex issues regarding the way speakers tend to ‘focus’ sound at high frequencies. It’s not the issue here, but along with the variability in the bass, the variability in the 5-10 kHz range certainly is.

Are the effects the same all over the room?

No. When it comes to peaks and valleys in the bass, the amount heard depends on where you’re listening from in the room. A good correction at one location probably won’t be as good at another. But room problems often depend as much or more on the speaker’s location rather than the listener’s.

If the speaker is in a kitchen, it’s probably at the back of a countertop, and there’s a good chance there are cabinets just above. Sound bounces back and forth between the counter and the bottom of the cabinets, reinforcing some upper bass notes and not others, all the way up into the midrange, where human voices lie. The result is a tubby, hollow, boomy sound no matter where you are in the room. Correcting that improves the sound quality dramatically everywhere in the room.

If a speaker is placed in a corner, especially if it’s near the floor (unlikely) or near the ceiling (common for wall-mounted), just about all the bass frequencies will be exaggerated and it will sound boomy and dull. Compensate for this and it can sound great—in fact, in the days of mono hi-fi, the corner was the recommended location for speakers, which were often so large they could hardly be put anywhere else!

A single microphone-position measurement will scramble the problems caused by the speaker location with the problems associated with that specific measurement location. That’s why corrections are best made based on measurements taken in many places in the room, and why the graph above shows room-averaged measurements.

Trueplay can help!

Before going into just how Trueplay can help, let’s look at some of the ways a dedicated audiophile who doesn’t have Trueplay might go about improving the reproduction of music in their listening room. Let’s assume you’ve placed your speakers according to the manufacturer’s recommendation, convenience and aesthetics notwithstanding.

Maybe the sound is too ‘live,’ ‘echoey,’ or ‘hollow-sounding,’ so you’ve added carpets and other sound-absorbing materials to help damp it down a bit. If the room still has what we call a ‘flutter echo,’ a sort of buzzing sound when you clap your hands, adding smaller objects such as lightly-upholstered furniture, tables lamps, and artwork, may help to bounce the sound more randomly and break up the echoes. Already the goal of good sound feels like it’s overwhelming other lifestyle considerations...and you’re probably still not entirely happy with the bass—it’s just not very even. You might still have the ‘55 Hz’ problem we showed above, maybe at some other, or several other frequencies.

There’s a non-electronic solution for that: bass traps. A bass trap is a big box with a hole in it, more formally called a ‘Helmholtz resonator.’ The size of the box and the size of the hole interact to create a resonance that absorbs sound energy over a narrow range of frequencies, and if you tune one to a frequency where your room has too much bass, like 55 Hz in the earlier example, the trap will absorb 55 Hz but leave 40 and 80 Hz unaffected. But be warned, you probably won’t get by with just one bass trap, you probably have several frequencies in the bass that demand attention.

This solution works. It’s often done in monitoring rooms of recording studios. Some audiophiles even do it at home (probably in just one room!). Aside from cost and aesthetic considerations, rooms treated this way feel a little ‘odd.’ Not like anechoic chambers, but the behavior in the bass is part of the identity of the room we’re in. If the sound of the room doesn’t reasonably match the appearance of the room we sense that something is out of the ordinary. We wind up with a room that is great for music, but not too good for ordinary living, even before taking into account all the funny boxes and in-your-face speakers.

There’s an alternative that gets you remarkably close to that result: equalization. It’s a bit of a dirty word among aficionados, but if you want even, smooth bass without a great deal of expense and inconvenience, it’s a very good solution.

The five-band equalizer that used to be in a lot of aftermarket car stereos won’t do it, nor even the 12- or 24-band equalizers popular a decade or two ago. You need a parametric equalizer (PE). They’ve been around for decades, though at one time they were very expensive.

A parametric equalizer allows you to electronically boost or cut the response in the signal path to the speaker, at a particular frequency (adjustable) with an adjustable sharpness. Like bass traps, one filter is never enough, but a PE often gives you eight or more, each independently adjustable. You’re not likely to want eight physical bass traps in your home!

A PE isn’t very expensive any more and it’s certainly less expensive than bass traps. Aficionados sometimes use both. They’ll point proudly to their two or three hulking bass traps without telling you about the equalization taking care of the rest of the problem frequencies!

So is the problem solved?

No. Good luck trying to adjust a parametric equalizer by ear, especially if you try to do it using music!

It might go pretty well at first. You’ll attack the most egregious bass bump first and it will seem to work. The next couple will be much harder. You’ll listen to some other music and decide it’s all wrong. If you go down this path, you’ll never be able to enjoy music again. You’ll constantly be going back to tweak the parameters.

To get a PE set up right you need to make measurements! You need measurements to use bass traps also, but I tacitly assumed anyone going that route would do that. Making measurements isn’t so hard nowadays, although making room-averaged measurements is tedious. It’s easy to make mistakes, requires a measurement microphone and software and a whole bunch of other…and, well no wonder not many people do it.

Enter Trueplay.

Trueplay (the spectral part of it) does it for you. It does it quickly (in a minute or so) without any special equipment, just an iOS device with the Sonos app. The parametric equalizer embedded in Sonos products has sixteen filters—just try to adjust that by ear!

So you say, it’s decades-old technology, just brought kicking and screaming (or chirping) into the twenty-first century. Yes, in a way. But the devil’s in the details.

The test tone, for example. Unless the test signal contains all the frequencies in the audible spectrum, it won’t be possible to measure at all frequencies. Since no speaker can reproduce all frequencies, the tone has to be tailored to be within the speaker’s capability. This is true even if you’re adjusting a multi-kilowatt system for an outdoor rock festival!

Lots of candidate test tones meet the full-frequency requirement, but rooms aren’t perfectly quiet, no matter how hard you try, and we want to make it easy, not hard. So the tone has to be crafted to overcome room noise efficiently, with sufficient energy at low frequencies where room noise tends to be highest, and just sufficient at high frequencies to do the job without being too unpleasant. Finally, the Trueplay test tone needs to be compatible with making measurements with a microphone which is moving around.

Why? Measurements are mathematically easier to process when fixed microphone positions are used, but the positions have to be chosen by the user, and once chosen, the microphone must really be still. This is a burden. First you have to choose the positions, then you have to make the measurement at one location using your microphone (assuming you don’t have a pile of microphones!), and then move on to the next position. This is the classic way to do it, and it’s quite a bother. It’s much easier to just walk around with the microphone while the test tone plays, letting the system’s math sort out what it hears. Of course, to get a good room-averaged measurement you have to sweep the iPhone around a good sampling of places in the room—our video is intended as a guide there. And while it may seem silly to wave the phone up and down, getting a good sampling of the sound at different heights is just as important as getting samples at different locations horizontally.

Let’s do it!

So you start up Trueplay from the Sonos app on your iOS device. After asking you to remove any case covering the microphone, and that you turn the phone over, so you’re holding it by the top (to ensure that your hand doesn’t cover the microphone), the app measures the noise level in the room. This is to help ensure that the measurement you’re about to make will succeed. Next, the app proceeds to the measurement phase.

If you haven’t run Trueplay using this device before, we’ll ask you to take a test 😀. No, not really, but we want you to watch the video if you haven’t done the process before. How you hold and wave the phone and how you move around the room will affect the results. If you’ve run Trueplay already using this device (and this install of the app), you can proceed without watching the video.

Once you tap start, the test tone begins to play. You walk around, waving the phone as shown in the video, feeling a bit silly, but in fact you’re making sophisticated acoustic measurements.

The test tone is periodic - it doesn’t have a beginning or end, except when it starts and when it stops. For a single (mono) product the period is about a third of a second. As you walk around the room, the microphone continuously picks up the sound and chops it up on the fly into individual periods, a process we call framing. Because the tone is periodic it doesn’t actually matter exactly where the cuts occur (as long as they occur exactly once per period), but for mathematical reasons which would require another article 😀 we avoid framing during the portion of the tone containing the chirp.

The Trueplay mathematical algorithm running on the phone as part of the app does some preliminary processing on each period as you walk around, to confirm that the particular chunk of captured sound has good signal-to-noise ratio (SNR), that is, the sound heard from the speaker is sufficiently louder than the sound from other sources in the room. If the chunk passes the test, the app accumulates it with the others up to that point. If not, it’s discarded. If too high a proportion of the periods are discarded, Trueplay stops to avoid wasting any more of your time and effort. The app also uses sound and the phone’s accelerometers to confirm that you’re moving around. If you stay in one place, you won’t get a useful result, so we stop Trueplay if we detect that happening. Insufficient spatial averaging is a problem with some measurement systems.

For a single product, the tone plays for 45 seconds, or more than 150 periods. If all goes well (and it often does) by the time the tone stops you’ll have measured the sound of the speaker in more than 150 places in your room! Stereo paired speakers use a test tone with a period that’s twice as long, and it plays for one minute. You don’t get quite as many measurements, but it’s still plenty. Part of the reason the tone plays that long is simply to give you enough time to walk all over the room.

Congratulations - you’ve just made a room-averaged frequency response measurement!

Once the tone stops playing, we take over all the heavy lifting, and heavy lifting it is! It’s remarkable how powerful the processors are in smartphones since the math involves dozens of Discrete Fourier Transforms (DFT), numerical peak finders, and even a special kind of curve fitter involving a process called gradient descent.

Even while you walked around, the phone was busy doing math. From each one-period-long chunk of microphone signal, which was first adjusted using the microphone calibration curve specific to the iOS model, we extracted an impulse response: the idealized response of the speaker and the room, at each location, to an infinitely large, infinitely short pulse.

An actual impulse response is a foregone conclusion—the speaker and everything around it would be vaporized! But the concept of a linear system impulse response is enormously useful. Speakers and rooms are close enough to linear systems provided you don’t go too loud. An approximation to a system’s impulse response can be computed from measurements made with non-impulsive test tones processed with a bunch of math (sometimes called deconvolution), so that’s what we do during the Trueplay walk-around-the-room, an activity we call the room dance.

An impulse response is a time-domain function. The time-domain aspect is useful during the room dance, especially for the error checking I mentioned, since it gives us information about the varying distance from the speaker(s). It’s also important for home theater calibration. And if you take the DFT of a system’s impulse response, you get its frequency response, phase vs. frequency as well as amplitude vs. frequency.

For the spectral correction, we don’t use the phase information.

This is somewhat controversial. There are room correction systems that attempt to correct for phase anomalies, and sometimes that can be beneficial. One situation in which it might be beneficial is if you’re trying to correct deficiencies in the speaker as well as effects of the room. It’s reasonable that an aftermarket room correction system might try to do that, after all, who knows what speakers it might be dealing with.

But we’re trying to compensate for the room’s acoustical behavior, not the speaker’s. Our philosophy is that the speaker should be as good as it possibly can be right out of the box. An impulse response may be a mathematical construct, but there are lots of musical sounds that are impulsive—for music it’s the spark of life. The pluck of the strings, the penetrating edge of the brass, the ting of the triangle, the thump of the drums, all these need to be reproduced accurately for the music to arrive with all its vibrancy intact. We call these portions of the musical sound transients.

It’s important for musical transients to remain transient, that is, the speaker shouldn’t stretch them out. It won’t, if its anechoic (before being affected by the room) impulse response is as compact as possible, that is, it packs its energy into the shortest amount of time as possible. The mathematical term for a system with the most compact impulse response for a given magnitude frequency response is minimum phase.

Speakers can be made very close to minimum phase, so we do that as well as we know how. And we give them a frequency response which is as wide as possible given the constraints of the specific product, tailored to make them sound as good as they possibly can when placed in a good location in a room with good acoustics.

It’ isn’t Trueplay’s job to make up for deficiencies in our speakers. It’s there to mitigate the realities of the rooms where we want to enjoy music.

Are rooms minimum phase?

No. One way nonminimum phase behavior arises inside a system is when a signal takes multiple paths from the input to the output. Inside a speaker, that’s usually one path through the woofer and one through the tweeter. We can make that very close to minimum phase with careful time alignment of the woofer and tweeter. But room acoustics are all about sound taking multiple paths of different lengths as it bounces around the room before arriving at your ears—that’s not going to be minimum phase.

Nevertheless, there are certain characteristics of room acoustics which tend to behave in a minimum phase way, more or less. Bass humps and dips are one of them. While the sound has to bounce around the room for a bass hump to build up, the building up behaves a lot like a simple bandpass filter, a minimum phase one.

And the great thing about dealing with minimum phase signal degradations is that if you correct them with a minimum phase correction filter, the corrected result is minimum phase, that’s a fancy way of saying that correcting the amplitude frequency response of such a system also corrects the phase response, automagically. The musical transients are preserved.

Correcting the room’s nonminimum phase degradations is also possible to some degree, but only in one location. When that’s done the sound tends to degrade even more than it already was in other parts of the room—even in locations quite close to the optimized one. Trueplay is all about making great sound fit with all aspects of living. Making the very best possible sound in one very constrained location (which is actually easier) at the expense of the sound in most of the rest of the room isn’t very friendly to everyday enjoyment of music.

So you’ve done a room-averaged frequency response measurement. Now what?

The room dance is done, the phone’s processor is cranking. What now?

While you actually measured a bunch of impulse responses as you walked around waving the phone, we saved only an average. It turns out there’s really no such thing as an ‘average impulse response.’ It’s different at every point and won’t average very well at all. So, each impulse response is turned into a power spectral density (PSD). These average nicely (with some additional corrections to deal with the possibility that you may have walked up close to the speaker at some point, and we don’t want to overweight those responses in the average) and the result is a pretty good representation of what you’ll hear in most places in the room.

Simple, right? Wherever in the frequency range the average PSD is too big we create a filter to cut the speaker’s frequency response by the same ratio (so they multiply out to 1). Wherever it’s too small, we boost the response. A parametric equalizer can do that.

Sort of. There are some problems:

The target. As mentioned earlier, making the response ‘flat’ (multiplying out to 1 at all frequencies), won’t result in good sound. The reasons are too complex to go into here, but each of our products is designed to sound good when optimally placed in a good room. Trueplay shouldn’t modify the sound much in that instance, though it will smooth out the bass, as if you added well-tuned bass traps. And to avoid changing the speaker’s overall sound balance, Trueplay must adjust the response to an appropriate in-room target which is created by the acoustical engineer as part of the design. Our targets are all pretty similar, but they differ slightly to best match the ‘basic sonic character’ of each product.
At any given location in a room, there are always frequencies at which multiple sound paths combine to cause perfect cancellation (this is one of the reasons not to use single point measurements). Proper smoothing helps with that. The amount of smoothing depends on frequency—less smoothing at lower frequencies, where correcting the detailed up-and-downness matters, more smoothing at higher frequencies, where cancellations happen frequently. Averaging over the room also helps, but it’s still a matter of probabilities. So, if the measured curve drops a lot at some frequency, while it may be correctly identifying a problem, it’s probably exaggerating it. And what if blindly-applied boost at some frequency overtaxes the capability of the speaker? It won’t result in damage, we protect against that, but it might degrade rather than improve the sound.
There might be a huge peak in the response at some frequency due to a strong room resonance. This probably isn’t completely bogus, but it may not be a good representation of the sound everywhere—maybe we shouldn’t cut the speaker’s response quite as much as the peak suggests.

So we apply limits to the correction, also tailored to the specific model of Sonos product you’re tuning with Trueplay. Once all this math is done, we have the desired correction.

Just a couple more steps and we’re done!

We’ve computed the correction, but its format is far too detailed. It’s in the form of the impulse response of the desired correction, sometimes called a finite impulse response (FIR) filter. While it’s possible to run this kind of filter in digital signal processing, it isn’t very practical, nor is it necessary. A sufficiently extensive parametric filter can do the job. Such a filter, in the class of infinite impulse response (IIR) filters, is computationally much simpler for the speaker to run, and also better suited for the correction, because it’s minimum phase.

But to use one, we have to adjust the center frequencies and Q values of each of the PE filters to match in aggregate, as well as possible, the frequency response of the FIR filter the initial correction calculation produces. This is where the gradient descent algorithm comes in, iteratively adjusting the coefficients (the poles and zeros) of the parametric filter to get a good match.All this math takes just a few seconds to execute on your smartphone!

When it’s done, the coefficients are transmitted to the speaker, where they become part of the overall digital signal processing it uses to deliver its great sound. The phone’s work (and yours) is done. Unlike some other products incorporating features like this, we also give you a toggle—you can turn the Trueplay equalization on and off and decide if you like it better or not (we think you will!).

If you don’t, it only takes a minute or two to do it over. Maybe a slightly different path around the room, or a little more time spent in the areas where you prefer to listen, will give a better result. If you still don’t like it, you can contact us. We’re always trying to understand the situations where it doesn’t work as well as we’d like so we can make it better.

Photo credits

Edgar Delgado rock concert indoors

Francisco Bricio symphony hall

Debby Hudson church

John Towner cathedral

Jean-Philippe Delberghe piano in home

Kyle Sorkness "Jeff Playing Drums" via photopin (license)

Nazar Yakymenko outdoor piano

Continue reading in Audio processing:

Audio processing
,
Quality Assurance
Arc Ultra Speech Enhancement: Transforming Listening Experiences with State of the Art Speech Enhancement
Read More
June 10, 2025
Audio processing
,
Machine Learning
,
Quality Assurance
Arc Ultra Speech Enhancement: Delivering Inclusive Sound Experiences
Read More
May 27, 2025