Testing Swap - An Exploration of QA at Sonos
Senior Software Development Engineer in Test
When developing a new feature or product for the Sonos system, we take the quality of our hardware and software very seriously. As with all user interactions and experiences, any hiccup can seriously tarnish the opinion of the brand. Each interaction a user has with our system gives us a chance to delight them, but if we don’t meet our quality bar, this interaction can lead to frustration. We maintain a high bar for quality so that users consistently have delightful interactions with their Sonos system, especially when it comes to new features or products.
Our Testing Methodology
When we build a new feature for the Sonos system, we target each step of our Quality Assurance (or QA) process. We work in tandem with our Development team during the development cycle for any new feature or product, and because of this we naturally fall into a Shift Left testing practice.
Traditionally in many software development models and strategies, testing the product is pushed near the end of the development cycle. This can lead to multiple problems, especially if several issues are found with the product this late in the cycle. This traditional schedule of testing can create bottlenecks, and can even lead to delays in a product’s release. Shift Left is a testing concept used to avoid bottlenecks and delays. Instead of testing the product toward the end of development, we shift the testing left in the schedule and continuously plan, build, and run tests throughout the cycle.
When following this concept of testing, we meet with the Development and Product Management teams early on to become familiar with the new product or feature, including any user experience requirements. This gives us enough time to begin planning test cases and build out the foundation for any test automation we’ll use to validate the new product or feature. Since we work alongside the Development team, we can continuously verify the quality of the development effort and modify the expected behavior of our tests and automation if any requirements change.
Sound Swap Overview
With the release of Sonos Roam in early 2021, we introduced a new feature to the Sonos system: Sound Swap (or just Swap), which lets users move audio seamlessly to and from their Sonos Roam.
Swap sounds like a relatively simple feature, being able to pull or push music from or to your Sonos Roam with the closest product to it, but the underlying logic to achieve this is pretty complex. The Swap feature consists of two smaller flows: Room Detection and Source Delegation.
Room Detection determines what product is closest to your Sonos Roam at the time Swap is initiated, ensuring we swap audio with the desired product. This sub-flow uses our Chirp technology, ultrasonic signals that help determine location.
Source Delegation handles the actual transition of the audio stream to or from the Sonos Roam.
When testing Swap, it’s important to target each flow as well as the feature as a whole. In addition to test planning, we hit every aspect of our QA process when testing this feature. Let’s explore each of these aspects in a little more detail.
The first step in any efficient testing effort is coming up with a comprehensive test plan. The best time to create this is early in the overall feature’s development, once there’s a good understanding of the requirements. The goal of the test plan is to define what needs to be tested, how it will be tested, and what needs to be in place before testing can begin.
Some key questions to consider:
What manual and automated test cases should be run to verify functionality and catch regressions?
What hardware and testbed requirements need to be met for automated testing?
What software “test hooks” need to be implemented on the product so we can test the feature more efficiently?
What should be done to test the performance and power consumption of the feature?
What mitigations should be put in place after evaluating potential security risks?
What possible regression risks to existing features and experiences could exist once this feature is implemented on the product?
What usability and quality questions can be measured using telemetry?
What documentation and guides should be in place before handing the feature off to the Release Management and Care teams to help with customer questions?
Asking these questions while creating our test plan forced us to dig deeper into the feature’s user experience and technical requirements, especially when coming up with concrete test cases.
Once these questions were answered and the test plan was created, we reviewed it with our feature and partner teams. This is a great time to get feedback on additional items to add, as well as align with everyone on the focus of testing the feature.
For Swap, manual testing was used in three different ways throughout development. Early exploratory testing helped us find corner cases that hadn’t yet been handled by the system, and allowed us time to fix issues that would otherwise have been discovered later in other testing efforts. Each developer ticket was also manually tested with verification and other testing notes attached to it, which let us confirm that the ticket’s requirements were met, as well as having confidence no regressions were introduced as part of the work. Finally, we also performed pre- and post-merge testing (more on this later).
Manual testing has some benefits over automated testing. For one, manual testing can be much more improvisational when it comes to what’s being tested. This was especially the case in early exploratory testing of the feature, as that was an effort to find the most unhandled corner cases. Also, manual testing generally follows what an end user does when using the feature. This helped us find usability issues with the Swap flow and allowed us to make corrections and improvements until it felt just right for the end user. While continuous automated testing has certain benefits over manual testing, these are things that we could only catch and fine-tune through manual testing.
Continuous Automated Testing
On the flipside, continuous automated testing (or just automation) helps keep watch for regressions in high-risk areas identified earlier during test planning. Not only can we watch for regressions in the feature itself, but also in other areas of the Sonos system. We have a large catalog of automated regression tests running every day, continuously checking for regressions in almost every aspect of the Sonos system and features.
Our automation runs every day on Sonos products set up in multiple different “testbeds,” each simulating a different Sonos customer’s setup. Each testbed has a different set of Sonos products, that way we’re able to target specific tests to specific testbeds depending on the product requirements.
For Swap testing, we were able to use pre-existing testbeds rather than build new ones. Some automated tests for this feature use button-spoofing to simulate pressing the Roam’s Play button to trigger Swap. To achieve this, we used our automation framework to send button driver commands to trigger Swap the way an end user would when using Roam.
Branch and Merging Strategies
While in development, all Swap code was quarantined in its own feature-specific branch in our source control system. Since a lot of code churn was happening during development, we wanted to isolate and identify any possible external regressions that may be created in this feature branch first. Once the feature was more stable and the risk of external regressions was much lower, our Test team ran through a pre-merge checklist before the feature was pushed to a more centralized branch.
For large, isolated feature work, it’s a good practice to create pre-merge checklists and run through them before merging the code. This gives us greater confidence both that the feature we’re merging is functioning correctly and that we won’t create new issues in other areas of our system. The checklist should include the following:
Manual feature and regression testing is complete and no blocking issues were found.
Automated tests have been created for this feature and are passing successfully.
No other automated tests encounter blocking issues specific to the feature branch.
Unit tests and code analysis have been run and are running successfully.
No device crashes or dead-locks/watchdogs that relate to this feature were reported.
A telemetry dashboard has been created for this feature (more on this below).
Documentation on this feature’s logging behavior for debugging by others has been created.
No existing blocking issues are still unresolved.
Once the feature is merged, we also run through a shorter post-merge checklist. For this, we mainly perform the same manual testing in the central branch this code was merged into. Some items in the pre-merge checklist, like creating a telemetry dashboard, may be completed as part of this post-merge checklist, depending on time limitations.
Alpha & Beta Testing
An important step in our testing process is making this feature available to actual users and getting their feedback. After the feature has been merged to the centralized branch and has undergone pre- and post-merge testing, we use various-size pools of Alpha and Beta testers to try it out. The Beta team created surveys with input from our Test and Development teams to ensure we’re asking the right questions to get the most useful feedback from our users.
As Alpha and Beta testers use the feature in their own homes, they report issues that are then turned into bugs for our team to work on. Alpha/Beta testing helps us find issues that may be related to that user’s specific configuration which our internal testing may not have caught. Finally, we review the surveys completed by the Alpha/Beta testers as a team and make corrections and improvements needed based on the feedback.
Telemetry & KPIs
The final aspect of our QA process is relying on telemetry to show us how the feature is working outside our testing setups. During the pre-merge testing, a dashboard was set up for Swap to track some key quality metrics. One notable metric we wanted to track was the various Swap failure types. This gave us a breakdown of the different kinds of failures we were seeing in regular use, along with possible code paths to review and determine what might be going on. Using this information, we were able to prioritize our work to address and minimize these failures.
We were able to use telemetry to track the distribution of Swap failure cases users were running into. Our strategy was to collect all reported failure cases, expected and unexpected. We filtered out the expected failure cases, such as trying to swap Airplay or Bluetooth audio, so we could focus on the issues our users shouldn’t encounter.
In addition, we used telemetry to collect another key metric: the latency performance of this feature. Each Swap flow takes a certain amount of time to complete as we need to first identify the product closest to Roam before we can move audio. Measuring the time the entire flow took to complete was a Key Performance Indicator (or KPI) we wanted to actively track, as this could make or break the experience of Swap.
We created a telemetry dashboard that displayed the results of these KPIs and more in somewhat real-time. This helped us gauge the overall health of the feature, especially after it was released publicly. We still use this dashboard to measure areas that need improvement, and then to validate fixes that have been implemented and released.
Following these testing practices allowed us to ship Swap on time with Roam’s release, while also having great confidence in its quality. We follow these practices with most features we develop at Sonos to ensure that we create the best possible experience for our users.