Tech Blog
Networking
April 4, 2022

Happy Music Streaming with MPEG-DASH

Won So

Principal Software Engineer

With the recent Sonos S2 software release 13.4.1, we introduced new capabilities that allow Sonos speakers to play Amazon Music’s UHD (Ultra High Definition) and Dolby Atmos content. In order to unlock these high-quality lossless music streams, we introduced key technical innovations:

  • We adopted MPEG-DASH (Dynamic Adaptive Streaming over HTTP), an industry standard adaptive streaming technology that allows Sonos speakers to dynamically switch streams based on bandwidth conditions.

  • We enabled a strong DRM (Digital Rights Management) protection option by adding support for MPEG-DASH streams using the Common Encryption (CENC) standard with Widevine.

In this article, we will give a short tour of the history of media streaming over the Internet, an overview on adaptive streaming technologies and algorithms, and how Sonos leveraged the technology innovatively to make a great music streaming device.


History of Internet Media Streaming

Streaming video and music over the Internet has become an essential part of our life. As of 2021, 78% of US consumers were using a video streaming service [1] and nearly a third of US people subscribe to a Music streaming service including Amazon, Apple, Pandora, Spotify and YouTube [2]. The recent COVID-19 pandemic expedited this trend. Since the pandemic began, Internet usage has surged by up to 70% with the largest increase coming from movie and music streaming [3]. Sonos has been a leader in this space; we currently support more than 130 Music streaming services over 36 countries [4,5].

Though the idea of using the Internet for media streaming is old, it was not successful from the beginning. Success was only made possible through a long evolution of continuous technology innovations.

Early Days

There were many early technologies that tried to enable media streaming over the Internet. RealNetworks was the first company that pioneered this space [6]. It successfully streamed a live broadcast of a major league baseball game to thousands of its subscribers. A short while later, Microsoft captured the market with the Windows Media Technology products. After that, Macromedia (later acquired by Adobe Systems) popularized media streaming in a web browser using Flash Player.

In the early days, most media streaming technologies were built using proprietary protocols based on the User Datagram Protocol (UDP) [6]. Because of this, media streaming required dedicated software running on a client machine (e.g., Window Media Player), and the stream provider had to maintain specialized infrastructure for servicing streaming traffic (e.g., Window Media Server cluster).


Advent of HTTP-based Streaming

Meanwhile, Internet traffic had grown significantly due to the huge success of the World Wide Web (WWW). By the mid-2000s, the vast majority of the Internet traffic was from the web and the Hypertext Transfer Protocol (HTTP) was the dominant protocol on the Internet; HTTP became the de-facto "narrow waist" of the modern Internet [7].

In order to provide a reliable service to a massive number of users across the world, major content providers started using a content delivery network (CDN) - a geographically distributed network of web caches such as Akamai - instead of building their own web infrastructure.

With that, the idea of using HTTP for media streaming captured the majority of attention because it had strong technical advantages over using a proprietary protocol. These included:

  • By sharing the massive CDN infrastructure built for the web, using HTTP addressed potential scalability issues that would arise if media streaming required separate dedicated infrastructure for a proprietary protocol.

  • Adaptability would be purely controlled by streaming clients; this also helps with scalability because there is no extra per-client state required on a server.

  • HTTP would allow media streaming traffic to easily pass across corporate firewalls because it’s essentially the same traffic as the web traffic.

In 2007, Move Networks first introduced HTTP-based adaptive streaming [6]. In 2008, Microsoft launched Smooth Streaming. In 2009, Apple followed with HTTP Live Streaming (HLS). In 2012, MPEG-DASH (Dynamic Adaptive Streaming over HTTP) was adopted as the industry standard for Internet media streaming. Since then, it has been widely adopted and used by popular streaming services including Netflix, YouTube etc.

Adaptive Bitrate Streaming (ABR)

Basic Idea

The basic idea behind HTTP-based adaptive bitrate (ABR) streaming is as follows:

  • Media files are encoded into the different bitrates and divided into short segments; each bitrate stream is associated with a different quality of the same content.

  • The segments are provided on a web server and web caches on CDNs so that they can be downloaded by HTTP get requests. A separate metadata file, called a "manifest", describes the temporal and structural relationships between segments.

  • The software running on a client device selects which quality of streams it downloads and plays on a per-segment basis. An adaptive algorithm usually drives this decision based on the available bandwidth and resources on the client device.

As a result, an end user is able to stream a content with "very little buffering, fast start time and a good experience for both high-end and low-end connections" [8].

HTTP-based adaptive bitrate streaming in a nutshell
Figure 1: HTTP-based adaptive bitrate streaming in a nutshell

Figure 1 shows an example that the service provider offers three qualities of streams - best, medium and low - for the same media (e.g., music track). The client device starts playing with the low quality stream, then up-shifts to the medium and best quality streams, and finally down-shifts back to the medium quality stream. These changes happen dynamically based on the characteristics of the network the client device is using when streaming the content.

ABR in Music Streaming

While ABR streaming has been widely adopted in the video streaming industry, it has not been considered a necessary technology in the music streaming industry because of lower bandwidth requirements. However, in recent years, we see that more music service providers have started embracing ABR in order to offer new types of services that may require significantly higher bandwidth than traditional music streaming. These include:

  • High-Resolution (Hi-Res) audio: music streaming services are updating their catalogs with Hi-Res audio tracks to provide premium services to subscribers. For example, Amazon Music Unlimited offers Ultra High Definition (24-bit at 48KHz) and High Definition (16-bit 44.1KHz) tracks encoded with a lossless codec (if you want to read more about Hi-Res audio, refer to this article).

  • Spatial and multi-channel audio: with the advent of spatial audio technologies, music streaming services started offering spatial audio to subscribers as there are more systems that support this. For example, Amazon Music Unlimited offers tracks mixed in Dolby Atmos.

HLS and MPEG-DASH Standards

Apple’s HLS and MPEG-DASH are two most popular standards used for HTTP-based adaptive bitrate streaming today. In terms of how they work, the standards are actually quite similar. The key differences are the manifest and media file formats used for packaging a media content. As shown in Table 1, HLS uses a m3u8 playlist and MPEG TS (Transport Stream) files, while DASH uses an XML Media Presentation Description (MPD) and ISOBMFF fragmented MP4 files for manifest and media files respectively. Sonos currently supports both technologies on our products.

Comparison between HLS and DASH
Table 1: comparison between HLS and DASH

Principles of ABR Algorithms

For HTTP-based adaptive bitrate (ABR) streaming, a client device dynamically changes the quality of the media stream in real time based on available resources - mostly network bandwidth, but also CPU in some cases. In order to achieve the best results, an ABR algorithm targets the following goals:

  • Experience goal: minimize re-buffering events. For example, a user should not wait during playback.

  • Quality goal: deliver the best quality of media.

  • Stability goal: it may not be ideal to switch streams too frequently. This is usually not an issue for audio, but becomes an issue for video because each quality stream is associated with a different video resolution and a user is able to recognize a transition.

The first two goals pose inherent trade-offs in the design of an ABR algorithm. If the algorithm is too conservative, it would cause no re-buffering events, but end up delivering poor-quality contents. In other words, such an algorithm would always select the lowest bitrate stream in order to avoid buffering. If the algorithm is too aggressive however, it would deliver the best-quality content, but likely trigger re-buffering events. That is, such an algorithm would always select the highest bit rate stream in order to deliver the best quality media.

Therefore, the core design principle around an ABR algorithm is how to make a good trade-off between the conflicting user experience and best media quality goals. A good algorithm will avoid buffering events while maximizing the quality of the delivered media. There are numerous proposals to solve this challenge and they can be categorized into two approaches [9]. The first is a throughput or rate-based approach where the ABR algorithm maintains a model for throughput, estimates an available bandwidth, and adapts the bitrate based on the changes. The second is a buffer-based approach where the ABR algorithm adapts the bitrate based on the change of playback buffer occupancy.

Sonos ABR Algorithm

Sonos implementation of an ABR algorithm takes a hybrid approach to achieve the best result. An ABR state machine is running on the device firmware during streaming. It maintains the rate estimate while downloading and playing ABR content. When the rate estimate goes up or down, the algorithm up-shifts or down-shifts the bitrate. The algorithm also monitors the playback buffer during playback, and reacts based on the buffer occupancy changes. For example, if the playback buffer shrinks rapidly, the algorithm proactively down-shifts the bitrate in order to prevent a possible buffer underrun.

Figure 2 and 3 illustrate how the Sonos ABR algorithm works in a given scenario. In this scenario, the test ABR track has 5 bitrate tiers of streams; 50K, 192K, 320K, 950K and 1600Kbps. The download bandwidth is throttled at 1000Kbps. We expect that the bitrate is stabilized at 950 Kbps. The graph shows the change of three metrics over time. The blue dotted line labeled with "ProcessRate" shows the internal rate estimate in bps. The green solid line labeled with "Buffer" shows the playback buffer occupancy in percentage. The black solid line labeled with "Bitrate" shows the chosen bitrate of the streams in bps.

ABR behavior during the startup phase
Figure 2: ABR behavior during the startup phase
ABR behavior during the stabilization phase
Figure 3: ABR behavior during the stabilization phase

Figure 2 shows what happens in the initial startup phase when playback first begins. There is a rapid ramp-up of bitrates, reaching 950Kbps after some time. During this phase, "Buffer" rapidly reaches 100% and "ProcessRate" increases gradually. Figure 3 shows what happens later in the stabilization phase. At a certain point in time, the ABR algorithm tries up-shifting to 1600 Kbps. After the startup phase, "ProcessRate" tends to be lower than the actual available bandwidth. Since the download bandwidth is limited at 1000Kbps, it is not able to sustain 1600 Kbps; therefore there is a sudden plummet of the playback buffer after the up-shift. The ABR algorithm is able detect this and quickly down-shifts the bitrate to 950 Kbps so that it can prevent a playback buffer underrun. After this phase, the estimated rate ("ProcessRate") converges to the actual bandwidth close to 1000 Kbps. Further excessive switching is prevented by a hysteresis-based mechanism.

Strong DRM Protection

Supporting the industry-standard MPEG-DASH streaming format opened an opportunity for adding a state-of-the-art third-party DRM (Digital Rights Management) system into Sonos products. As part of the new Amazon Music service integration, we have added support for MPEG-DASH streams using the Common Encryption (CENC) standard [10] with Widevine. This is desirable especially for service providers like Amazon Music that offer high-bitrate audio streams encoded with a lossless codec (e.g., 24-bit FLAC Ultra HD) because the quality of streamed audio is quite close to its original studio-quality recording. These streams are more sensitive to content protection than traditional low-bitrate streams encoded with a lossy codec.

Happy music streaming with Sonos

We have adopted MPEG-DASH, an industry standard adaptive streaming technology, to allow Sonos speakers to dynamically switch streams based on bandwidth conditions. Our implementation of an adaptive streaming algorithm enables Sonos users to experience high-bitrate audio contents (UHD and ATMOS) smoothly with minimal audio dropouts.

With the addition of MPEG-DASH, we also have added support for MPEG-DASH streams using CENC with Widevine, which offers a strong DRM protection option for music service providers.

These technical innovations are provided to our listeners just in time for new premium music services featuring Hi-Res and spatial audio that require higher bandwidths and stronger content protection than traditional music streaming. We are proud of the innovations we have achieved and we hope to bring greater streaming experience to Sonos customers.

Enjoy happy music streaming with Sonos!

References

[1] Julia Stoll, SVOD service user shares in the U.S. 2015-2021, Statistica.

[2] Tim Ingham, Nearly a third of people in the US are using music streaming subscriptions, Music Business Worldwide.

[3] Mark Beech, COVID-19 Pushes Up Internet Use 70% And Streaming More Than 12%, First Figures Reveal, Forbes.

[4] Sonos, Music services on Sonos.

[5] Sonos, Sonos Service Region.

[6] Alex Zambelli, A history of media streaming and the future of connected TV, the Guardian.

[7] Lucian Popa et al., HTTP: An Evolvable Narrow Waist for the Future Internet, EECS University of California at Berkeley, Technical Report No. UCB/EECS-2012-5.

[8] Liz Gannes, The Next Big Thing in Video: Adaptive Bitrate Streaming, GigaOM.

[9] Theodoros Karagkioules et al., A Comparative Case Study of HTTP Adaptive Streaming Algorithms in Mobile Networks, Proceedings of the 27th Workshop on Network and Operating Systems Support for Digital Audio and Video, ACM.

[10] Unified Streaming, Common Encryption (CENC), Unified Streaming Documentation.

Share

More from Sonos Tech Blog:

© 2022 by Sonos. Inc.
All rights reserved. Sonos and Sonos product names are trademarks or registered trademarks of Sonos, Inc.
All other product names and services may be trademarks or service marks of their respective owners. Sonos, Inc.