pyFLAC: Real-time lossless audio compression in Python
Senior Software Engineer, Advanced Technology
FLAC is the go-to compression algorithm for audio if you want to maintain a perfect reconstruction of the original data. Other audio compression techniques such as MP3 or AAC can remove perceptually redundant information in the signal. This might be OK for humans, but can distort the data from the perspective of digital signal processing algorithms.
Here in the Advanced Technology research department of Sonos, we are investigating ways in which audio can be used to infer and compensate for particular characteristics of our environment. One example of this is Trueplay, in which audio is used to capture the response of a particular room, which can subsequently be used to tune the speakers for optimized sound quality. You can read more about this in the excellent blog post from our very own Tim Sheen.
When researching and developing new features like Trueplay, we have to undergo large data collection efforts to gather an accurate picture of the many different locations one might find a Sonos speaker. In the initial stages of development, we often begin this process on prototype hardware such as Raspberry Pi, in order to prove that an idea can be realised in practice. We may also need to train and verify machine learning models on the collected data before making further commitments to development.
When collecting raw audio data, the throughput can get reasonably large as more microphones are introduced, so any reductions we can make without removing information from the signal are worthwhile. For example, if we are transmitting 16-bit audio data from 3 devices, each with 4 microphones at 44.1kHz, then the throughput is just above 8Mbps. Since a perfect reconstruction of the original data is required, FLAC is the obvious candidate to reduce the bandwidth, and can do so by around 50% in many contexts. We will present the effectiveness of the FLAC algorithm in different scenarios later in the article.
FLAC: Background and limitations
FLAC stands for Free Lossless Audio Codec. Maintained by Xiph.org, it is free and open-source, and remains the most widely supported lossless audio codec. At Sonos, supporting high-quality audio is one of our key principles, and so FLAC playback is supported across our product line.
The FLAC library itself (libFLAC) is written in C, which is great for performance and efficiency, but not so great for fast-paced research and development. We often select Python as our language of choice for internal R&D projects, as it means less time writing code, and more time focussing on the research itself.
There are many different existing Python implementations for FLAC encoding/decoding. However, these tend to operate on files, rather than real-time streams, which is no good for continuous processing. Of course, we should never reinvent the wheel; instead, we can use CFFI to expose the functionality of libFLAC in Python.
So, welcome to pyFLAC: a Python library for realtime lossless audio compression using libFLAC.
In the spirit of open-source, we are releasing pyFLAC as a free-to-use package which can be installed directly from PyPi using pip.
pip3 install pyflac
Below is a simple example of how you might use pyFLAC alongside python-sounddevice to capture raw audio data from a microphone, and then add the compressed audio to a queue for processing in a separate thread.
import queue import pyflac import sounddevice as sd class FlacAudioStream: def __init__(self): self.stream = sd.InputStream(dtype='int16', callback=self.audio_callback) self.encoder = pyflac.StreamEncoder(callback=self.encoder_callback, sample_rate=self.stream.samplerate) self.queue = queue.SimpleQueue() def audio_callback(self, indata, frames, sd_time, status): self.encoder.process(indata) def encoder_callback(self, buffer, num_bytes, num_samples, current_frame): self.queue.put(buffer) audio = FlacAudioStream() audio.stream.start()
numpy array of raw audio from
sounddevice is passed directly to the pyFLAC encoder (this is just for the purpose of illustration and should be done outside of the high priority audio callback; a more detailed example is included in the pyFLAC documentation). Once the FLAC encoder has compressed data ready, the encoder callback will be called. In this simple example, we add the data to a queue to be processed in a separate thread.
Conversely, when pulling down data from FLAC-compressed audio streams, you can run it through the pyFLAC decoder so that the original numpy array of raw audio is passed back via the callback. This makes use of the SoundFile python library to write the decoded data to a WAV file.
import queue import pyflac import soundfile as sf class FlacAudioStream: def __init__(self): self.output = None self.queue = queue.SimpleQueue() self.decoder = pyflac.StreamDecoder(callback=self.callback) def process(self): while not self.queue.empty(): self.decoder.process(self.queue.get()) def callback(self, data, sample_rate, num_channels, num_samples): if self.output is None: self.output = sf.SoundFile( 'output.wav', mode='w', channels=num_channels, samplerate=sample_rate ) self.output.write(data) audio = FlacAudioStream() audio.decoder.process()
The pyFLAC decoder accepts compressed bytes of FLAC data via the
process method. The decoder process method is non-blocking as data is processed in a background thread. In this simple example, we assume that the queue has already been populated with compressed audio data. The decoder will call the callback when it has a numpy array of raw audio samples ready. Here, we just save them to a
.wav file using
soundfile as an example, although pyFLAC does also include some helper classes for converting to/from WAV files directly as well.
The pyFLAC encoder accepts some other arguments to its constructor which are not listed above:
blocksize: allows the user to specify the number of samples to be returned in the callback (by default this is left to libFLAC to determine an appropriate size).
verify: when set to True, the encoder will pass the data back through an internal decoder to verify the original signal against the decoded signal. An exception is raised if a mismatch occurs.
compression_level: the compression level is an integer ranging from 0 to 8, which denotes the amount of compression you wish to apply. Where 0 is the fastest but least compression, 5 is the default and usually the best option, and 8 is the slowest but applies the most compression. This table gives more detail on what is happening internally between compression level settings. The implications of using each of these levels in different scenarios is shown in the next section.
The FLAC algorithm attains different levels of compression on different kinds of audio data depending on the content. A more predictable audio signal can be compressed much more than white noise can, for example.
The below graph shows the effectiveness of the FLAC algorithm when processing different types of content, at each compression level setting.
Music: Intervals - 5HTP (Stereo @ 44.1kHz)
Voice: Obama speech (Mono @ 44.1kHz)
5.1 Surround: 1917 movie trailer (6 channels @ 44.1kHz)
The compression ratio is calculated from the compressed size / original size, so a lower number means better compression. It is clear the FLAC algorithm can compress the 5.1 surround sound clip very well, probably because there is no content in the LFE and rear surround channels some of the time, so run-length encoding can be applied to compress the data in these periods of silence. The linear predictive coding stage of the algorithm does a great job at approximating the speech data, such that the errors are small enough to be efficiently compressed with Golomb-Rice coding. You can see the effectiveness of this improve as the LPC order is increased above compression level 2.
These techniques are less effective on the music track, but the FLAC algorithm has another trick up its sleeve to compress the data. In this case, the music is a stereo mix, and there are only subtle differences between the left and right channels. The FLAC algorithm can exploit these similarities using a technique called mid-side encoding. You can read more about how FLAC performs when compressing different types of music at the FLAC website.
The highest compression level yields the smallest file size at the cost of taking more time to process. The following benchmarks, showing the relationship between compression level and CPU usage, were recorded on a Raspberry Pi 4B using just one core of the Cortex-A72 processor. The CPU usage is determined by averaging the time taken to encode a block of audio, and dividing this by the period of one block. Measurements are taken for each different content type and at different block sizes.
As expected, changing the block size doesn’t really affect the relative processing time much, however the compression level does have a significant impact on the processing time. The default compression level (5) tends to be the best choice, as we see diminishing returns when selecting higher levels, which may not justify the additional processing time for typical application areas.
Since FLAC is tailored to compress audio using different bespoke techniques, we wondered how it compares to more generic compression algorithms such as GZIP. The below chart plots FLAC against GZIP and the original WAV size of the three content types.
GZIP uses a set of “Huffman” codes (where shorter codes are assigned to more common elements) to compress the data. Similarly to FLAC it also makes use of run-length encoding to assign short codes to repeated sequences, such as periods of silence. However the extra techniques in the FLAC toolkit, such as mid-side encoding and linear predictive coding alongside Golomb-Rice coding mean that much greater compression ratios can be achieved.
Ultimately the pyFLAC package has allowed us to transmit high quality audio data more efficiently in our internal research projects, reducing data rates by around 40%. The ability to apply this to raw audio streams in real-time means it can support large data collections without the need of additional post processing from disk. We hope that by releasing the package on PyPi, others may also benefit from lossless audio compression in their Python projects.
If you would like to try out pyFLAC yourself you can find more comprehensive examples in the documentation at readthedocs.org. Furthermore, if you wish to contribute to the pyFLAC project or just view the source code, then head over to github.com/sonos/pyFLAC.
Continue reading in Audio processing:
Sonos at ICASSP 2022
Read MoreJuly 28, 2022
Testing Swap - An Exploration of QA at Sonos
Read MoreMarch 3, 2022
Continue reading in Open Source: