How Libvorbis Encodes Silent Audio to Reduce File Size
This article explains the specific mechanisms the libvorbis encoder uses to compress silent and near-silent passages in audio files. By utilizing psychoacoustic modeling, spectral floor simplification, and efficient vector quantization, libvorbis drastically reduces the bitrate during silent periods, ensuring that empty space consumes virtually no storage.
The Role of Variable Bitrate (VBR)
Libvorbis is inherently a variable bitrate (VBR) codec. Instead of allocating a fixed number of bits to every second of audio, it dynamically adjusts bit distribution based on the complexity of the signal. During silent passages, the audio lacks wave patterns and frequency details. The encoder detects this lack of activity and scales the bitrate down to a bare minimum—often below 1 kbps—preventing the waste of storage space on empty data.
Frequency Analysis and MDCT
The encoding process begins by converting time-domain audio samples into the frequency domain using the Modified Discrete Cosine Transform (MDCT). When the input audio is silent, the resulting MDCT coefficients (which represent the energy at different frequencies) are either exactly zero or represent negligible noise values. The encoder’s psychoacoustic model analyzes these coefficients against the absolute threshold of hearing. Any noise that falls below this threshold is discarded and treated as absolute silence.
Floor Curve Simplification
Vorbis represents audio spectral data using two main components: the “floor” (the general shape or envelope of the audio spectrum) and the “residue” (the detailed fine structure left over). During silence, the encoder simplifies the spectral floor into a flat line at the lowest possible energy level. This flat floor requires only a few bits of metadata to define, compared to the complex curves needed for active music or speech.
Residue Coding and Vector Quantization
The remaining data, or residue, represents the difference between the original signal and the floor. In silent passages, the residue consists entirely of zeros. Libvorbis groups these residue coefficients into vectors and encodes them using Vector Quantization (VQ) codebooks.
To minimize file size, the encoder utilizes specific codebooks optimized for zero-value vectors. Instead of writing out individual zeros for every frequency bin, libvorbis uses highly efficient “null” codes. These codes instruct the decoder to fill large blocks of the audio frame with silence using just a single, tiny data marker.
Frame Size Selection
Vorbis uses variable block sizes (typically ranging from 64 to 8192 samples) to analyze audio. When transitioning from sound to silence, the encoder switches to larger block sizes. Larger blocks allow the encoder to group long stretches of silence together, maximizing the efficiency of the zero-run coding and reducing the package overhead of frame headers.