How Libvorbis Calculates Audio Residue Curves
This article explains how the libvorbis encoder calculates and processes the residue curve during Ogg Vorbis audio compression. You will learn about the relationship between the Modified Discrete Cosine Transform (MDCT), the spectral floor, and how the remaining spectral data—known as the residue—is normalized, quantized, and prepared for bitstream encoding.
To compress audio efficiently, libvorbis separates the frequency spectrum of an audio signal into two primary components: the floor (the general spectral envelope) and the residue (the fine-grained spectral details and harmonics). By calculating and encoding these two parts separately, the encoder can discard imperceptible data according to human psychoacoustics.
Step 1: MDCT Transformation
Before any residue calculation occurs, the input time-domain audio samples are grouped into overlapping windows and transformed into the frequency domain using the Modified Discrete Cosine Transform (MDCT). This produces a set of MDCT coefficients representing the frequency spectrum of the audio frame.
Step 2: Generating the Floor Curve
The psychoacoustic model analyzes the MDCT coefficients to determine the masking thresholds of the human ear—identifying which quiet sounds are drowned out by louder, adjacent frequencies. From this analysis, libvorbis generates a “floor curve” (using either Floor Type 0 or Floor Type 1). This curve acts as a coarse approximation of the spectral energy peak envelope.
Step 3: Spectral Normalization (Residue Calculation)
The residue curve is calculated by dividing the original MDCT coefficients by the synthesized floor curve. Mathematically, for each frequency bin \(i\):
\[\text{Residue}[i] = \frac{\text{MDCT}[i]}{\text{Floor}[i]}\]
This process is called spectral normalization or flattening. By dividing the MDCT coefficients by the floor, the broad dynamic range of the spectrum is removed. The resulting residue values represent a flattened, normalized error signal centered around zero, containing only the fine structural details of the audio.
Step 4: Vector Quantization of the Residue
Once the residue curve is isolated, it is partitioned into different frequency bands. Libvorbis uses Vector Quantization (VQ) to compress this residue data.
- Partitioning: The residue spectrum is split into sub-vectors of a specific dimension (e.g., groups of 2, 4, or 8 coefficients).
- Codebook Matching: The encoder compares these sub-vectors against pre-defined multidimensional lookup tables called codebooks.
- Quantization: The encoder finds the codebook entry that most closely matches the residue sub-vector and replaces the actual audio data with the index of that entry.
During decoding, the Ogg Vorbis player reverses this process. It decodes the floor curve, decodes the residue indices back into normalized vectors using the shared codebooks, and multiplies the residue by the floor curve to reconstruct the original MDCT spectrum.