How libvorbis Implements Sample-Accurate Seeking
This article explains how the libvorbis reference
library achieves exact, sample-accurate seeking within compressed Vorbis
audio streams. Because Vorbis is a variable-bitrate, packet-based audio
codec usually encapsulated in the Ogg container, seeking to a precise
sample requires a coordinated effort between the container layer’s
framing metadata and the codec’s windowing engine. By utilizing Ogg
granule positions, bisection search algorithms, and decoder “preroll”
discarding, libvorbis ensures that players can jump to any
individual sample without audio drift or decoding artifacts.
The Role of the Ogg Container and Granule Positions
Vorbis audio packets do not have a fixed size or duration, meaning a
decoder cannot calculate a sample’s location in a file simply by
multiplying time by a constant bitrate. To solve this,
libvorbis relies on the Ogg container format.
The Ogg container groups Vorbis packets into “pages.” Each Ogg page header contains a 64-bit field called the granulepos (granule position). In a Vorbis stream, the granulepos represents the absolute, cumulative sample number of the last complete sample decoded by the packets ending on that specific page. This metadata acts as a precise bridge between physical byte offsets in the file and temporal sample positions.
The Seeking Process: Bisection Search
When an application requests a seek to a specific target sample,
libvorbis (typically working in tandem with the helper
library libvorbisfile) performs the following steps:
- Bisection Search: The library performs a binary search (bisection) over the physical file. It jumps to different byte offsets, locates the nearest Ogg page header, and reads its granulepos.
- Finding the Target Page: By comparing the target sample number with the granule positions of the pages, the search algorithm narrows down the byte range until it identifies the exact Ogg page that contains the target sample.
Managing Codec Delay and MDCT Overlap (Preroll)
Identifying the correct page is not enough for sample-accurate playback. Vorbis uses the Modified Discrete Cosine Transform (MDCT) for audio compression, which relies on overlapping blocks (windows). To reconstruct any given audio frame \(N\), the decoder needs the data from frame \(N-1\) to perform the “overlap-add” process. If decoding starts exactly at the target sample, the first few decoded samples will be mathematically corrupted because they lack the prior overlap data.
To prevent this, libvorbis implements a “preroll”
mechanism: * Decoding Early: The library begins the
actual decoding process at the start of the Ogg page (or sometimes one
page prior) containing the target sample, rather than at the exact
target sample itself. * Priming the MDCT: This allows
the decoder’s synthesis engine to prime its overlap-add buffers with the
preceding packet data, ensuring the math behind the transform yields
perfect audio quality.
Sample Discarding
Once the decoder is primed and producing valid audio samples, the final step to achieve absolute accuracy is sample discarding. The library decodes the initial packets from the seek point but silences or discards the resulting output samples. It continues to discard these samples until the internal sample counter matches the exact target sample requested by the user. Once the target sample index is reached, the library begins forwarding the decoded audio buffer to the audio output device.