How libvorbis Implements Sample-Accurate Seeking

This article explains how the libvorbis reference library achieves exact, sample-accurate seeking within compressed Vorbis audio streams. Because Vorbis is a variable-bitrate, packet-based audio codec usually encapsulated in the Ogg container, seeking to a precise sample requires a coordinated effort between the container layer’s framing metadata and the codec’s windowing engine. By utilizing Ogg granule positions, bisection search algorithms, and decoder “preroll” discarding, libvorbis ensures that players can jump to any individual sample without audio drift or decoding artifacts.

The Role of the Ogg Container and Granule Positions

Vorbis audio packets do not have a fixed size or duration, meaning a decoder cannot calculate a sample’s location in a file simply by multiplying time by a constant bitrate. To solve this, libvorbis relies on the Ogg container format.

The Ogg container groups Vorbis packets into “pages.” Each Ogg page header contains a 64-bit field called the granulepos (granule position). In a Vorbis stream, the granulepos represents the absolute, cumulative sample number of the last complete sample decoded by the packets ending on that specific page. This metadata acts as a precise bridge between physical byte offsets in the file and temporal sample positions.

When an application requests a seek to a specific target sample, libvorbis (typically working in tandem with the helper library libvorbisfile) performs the following steps:

  1. Bisection Search: The library performs a binary search (bisection) over the physical file. It jumps to different byte offsets, locates the nearest Ogg page header, and reads its granulepos.
  2. Finding the Target Page: By comparing the target sample number with the granule positions of the pages, the search algorithm narrows down the byte range until it identifies the exact Ogg page that contains the target sample.

Managing Codec Delay and MDCT Overlap (Preroll)

Identifying the correct page is not enough for sample-accurate playback. Vorbis uses the Modified Discrete Cosine Transform (MDCT) for audio compression, which relies on overlapping blocks (windows). To reconstruct any given audio frame \(N\), the decoder needs the data from frame \(N-1\) to perform the “overlap-add” process. If decoding starts exactly at the target sample, the first few decoded samples will be mathematically corrupted because they lack the prior overlap data.

To prevent this, libvorbis implements a “preroll” mechanism: * Decoding Early: The library begins the actual decoding process at the start of the Ogg page (or sometimes one page prior) containing the target sample, rather than at the exact target sample itself. * Priming the MDCT: This allows the decoder’s synthesis engine to prime its overlap-add buffers with the preceding packet data, ensuring the math behind the transform yields perfect audio quality.

Sample Discarding

Once the decoder is primed and producing valid audio samples, the final step to achieve absolute accuracy is sample discarding. The library decodes the initial packets from the seek point but silences or discards the resulting output samples. It continues to discard these samples until the internal sample counter matches the exact target sample requested by the user. Once the target sample index is reached, the library begins forwarding the decoded audio buffer to the audio output device.