How Vorbis Seeking Uses Granule Positions
Seeking within a libvorbis audio stream relies heavily on “granule positions” to achieve fast, sample-accurate navigation. This article explains how libvorbis—the reference library for the Ogg Vorbis audio format—utilizes granule positions as absolute time counters to map byte offsets to exact audio samples, enabling efficient decoding and precise seeking.
What is a Granule Position?
In the Ogg Vorbis container format, a “granule position” (often
abbreviated as granulepos) is a 64-bit integer stored in
the header of each Ogg page. Rather than representing a simple byte
offset or a generic timestamp, the granule position represents the
exact, absolute sample number (PCM sample index) from the very beginning
of the audio stream.
Because Vorbis is a variable bitrate (VBR) format, there is no constant relationship between the size of the file in bytes and the duration of the audio. Without granule positions, a decoder would have to scan and decode the entire file from the beginning to reach a specific timestamp, which would be incredibly slow and resource-intensive.
The Mechanism of Seeking with Granule Positions
When a player requests a seek to a specific time, libvorbis uses granule positions to navigate the file through a structured, multi-step process:
1. Target Time Conversion
The playback software first converts the target seek time (in seconds) into a target sample number. For example, seeking to 10 seconds in a 44.1 kHz stream means looking for target sample number 441,000. This target sample number is the destination granule position.
2. Bisection Search (Binary Search)
Because the file is variable bitrate, the decoder cannot simply jump
to a calculated byte offset. Instead, the seeking algorithm performs a
bisection search across the physical file: * The seeker jumps to a
physical byte location (often starting midway or using a rough
estimation). * It reads the nearest Ogg page header to extract its
granulepos. * By comparing the page’s
granulepos to the target sample number, the algorithm
determines if it is too early or too late in the stream. * It adjusts
its search boundaries and jumps again, repeating the process until it
locates the exact Ogg page that contains the target sample.
3. Preroll and Overlap Resolution
Vorbis uses Modified Discrete Cosine Transform (MDCT) windowing, which means audio frames overlap. To reconstruct a specific sample, the decoder cannot just start decoding at that exact sample; it requires context from the preceding frame to resolve the overlap.
Once the correct page is located via its granule position, libvorbis starts decoding a few frames prior to the target. It uses the page’s granule position to calculate the exact offset of the decoded samples. The decoder then discards the initial “preroll” samples used to prime the MDCT engine and begins playback at the precise requested sample, achieving sample-accurate seeking.
4. Managing Chained Streams
Ogg files can be “chained,” meaning multiple independent audio streams are concatenated into a single file. Each logical stream in a chain resets its granule position. When seeking within a chained file, the seeking library uses the granule positions to identify stream boundaries, calculate the cumulative duration of preceding streams, and navigate accurately across different encoded segments.