Extract Album Art from Vorbis Comments
This article explains how developers can extract embedded album art from libvorbis comment blocks, which are commonly used in Ogg Vorbis and FLAC audio files. It covers the structure of the metadata, how the image data is encoded within the comment block, and the step-by-step programming logic required to decode and save the image file.
The Storage Standard: METADATA_BLOCK_PICTURE
While older Vorbis files occasionally used a simple
COVERART tag containing Base64-encoded image data, modern
implementations use the METADATA_BLOCK_PICTURE tag. This
tag is the official Xiph.org standard and adopts the exact structure of
the FLAC picture metadata block.
Because Vorbis comments must be text-based, the raw binary of this
picture block is serialized into a single
Base64-encoded string and assigned to the
METADATA_BLOCK_PICTURE field.
Step-by-Step Extraction Process
To extract the image, a developer must perform the following programming steps:
1. Read the Vorbis Comments
Use an audio tagging library (such as TagLib, Mutagen, or the native
libvorbisfile library) to parse the audio file and locate
the Vorbis comment header. Search the key-value pairs for the key
METADATA_BLOCK_PICTURE (note that keys are case-insensitive
in Vorbis comments, though usually uppercase).
2. Decode the Base64 String
Retrieve the string value associated with
METADATA_BLOCK_PICTURE. This string contains only ASCII
characters representing Base64 data. Decode this string back into a raw
binary byte array.
3. Parse the Binary Picture Block
Once decoded, the byte array is structured according to the FLAC picture block specification. All integer values in this block are stored in big-endian (network) byte order.
To locate the actual image data, parse the fields in order:
- Picture Type (4 bytes): Indicates what the picture
represents (e.g.,
3for the front cover). - MIME Type Length (4 bytes): An integer representing the length of the MIME type string in bytes.
- MIME Type (Variable length): A UTF-8 string
representing the file format (e.g.,
image/jpegorimage/png). - Description Length (4 bytes): An integer representing the length of the description string.
- Description (Variable length): A UTF-8 string describing the image.
- Width (4 bytes): Width of the image in pixels.
- Height (4 bytes): Height of the image in pixels.
- Color Depth (4 bytes): Color depth of the image in bits per pixel.
- Color Count (4 bytes): For indexed-color images;
otherwise
0. - Picture Data Length (4 bytes): An integer representing the exact size of the binary image data in bytes.
- Picture Data (Variable length): The raw, compressed image file bytes (JPEG, PNG, etc.).
4. Extract and Save the Image Data
To retrieve the raw image file, skip the header fields and read the exact number of bytes specified by the Picture Data Length field.
For example, to calculate the byte offset where the raw image data begins: * Start offset = 4 (Picture Type) + 4 (MIME Type Length) + MIME Type Length + 4 (Description Length) + Description Length + 20 (Width, Height, Depth, Colors, and Picture Data Length fields).
Extract the bytes from this offset to the end of the file (or for the
length specified in the Picture Data Length field). Write these bytes
directly to a file disk, using the extension derived from the parsed
MIME Type (e.g., .jpg for
image/jpeg).