SDIF: Sound Description Interchange Format

Effective musical application of the many sound analysis/synthesis tools
now available has been hampered by the absence of a common representation
for analyzed data. Although standard file formats abound for time-domain
audio samples, each institution has its own file formats for other kinds
of sound descriptions.

Sound descriptions supported by SDIF include


  • STFT
  • phase vocoder
  • sinusoidal tracks (freq/amp/phase envelopes)
  • Picked spectral peaks with frequency/phase/amplitude estimates
  • noise bands
  • Harmonic sinusoidal tracks
  • pitch estimates
  • cepstral coefficients
  • resonant filter coefficients
  • LPC coefficients
  • formants
  • diphones
  • spectral envelopes (sampled and parametric)
  • "note lists"
  • wavelets
  • sampled sounds

Features of SDIF


  • Open interchange format
  • Freely available code (C and C++) to read and write
  • A suite of freely available tools and utilities for manipulating SDIF
    files: viewer, editor, converters to/from other common formats, sanity
    checkers, etc.
  • Will be used by CNMAT (CAST) and IRCAM (Chant, HMM, Additive, Studio
    Online)
  • Follows existing standards: IFF chunks (parent of AIFF), IEEE floats,
    Big-endian...
  • 32-bit and 64-bit int and float data sizes
  • Mechanism for storing file creation info (date, program & options
    used, etc.)
  • 64-bit alignment of all data for easy reading
  • Straightforward data representation and semantics
  • Streamable in real-time. (Stateless representation)
  • Arbitrary time sampling of all data
  • Extensible
  • Simple and efficient to read and write
  • Flexible

SDIF Data Format in a Nutshell

SDIF is optimized both for streaming and for archiving. The SDIF format
is a sequence of time-tagged frames (IFF chunks), appearing in chronological
order, with multiple kinds of frames allowed in a single file or stream.
Thus, a single SDIF file might contain STFT results, a pitch estimate envelope,
and sinusoidal tracks from the same original sound. A library of standard
frame types defines formats for storing the common sound representations
listed above.

Frames consist of some number of 2D matrices of floating point numbers,
with each column corresponding to a parameter like frequency or amplitude
and each row representing an object like a filter, sinusoid, or noise band.

A few optional chunk types can appear at the beginning of an SDIF file
or stream, e.g., file creation info and one-time global information needed
to configure synthesis software to interpret the chunk data.

History

This effort was suggested by Xavier Rodet of IRCAM. Discussions with
scholars and vendors at the 1995 ICMC confirmed the need for this standard
in the academic and commercial communities . Adrian Freed coined the name
and invited ICMC delegates to participate in the development work. Discussions
over the months led to new requirements for the standard particularly in
the area of internet applications. Also, although originally conceived of
as a portable (and not necessarily efficient) interchange format, CNMAT
has decided to use it as the primary working format for its sound analysis
and synthesis tools. This has resulted in a standard grounded in real needs
and field tested.

CNMAT and IRCAM are nearing consensus on the final details of the format,
and expect to publish a specification in the next few months. Already there
is a library for reading and writing SDIF, and a growing set of tools for
manipulating SDIF files.

We encourage feedback from all interested uses of SDIF.

Future Work

After we standardize the SDIF formats for sound representations now in
common practice, we expect to experiment with the following topics and eventually
add them to the standard:


  • Structural info for frames
  • "Culling": taking a subset of a dataset for large-scale viewing
    or overview playback
  • Patches
  • etc...