SDIFF | CNMAT

SDIFF

SDIFF - Spectral Description Interchange File Format

What is SDIFF?

The idea is to create a new standardized file format to promote :

the multiplatform interchange of spectral description,

reduce the considerable duplication of effort for everybody to support
everybody else's extant data formats. (there are more than 10 groups working on
spectral descriptions for audio and computer music),

encourage the development of new tools for manipulating spectral
description,

promote the use of spectral descriptions in general.

What's new?

file format (syntax)

Thank you everyone for your feedback and support for this effort. Much of your
feedback has been concerned with the form SDIFF data is to be represented in. I
believe that most of our efforts should be concentrated on the content of the
data and successfully documenting how this data can be unambiguously interpreted
(the semantics). However, we will need a file format (the syntax) and I am
sifting through many candidates. Here are some possibilities:

bento

AIFF_C

ASCII

a completely custom one

Naming

Many of the available file formats make claims about extensibility, but I
would like to address in SDIFF a flaw of the existing formats in this respect,
i.e. the namespace of object data types is owned and managed by a central
authority. In practice these central authorities are often inflexible or simply
disappear. I propose to resolve the name space issue by using domain names, e.g.
cnmat.berkeley.edu. These are already managed by a central authority which is
going to be around in some form for a long time. This would facilitate a
relatively open standard where new ideas about spectral descriptions could be
quickly integrated into SDIFF.

Another thing that you notice in standards is the huge size of the documents
that describe them and the subsequent documents to describe how to interpret the
standard. I would like to propose a way to short cut most of the verbiage to
reach what we are really doing this for: code we can build into our tools that
facilitates interchange. The idea is that the specification for a particular
spectral description consist of:

example files in that format

commented ANSII C code that reads and interprets those files by outputting
an AIFF sound file.

The C implementations would be designed for readability, simplicity and
portability at the expense, no doubt, of space and cpu efficiency.

We are examining the possibility of using URL's instead of domain names.
The URL points to documentation and a java program that can read and interpret the SDIFF
file.

Grouping

Many of you have raised the question of how to store information related to
SDIFF data in the same file. Also you have expressed the need to group different
related SDIFF data types into single files. I believe we can achieve a simpler,
easier to understand and implement spec. if we adopt a one type/one file model
for SDIFF data and use a more general mechanism to group files. I am interested
in your suggestions/experiences for the grouping mechanism. Possibilities
include:

Bento

Directories

Who has expressed interest in this effort?

Adrian Freed, CNMAT

Xavier Rodet, IRCAM

Steven Curtin, Ensoniq

Chris Muir, Gibson/G-wiz

Tom Erbe, Calarts

Xavier Serra

Greg Sandell

Cor Jansen

Brian E.D. Kingsbury, ICSI

Who is working on spectral descriptions?

In addition to the projects mentioned above I know of:

Robin Bargar, NCSA

Macaulay/Quatieri, Lincoln Labs

Yinong Ding, TI

Brian George

Greg Sandall

Steve McAdams

Kelly Fitz, Lippold Haken, lemur

Stephen William Berkley

Andrew Horner, horner@cws.ust.hk

Phil Burk, 3d0

What existing file formats do people use?

Thanks for sending me your current formats or plans. Here is what I have
received so far that I was able to readily put in html form:

IRCAM proposal (from Xavier Rodet)

 parameterFormat  -   Temporary general format for parameter files

The parameterFormat uses a magic word and an architecture word to be understood
by any machine (theoretically!):
-----------------------------------------------------------
- Nature of infos - | --  Name and Content -- |  - Size -
-----------------------------------------------------------
                    |                         |
Magic word          |  the 4 ASCII characters |   4 bytes
                    |  p a r m on same Endian |
                    |        or               |
                    |  m r a p on the other   |
                    |                         |
Architecture        |  arch, 8 ASCII chars    |   8 bytes
                    |                         |
Header size         |  4 bytes integer        |   4 bytes
                    |  Size of the header in  |   
                    |  bytes. Must always be  |   
                    |  >0 and multiple of 8   |   
                    |                         |
Type of data        |  Type of the data that  |   4 bytes
                    |  the file contains.     |
                    |  e.g : FMTE, MFCC...    |
                    |                         |
Header begining     |  Format of the data,    |   8 bytes
                    |  says, what the file is |   
                    |  supposed to be.        |   
                    |  e.g. : "tmnd    " for  |
                    | time, mode, numb, data. |
                    |                         |
                    |                         |
Header remain       |  Anything, according to | 
                    |  type. Using ASCII here | 
                    |  makes easier the       |
                    |  viewing of the file.   |
                    |  e.g. by using 'more'.  |
                    |                         |
Data                |  The order of the three |
                    |  following variables,    | 
                    |  depends on  the header |
                    |  begining.              |
                    |                         |
                    |  Time of first data set |   4 bytes 
                    |  float binary 4 bytes   | 
                    |                         | 
                    |  Number of data in      |   4 bytes
                    |  float binary 4 bytes   | 
                    |                         | 
                    |  'Mode' means the type  |   4 bytes
                    |  of data as in SVP      | 
                    |                         | 
                    |  1st data               |   Any according
                    |                         |   to mode
                    |  2nd data               |   
                    |   .....                 |   
                    |  Nth data               |   
                    |                         |   
                    |  Time of second data set|   4 bytes 
                    |  float binary 4 bytes   | 
                    |                         | 
                    |  Number of data in      |   4 bytes
                    |  float binary 4 bytes   | 
                    |                         | 
                    |  'Mode' means the type  |   4 bytes
                    |  of data as in SVP      | 
                    |                         | 
                    |  1st data               |   Any according to
                    |                         |   type and mode
                    |  2nd data               |   
                    |   .....                 |   
                    |  Mth data               |   
                    |                         |   
         etc...     |     etc...              |     etc...

Xavier Serra:SMS

 "I am currently changing the file format for SMS files a bit. Once I have it done I will put it in my www page. The current version has the following header:

/* structure for header of SMS file */
typedef struct 

{
        /* fix part */
        int iSmsMagic;         /* magic number for SMS data file */
        int iHeadBSize;        /* size in bytes of header */
        int nRecords;            /* number of data records */
        int iRecordBSize;      /* size in bytes of data record */
        int iFormat;           /* type of data format */
        int iFrameRate;        /* rate in Hz of data records */
        int iStochasticType;   /* representation of stochastic coefficients */
        int nTrajectories;     /* number of trajectoires in each record */
        int nStochasticCoeff;  /* number of stochastic coefficients in each     
                                  record */
        float fAmplitude;      /* average amplitude of represented sound */
        float fFrequency;      /* average fundamental frequency */
        int iOriginalSRate;    /* sampling rate of original sound */
        int iBegSteadyState;   /* record number of begining of steady state */
        int iEndSteadyState;   /* record number of end of steady state */
        float fResidualPerc;   /* percentage of the residual with respect to  
the 

                                  original */
        int nLoopRecords;      /* number of loop records specified */
        int nSpecEnvelopePoints; /* number of breakpoints in spectral envelope  
*/
        int nTextCharacters;   /* number of text characters */
        /* variable part */
        int *pILoopRecords;    /* array of record numbers of loop points */
        float *pFSpectralEnvelope; /* spectral envelope of partials */
        char *pChTextCharacters; /* Textual information relating to the sound  
*/
} SMSHeader;


After this header there are records of equal size. Once read one record
fills up the following structure:

/* structure with SMS data */
typedef struct 

{
        float *pSmsData;           /* pointer to all SMS data */
        int sizeData;              /* size of all the data */
        float *pFFreqTraj;         /* frequency of sinusoids */
        float *pFMagTraj;          /* magnitude of sinusoids */
        float *pFPhaTraj;          /* phase of sinusoids */
        int nTraj;                 /* number of sinusoids */
        float *pFStocGain;         /* gain of stochastic component */
        float *pFStocCoeff;        /* filter coefficients for stochastic  
component */
        int nCoeff;                /* number of filter coefficients */
} SMS_DATA;

Greg Sandell: Sharc

I have the Lemur format as a Microsoft Word file

I have the ICSI Speech File Format (ISFF) as shar archived manual pages

How can I contribute to the SDIFF effort?

Please let me know of anybody who is working in this area who is not
accounted for above.

To make sure everybody's needs are accounted for (if not actually
satisfied) in this effort, please send me documentation or links to
documentation of your existing file formats and/or future requirements.

Sign up with CNMAT's affiliate program to offset the cost of marshalling
the required resources (contact
David Wessel, CNMAT

When will the SDIFF spec. be ready?

The plan is to proceed as follows:

Throw up the spec. of a strong candidate .

Address its known weaknesses.

Incorporate the first round of your suggestions into the spec.

Implement a reader/writer for the format and correct the spec. where it is
not implementable.

Post the implementation here for everyone to build into their tools (with
example files).

Integrate everyone's experience into a revised version.

Do a final implementation of the revised spec.

Finalize the documentation and implementation.

What do you mean by "spectral descriptions" anyway?

I take a broad view. The goal is to cover:

STFT

phase vocoder

M&Q (tracks with births and deaths)

Picked spectral Peaks and their Frequency/phase/amplitude estimates

HMM tracks

pitch estimation

cepstral coefficients

resonant filter coeffiecints

formants

diphones

spectral envelopes

"note lists"

How on earth can all this be squeezed in one file format?

You can cover all these with one basic idea (suggested to me by Xavier
Rodet): represent the data as a sequence of frames of time tagged matrices.

So the basic structure is something like this:

Header:
- size of the header
- data type
- data version
- data owner/copyright info.
- column names and units
- sample rate

Data:
- frame1: time_tag number_of_rows number_of_columns matrix_of_data
- frame2: time _tagnumber_of_rows number_of_columns matrix_of_data
- ...

In most cases the rows correspond to frequency tracks, bins, or channels.
The columns correspond to parameters such as:

phase

frequency

amplitude

bandwidth

voice/unvoice measure

filter coefficients

Does anyone require more than 2d matrices? Do we really need a different
number of columns on each frame.

How will SDIFF be supported on different platforms?

SDIFF is an interchange format so portability is very important. The
following things are proposed to achieve this:

All data is IEEE 754 floating point. Everything is 32-bit, except
perhaps time tags which should be 64-bit.

Strings are null-terminated ascii.

No assumptions about file types based on file names.

A binary representation will be specified.

The binary representation will be that implemented
by Java portable binary stream data, i.e. big-endian

The keen observer will notice that such a format may not be especially
space efficient since it affords more dynamic range and resolutions to
parameters than is strictly necessary. This is to simplify the implementation
and guarantee a lossless interchange, both essential features in an interchange
standard. I would expect individual users for resource constrained platforms to
develop formats optimized for space efficiency with lossy and perhaps loss-less
compression such as the one described by Andrew Horner at this year's ICMC.