SDIFF
The idea is to create a new standardized file format to promote :
Thank you everyone for your feedback and support for this effort. Much of your
feedback has been concerned with the form SDIFF data is to be represented in. I
believe that most of our efforts should be concentrated on the content of the
data and successfully documenting how this data can be unambiguously interpreted
(the semantics). However, we will need a file format (the syntax) and I am
sifting through many candidates. Here are some possibilities:
Many of the available file formats make claims about extensibility, but I
would like to address in SDIFF a flaw of the existing formats in this respect,
i.e. the namespace of object data types is owned and managed by a central
authority. In practice these central authorities are often inflexible or simply
disappear. I propose to resolve the name space issue by using domain names, e.g.
cnmat.berkeley.edu. These are already managed by a central authority which is
going to be around in some form for a long time. This would facilitate a
relatively open standard where new ideas about spectral descriptions could be
quickly integrated into SDIFF.
Another thing that you notice in standards is the huge size of the documents
that describe them and the subsequent documents to describe how to interpret the
standard. I would like to propose a way to short cut most of the verbiage to
reach what we are really doing this for: code we can build into our tools that
facilitates interchange. The idea is that the specification for a particular
spectral description consist of:
The C implementations would be designed for readability, simplicity and
portability at the expense, no doubt, of space and cpu efficiency.
We are examining the possibility of using URL's instead of domain names.
The URL points to documentation and a java program that can read and interpret the SDIFF
file.
Many of you have raised the question of how to store information related to
SDIFF data in the same file. Also you have expressed the need to group different
related SDIFF data types into single files. I believe we can achieve a simpler,
easier to understand and implement spec. if we adopt a one type/one file model
for SDIFF data and use a more general mechanism to group files. I am interested
in your suggestions/experiences for the grouping mechanism. Possibilities
include:
In addition to the projects mentioned above I know of:
Thanks for sending me your current formats or plans. Here is what I have
received so far that I was able to readily put in html form:
parameterFormat - Temporary general format for parameter files The parameterFormat uses a magic word and an architecture word to be understood by any machine (theoretically!): ----------------------------------------------------------- - Nature of infos - | -- Name and Content -- | - Size - ----------------------------------------------------------- | | Magic word | the 4 ASCII characters | 4 bytes | p a r m on same Endian | | or | | m r a p on the other | | | Architecture | arch, 8 ASCII chars | 8 bytes | | Header size | 4 bytes integer | 4 bytes | Size of the header in | | bytes. Must always be | | >0 and multiple of 8 | | | Type of data | Type of the data that | 4 bytes | the file contains. | | e.g : FMTE, MFCC... | | | Header begining | Format of the data, | 8 bytes | says, what the file is | | supposed to be. | | e.g. : "tmnd " for | | time, mode, numb, data. | | | | | Header remain | Anything, according to | | type. Using ASCII here | | makes easier the | | viewing of the file. | | e.g. by using 'more'. | | | Data | The order of the three | | following variables, | | depends on the header | | begining. | | | | Time of first data set | 4 bytes | float binary 4 bytes | | | | Number of data in | 4 bytes | float binary 4 bytes | | | | 'Mode' means the type | 4 bytes | of data as in SVP | | | | 1st data | Any according | | to mode | 2nd data | | ..... | | Nth data | | | | Time of second data set| 4 bytes | float binary 4 bytes | | | | Number of data in | 4 bytes | float binary 4 bytes | | | | 'Mode' means the type | 4 bytes | of data as in SVP | | | | 1st data | Any according to | | type and mode | 2nd data | | ..... | | Mth data | | | etc... | etc... | etc...
"I am currently changing the file format for SMS files a bit. Once I have it done I will put it in my www page. The current version has the following header: /* structure for header of SMS file */ typedef struct { /* fix part */ int iSmsMagic; /* magic number for SMS data file */ int iHeadBSize; /* size in bytes of header */ int nRecords; /* number of data records */ int iRecordBSize; /* size in bytes of data record */ int iFormat; /* type of data format */ int iFrameRate; /* rate in Hz of data records */ int iStochasticType; /* representation of stochastic coefficients */ int nTrajectories; /* number of trajectoires in each record */ int nStochasticCoeff; /* number of stochastic coefficients in each record */ float fAmplitude; /* average amplitude of represented sound */ float fFrequency; /* average fundamental frequency */ int iOriginalSRate; /* sampling rate of original sound */ int iBegSteadyState; /* record number of begining of steady state */ int iEndSteadyState; /* record number of end of steady state */ float fResidualPerc; /* percentage of the residual with respect to the original */ int nLoopRecords; /* number of loop records specified */ int nSpecEnvelopePoints; /* number of breakpoints in spectral envelope */ int nTextCharacters; /* number of text characters */ /* variable part */ int *pILoopRecords; /* array of record numbers of loop points */ float *pFSpectralEnvelope; /* spectral envelope of partials */ char *pChTextCharacters; /* Textual information relating to the sound */ } SMSHeader; After this header there are records of equal size. Once read one record fills up the following structure: /* structure with SMS data */ typedef struct { float *pSmsData; /* pointer to all SMS data */ int sizeData; /* size of all the data */ float *pFFreqTraj; /* frequency of sinusoids */ float *pFMagTraj; /* magnitude of sinusoids */ float *pFPhaTraj; /* phase of sinusoids */ int nTraj; /* number of sinusoids */ float *pFStocGain; /* gain of stochastic component */ float *pFStocCoeff; /* filter coefficients for stochastic component */ int nCoeff; /* number of filter coefficients */ } SMS_DATA;
The plan is to proceed as follows:
I take a broad view. The goal is to cover:
You can cover all these with one basic idea (suggested to me by Xavier
Rodet): represent the data as a sequence of frames of time tagged matrices.
So the basic structure is something like this:
In most cases the rows correspond to frequency tracks, bins, or channels.
The columns correspond to parameters such as:
Does anyone require more than 2d matrices? Do we really need a different
number of columns on each frame.
SDIFF is an interchange format so portability is very important. The
following things are proposed to achieve this:
The keen observer will notice that such a format may not be especially
space efficient since it affords more dynamic range and resolutions to
parameters than is strictly necessary. This is to simplify the implementation
and guarantee a lossless interchange, both essential features in an interchange
standard. I would expect individual users for resource constrained platforms to
develop formats optimized for space efficiency with lossy and perhaps loss-less
compression such as the one described by Andrew Horner at this year's ICMC.