audio/video sync in SMPEG

Sathish Vasudevaiah vsatish at
Fri Feb 2 10:49:56 EST 2001


Recently I put some efforts in trying to understand
the audio/video synchronisation in smpeg and wasn't
comfortable with what I understood. So I decided to
prepare this writeup on what I digested and place
it before the group for critique. In the process I  hope
to learn the missing pieces.

Before I start I must admit that my knowledge of
MPEG1 is what I got from the book "Video Demystified",
brief chapter on MPEG1.

Also what I write here could be wrong, very wrong

Inn the following XXXX marks the spot where I
am very confused and look forward to some guidance.


Synchronisation in smpeg:

In smpeg, a/v sync is broadly achieved using MPEG
Presentation Time Stamps (PTS).

Presentation time stamp:
" ..The optional PTS is a 33-bit number coded using three
fields, separated by marker bits. PTS indicates the
intended time of display by the decodder. The value
of PTS is the number of periods of a 90 kHZ system
clock. This field is present only if PTS_bits is
present and stream_ID not equal to private stream 2...."

 From what is seen so far, it appears that a time stamp
is calcualted for each packet and is inserted into the
MPEGstream along with the packet data. Mainly the
time stamp is derived from the PTS.


As raw audio output (afer decoding) is played back by the
SDL, SDL driver independently synchronises the audio playback
based on the expected time for one frame of audio .

typedef struct {
    int freq;        /* DSP frequency -- samples per second */
    Uint16 format;        /* Audio data format */
    Uint8  channels;    /* Number of channels: 1 mono, 2 stereo */
    Uint8  silence;        /* Audio buffer silence value (calculated) */
    Uint16 samples;        /* Audio buffer size in samples (power of 2) */
    Uint16 padding;        /* Necessary for some compile environments */
    Uint32 size;        /* Audio buffer size in bytes (calculated) */
} SDL_AudioSpec;

 From this information two values are calculated:

#define frame_ticks        (this->hidden->frame_ticks)

          - based on spec data structure , the time duration
            of one audio frame is calculated in ticks
            (nothing but milliseconds)

#define next_frame        (this->hidden->next_frame)
          - This contains the expected start of time for
            the next frame.

      frame_ticks = (float)(spec->samples*1000)/spec->freq;
      next_frame = SDL_GetTicks()+frame_ticks;

One thing that has bugged me is the  SDL_Audiospec->samples member.
How is this computed ?  XXXX

The actual synchronisation....

        #ifndef USE_BLOCKING_WRITES /* Not necessary when
                                       using blocking writes */
        /* See if we need to use timed audio synchronization */
        if ( frame_ticks ) {
            /* Use timer for general audio synchronization */
            Sint32 ticks;

            ticks = ((Sint32)(next_frame - SDL_GetTicks()))-FUDGE_TICKS;

Preparing for the next frame...

       /* If timer synchronization is enabled, set the next write frame */
       if ( frame_ticks ) {
        next_frame += frame_ticks;

- synchronisation means a wait (the assumption being that
  the incoming data rate is much faster than the rate at
  which the driver is outputting) XXXX

  If it is not underruns at the device driver level? XXXX

- FUDGE_TICKS - this is linked to RR scheduling delays
  (overheads). How to calibrate this  ? XXXX

SMPEG: Time stamp processing in MPEGaudio

The MPEG PTS info is kept in this array found in
MPEGaudio class.

         /* Timestamp sync   */
         #define N_TIMESTAMPS 5
         double timestamp[N_TIMESTAMPS];

Here timestamp[] is used as a FIFO to store and then use the timestamps.

PTS -> stream -> MPEGring -> timestamp array[]
The timestamp info is put into the MPEGRing by mpeg audio decoder
(in  Decode_MPEGaudio).
      In MPEGaudio::run, the timestamp info is read from the MPEGstream
      and  put into the MPEGring buffer in the call
      audio->ring->WriteDone(...., timestamp);

Time stamps from the MPEG ring are used to compare with the
audio->Time() (which internally uses the play_time variable) and
calculate the difference. This difference is added to the
play_time to keep it in sync with the PTS. The main purpose of
timestamp[] array seems to keep the Time() virtual method to
return the correct time of the playback.

MPEG::seekIntoStream(int position) methos is used by
MPEG:Rewind and MPEGL:Seek to implement the functionality.
The seekIntoStream uses the MPEGstream-> time() method to
get the timestamp info.

In addition, there is an usage of frags_playing and frag_time variables
as shown below. Their meanings have completely eluded me XXXX.

int Play_MPEGaudio(MPEGaudio *audio, Uint8 *stream, int len)
    /* Increment the current play time (assuming fixed frag size) */
    switch (audio->frags_playing++) {
      // Vivien: Well... the theorical way seems good to me :-)
    case 0:        /* The first audio buffer is being filled */
    case 1:        /* The first audio buffer is starting playback */
        audio->frag_time = SDL_GetTicks();
    default:    /* A buffer has completed, filling a new one */
        audio->frag_time = SDL_GetTicks();
        audio->play_time += ((double)len)/audio->rate_in_s;


Basic synchronisation strategy is to skip frames if
the playback is slow (less than the desired framerate)
or wait if it is fast.

To do this two times are important and are used used:
         - one is the actual playback time
         - desired or the specified time in the encoded

There is another twist to this , video timing is
tied to the audio timing. This is done by using
the audio time stamps in the synchronisation
calculations.  'SetTimeSource' method is used to put
a reference to audio timing information in the video module.

          void MPEG::EnableAudio(bool enabled) {

The MPEG video module appears to have three ways of knowing the time:
                 This inline function returns the current playback
                 time of the MPEGaudio module.

                  The Time() method  in base class MPEGaction
                  returns the current playback time of the video module.
                   It is a 'get' method for the attribute play_time
                   but the attribute is directly modifed in the video module

       vid_stream-> current-> show_time
                   This contains the fine-grained timing information.
                   The MPEG PTS gives the frame level timing information,
                   however the GoP timestamp gives the timing at the
                   sub-frame level (like picture level).

- Other data structures used in the synchronisation

                This can have values from -1,0,.....n. When it
                is > 0, it indicates that so many frames have to
                 be parsed but not displayed. However following
                 is a mystery to me: XXXX

                 MPEGvideo:: RenderFinal
                     /* Process all frames without displaying any */
                     _stream->_skipFrame = 1;

                 How is this relayed to _skipFrame ? It is updaed
                 only in timesync and carries its values  across
                 frames decoding.

                This contains the frame no. to which a seek can be done.

                 When a seek is done in seconds the current_frame
                 attribute will not contain the correct value.
                 This flag indicates that the current frame no.
                 should be re-calculated using GoP time code.

                   It may sound silly but I still haven't got the true
                   meaning of these variables.
                  At the beginning of the timeSync method, we have...
                  /* Update the number of frames displayed */

              XXXX Why increment both ? why do we need both ?

Here is the sync loop...

   While loop
        mpegVidRsrc( 0, mpeg->_stream, 0 );
             < parse video stream >
             if (need to skip frame)
                    look for new Picture Start Code
                    When complete frame is decoded.              
                        call  MPEGvideo::ExecuteDisplay( VidStream* 
vid_stream )
                            if( ! vid_stream->_skipFrame )
                            timeSync( vid_stream );

   end of loop

Sync logic

 'time behind'  =  Audio playback  -  Video playback
                    time                time

'time behind' range is used to compute the sync level:

               |           |          |                 |              
-----Ahead----> <-------In Sync------> <----little-----> <------- lot -----
                                          out of sync           out of sync

Based on this range, no. of frames to be skipped is computed
and this information is used in the sync.loop to skip frames.

XXXX  But won't   'time behind'/(one frametime) give the correct no.
      of frames to skip ???

- The sync. logic assumes RR scheduling for the threads. If
a different policy is in place for the threads (e.g. FIFO), the
sync breaks down.

- The strength of smpeg compared to other (open source) mpeg
decoders  seems to be that smpeg is arch. neutral while
others seem to be optimised for i386

More information about the smpeg mailing list