[sdlsound] SDL_sound v2's internal mixer format...

Wed Aug 11 07:19:16 EDT 2004

> I have to wonder why they chose an audio output format that's so painful under 
> powerpc.

CoreAudio was designed for professional audio editing applications. It's
also a very good example of Apple's less-than-positive attitude towards
game development in the late 90's.

It's worth noting that PowerPCs can tear up some floating point
calculations (especially if you can take advantage of Altivec)...modern
PowerPCs are probably better at floating point processing than integer
work. It's just the conversion between float and int that sucks.

Truth be told, this isn't wildly inaccurate on x86 hardware either; AMD
FPUs absolutely smoke integer performance, especially on SSE2 codepaths,
and Intel has slowly been moving towards this truth, too. Also, the big
cost nowadays in terms of CPU performance is not really one instruction
or another; it's almost universally memory bandwidth...when you beat up
the CPU cache, as you do when converting blocks of audio data, you see
the real weakness of modern processors.

The problem isn't the format of the mixing buffer, it's having to
convert the whole buffer before feeding the API or hardware.

Others have pointed out some worst-case scenarios: iTunes, including UI
and mp3 decoding overhead, eats less than 5% of the CPU on a modern
system. We're really talking about squeezing all the performance we can
from a system, but we _can_ suffer a little on the Mac, to be honest.

Therefore, in the short term, it makes more sense to optimize for
handhelds that don't have the luxury of FPUs and gigahertz processors.
This puts Float32 as the One True Way out of the question. Hell, it puts
the notion of a One True Way at all out of the question.

>   Does the audio output hardware actually /use/ the floating point 
> numbers or does it convert it all back into 16-bit before putting it through 
> the DAC?  Maybye we can do a workaround with assembly or bit-shifts to get 
> faster int/float conversion?

Right now all known consumer-level audio hardware eats 16-bit ints, to
my knowledge, including the on-board audio chip in every Powerbook and
G5. The assumption in CoreAudio is that this limitation goes away
eventually, and the API was designed to reflect that.

In this case, we're going to be converting, we're just trying to
minimize the overhead (see the rant on Sound Manager, below).

> Actually, graphics are still integer.  OpenGL can take floats(but it doesn't 
> have to), but it still renders to integers.  The movement to floating-point 
> audio kind of already happened with formats like MP3.

DirectX9-level hardware handles floating point framebuffers, etc...at
least, it should, fwiw.

The benefit, as you nailed immediately, is that popular audio formats
(mp3 and ogg vorbis being at the forefront of this revolution) decode to
float by default. In the case of libvorbis, the ov_read() function
decodes to float and then converts the entire buffer before
returning...an expensive endeavor, especially when you don't
particularly _need_ an integer format. In this case, it's better to call
ov_read_float() and convert at mix-time, so you are both more CPU-cache
friendly and spend less time blocked in the decoder. In the case of OSX,
the SDL audio subsystem is a mess, since it's still bound to Sound
Manager. Sound Manager is a legacy nightmare on OSX. It was driven by
hardware interrupts on OS9, but layered over CoreAudio now. For the Ogg
case, you end up doing:

- Decode to float32 in Ogg,
- Convert to int16 and feed to SDL_sound,
- SDL_sound or application feeds SDL audio callback (convert or memcpy),
- SDL feeds to Sound Manager buffer, possibly with another convert or
memcpy,
- Convert to float32 and feed to CoreAudio HAL,
- Convert to int 16 and feed to hardware buffer.

Any of these are acceptable by themselves, but it really does add up. We
had OpenAL built on top of Sound Manager at one point. An Apple intern
took the exact same mixer code and had it target CoreAudio HAL directly
instead of Sound Manager...more or less this was a question of changing
what you wrote to where but not much else. UT2003 went from 25% of the
CPU time being spent in OpenAL to 5% or so. Apple's official stance is
"Sound Manager is there for compatibility, but you're insane to use it."
Obviously, this is true.

But this wasn't a discussion about MacOS so much as a discussion of
handhelds and how to make this efficient on all platforms.

So here's where I sit with SDL_sound v2:

First, we have to have multiple backends inside the mixer. There's just
no way around it. There's no good reason to piss away CPU time on a
handheld to make a nice MacOS path, and vice versa. The good news is
that this can all be hidden from the end user, even if it's a little
nasty for the library developers. So it goes; I consider that
unfortunate, but acceptable.

In this case, we need to fix SDL in two ways:
1) SDL needs to accept a float32 path, which can be propagated to
SDL_sound.
2) SDL itself needs CoreAudio support.

That basically "fixes" MacOSX.

As for SDL_sound:
1) The internal mixer tries to accomodate whatever SDL or the hardware
wants (as dictated via SDL_OpenAudio()), meaning a Float32 path when we
get the infrastructure in place, but most immediately int16.

2) I reject the mixing to int32; it's an extra conversion if we mix to a
int32 buffer, or we can mix and convert sample-by-sample (mix every
sample point of every playing sample to an int32 register, convert the
register, store, go to next sample point), but this is extremely
cache-unfriendly to keep jumping between different buffers in memory,
and it makes it difficult or impossible to use SIMD effectively. It's
better to pick a format, mix a sample to that format, mix the next one
on top of it, etc, and ignore clipping issues. It's probably best to say
that clipping isn't a wildly important concern in most games, and for
those that it is...well, hopefully we'll all move to float32 someday. :)

3) The callbacks are on their own. My current opinion on this falls
somewhere between "you shouldn't be using them without a damned good
reason" and "there's a reason it's in the 'hardcore' section of the
docs. There are ways we can make this saner for the app writer, though:

4) We can pass the callback something that tells it the format it'll be
eating.

5) Let the app force a format at init time and have the mixer or SDL
swallow the conversion overhead (with SSE/MMX/Altivec/whatever support)
if they _require_ a specific format. This is less ideal in terms of
performance, but a good tradeoff in terms of application programmer
time.

6) Offer some sort of union like Tyler discussed to make the casting
less nasty in the callback.

7) Try very hard to provide all the functionality in the mixer for which
an app would need a callback, so they don't need one at all. This is
just Good Policy in general, since it makes app development easier, and
allows us to optimize one library instead of making each app reinvent
the same SSE2 code. There are _always_ exceptions, but the rough
guideline is that the callback should only be needed for visualizing the
final mixed buffer (i.e. - the oscilloscope thingey in xmms, whatever),
and not for post-processing on that buffer. Sure, there will be apps
that need this in unusual ways, but we should minimize this for
everyone's sake.

I think that is best.

--ryan.