Here are my rough notes on an extension for audio capture through OpenAL. After writing this up, Joe pointed out to me that there is already an extension in the Linux implementation (and it even used the same name!), but it lacks a few features this proposal offers: notably, there isn't a means to enumerate capture devices, capture from multiple devices at once, and query for available data without an explicit read. To my knowledge, that extension is only used by the Linux version of Tribes 2 at this time, and is undocumented beyond the implementation's source code, so similarities aside, I don't consider this a huge reinventing of the wheel.
I'm most interested in the following comments:
- Thoughts on sections labelled "RFC". - Notes like "that isn't feasible in a given existing implementation". - Notes like "that would be awkward for future hardware/implementations that are likely to appear". - Notes like "that would suck at the application level". - Thoughts on a name other than AL_EXT_capture (which is used below).
This is just a request for comments at this point. As I have an immediate need for capture capabilities for ut2004, if we're still formalizing this document, we'll ship with an extension called AL_EXT_sdfkjsdkfjsdkf that fills my immediate needs and replace it later. :)
(RFC: The name is in use, need something cooler.)
AL_EXT_capture is used to record audio data in real time from hardware that accepts input. This is exposed as an AL extension for two reasons:
- Portability. It doesn't make sense for application developers to write platform-dependent recording code for every target when they chose OpenAL as an abstraction over system audio APIs in the first place.
- Functionality. It's possible that the AL implementation may conflict with system APIs and hardware in ways unknown to the application developer, and thus by supporting audio capture in the implementation itself, conflicts can be managed more effectively and efficiently, if not eliminated altogether.
First, basic usage without explanation or error checks:
static ALbyte buf[SAMPS];
if (!alcIsExtensionPresent(NULL, "AL_EXT_capture")) return(NULL);
ALCdevice *dev = alcCaptureOpenDevice(NULL, 11025, AL_FORMAT_MONO8, SAMPS); alcCaptureStart(dev);
ALint samples = 0; while (samples < SAMPS) alcGetInteger(dev, ALC_CAPTURE_SAMPLES, &samples);
alcCaptureSamples(dev, buf, sizeof (buf)); alcCaptureStop(dev); alcCaptureCloseDevice(dev); return(buf); // buf has SAMPS samples of MONO8 audio at 11025.}
To explain verbosely: - You check if alcIsExtensionPresent(NULL, "ALC_EXT_capture") is AL_TRUE. If so, then the entry points exist, and can be found with alcGetProcAddress(). If the extension isn't present, this AL implementation has no audio capturing facilities and/or doesn't export the entry points you need. Checking for this extension per-device doesn't make sense, since the next step is usually to enumerate available recording devices (or open the default one without a care for what the hardware really is). (RFC: I'm open to capture devices returning AL_TRUE and regular AL output devices returning AL_FALSE, if a device handle is specified, but the primary use is going to be in determining if the entry points exist at all).
- You can see what devices are available for recording with the device enumeration extension that is still in proposal...the only difference is that instead of querying ALC_DEVICE_SPECIFIER and ALC_DEFAULT_DEVICE_SPECIFIER, you'd use ALC_CAPTURE_DEVICE_SPECIFIER and ALC_CAPTURE_DEFAULT_DEVICE_SPECIFIER (RFC: shorter names welcome). Reported devices will only be capture devices. Capture devices are not reported with the regular output devices through the normal device enumeration (see notes on "half-duplex", below).
- You open the capture device with alcCaptureOpenDevice(). This is handed a device specifier string, or NULL for the "default" capture device at the implementation's discretion, a sample rate, a data format, and the minimum samples you need the device to buffer at a time (likely, this is twice what you plan to process, so you can read half while the other half of the buffer is filling without dropping samples). Returns NULL if device could not be opened for whatever reason. The implementation should try to accomodate the application's format request, and convert internally where possible. (RFC: Perhaps a way to query capabilities? Or treat these format details as hints and query for what you really got? What happens when the user specifies a data format based on an extension...surely we don't expect the implementation to record to Ogg Vorbis. :) ). At the implementation's discretion, you may open and make use of multiple capture devices simultaneously.
- Recording does not begin when you successfully open the device. You must call alcCaptureStart(devHandle) to begin capturing samples from the hardware. The implementation is encouraged to optimize the situation where the device is open but not started by dropping samples or not "listening" for them at all by disabling interrupt handlers, etc. (Re)starting a device doesn't guarantee the state of previously captured samples (RFC: but see the notes on half-duplex operation, below).
- RFC: Worth adding queries to see if a device is in the stop/start state?
- Once recording starts, the device's internal buffer will begin to fill (RFC: Should we allow sync vs async processing?). If the internal buffer is totally filled, it will start replacing the oldest samples. You can see how many samples have accumulated with alcGetInteger(devHandle, ALC_CAPTURE_SAMPLES). Most applications will want to check this regularly until enough samples have built up to merit processing by the application. This should be a "fast" call.
- When you want to move samples from the internal buffer to the application's address space, you use alcCaptureSamples(devHandle, buf, bufsize). Up to bufsize octets will be stored in buf, assuming that many are currently available, starting with the oldest unread sample. This is a "slow" call, since it involves at least a memcpy(), if not a trip across the PCI bus...so use ALC_CAPTURE_SAMPLES until you have enough data to merit the copy. Once the data is retrieved via this call, it can be processed by the application, become buffer data for 3D spatialization and playback in an output device's context, etc. (RFC: Everywhere else in this extension refers to buffer sizes in terms of numbers of samples...my concern is programmer screwups leading to buffer overflows that are avoided by passing sizeof (buf) here...but I'm flexible on what works best).
- If you don't care about recording at a given point (a cut scene, level loading, etc), you should stop capturing samples via the alcCaptureStop (devHandle) call. The implementation is encouraged to optimize the case of a stopped capture device.
- When you are done, you close the device with alcCaptureCloseDevice (devHandle). (RFC: Could we recycle the alcCloseDevice() entry point? We need a seperate Open entry point to prevent opening output devices and specify recording parameters, but there's little reason to seperate the close functionality beyond API consistency). The device is explicitly stopped when closed.
- All entry points, with the exception of alcCaptureOpenDevice(), return ALvoid. All entry points set error state obtainable with alcGetError().
Half-duplex and duplicate devices: Devices listed in the capture enumeration may be on the same physical hardware as one uses for output, but they will still be exposed as separate device objects. Capture devices are never allowed to do output, and output devices are never allowed to capture. Trying to interchange device handles is an error condition. As such, a sound card with a line-in and a microphone jack may expose two capture devices and one output device. Apple's iSight camera might expose a single capture device for the microphone and no output devices. If the card supports "full-duplex", these devices may be used at the same time. If a card is so-called "half-duplex", then opening its capture and output devices may be an error condition (RFC: Perhaps there could be ways around this by manipulating a context's suspend/process state in sync with the capture device's start/stop state? Perhaps device enumerations should just hide logical devices when the hardware is in use, to prevent opening it at all? Is it even worth worrying about it in this day and age?).
That's all; comments welcome.
--ryan.