Note: This is actually a modification to the Q3 toolchain source, rather than to the Q3 game source proper.

Purpose

Faster assembling of Q3VM bytecode.

Overview

The bulk of time spent in q3asm is in constructing the symbol table and in searching the symbol table (this becomes apparent under any sort of profiling). The original implementation of q3asm uses a linked list that is repeatedly traversed for insertion (to sort symbols by value) and searching. This method is extremely slow.

This patch utilizes various abstract data types (ADT) to speed up the construction and resolution of the symbol table by name. For sorting by symbol value, qsort() is used. Although qsort increases heap memory use (by converting 1LL to array, qsort()ing, then converting back to 1LL), the speed benefit outweighs the space cost. Memory is relatively cheap these days, anyway.

My personal development box is an Athlon/1.2GHz with 256MB RAM. The assembly stage of my mod, Project FI, as of CVS 2002 Jan 22, dropped from 68 seconds to 3 2, more than a twenty-fold decrease in time.

Usage

Ensure that you're actually using the modified q3asm during the linking stage.

The option "-v" enables verbose progress reporting. Otherwise, q3asm only reports errors.

(Hashtable implementation) An additional option, "-b" is provided to specify the size of the symbols hash table. If not specified, the default value of 2048 is used.

Limitations

(Hashtable implementation) Efficiency of the hash table is determined by the hash function used.

(Hashtable implementation) Although there is no limit to specifiable hash table size, there are certain sane ranges. Stock Q3 1.27f game source does not have any more than 11_000 symbols in any one module. Table sizes excessively larger than this is wasteful. The hash table uses about 12 bytes for every starting bucket, so 1_000_000 buckets will cause q3asm to start by allocating about 12MB of memory before even looking at a file. Reasonable values are between 1000 and 100_000; default of 2048 is used if table size is not specified.

(Balanced Binary Search Tree implementation) There is a slight and noticeable speed penalty (compared to hashtable) during insertions, but this cost is amortized across the second pass, when the tree is searched far more quickly than hashtables.

Programming Notes

The new hash function HashString() uses a modified version of the default hash function from Kazlib 1.19 (kazlib-1.19/hash.c:817, hash_fun_default), and is hereby duly credited.

As far as I can tell, sorting the symbols list isn't functionally critical; the QVM output will still run correctly. Sorting the symbols, however, maintains byte-for-byte combatibility with the original q3asm. And qsort really is quite quick. Nonetheless, if you still want that tiny extra bit of speed, you can change sort_symbols() into a no-op (like making it return immediately).

Quicksort really is quick, but a little annoying to set up and get working properly when dealing with singly-linked lists.

(Hashtable implementation) Statistics on the hash table(s) may be displayed, the symbol table's then the opcode table's. All values should nominally be as small as possible, as smaller values indicate a more-efficient hash table.

(Balanced Binary Search Tree implementation) Statistics on the the trees may appear at the end of linking. These values can't really help change anything... they're just there to look pretty.

The assembler attempts to minimize output, reporting only errors. An optional verbose mode (with the "-v" option) displays assembly progress messages and assembly statistics.

Log Samples

Output of compilation script using q3asm packaged with linux-q3sdk-1.1b provided by Loki Software, Inc: q3asm.log0 (68 seconds).

Output of compilation script using q3asm-turbo1 patched against q3asm.c provided by Q3ToolSource.exe(?): q3asm.log1 (3 seconds).

Output of compilation script using q3asm-turbo3: q3asm.log3 (2 seconds).

Output of compilation script using q3asm-turbo9: q3asm.log9 (2 seconds).


(2006.09.18 0031 UTC) Unified diff, against unmodified q3asm.c (Q3A SDK 1.32): q3asm-turbo10.diff

(2002.10.28 0045 UTC) Unified diff, against unmodified q3asm.c (Q3A SDK 1.32): q3asm-turbo9.diff

(2002.10.28 0045 UTC) Unified diff, against unmodified q3asm.c (Q3A SDK 1.32): q3asm-turbo8.diff

(2002.09.14 0448 UTC) Unified diff, against unmodified q3asm.c (Q3A SDK 1.27): q3asm-turbo7.diff

(Q3A SDK 1.27 and below) Non-Windows systems may also need the patch against common/cmdlib.c to eliminate a WIN32-ism (MS-DOS text format): cmdlib.c.diff

(Q3A SDK 1.27 and below) Makefile for GNU Make for the q3asm/ directory (Unix text format). Note that Q3A SDK 1.32 comes with a vendor-supplied (by Id) Makefile: Makefile.q3asm.txt
If you get warnings about 'getwd', this is because getwd() is prone to buffer-overruns, a common security hole. Since q3asm is not security-sensitive (compared to something like a web server), getwd() is mostly harmless. Ignore the warning.

History (ChangeLog):


Postscript: D!ABLO also has a speed-optimized assembler targetted for the Microsoft compiler at http://www.shadowflare.com/scripts/download.php?fID=14 (original forum message http://www.quake3world.com/ubb/Forum4/HTML/007332.html?)
Postpostscript (2005.10.04): With the release of Q3 engine proper under the GNU General Public License, I also grant permission to use these patches under the same GPL license. This was already permissible, but I may well be explicit about it while I have the chance.
Postpostpostscript (2006.09.17) Now that I feel confident about the tree implementation, I plan to direct further development towards ioquake3.
-- PhaethonH (PhaethonH@gmail.com)