Quake 3 Virtual Machine (Q3VM) Specifications

Synopsis

A description of the Q3VM architecture and operations.

Introduction

The Q3VM is a virtual machine used by Quake III to run the game module. The VM is a sort of sandbox to limit the damage a rogue or malicious QVM program can wreak. Though not perfect, it certainly is much safer than full access to native machine language, which could otherwise more easily allow the spread of viruses or the corruption of system resources. In addition, Q3VM programs, similar to JVM's, are write-once run-on-many-platforms (at least for which a Q3VM interpreter or compiler exists).

The instruction set for the Q3VM is derived from the bytecode interpreter target of LCC, with minor differences. The VM heavily relies on stack operations (stack-based machine).

Memory

Memory bus of little-endian, unsigned 32 bits, flat memory space, zero-based addressing, octet-accessible.

Memory is accessible in words of 4 octets (32 bits), 2 octets (16 bits), and single octect (8 bits).

Code and data occupy distinct address spaces. Thus code address 0 is not the same as data address 0. This separation also precludes self-modifying code.

Code segment

Q3VM is CISC-like in the sense that opcodes do not occupy equal amounts of space. Some opcodes take a parameter embedded in the code segment. As a result, the length of the code segment has no bearing on the number of instructions in the code segment. The QVM file format header manifests this distinction.

Jump addresses are specified in terms of instructions, rather than bytes. Thus, the VM code memory requires special processing when loading the code segment to identify instruction offsets.

Processor/CPU

While the Q3VM is currently implemented only in emulation, the concept of a VM's CPU is a convenient notion.

Registers

Theses are guesses derived from modern machines, other fictional machines, and Forth.

Opcodes

List of opcodes recognized by the Q3VM. All opcodes occupy one octet, and take 1 or 4 additional octets as parameters. All multi-octet values are treated as little-endian (a legacy of Id's MS-DOS days).

Opcode mnemonics ("names") follow a pattern (intriguingly, the opcode values don't, as they do on other architectures). Up to four letters for the operation name, then a letter indicating a data type, then a digit indicating data size (usually the number of required octets).

Data type and size codes:

B
block (sequence of octets, of arbitrary length (e.g. character strings or structs))
F4
little-endian IEEE-754 32-bit single-precision floating point value.
P4
Four-octet pointer (memory address) value.
I4
Four-octet signed integer. Corresponds to Q3VM's C data type "signed int".
I2
Two-octet signed integer. Corresponds to Q3VM's C data type "signed short".
I1
One-octet signed integer. Corresponds to Q3VM's C data type "signed char".
U4
Four-octet unsigned integer. Corresponds to Q3VM's C data type "unsigned int".
U2
Two-octet unsigned integer. Corresponds to Q3VM's C data type "unsigned short".
U1
One-octet unsigned integer. Corresponds to Q3VM's C data type "unsigned char".

Generalized opcodes list (lcc bytecode):

CNSTt v
CONSTANT. Reads v as a value of type t. For floats, v is the bit pattern of the float value.
ASGNt
ASSIGN. Pops a value from stack, interpreted according to t, to use as the assignment value. Pops another value from stack, interpreted as a pointer value (memory address) to use as the assignment destination.
INDIRt
INDIRECTION. Pops a value from stack, interpreted as a pointer value (memory address). Retrieves the 32-bit value from the indicated memory location, and interprets the 32-bit value according to t, then pushes to stack. (think of it as pointer dereferencing)
CVFt
CONVERT TO FLOAT. Pops a value from stack, interpreted according to t, converted to its equivalent float (F4) form, then pushed to stack.
CVIt
CONVERT TO (signed) INTEGER. Pops a value from stack, interpreted according to t, converted to its equivalent signed integer (I4) form, then pushed to stack.
CVUt
CONVERT TO UNSIGNED INTEGER. Pops a value from stack, interpreted according to t, converted to its equivalent unsigned integer (U4) form, then pushed to stack.
CVPt
CONVERT TO UNSIGNED INTEGER. Pops a value from stack, interpreted according to t, converted to its equivalent pointer (P4) form, then pushed to stack.
NEGt
ARITHMETIC NEGATE. Pops a value from stack, interpreted according to t, negated arithmeticaly (multiplied by -1 or -1.0), then pushed to stack.
ADDRGP4 v
ADDRESS OF GLOBAL (pointer). Takes v as a memory address, takes the 32-bit value at the address, pushes the 32-bit value to stack (i.e. get value of a global variable/function/symbol).
ADDRLP4 v
ADDRESS OF (from?) LOCAL (pointer). Address of a local variable. Local Pointer(?) plus v. First local variable is at 0.
ADDRFP4 v
ADDRESS (from?) FRAME (pointer). Address of an argument (with repect to the frame pointer). Frame Pointer plus (minus?) v. First argument is at 0, second argument at 4, etc (XXX: verify).
ADDt
ADD. Pops a value from stack, interpreted according to t, as second operand. Pops another value from stack, interpreted t, as first operand. The two values are arithmeticaly added, pushes sum to stack.
SUBt
SUBTRACT. Pops a value from stack, interpreted t, as second operand. Pops another value from stack, interpreted t, as first operand. Subtracts second operand from first operand, pushes difference to stack.
MULt
MULTIPLY. Pops a value from stack, interpreted t, as second operand. Pops another value from stack, interpreted t, as first operand. Multiplies the values together, pushes product to stack.
DIVt
DIVIDE. Pops a value from stack, interpreted t, as second operand. Pops another value from stack, interpreted t, as first operand. Second operand divides into first operand, pushes quotient to stack (XXX: integer division C style?).
MODt
MODULUS. Pops a value from stack, as t, as second operand. Pops another value from stack, as t, as first operand. The second operand divides into the first operand, pushes remainder to stack.
LSHt
LEFT SHIFT. Pops a value from stack, as t, as second operand. Pops another value from stack, as t, as first operand. First operand bit-wise shifts left by number of bits specified by second operand, pushes result to stack.
RSHt
RIGHT SHIFT. Pops a value from stack, as t, as second operand. Pops another value from stack, as t, as first operand. First operand bit-wise shifts right by number of bits specified by second operand, pushes result to stack. Sign-extension depends on t.
BANDt
BINARY/BITWISE AND. Pops a value from stack (t) as second operand. Pops another value from stack (t) as first operand. The two operands combine by bitwise AND, pushes result to stack.
BCOMt
BINARY/BITWISE COMPLEMENT. Pops a value from stack (t). Flips (almost-)every bit to its opposite value. Changing the high bit (bit 31) depends on t.
BORt
BINARY/BITWISE OR. Pops a value from stack (t) as second operand. Pops another value from stack (t) as first operand. Combines the two operands by bitwise OR, pushes result to stack.
BXORt
BINARY/BITWISE XOR. Pops two values, XORs, pushes result to stack.
EQt v
EQUAL-TO. Pops two values from stack (as type t), compares the two values. Jumps to address v if true.
GEt v
GREATER-THAN OR EQUAL-TO. Pops a value from stack (type t) as second operand. Pops another value from stack (type t) as first operand. Compares if first operand is equal to the second operand. Jumps to address v if true.
GTt v
GREATER-THAN. Pops a second operand (type t) then a first operand (type t). Compares if first operand is greater than the second operand. Jumps to address v if true.
LEt v
LESS-THAN OR EQUAL-TO. Pops a second operand (type t) then a first operand (type t). Compares if first operand is less than or equal to the second operand. Jumps to address v if true.
LTt v
LESS-THAN. Pops a second operand (t) then a first operand (t). Compares if first operand is less than the second operand. Jumps to address v if true.
NEt v
NOT-EQUAL. Pops two values (t), compares the two values. Compares if first operand is not equal to the second operand. Jumps to address v if true.
JUMPV
Pops a pointer value from stack, sets PC to the value (jump).
RETt
Pop value from stack, shrink stack to eliminate local frame, push value back to stack. Or copy top of stack to bottom of frame and shrink stack size/pointer?
ARGt
Pop value from stack as type t, store into arguments-marshalling space.
CALLt
Pops value from stack as address of a function. Makes a procedure call to the function, which is expected to return a value of type t.
pop
Pop stack (discard top of stack). (functions of void type still return a (nonsense) 32-bit value)

List of assembly directives:

equ s v
Assign integer value v to symbol s.
data
Assemble to DATA segment (initialized 32-bit variables).
code
Assemble to CODE segment (instructions).
lit
Assemble to LIT segment (initialized 8-bit variables).
bss
Assemble to BSS segment (uninitialized block storage segment).
LABELV symbol
Assigns the value of the currently computed Program Counter to the symbol named symbol. (i.e. the current assembled address value is assigned to symbol).
byte l v
Write initialized value v of l octets. 1-octet values go into LIT segment. 2-octet values are invalid (fatal error). 4-octet values go into DATA segment.
skip v
Skip v octets in the segment, leaving the octets uninitialized.
export s
Export symbol s for external linkage.
import s
Import symbol s.
align v
Ensure memory alignment at a multiple of v, skipping octets if needed.
address x
??? (evaluates expression x and append result to DATA segment)
file filename
Set filename to filename for status and error reporting.
line lineno
Indicates current source line of lineno.
proc locals args
Start of procedure (function) body. Set aside locals octets for local variables, and args octets for arguments marshalling (for all possible subcalls within this procedure).
endproc locals args
End of procedure body. Clean up args bytes (for arguments marshalling) and locals bytes (for local variables).

Q3VM Bytecodes

Specific opcodes (Q3VM bytecode): (TOS = Top Of Stack; NIS = Next In Stack (next value after TOS)) (Hack syntax: $PARM = code parameter)

ByteOpcodeParameter sizeDescription
0 UNDEF - undefined opcode.
1 IGNORE - no-operation (nop) instruction.
2 BREAK - ??
3 ENTER 4 Begin procedure body, adjust stack $PARM octets for frame (always at least 8 (i.e. 2 words)). Frame contains all local storage/variables and arguments space for any calls within this procedure.
4 LEAVE 4 End procedure body, $PARM is same as that of the matching ENTER.
5 CALL - make call to procedure (code address <- TOS).
6 PUSH - push nonsense (void) value to opstack (TOS <- 0).
7 POP - pop a value from stack (remove TOS, decrease stack by 1).
8 CONST 4 push literal value onto stack (TOS <- $PARM)
9 LOCAL 4 get address of local storage (local variable or argument) (TOS <- (frame + $PARM)).
10 JUMP - branch (code address <- TOS)
11 EQ 4 check equality (signed integer) (compares NIS vs TOS, jump to $PARM if true).
12 NE 4 check inequality (signed integer) (NIS vs TOS, jump to $PARM if true).
13 LTI 4 check less-than (signed integer) (NIS vs TOS, jump to $PARM if true).
14 LEI 4 check less-than or equal-to (signed integer) (NIS vs TOS, jump to $PARM if true).
15 GTI 4 check greater-than (signed integer) (NIS vs TOS), jump to $PARM if true.
16 GEI 4 check greater-than or equal-to (signed integer) (NIS vs TOS), jump to $PARM if true.
17 LTU 4 check less-than (unsigned integer) (NIS vs TOS), jump to $PARM if true.
18 LEU 4 check less-than or equal-to (unsigned integer) (NIS vs TOS), jump to $PARM if true.
19 GTU 4 check greater-than (unsigned integer) (NIS vs TOS), jump to $PARM if true.
20 GEU 4 check greater-than or equal-to (unsigned integer) (NIS vs TOS), jump to $PARM if true.
21 EQF 4 check equality (float) (NIS vs TOS, jump to $PARM if true).
22 NEF 4 check inequality (float) (NIS vs TOS, jump to $PARM if true).
23 LTF 4 check less-than (float) (NIS vs TOS, jump to $PARM if true).
24 LEF 4 check less-than or equal-to (float) (NIS vs TOS, jump to $PARM if true).
25 GTF 4 check greater-than (float) (NIS vs TOS, jump to $PARM if true).
26 GEF 4 check greater-than or equal-to (float) (NIS vs TOS, jump to $PARM if true).
27 LOAD1 - Load 1-octet value from address in TOS (TOS <- [TOS])
28 LOAD2 - Load 2-octet value from address in TOS (TOS <- [TOS])
29 LOAD4 - Load 4-octet value from address in TOS (TOS <- [TOS])
30 STORE1 - lowest octet of TOS is 1-octet value to store, destination address in next-in-stack ([NIS] <- TOS).
31 STORE2 - lowest two octets of TOS is 2-octet value to store, destination address in next-in-stack ([NIS] <- TOS).
32 STORE4 - TOS is 4-octet value to store, destination address in next-in-stack ([NIS] <- TOS).
33 ARG 1 TOS is 4-octet value to store into arguments-marshalling space of the indicated octet offset (ARGS[offset] <- TOS).
34 BLOCK_COPY - ???
35 SEX8 - Sign-extend 8-bit (TOS <- TOS).
36 SEX16 - Sign-extend 16-bit (TOS <- TOS).
37 NEGI - Negate signed integer (TOS <- -TOS).
38 ADD - Add integer-wise (TOS <- NIS + TOS).
39 SUB - Subtract integer-wise (TOS <- NIS - TOS).
40 DIVI - Divide integer-wise (TOS <- NIS / TOS).
41 DIVU - Divide unsigned integer (TOS <- NIS / TOS).
42 MODI - Modulo (signed integer) (TOS <- NIS mod TOS).
43 MODU - Modulo (unsigned integer) (TOS <- NIS mod TOS).
44 MULI - Multiply (signed integer) (TOS <- NIS * TOS).
45 MULU - Multiply (unsigned integer) (TOS <- NIS * TOS).
46 BAND - Bitwise AND (TOS <- NIS & TOS).
47 BOR - Bitwise OR (TOS <- NIS | TOS).
48 BXOR - Bitwise XOR (TOS <- NIS ^ TOS).
49 BCOM - Bitwise complement (TOS <- ~TOS).
50 LSH - Bitwise left-shift (TOS <- NIS << TOS).
51 RSHI - Algebraic (signed) right-shift (TOS <- NIS >> TOS).
52 RSHU - Bitwise (unsigned) right-shift (TOS <- NIS >> TOS).
53 NEGF - Negate float value (TOS <- -TOS).
54 ADDF - Add floats (TOS <- NIS + TOS).
55 SUBF - Subtract floats (TOS <- NIS - TOS).
56 DIVF - Divide floats (TOS <- NIS / TOS).
57 MULF - Multiply floats (TOS <- NIS x TOS).
58 CVIF - Convert signed integer to float (TOS <- TOS).
59 CVFI - Convert float to signed integer (TOS <- TOS).

QVM File Format

QVM file format. Multi-octet words are little-endian (LE).

QVM header
offsetsizedescription
04magic (0x12721444 LE, 0x44147212 BE)
44instruction count
84length of CODE segment
124offset of CODE segment
164lenth of DATA segment
204offset of DATA segment
244length of LIT segment
284length of BSS segment

Following the header is the code segment, aligned on a 4-byte boundary. The code segment can be processed octet-by-octet to identify instructions. For bytecodes that take a multi-octet parameter, the parameter is stored in LE order. Big-endian machines may want to byte-swap the parameter for native-arithmetic purposes. The instruction count field in the header identifies when the sequence of instructions end. Comparisons of code length, instruction counts, and file positions can be used to verify proper processing.

The data segment follows the code segment. The assembler (qvm generator) needs to ensure the data segment begins on a 4-octet boundary (by padding with zeroes). The data segment consists of a series of 4-octet LE values that are copied into VM memory. For big-endian machines, each word may be optionally byte-swapped (for native arithmetic) on copy.

The lit segment immediately follows the data segment. This segment should already be aligned on a 4-octet boundary. The lit segment consists of a series of 1-octet values, copied verbatim to VM memory. No byte-swapping is needed.

The bss segment is not stored in the qvm file, since this segment consists of unitialized values (i.e. values that don't need to be stored). The VM must ensure there is still sufficient memory to accomodate the bss segment. The QVM header specifies the size of the bss segment. The VM may optionally initialize the bss portion of memory with zeroes.

Updated 2003.02.23
Updated 2011.07.11 - change of contact e-mail

PhaethonH < PhaethonH@gmail.com >