A description of the Q3VM architecture and operations.
The Q3VM is a virtual machine used by Quake III to run the game module. The VM is a sort of sandbox to limit the damage a rogue or malicious QVM program can wreak. Though not perfect, it certainly is much safer than full access to native machine language, which could otherwise more easily allow the spread of viruses or the corruption of system resources. In addition, Q3VM programs, similar to JVM's, are write-once run-on-many-platforms (at least for which a Q3VM interpreter or compiler exists).
The instruction set for the Q3VM is derived from the bytecode interpreter target of LCC, with minor differences. The VM heavily relies on stack operations (stack-based machine).
Memory bus of little-endian, unsigned 32 bits, flat memory space, zero-based addressing, octet-accessible.
Memory is accessible in words of 4 octets (32 bits), 2 octets (16 bits), and single octect (8 bits).
Code and data occupy distinct address spaces. Thus code address 0 is not the same as data address 0. This separation also precludes self-modifying code.
Q3VM is CISC-like in the sense that opcodes do not occupy equal amounts of space. Some opcodes take a parameter embedded in the code segment. As a result, the length of the code segment has no bearing on the number of instructions in the code segment. The QVM file format header manifests this distinction.
Jump addresses are specified in terms of instructions, rather than bytes. Thus, the VM code memory requires special processing when loading the code segment to identify instruction offsets.
While the Q3VM is currently implemented only in emulation, the concept of a VM's CPU is a convenient notion.
Theses are guesses derived from modern machines, other fictional machines, and Forth.
List of opcodes recognized by the Q3VM. All opcodes occupy one octet, and take 1 or 4 additional octets as parameters. All multi-octet values are treated as little-endian (a legacy of Id's MS-DOS days).
Opcode mnemonics ("names") follow a pattern (intriguingly, the opcode values don't, as they do on other architectures). Up to four letters for the operation name, then a letter indicating a data type, then a digit indicating data size (usually the number of required octets).
Data type and size codes:
Generalized opcodes list (lcc bytecode):
List of assembly directives:
Specific opcodes (Q3VM bytecode): (TOS = Top Of Stack; NIS = Next In Stack (next value after TOS)) (Hack syntax: $PARM = code parameter)
Byte | Opcode | Parameter size | Description |
---|---|---|---|
0 | UNDEF | - | undefined opcode. |
1 | IGNORE | - | no-operation (nop) instruction. |
2 | BREAK | - | ?? |
3 | ENTER | 4 | Begin procedure body, adjust stack $PARM octets for frame (always at least 8 (i.e. 2 words)). Frame contains all local storage/variables and arguments space for any calls within this procedure. |
4 | LEAVE | 4 | End procedure body, $PARM is same as that of the matching ENTER. |
5 | CALL | - | make call to procedure (code address <- TOS). |
6 | PUSH | - | push nonsense (void) value to opstack (TOS <- 0). |
7 | POP | - | pop a value from stack (remove TOS, decrease stack by 1). |
8 | CONST | 4 | push literal value onto stack (TOS <- $PARM) |
9 | LOCAL | 4 | get address of local storage (local variable or argument) (TOS <- (frame + $PARM)). |
10 | JUMP | - | branch (code address <- TOS) |
11 | EQ | 4 | check equality (signed integer) (compares NIS vs TOS, jump to $PARM if true). |
12 | NE | 4 | check inequality (signed integer) (NIS vs TOS, jump to $PARM if true). |
13 | LTI | 4 | check less-than (signed integer) (NIS vs TOS, jump to $PARM if true). |
14 | LEI | 4 | check less-than or equal-to (signed integer) (NIS vs TOS, jump to $PARM if true). |
15 | GTI | 4 | check greater-than (signed integer) (NIS vs TOS), jump to $PARM if true. |
16 | GEI | 4 | check greater-than or equal-to (signed integer) (NIS vs TOS), jump to $PARM if true. |
17 | LTU | 4 | check less-than (unsigned integer) (NIS vs TOS), jump to $PARM if true. |
18 | LEU | 4 | check less-than or equal-to (unsigned integer) (NIS vs TOS), jump to $PARM if true. |
19 | GTU | 4 | check greater-than (unsigned integer) (NIS vs TOS), jump to $PARM if true. |
20 | GEU | 4 | check greater-than or equal-to (unsigned integer) (NIS vs TOS), jump to $PARM if true. |
21 | EQF | 4 | check equality (float) (NIS vs TOS, jump to $PARM if true). |
22 | NEF | 4 | check inequality (float) (NIS vs TOS, jump to $PARM if true). |
23 | LTF | 4 | check less-than (float) (NIS vs TOS, jump to $PARM if true). |
24 | LEF | 4 | check less-than or equal-to (float) (NIS vs TOS, jump to $PARM if true). |
25 | GTF | 4 | check greater-than (float) (NIS vs TOS, jump to $PARM if true). |
26 | GEF | 4 | check greater-than or equal-to (float) (NIS vs TOS, jump to $PARM if true). |
27 | LOAD1 | - | Load 1-octet value from address in TOS (TOS <- [TOS]) |
28 | LOAD2 | - | Load 2-octet value from address in TOS (TOS <- [TOS]) |
29 | LOAD4 | - | Load 4-octet value from address in TOS (TOS <- [TOS]) |
30 | STORE1 | - | lowest octet of TOS is 1-octet value to store, destination address in next-in-stack ([NIS] <- TOS). |
31 | STORE2 | - | lowest two octets of TOS is 2-octet value to store, destination address in next-in-stack ([NIS] <- TOS). |
32 | STORE4 | - | TOS is 4-octet value to store, destination address in next-in-stack ([NIS] <- TOS). |
33 | ARG | 1 | TOS is 4-octet value to store into arguments-marshalling space of the indicated octet offset (ARGS[offset] <- TOS). |
34 | BLOCK_COPY | - | ??? |
35 | SEX8 | - | Sign-extend 8-bit (TOS <- TOS). |
36 | SEX16 | - | Sign-extend 16-bit (TOS <- TOS). |
37 | NEGI | - | Negate signed integer (TOS <- -TOS). |
38 | ADD | - | Add integer-wise (TOS <- NIS + TOS). |
39 | SUB | - | Subtract integer-wise (TOS <- NIS - TOS). |
40 | DIVI | - | Divide integer-wise (TOS <- NIS / TOS). |
41 | DIVU | - | Divide unsigned integer (TOS <- NIS / TOS). |
42 | MODI | - | Modulo (signed integer) (TOS <- NIS mod TOS). |
43 | MODU | - | Modulo (unsigned integer) (TOS <- NIS mod TOS). |
44 | MULI | - | Multiply (signed integer) (TOS <- NIS * TOS). |
45 | MULU | - | Multiply (unsigned integer) (TOS <- NIS * TOS). |
46 | BAND | - | Bitwise AND (TOS <- NIS & TOS). |
47 | BOR | - | Bitwise OR (TOS <- NIS | TOS). |
48 | BXOR | - | Bitwise XOR (TOS <- NIS ^ TOS). |
49 | BCOM | - | Bitwise complement (TOS <- ~TOS). |
50 | LSH | - | Bitwise left-shift (TOS <- NIS << TOS). |
51 | RSHI | - | Algebraic (signed) right-shift (TOS <- NIS >> TOS). |
52 | RSHU | - | Bitwise (unsigned) right-shift (TOS <- NIS >> TOS). |
53 | NEGF | - | Negate float value (TOS <- -TOS). |
54 | ADDF | - | Add floats (TOS <- NIS + TOS). |
55 | SUBF | - | Subtract floats (TOS <- NIS - TOS). |
56 | DIVF | - | Divide floats (TOS <- NIS / TOS). |
57 | MULF | - | Multiply floats (TOS <- NIS x TOS). |
58 | CVIF | - | Convert signed integer to float (TOS <- TOS). |
59 | CVFI | - | Convert float to signed integer (TOS <- TOS). |
QVM file format. Multi-octet words are little-endian (LE).
QVM header | ||
offset | size | description |
0 | 4 | magic (0x12721444 LE, 0x44147212 BE) |
4 | 4 | instruction count |
8 | 4 | length of CODE segment |
12 | 4 | offset of CODE segment |
16 | 4 | lenth of DATA segment |
20 | 4 | offset of DATA segment |
24 | 4 | length of LIT segment |
28 | 4 | length of BSS segment |
Following the header is the code segment, aligned on a 4-byte boundary. The code segment can be processed octet-by-octet to identify instructions. For bytecodes that take a multi-octet parameter, the parameter is stored in LE order. Big-endian machines may want to byte-swap the parameter for native-arithmetic purposes. The instruction count field in the header identifies when the sequence of instructions end. Comparisons of code length, instruction counts, and file positions can be used to verify proper processing.
The data segment follows the code segment. The assembler (qvm generator) needs to ensure the data segment begins on a 4-octet boundary (by padding with zeroes). The data segment consists of a series of 4-octet LE values that are copied into VM memory. For big-endian machines, each word may be optionally byte-swapped (for native arithmetic) on copy.
The lit segment immediately follows the data segment. This segment should already be aligned on a 4-octet boundary. The lit segment consists of a series of 1-octet values, copied verbatim to VM memory. No byte-swapping is needed.
The bss segment is not stored in the qvm file, since this segment consists of unitialized values (i.e. values that don't need to be stored). The VM must ensure there is still sufficient memory to accomodate the bss segment. The QVM header specifies the size of the bss segment. The VM may optionally initialize the bss portion of memory with zeroes.
Updated 2003.02.23
Updated 2011.07.11 - change of contact e-mail