Files.cppblog.com



DOOM3 源码剖析



目 录

DOOM3 SOURCE CODE REVIEW: INTRODUCTION (PART 1 OF 6) 4

From notes to articles... 4

Background 4

First contact 4

Architecture 6

The Code 8

Unrolling the loop 10

Renderer 11

Profiling 12

Scripting and Virtual Machine 12

Interviews 12

Recommended readings 13

One more thing 13

DOOM3 SOURCE CODE REVIEW: DMAP (PART 2 OF 6) 16

The editor 16

Code overview 17

0. Loading the level geometry 18

1. MakeStructuralBspFaceList & FaceBSP 19

2. MakeTreePortals 20

3. FilterBrushesIntoTree 22

4. FloodEntities & FillOutside 22

5. ClipSidesByTree 23

6. FloodAreas 23

7. PutPrimitivesInAreas 24

8. Prelight 24

9. FixGlobalTjunctions 25

10. Write output 25

History 25

Recommended readings 26

DOOM3 SOURCE CODE REVIEW: RENDERER (PART 3 OF 6) 29

Architecture 33

Frontend/Backend/GPU collaboration 34

Doom3 Renderer Frontend 35

Doom3 Renderer Backend 39

Interactive surfaces 45

So much more.... 46

Recommended readings 47

DOOM3 SOURCE CODE REVIEW: PROFILING (PART 4 OF 6) 48

Overview 48

Main Thread 48

Game Logic 48

Renderer 49

Renderer: Frontend 49

Renderer: Backend 50

Flat stats 51

DOOM3 SOURCE CODE REVIEW: SCRIPTING VM (PART 5 OF 6) 52

Architecture 52

Compiler 53

Interpreter 53

Recommended readings 54

DOOM3 SOURCE CODE REVIEW: INTERVIEWS (PART 6 OF 6) 55

1996-2007: All plans and interviews from John Carmack 55

2012 Q&A with John Carmack 55

2004 (October): Interview for "The making of Doom3" book . 56

2004 Quakecon keynote 64

John Carmack: 2003 at id Software studio 64

2001 First video at MacWord 64

DOOM3 SOURCE CODE REVIEW: INTRODUCTION

(PART 1 OF 6)

JUNE 8, 2012

On November 23, 2011 id Software maintained the tradition and released the source code of their previous engine. This time is was the turn of idTech4 which powered Prey, Quake 4 and of course Doom 3. Within hours the GitHub repository was forked more than 400 times and people started to look at the game internal mechanisms/port the engine on other platforms. I also jumped on it and promptly completed the Mac OS X Intel version which John Carmackkindly advertised.

In terms of clarity and comments this is the best code release from id Software after Doom iPhone codebase (which is more recent and hence better commented). I highly recommend everybody to read, build and experiment with it.

Here are my notes regarding what I understood. As usual I have cleaned them up: I hope it will save someone a few hours and I also hope it will motivate some of us to read more code and become better programmers.

Part 1: Overview

Part 2: Dmap 

Part 3: Renderer

Part 4: Profiling 

Part 5: Scripting

Part 6: Interviews (including Q&A with John Carmack) 

From notes to articles...

I have noticed that I am using more and more drawing and less and less text in order to explain codebase. So far I have usedgliffy to draw but this tool has some frustrating limitations (such as lack of alpha channel). I am thinking of authoring a tool specialized in drawing for 3D engines using SVG and Javascript. I wonder if something like this already exist ? Anyway, back to the code...

Background

Getting our hands on the source code of such a ground breaking engine is exciting. Upon release in 2004 Doom III set new visual and audio standards for real-time engines, the most notable being "Unified Lighting and Shadows". For the first time the technology was allowing artists to express themselves on an hollywood scale. Even 8 years later the first encounter with the HellKnight in Delta-Labs-4 still looks insanely great:

First contact

The source code is now distributed via Github which is a good thing since the FTP server from id Software was almost always down or overloaded.

| |[pic] |

|The original release from TTimo compiles well with Visual Studio 2010 Professional. | |

|Unfortunately Visual Studio 2010 "Express" lacks MFC and hence cannot be used. This was | |

|disappointing upon release but some people have since removed the dependencies. | |

| | |

| | |

|Windows 7 : | |

|=========== | |

| | |

|git clone | |

| | |

| | |

| |[pic] |

|For code reading and exploring I prefer to use XCode 4.0 on Mac OS X: The search speed | |

|from SpotLight, the variables highlights and the "Command-Click" to reach a definition | |

|make the experience superior to Visual Studio. The XCode project was broken upon release | |

|but it waseasy to fix with a few steps and there is now a Github repository by "bad | |

|sector" which works well on Mac OS X Lion. | |

| | |

| | |

| | |

|MacOS X : | |

|========= | |

| | |

|git clone | |

| | |

| | |

Notes : It seems "variable hightlights" and "Control-Click" are also available on Visual Studio 2010 after installing theVisual Studio 2010 Productivity Power Tools. I cannot understand why this is not part of the vanilla install.

Both codebases are now in the best state possible : One click away from an executable !

← Download the code.

← Hit F8 / Commmand-B.

← Run !

Trivia : In order to run the game you will need the base folder containing the Doom 3 assets. Since I did not want to waste time extracting them from the Doom 3 CDs and updating them: I downloaded the Steam version. It seems id Software team did the same since the Visual Studio project released still contains "+set fs_basepath C:\Program Files (x86)\Steam\steamapps\common\doom 3" in the debug settings! 

Trivia : The engine was developed with Visual Studio .NET (source). But the code does not feature a single line of C# and the version released requires Visual Studio 2010 Professional in order to compile.

Trivia : Id Software team seems to be fan of the Matrix franchise: Quake III working title was "Trinity" and Doom III working title was "Neo". This explains why you will find all of the source code in the neo subfolder.

Architecture

The solution is divided in projects that reflect the overall architecture of the engine:

|Projects |Builds |Observations |

| |Windows |MacO SX | |

|Game |gamex86.dll |gamex86.so |Doom3 gameplay |

|Game-d3xp |gamex86.dll |gamex86.so |Doom3 eXPension (Ressurection) gameplay |

|MayaImport |MayaImport.dll |- |Part of the assets creation toolchain: Loaded at runtime in order to open Maya files and |

| | | |import monsters, camera path and maps. |

|Doom3 |Doom3.exe |Doom3.app |Doom 3 Engine |

|TypeInfo |TypeInfo.exe |- |In-house RTTI helper: Generates GameTypeInfo.h: A map of all the Doom3 class types with |

| | | |each member size. This allow memory debugging via TypeInfo class. |

|CurlLib |CurlLib.lib |- |HTTP client used to download files (Staticaly linked against gamex86.dll and doom3.exe). |

|idLib |idLib.lib |idLib.a |id Software library. Includes parser,lexer,dictionary ... (Staticaly linked against |

| | | |gamex86.dll and doom3.exe). |

Like every engine since idTech2 we find one closed source binary (doom.exe) and one open source dynamic library (gamex86.dll).:

[pic] Most of the codebase has been accessible since October 2004 via the Doom3 SDK: Only the Doom3 executable source code was missing. Modders were able to build idlib.a and gamex86.dll but the core of the engine was still closed source.

Note : The engine does not use the Standard C++ Library: All containers (map,linked list...) are re-implemented but libc is extensively used. 

Note : In the Game module each class extends idClass. This allows the engine to perform in-house RTTI and also instantiate classes by classname.

Trivia : If you look at the drawing you will see that a few essential frameworks (such as Filesystem) are in the Doom3.exe project. This is a problem since gamex86.dll needs to load assets as well. Those subsystems are dynamically loaded by gamex86.dll from doom3.exe (this is what the arrow materializes in the drawing). If we use a PE explorer on the DLL we can see that gamex86.dll export one method: GetGameAPI:

[pic]

Things are working exactly the way Quake2 loaded the renderer and the game ddls: Exchanging objects pointers: 

When Doom3.exe starts up it:

← Loads the DLL in its process memory space via LoadLibrary.

← Get the address of GetGameAPI in the dll using win32's GetProcAddress.

← Call GetGameAPI.

gameExport_t * GetGameAPI_t( gameImport_t *import );

At the end of the "handshake", Doom3.exe has a pointer to a idGame object and Game.dll has a pointer to a gameImport_t object containing additional references to all missing subsystems such as idFileSystem.

Gamex86's view on Doom 3 executable objects:

typedef struct {

int version; // API version

idSys * sys; // non-portable system services

idCommon * common; // common

idCmdSystem * cmdSystem // console command system

idCVarSystem * cvarSystem; // console variable system

idFileSystem * fileSystem; // file system

idNetworkSystem * networkSystem; // network system

idRenderSystem * renderSystem; // render system

idSoundSystem * soundSystem; // sound system

idRenderModelManager * renderModelManager; // render model manager

idUserInterfaceManager * uiManager; // user interface manager

idDeclManager * declManager; // declaration manager

idAASFileManager * AASFileManager; // AAS file manager

idCollisionModelManager * collisionModelManager; // collision model manager

} gameImport_t;

Doom 3's view on Game/Modd objects:

typedef struct

{

int version; // API version

idGame * game; // interface to run the game

idGameEdit * gameEdit; // interface for in-game editing

} gameExport_t;

Notes : A great resource to understand better each subsystems is the Doom3 SDK documentation page: It seems to have been written by someone with deep understanding of the code in 2004 (so probably a member of the development team).

The Code

Before digging, some stats from cloc:

./cloc-1.56.pl neo

2180 text files.

2002 unique files.

626 files ignored.

v 1.56 T=19.0 s (77.9 files/s, 47576.6 lines/s)

-------------------------------------------------------------------------------

Language files blank comment code

-------------------------------------------------------------------------------

C++ 517 87078 113107 366433

C/C++ Header 617 29833 27176 111105

C 171 11408 15566 53540

Bourne Shell 29 5399 6516 39966

make 43 1196 874 9121

m4 10 1079 232 9025

HTML 55 391 76 4142

Objective C++ 6 709 656 2606

Perl 10 523 411 2380

yacc 1 95 97 912

Python 10 108 182 895

Objective C 1 145 20 768

DOS Batch 5 0 0 61

Teamcenter def 4 3 0 51

Lisp 1 5 20 25

awk 1 2 1 17

-------------------------------------------------------------------------------

SUM: 1481 137974 164934 601047

-------------------------------------------------------------------------------

The number of line of code is not usually a good metric for anything but here it can be very helpful in order to assess the effort to comprehend the engine. 601,047 lines of code makes the engine twice as "difficult" to understand compared to Quake III. A few stats with regards to the history of id Software engines # lines of code:

|#Lines of code |Doom |idTech1 |idTech2 |idTech3 |idTech4 |

|Engine |39079 |143855 |135788 |239398 |601032 |

|Tools |341 |11155 |28140 |128417 |- |

|Total |39420 |155010 |163928 |367815 |601032 |

[pic]

Note : The huge increase in idTech3 for the tools comes from lcc codebase (the C compiler used to generate QVM bytecode) .

Note : No tools are accounted for Doom3 since they are integrated to the engine codebase.

From a high level here are a few fun facts:

← For the first time in id Software history the code is C++ instead of C. John Carmack elaborated on this during our Q&A.

← Abstraction and polymorphism are used a lot across the code. But a nice trick avoids the vtable performance hit on some objects.

← All assets are stored in human readable text form. No more binaries. The code is making extensive usage of lexer/parser. John Carmack elaborated on this during our Q&A.

← Templates are used in low level utility classes (mainly idLib) but are never seen in the upper levels so they won't make your eyes bleed the way Google's V8 source code does.

← In terms of code commenting it is the second best codebase from id software, the only one better is Doom iPhone, probably because it is more recent than Doom3. 30% comments is still outstanding and find it rare to find a project that well commented! In some part of the code (see dmap page) there are actually more comments than statements.

← OOP encapsulation makes the the code clean and easy to read.

← The days of low level assembly optimization are gone. A few tricks such as idMath::InvSqrt and spacial localization optimizations are here but most of the code just tries to use the tools when they are available (GPU Shaders, OpenGL VBO, SIMD, Altivec, SMP, L2 Optimizations (R_AddModelSurfaces per model processing)...).

It is also interesting to take a look at idTech4 The Coding Standard defined by John Carmack (I particularly appreciated the comments about const placement).

Unrolling the loop

Here is the main loop unrolled with the most important parts of the engine: 

// OS Specialized object

idCommonLocal commonLocal;

// Interface pointer (since Init is OS dependent it is an abstract method

idCommon * common = &commonLocal;

int WINAPI WinMain( HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nCmdShow )

{

// Min = 201,326,592 Max = 1,073,741,824

Sys_SetPhysicalWorkMemory( 192 Async();

// Unlock other thread waiting for inputs

Sys_TriggerEvent( TRIGGER_EVENT_ONE );

// Check if we have been cancelled by the main thread (on shutdown).

pthread_testcancel();

}

}

Sys_ShowConsole

while( 1 ){

// Show or hide the console

Win_Frame();

common->Frame(){

// Game logic

session->Frame()

{

for (int i = 0 ; i < gameTicsToRun ; i++ )

RunGameTic(){

// From this point execution jumps in the GameX86.dll address space.

game->RunFrame( &cmd );

for( ent = activeEntities.Next(); ent != NULL; ent = ent->activeNode.Next() )

// let entities think

ent->GetPhysics()->UpdateTime( time );

}

}

// normal, in-sequence screen update

session->UpdateScreen( false );

{

renderSystem->BeginFrame

// Renderer front-end. Doesn't actually communicate with the GPU !!

idGame::Draw

renderSystem->EndFrame

// Renderer back-end. Issue GPU optimized commands to the GPU.

R_IssueRenderCommands

}

}

}

}

For more details here is the fully unrolled loop that I used as a map while reading the code.

It is a standard main loop for an id Software engine. Except for Sys_StartAsyncThread which indicate that Doom3 is multi-threaded. The goal of this thread is to handle the time-critical functions that the engine don't want limited to the frame rate:

← Sound mixing.

← User input generation.

Trivia : idTech4 high level objects are all abstract classes with virtual methods. This would normally involves a performance hit since each virtual method address would have to be looked up in a vtable before calling it at runtime. But there is a "trick" to avoid that. All object are instantiated statically on the heap as follow:

idCommonLocal commonLocal; // Implementation

idCommon * common = &commonLocal; // Pointer for gamex86.dll

Since an object allocated on the heap has a known type the compiler can optimize away the vtable lookup. The interface pointer is used during the handshake so doom3.exe can exchange objects reference with gamex86.dll. 

Trivia : Having read most engines from id Software I find it noticeable that some method name have NEVER changed since doom1 engine: The method responsible for pumping mouse and joystick inputs is still called: IN_frame().

Renderer

Two important parts:

← Since Doom3 uses a portal system, the preprocessing tool dmap is a complete departure from the traditional bsp builder. I reviewed it to the deep down on a dedicated page. 

[pic]

← The runtime renderer has a very interesting architecture since it is broken in two parts with a frontend and backend: More on the dedicated page. 

[pic]

Profiling

I used Xcode's Instruments to check where the CPU cycle were going. The results and analysis are here.

Scripting and Virtual Machine

In every idTech product the VM and the scripting language totally changed from the previous version...and they did it again:Details are here.

Interviews

While reading the code, several novelties puzzled me so I wrote to John Carmack and he was nice enough to reply with in-depth explanations about:

← C++.

← Renderer broken in two pieces.

← Text-based assets.

← Interpreted bytecode.

I also compiled all videos and press interviews about idTech4. It is all in the interviews page.

Recommended readings

As usual a few books that you may enjoy if you enjoy the code:

[pic][pic]

One more thing

Summer is coming and it was not always easy to focus...

[pic][pic]

...but overall it was a blast to read most of it. Since idTech5 source code will not be released anytime soon (if ever) this leaves me with idTech3 (Quake III) not yet reviewed. Maybe I will write something about it if enough people are interested.

DOOM3 SOURCE CODE REVIEW: DMAP

(PART 2 OF 6)

Like every id Software engine the maps generated by the design team were heavily preprocessed by a tool in order to increase performances at runtime.

For idTech4 the tool is named dmap and its goal is to read a soup of polyhedron from a .map file, identify areas connected by inter-area-portals and save those in a .proc file.

The goal is to power the runtime portal system of doom3.exe. There is an amazing paper from 1992 by Seth Teller: "Visibility Computations in Densely Occluded Polyhedral environment" : It describes well how idTech4 works with many explanatory drawings.

The editor

Designers produce level maps via CSG (Constructive Solid Geometry): They use polyhedrons that usually have 6 faces and place them on the map.

Those blocks are called brushes and the following drawing shows 8 brushes used (I use the same map to explain each step indmap).

A designer may have a good idea of what is "inside" (on the left) but dmap receives a brushes soup where nothing is inside and nothing is outside (on the right).

|Designer view |Dmap view as brushes are read from the .map file. |

|[pic] |[pic] |

|[pic] | |

| |A brush is not defined via faces but by its planes. It may seem very inefficient to give |

| |planes instead of faces but it is very helpful later when checking if two faces are on the|

| |same plane. There is no inside or outside since the planes are not oriented |

| |"consistently". The planes orientation can point outside or inside the volume |

| |indifferently. |

Code overview

Dmap source code is very well commented, just look at the amount of green: There is more comments than code !

bool ProcessModel( uEntity_t *e, bool floodFill ) {

bspface_t *faces;

// build a bsp tree using all of the sides

// of all of the structural brushes

faces = MakeStructuralBspFaceList ( e->primitives );

e->tree = FaceBSP( faces );

// create portals at every leaf intersection

// to allow flood filling

MakeTreePortals( e->tree );

// classify the leafs as opaque or areaportal

FilterBrushesIntoTree( e );

// see if the bsp is completely enclosed

if ( floodFill && !dmapGlobals.noFlood ) {

if ( FloodEntities( e->tree ) ) {

// set the outside leafs to opaque

FillOutside( e );

} else {

common->Printf ( "**********************\n" );

common->Warning( "******* leaked *******" );

common->Printf ( "**********************\n" );

LeakFile( e->tree );

// bail out here. If someone really wants to

// process a map that leaks, they should use

// -noFlood

return false;

}

}

// get minimum convex hulls for each visible side

// this must be done before creating area portals,

// because the visible hull is used as the portal

ClipSidesByTree( e );

// determine areas before clipping tris into the

// tree, so tris will never cross area boundaries

FloodAreas( e );

// we now have a BSP tree with solid and non-solid leafs marked with areas

// all primitives will now be clipped into this, throwing away

// fragments in the solid areas

PutPrimitivesInAreas( e );

// now build shadow volumes for the lights and split

// the optimize lists by the light beam trees

// so there won't be unneeded overdraw in the static

// case

Prelight( e );

// optimizing is a superset of fixing tjunctions

if ( !dmapGlobals.noOptimize ) {

OptimizeEntity( e );

} else if ( !dmapGlobals.noTJunc ) {

FixEntityTjunctions( e );

}

// now fix t junctions across areas

FixGlobalTjunctions( e );

return true;

}

0. Loading the level geometry

A .map file is a list of entities. The level is the first entity in the file and has a "worldspawn" class. An entity contains a list of primitives that are almost always brushes. The remaining entities are lights, monsters, player spawning location, weapons etc ...

Version 2

// entity 0

{

"classname" "worldspawn"

// primitive 0

{

brushDef3

{

( 0 0 -1 -272 ) ( ( 0.0078125 0 -8.5 ) ( 0 0.03125 -16 ) ) "textures/base_wall/stelabwafer1" 0 0 0

( 0 0 1 -56 ) ( ( 0.0078125 0 -8.5 ) ( 0 0.03125 16 ) ) "textures/base_wall/stelabwafer1" 0 0 0

( 0 -1 0 -3776) ( ( 0.0078125 0 4 ) ( 0 0.03125 0 ) ) "textures/base_wall/stelabwafer1" 0 0 0

( -1 0 0 192 ) ( ( 0.0078125 0 8.5 ) ( 0 0.03125 0 ) ) "textures/base_wall/stelabwafer1" 0 0 0

( 0 1 0 3712 ) ( ( 0.006944 0 4.7 ) ( 0 0.034 1.90) ) "textures/base_wall/stelabwafer1" 0 0 0

( 1 0 0 -560 ) ( ( 0.0078125 0 -4 ) ( 0 0.03125 0 ) ) "textures/base_wall/stelabwafer1" 0 0 0

}

}

// primitive 1

{

brushDef3

}

// primitive 2

{

brushDef3

}

}

.

.

.

// entity 37

{

"classname" "light"

"name" "light_51585"

"origin" "48 1972 -52"

"texture" "lights/round_sin"

"_color" "0.55 0.06 0.01"

"light_radius" "32 32 32"

"light_center" "1 3 -1"

}

Each brush is described as a set of planes. The sides of a brush are called faces (also called windings) and each is obtained by clipping a plane with every other planes in the brush.

Note : During the loading phase a really neat and fast "Plane Hashing System" is used: idPlaneSet built on top of aidHashIndex it is really worth taking a look at it.

1. MakeStructuralBspFaceList & FaceBSP

The first step is to slice the map via Binary Space Partition. Every single non transparent faces in the map wil be used as splitting plane.

The heuristic to select a splitter is:

1 : If the map is more than 5000 units: Slice using an Axis Aligned Plane in the middle of the space. In the following drawing a 6000x6000 space is sliced three times.

[pic]

2 : When there is no more parts bigger than 5000 units: Use the faces marked as "portal" (they have materialtextures/editor/visportal). In the following drawing the portal brushes are in blue.

[pic]

3 : Finally use the remaining faces. Select the face that is collinear to the most other planes AND split the less faces ; Also try to favor axial splitters. The splitting planes are marked in red.

[pic]

[pic]

The process stops when no more faces are available: The BSP tree leaf all represent a convex subspace:

[pic]

2. MakeTreePortals

The map is now divided into convex subspaces but those subspaces have no awareness of each other. The goal of this step is to connect each leaf to its neighbors by creating portals automatically. The idea is to start with six portals boundering the map: The connect "outside" to "inside" (the root of the BSP). Then for each nodes int the BSP: split each portal in the node, add the splitting plane as portal and recurse.

[pic]

|[pic] | |

| | |

| |The original six portals are going to be split and propagated all the |

| |way down to the leafs. This is not as trivial as it seems since each |

| |time a node is split: Every portals it is connected to must also be |

| |split. |

| | |

| |In the drawing on the left one portal is connecting two BSP sibling |

| |nodes. Upon following the left child its splitting plane cut the portal|

| |in two. We can see that the other node portals must also be updated so |

| |they don't connect to a sibling anymore but to its "nephews". |

At the end of the process the six original portals have been split in hundreds of portals and new portals have been created on the splitting planes: Each leaf in the BSP has gained awareness of its neighbors via a linked list of portals connecting it to leafs sharing an edge:

[pic]

3. FilterBrushesIntoTree

|[pic] | |

| |This step works like a game of Shape Sorter with the BSP being the board and the brushes |

| |being the shapes. Each brush is sent down the BSP in order to discover which leafs |

| |are opaque. |

| | |

| |This work because of a well defined heuristic: If a brush is crossing a splitting plane a|

| |little but not more than EPSILON then it is not split. Instead it is sent all together on|

| |the plane side where all the other elements of the brush are.  |

| | |

| |Now "inside" and "outside" are starting to be visible. |

A leaf hit by a brush is considered opaque (solid) and is hence marked accordingly.

[pic]

4. FloodEntities & FillOutside

Using a player spawning entity a floodfill algorithm is triggered from each leaf. It mark leafs reachable by entities.

[pic]

The final step FillOutside go through each leaf and if it is not reachable mark it as opaque. 

[pic]

We now have a level where each subspace is either reachable or opaque: Navigation via leaf portals can now be consistent by checking if the target leaf is opaque or not.

5. ClipSidesByTree

It is now time to discard useless parts of the brushes: Each original brush side is now sent down the BSP. If a side is within an opaque space then it is discarded. Otherwise it is added to the side's visibleHull list.

This result in a "skin" of the level, only visible parts are kept. 

[pic]

From this point only side's visibleHull are considered for the remaining operations.

6. FloodAreas

Now dmap group leafs together with area IDs: For each leaf a floodfilling algorithm is triggered. It tries to flow everywhere using portals associated to the leaf.

This is where the designer work is tremendously important: Areas can be identified only if visportals (the portal brushes mentioned in Step 1) were manually placed on the map. Without them dmap will identify only one area and the entire map will be sent to the GPU each frame.

The Floodfilling recursive algorithm is stopped only by areaportals and opaque nodes. In the following drawing an automatically generated portal (in red) will allow flood but a designer placed visportal (in blue, also called areaportal ) will stop it, making two areas:

[pic]

[pic]

At the end of the process each non opaque leaf belongs to an area and the inter-area-portals (in blue) have been identified.

[pic]

7. PutPrimitivesInAreas

This step combines the areas identified in Step 6 and the visibleHull calculated in Step 5 in an other "Shape Sorter" game: This time the board is the areas and the shapes are the visibleHull.

An array of areas is allocated and each visibleHull of each brush is sent down the BSP: Surfaces are added to the area array at index areaIDs.

Note : Pretty clever, this step will also optimize entity spawning. If some entities are marked "func_static" they are instantiated now and associated to an area. This is a way to "melt" boxes, barrels,chairs into an area (also have its shadow volume pre-generated).

8. Prelight

For each static light dmap pre-calculate the shadow volumes geometry. Those volumes are later saved in the .proc as it. The only trick is that shadow volumes are saved with a name "_prelight_light" concatenated to the light ID so the engine can match the light from the .map file and the shadow volume from the .proc file:

shadowModel { /* name = */ "_prelight_light_2900"

/* numVerts = */ 24 /* noCaps = */ 72 /* noFrontCaps = */ 84 /* numIndexes = */ 96 /* planeBits = */ 5

( -1008 976 183.125 ) ( -1008 976 183.125 ) ( -1013.34375 976 184 ) ( -1013.34375 976 184 ) ( -1010 978 184 )

( -1008 976 184 ) ( -1013.34375 976 168 ) ( -1013.34375 976 168 ) ( -1008 976 168.875 ) ( -1008 976 168.875 )

( -1010 978 168 ) ( -1008 976 167.3043518066 ) ( -1008 976 183.125 ) ( -1008 976 183.125 ) ( -1010 978 184 )

( -1008 976 184 ) ( -1008 981.34375 184 ) ( -1008 981.34375 184 ) ( -1008 981.34375 168 ) ( -1008 981.34375 168 )

( -1010 978 168 ) ( -1008 976 167.3043518066 ) ( -1008 976 168.875 ) ( -1008 976 168.875 )

4 0 1 4 1 5 2 4 3 4 5 3 0 2 1 2 3 1

8 10 11 8 11 9 6 8 7 8 9 7 10 6 7 10 7 11

14 13 12 14 15 13 16 12 13 16 13 17 14 16 15 16 17 15

22 21 20 22 23 21 22 18 19 22 19 23 18 20 21 18 21 19

1 3 5 7 9 11 13 15 17 19 21 23 4 2 0 10 8 6

16 14 12 22 20 18

}

9. FixGlobalTjunctions

Fixing TJunction is usually important in order to avoid visual artefacts but this is even more important in idTech4: The geometry is also used to generated the shadow while writing to the stencil buffer. T-Junctions are twice as annoying.

10. Write output

In the end all this preprocessing is saved to a .proc file:

← For each area a set of surface faces grouped by material.

← The BSP Tree with areaID for leafs.

← Inter-area-portals winding.

← Shadow Volumes.

History

Many code segments from dmap feature similarities with code found in the preprocessing tools of Quake (qbsp.exe), Quake 2 (q2bsp.exe) or Quake (q3bsp.exe). That's because the Potentially visible Set was generated via a temporary portal system:

← qbsp.exe read a .map and generated a .prt file that contained connectivity information between leafs in the BSP: Portals (exactly like Step2. MakeTreePortals).

← vis.exe used the .prt as input. For each leaf:

➢ fillFlood into connected leaf using portals.

➢ Before flooding into a leaf: Test for visibility by clipping the next portal with the two previous portal anti view frustrum (many people claimed that visibility was done by casting thousands of rays but this is a myth that many still believe nowadays).

A drawing is always better: Let's say qbsp.exe identified 6 leafs connected by portals and now vis.exe is running to generate the PVS. This process will be executed for each leaf but this example focus exclusively on leaf 1.

|[pic] | |

| |Since a leaf is always visible by itself the initial PVS for leaf 1 is as follow: |

| | |

| |Leaf ID |

| |1 |

| |2 |

| |3 |

| |4 |

| |5 |

| |6 |

| | |

| |Bit vector (PVS for leaf 1) |

| |1 |

| |? |

| |? |

| |? |

| |? |

| |? |

| | |

| | |

| |The floodFilling algorithm starts: The rule is that as long as we don't have two portals in the |

| |path, the leaf is considered visible from the starting point. This means we reach leaf3 with the|

| |following PVS: |

| | |

| |Leaf ID |

| |1 |

| |2 |

| |3 |

| |4 |

| |5 |

| |6 |

| | |

| |Bit vector (PVS for leaf 1) |

| |1 |

| |1 |

| |1 |

| |? |

| |? |

| |? |

| | |

|[pic] |Once in the leaf3 we can actually start checking for visibility: |

| | |

| |By taking two points form the portal n-2 and the portal n-1 we can generate clipping planes and |

| |test if the next portals are potentially visible. |

| | |

| |In the drawing the can see that portals leading to leaf 4 and 6 will fail the tests while portal|

| |toward 5 will succeed. The floorFilling algorithm will then recurse to leaf6. In the end the PVS|

| |for leaf 1 will be: |

| | |

| |Leaf ID |

| |1 |

| |2 |

| |3 |

| |4 |

| |5 |

| |6 |

| | |

| |Bit vector (PVS for leaf 1) |

| |1 |

| |1 |

| |1 |

| |0 |

| |1 |

| |0 |

| | |

In idTech4 the PVS is not generated, instead the portal data is conserved. The visibility of each area is computed at runtime by projecting the portal windings into screen space and clipping them against each others.

Recommended readings

The great article by Sean Barret: The 3D Software Rendering Technology of 1998's Thief: The Dark Project mentions Seth Teller's 1992 thesis work in three parts; "Visibility Computations for Global Illumination Algorithms, " : A lot can be read about visibility precomputation, virtual light sources, portals, portal sequence, gross/fine culling, general observer and visible supersets.

[pic] [pic] [pic] 

Michael Abarash Graphic Programming Black Book: The chapter 60 is pure gold when it comes to explain how to split a segment with a plane.

[pic]

The proof to the spliting segment formula is in "Computer Graphics: Principles and Practice":

[pic]

More about T-Junction fixing in "Mathematics for 3D Game Programming and Computer Praphics":

[pic]

DOOM3 SOURCE CODE REVIEW: RENDERER

(PART 3 OF 6)

idTech4 renderer features three key innovations:

← "Unified Lighting and Shadows": The level faces and the entities faces go through the same pipeline and shaders.

← "Visible Surface Determination": A portal system allows VSD to be performed at runtime: No more PVS.

← "Multi-pass Rendering".

By far the most important is that idTech4 is a multi-pass renderer. The contribution of each light in the view is accumulated in the GPU framebuffer via additive blending. Doom 3 takes full advantage of the fact that color framebuffer registers saturate instead of wrapping around.

CPU register (wrap around) :

============================

1111 1111

+ 0000 0100

---------

= 0000 0011

GPU register ( saturate) :

==========================

1111 1111

+ 0000 0100

---------

= 1111 1111

I build a custom level to illustrate additive blending. The following screenshot shows three lights in a room resulting in three passes with the result of each pass accumulated in the framebuffer. Notice the white illumination at the center of the screen where all lights blend together .

[pic]

I modified the engine in order to isolate each light pass, they can be viewed using the left and right arrows:

[pic]

…更多图片

I modified the engine further in order to see the framebuffer state AFTER each light pass. Use left and right arrow to move in time.

[pic]

…更多图片

Trivia : It is possible to take the result of each light pass, blend them manually with photoshop (Linear Dodge to mimic OpenGL additive blending) and reach the exact same visual result. 

Additive blending combined to support of shadows and bumpmapping resulted in an engine that can still produce very nice result even by 2012 standards:

[pic]

Architecture

The renderer is not monolithic like previous idTech engines but rather broken down in two parts called Frontend and Backend:

← Frontend:

1. Analyze world database and determine what contributes to the view.

2. Store the result in an Intermediate Representation (def_view_t) and upload/reuse cache geometry in the GPU's VBO.

3. Issue a RC_DRAW_VIEW command.

← Backend:

1. The RC_DRAW_VIEW wakes up the backend.

2. Use the Intermediate Representation as input and issue commands to the GPU using the VBOs.

[pic]

The architecture of the renderer draws a striking similarity with LCC the retargetable compiler that was used to generate the Quake3 Virtual Machine bytecode: 

[pic]

I initially thought the renderer design as influenced by LCC design but the renderer is built in two parts because it wasmeant to be multi-thread on SMP systems. The front-end would run on one core and the back-end on an other core. Unfortunately due to instability on certain drivers the extra thread had to be disabled and both ends run on the same thread.

Genesis trivia : Archelology can be done with code as well: If you look closely at the unrolled code renderer (frontend,backend) you can clearly see that the engine switches from C++ to C (from objects to static methods):

This is due to the genesis of the code. idTech4 renderer was written by John Carmack using Quake3 engine (C codebase) before he was proficient in C++. The renderer was later integrated to the idtech4 C++ codebase.

How much Quake is there in Doom3 ? Hard to tell but it is funny to see that the main method in the Mac OS X version is: 

- (void)quakeMain;

Frontend/Backend/GPU collaboration

Here is a drawing that illustrate the collaboration between the frontend, the backend and the GPU:

[pic]

1. The Frontend analyzes the world state and issues two things:

← An intermediate representation containing a list of each light contributing to the view. Each light contains a list of the entity surfaces interacting with it.

← Each light-entity interaction that is going to be used for this frame is also cached in a interaction table. Data is usually uploaded to a GPU VBO.

2. The Backend takes the intermediate representation as input. It goes through each lights in the list and makes OpenGL draw calls for each entity that interact with the light. The draw command obviously reference the VBO and textures.

3. The GPU receives the OpenGL commands and render to the screen.

Doom3 Renderer Frontend

The frontend performs the hard part: Visible Surface Determination (VSD). The goal is to find every light/entity combination affecting the view. Those combinations are called interactions. Once each interaction have been found the frontend makes sure everything needed by the backend is uploaded to the GPU Ram (it keeps track of everything via an "interaction table"). The last step is to generate an Intermediate representation that will be read by the backend so it can generate OpenGL Commands.

In the code this is how it looks:

- idCommon::Frame

- idSession::UpdateScreen

- idSession::Draw

- idGame::Draw

- idPlayerView::RenderPlayerView

- idPlayerView::SingleView

- idRenderWorld::RenderScene

- build params

//This is the frontend

- ::R_RenderView(params)

{

R_SetViewMatrix

R_SetupViewFrustum

R_SetupProjection

//Most of the beef is here.

static_cast(parms->renderWorld)->FindViewLightsAndEntities()

{

//Walk the BSP and find the current Area

PointInArea

// Recursively pass portals to find lights and entities interacting with the

// view.

FlowViewThroughPortals

}

//Improve Z-buffer accuracy by moving far plan as close as the farthest entity.

R_ConstrainViewFrustum

// Find entities that are not in a visible area but still casting a shadow

// (usually enemies)

R_AddLightSurfaces

// Instantiate animated models (for monsters)

R_AddModelSurfaces

R_RemoveUnecessaryViewLights

// A simple C qsort call. C++ sort would have been faster thanks to inlining.

R_SortDrawSurfs

R_GenerateSubViews

R_AddDrawViewCmd

}

Note : The switch from C to C++ is obvious here.

It is alwasy easier to understand with a drawing so here is a level: Thanks for the designer's visplanes the engine sees four areas:

[pic]

Upon loading the .proc the engine also loaded the .map containing all the lights and moving entities definitions. For each light the engine has built a list of each area impacted:

|[pic] | |

| | |

| |Light 1 : |

| |========= |

| | |

| |- Area 0 |

| |- Area 1 |

| | |

| |Light 2 : |

| |========= |

| | |

| |- Area 1 |

| |- Area 2 |

| |- Area 3 |

At runtime we now have a player position and monsters casting shadows. For scene correctness, all monsters and shadow must be found. 

[pic]

Here is the process:

1. Find in which area the player is by walking the BSP tree in PointInArea.

2. FlowViewThroughPortals : Starting from the current area floodfill into other visible area using portal system. Reshape the view frustrum each time a portal is passed: This is beautifully explained in the Realtime rendering book bible:

[pic]

. Now we have a list of every lights contributing to the screen and most entities which are stored in the Interaction table:

Interaction table (Light/Entity) :

==================================

Light 1 - Area 0

Light 1 - Area 1

Light 1 - Monster 1

Light 2 - Area 1

Light 2 - Monster 1

The interaction table is still incomplete: The interaction Light2-Monster2 is missing, the shadow cast by Monster2 would be missing.

3. R_AddLightSurfaces will find the entity not in the view but casting shadow by going through each light's area list.

Interaction table (Light/Entity) :

==================================

Light 1 - Area 0

Light 1 - Area 1

Light 1 - Monster 1

Light 2 - Area 1

Light 2 - Monster 1

Light 2 - Monster 2

4. R_AddModelSurfaces : All interaction have been found, it is now time to upload the vertices and indices to the GPU's VBO if they are not there already. Animated monster geometry is instantiated here as well (model AND shadow volume)

5. All "intelligent" work has been done. Issue a RC_DRAW_VIEW command via R_AddDrawViewCmd that will trigger the backend to render to the screen.

Doom3 Renderer Backend

The backend is in charge of rendering the Intermediate Representation while accounting for the limitations of the GPU: Doom3 supported five GPU rendering path:

← R10 (GeForce256)

← R20 (GeForce3)

← R200 (Radeon 8500)

← ARB (OpenGL 1.X)

← ARB2 (OpenGL 2.0)

As of 2012 only ARB2 is relevant to modern GPUs: Not only standards provide portability they also increase longevity.

Depending on the card capability idtech4 enabled bump-mapping (A tutorial about using a hellknight I wrote a few years ago) and specular-mapping but all of them try the hardest to save as much fillrate as possible with: 

← OpenGL Scissor test (specific to each light, generated by the frontend

← Filling the Z-buffer as first step.

The backend unrolled code is as follow:

idRenderSystemLocal::EndFrame

R_IssueRenderCommands

RB_ExecuteBackEndCommands

RB_DrawView

RB_ShowOverdraw

RB_STD_DrawView

{

RB_BeginDrawingView // clear the z buffer, set the projection matrix, etc

RB_DetermineLightScale

RB_STD_FillDepthBuffer // fill the depth buffer and clear color buffer to black.

// Go through each light and draw a pass, accumulating result in the framebuffer

_DrawInteractions

{

5 GPU specific path

switch (renderer)

{

R10 (GeForce256)

R20 (geForce3)

R200 (Radeon 8500)

ARB (OpenGL 1.X)

ARB2 (OpenGL 2.0)

}

// disable stencil shadow test

qglStencilFunc( GL_ALWAYS, 128, 255 );

RB_STD_LightScale

//draw any non-light dependent shading passes (screen,neon, etc...)

int processed = RB_STD_DrawShaderPasses( drawSurfs, numDrawSurfs )

// fob and blend lights

RB_STD_FogAllLights();

// now draw any post-processing effects using _currentRender

if ( processed < numDrawSurfs )

RB_STD_DrawShaderPasses( drawSurfs+processed, numDrawSurfs-processed );

}

In order to follow the backend steps, I took a famous screen from Doom3 level And I froze the engine at every steps in the rendition : 

[pic]

Since Doom3 uses bumpmapping and specular mapping on top of the diffuse texture, to render a surface can take up to 3 textures lookup. Since a pixel can potentially be impacted by 5-7 lights it is not crazy to assume 21 textures lookup per pixels..not even accounting for overdraw. The first step of the backend is to reach 0 overdraw: Disable every shaders, write only to the depth buffer and render all geometry: 

[pic]

The depth buffer is now filled. From now on depth write is disabled and depth test is enabled.

Render first to the z-buffer may seem counter-productive at first but it is actually extremely valuable to save fillrate:

← Prevent from running expensive shaders on non-visible surfaces.

← Prevent from rendering non visible shadows to the stencil buffer.

← Since surfaces are rendered in no particular order (back to front or front to back) there would be a lot of overdraw. This step totally remove overdraw.

Note that the color buffer is cleared to black: Doom3 world is naturally pitch black since there is no "ambient" light: In order to be visible a surface/polygon must interact. with a light. This explains why Doom3 was so dark ! 

After this the engine is going to perform 11 passes (one for each light). I broke down the rendering process . The next slideshow shows each individual light pass: you can move in time with the left and right arrow.

[pic]

…更多图片

Now the details of what happens in the GPU framebuffer:

[pic]

Stencil buffer and Scissors test:

Before each light pass, if a shadow is cast by the light then the stencil test has to be enable. I won't elaborate on the depth-fail/depth pass controversy and the infamous move of Creative Labs. The source code released features the depth pass algorithm which is slower since it requires building better shadow volume. Some people have managed to put the depth fail algorithm back in the source but be aware that this is only legal in Europe !

In order to save fillrate the frontend generate a screen space rectangle to be used as scissor test by OpenGL. This avoid running shader on pixels where the surface would have been pitch black anyway due to distance from the light.

The stencil buffer just before light pass 8. Any non-black area will be lit while the other will prevent writing to the framebuffer: The mask principle sis clearly visible

[pic]

The stencil buffer just before light pass 7. The scissor set to save fillrate is clearly visible.

[pic]

Interactive surfaces

The last step in rendition is RB_STD_DrawShaderPasses: It render all surfaces that don't need light. Among them are the screen and the amazing interactive GUI surfaces that is one of the part of the engine John Carmack was the most proud off. I don't think this part of the engine ever got the respect it deserve. Back in 2004 the introduction cinematic used to be a video that would play fullscreen. After the video played the level would load and the engine would kick in... but not in Doom III:

……

Steps :

← Level load.

← Cinematic starts playing.

← At 5mn5s the camera moves away.

← The video we just saw was a SCREEN IN THE GAME ENGINE !

I remember when I saw this for the first time I thought it was a trick. I thought the video player would cut and the designer had a texture on the screen and a camera position that would match the last frame of the video. I was wrong: idTech4 can actually play videos in a GUI interactive surface elements. For this it reused RoQ: The technology that Graeme Devine brought with him when he joined id Software.

Trivia : The RoQ used for the intro was impressive for 2005 and it was an audacious move to have it in a screen within the game:

← It is 30 frames per seconds.

← Each frame is 512x512: Quite a high resolution at the time

← Each frame is generated in idCinematicLocal::ImageForTime on the CPU and uploaded on the fly to the GPU as an OpenGL texture.

But the interactive surfaces can do so much more than that thanks to scripting and its ability to call native methods. Some people got really interested and managed to have Doom 1 run in it ! 

[pic]

Trivia :The Interactive Surface technology was also reused in order to design all the menus in Doom3 (settings, main screen etc,....).

So much more....

This page is only the tip of the iceberg and it is possible to go so much deeper.

Recommended readings

|[pic] | |

| |If you are reading this and you don't own a copy of Realtime rendering you are depriving yourself of priceless |

| |information. |

DOOM3 SOURCE CODE REVIEW: PROFILING

(PART 4 OF 6)

XCode comes with a great tool for profiling: Instruments. I used it in sampling mode during a playing session (removing the game loading and level GPU pre-caching altogether):

Overview

|The high level loop shows the three threads running in|[pic] |

|the process: | |

|Main thread where gamelogic and rendition occur. | |

|Auxiliary thread were inputs are collected and sound | |

|effects are mixed. | |

|Music thread (consuming 8% of resources), created by | |

|CoreAudio and calling idAudioHardwareOSX at regular | |

|intervals (note: sound effects are done with OpenAL | |

|but do not run in their own thread). | |

Main Thread

|[pic] |The Doom 3 MainThead runs...QuakeMain! Amusingly the team that |

| |ported Quake 3 to Mac OS X must have reused some old code. Inside|

| |the time repartition is as follow: |

| |65% dedicated to graphic rendition (UpdateScreen). |

| |25% dedicated to gamelogic: This is surprisingly high for an id |

| |Software game. |

Game Logic

The gamelogic occurs in gamex86.dll space (or game.dylib on Mac OS X):

The game logic account for 25% of the Main Thread time which is unusually high. Two reasons:

← A: The virtual machine is run and allows entities to think. All of the bytecode is interpreted and the scripting language seems to have been overused.

← The Physic engine is more complex (LCP solvers) and hence more demanding than previous games. It is run on each object and include ragdoll and interactions solving.

[pic]

Renderer

As previously described the renderer is made of two parts:

← Frontend (idSessionLocal::Draw) accounting for 43.9% of the rendition process. Note that Draw is a pretty poor name since the frontend does not perform a single draw call to OpenGL !

← Backend (idRenderSessionLocale:EndFrame) accounting for 55.9% of the rendition process.

[pic]

The load distribution is pretty much even and it is not that surprising since:

← The frontend performs a lot of calculation with regard to Visual Surface Determination.

← The frontend also performs model animation and shadow silhouette finding.

← The frontend upload vertices to the GPU.

← The backend spends a lot of time setting up parameters for the shaders and communicating with the GPU (i.e: submitting triangles indices or per vertex normal matrix for bumpmapping in glDrawElements).

Renderer: Frontend

Renderer FontEnd: 

No surprise here, most of the time (91%) is spent uploading data to the GPU in VBOs (R_AddModelSurfaces). A little bit of time (4%) is visible when going through areas, trying to find all interactions (R_AddLightSurfaces). A minimal amount (2.9%) is spent in Visual Surface Determination: Traversing the BSP and running the portal system.

[pic]

Renderer: Backend

Renderer BackEnd:

The backend obviously triggers a buffer swap (GLimp_SwapBuffers) and spend some time synchronizing (10%) with the screen since the game was running in double buffering environment. 5% is the cost of avoiding totally overdraw with a first pass aiming to populate the Z-Buffer first (RB_STS_FillDepthBuffer). 

[pic]

Flat stats

[pic]

If you feel like loading the Instruments trace and exploring yourself: Here it the profile file.

DOOM3 SOURCE CODE REVIEW: SCRIPTING VM

(PART 5 OF 6)

From idTech1 to idTech3 the only thing that completely changed every time was the scripting system:

← idTech1: QuakeC running in a Virtual Machine.

← idTech2: C compiled to an x86 shared library (no virtual Machine).

← idTech3: C compiled to bytecode with LCC, running in QVM (Quake Virtual Machine). On x86 the bytecode was converted to native instructions at loadtime.

idTech4 is no exception, once again everything is different:

← The scripting is done via an Object Oriented language similar to C++.

← The language is fairly limited (no typedef, five basic types).

← It is always interpreted via a virtual machine: There is no JIT conversion to native instruction like in idTech3 (John Carmack elaborated on this during our Q&A).

A good introduction is to read the Doom3 Scripting SDK notes.

Architecture

Here is the big picture:

Compilation : At loadtime the idCompiler is fed one predetermined.script file. A serie of #include directives will result in a script stack that contains all the scripts string and every functions source code. It is scanned by an idLexer that generates basic tokens. Tokens enter the idParser and one giant bytecode is generated and stored in idProgram singleton: This constitute the Virtual Machine RAM and contains both .text and .data VM segments.

[pic]

Virtual Machine : At runtime the engine will allocate real CPU time to each idThread (one after an other) until the end of the linked list is reached. Each idThread contains an idInterpreter that saves the state of the Virtual CPU. Unless the interpreter go wild and run for more than 5,000,000 instructions it will not be pre-empted by the CPU: This is collaborative multitasking.

Compiler

The compilation pipeline is similar to what we can find reading any compiler such a V8 from Google or Clang except that there is no preprocessor. Hence functions such as "comment skipping", macro, directive (#include,#if) have to be done in the lexer and the parser.

Since the idLexer is reused all across the engine to parse every text assets (maps, entities, camera path) it is very primitive. As an example it only return five types of tokens:

← TT_STRING

← TT_LITERAL

← TT_NUMBER

← TT_NAME

← TT_PUNCTUATION

So the parser actually has to perform much more than in a "standard" compiler pipeline.

At startup the idCompiler load the first script script/doom_main.script, a serie of #include will build a stack of scripts that are combined in one giant one.

The Parser seems to be a standard recursive descent top down parser. The scripting language grammar seems to be LL(1) necessitating 0 backtrack (even though the Lexer has the capability to "unread" up to one token). If you ever got a chance of reading the dragon book you will not be lost...otherwise this is a good reason to get started ;) !

Interpreter

At runtime, events trigger the creation of idThread that are not Operating System threads but Virtual Machine threads. They are given some runtime by the CPU. Each idThread has an idInterpreter that keeps track of the Intruction Pointer and the two stacks (one for the data/parameters and one to keep track of the function calls).

Execution occurs in idInterpreter::Execute until the interpreter relinquish control of the Virtual Machine: This is collaborative multi-tasking.

idThread::Execute

bool idInterpreter::Execute(void)

{

doneProcessing = false;

while( !doneProcessing && !threadDying )

{

instructionPointer++;

st = &gameLocal.program.GetStatement( instructionPointer );

//op is an unsigned short, the VM can have 65,535 opcodes

switch( st->op ) {

.

.

.

}

}

}

Once the idInterpreter relinquish control the next idThread::Execute method is called until no more thread need execution time. The overal architecture reminded me a lot of Another World VM design.

Trivia : The bytecode is never converted to x86 instructions since it was not meant to be heavily used. But in the end too much was done via scripting and Doom3 would probably have benefited immensely from a JIT x86 converted just like Quake3 had.

Recommended readings

Great way to understand more about the virtual machine is to read the classic Compilers: Principles, Techniques, and Tools :

[pic]

DOOM3 SOURCE CODE REVIEW: INTERVIEWS

(PART 6 OF 6)

In this page are grouped most of the interviews I found about idTech4, sorted in reverse chronological order.

1996-2007: All plans and interviews from John Carmack

Every John Carmack plan from 1996 to 2007: here.

Every John Carmack interviews from 1996 to 2007 : here.

2012 Q&A with John Carmack

A few questions asked while reading the source code.

| |

|Fabien Sanglard - What motivated to move the team to C++ for idTech4 ? |

| |

|John Carmack - There was a sense of inevitability to it at that point, but only about half the programmers really had C++ background in the |

|beginning. I had C and Objective-C background, and I sort of "slid into C++" by just looking at the code that the C++ guys were writing. In |

|hindsight, I wish I had budgeted the time to thoroughly research and explore the language before just starting to use it. |

| |

|You may still be able to tell that the renderer code was largely developed in C, then sort of skinned into C++. |

| |

|Today, I do firmly believe that C++ is the right language for large, multi-developer projects with critical performance requirements, and Tech |

|5 is a lot better off for the Doom 3 experience. |

| |

|Fabien Sanglard - So far only .map files were text-based but with idTech4 everything is text-based: Binary seems to have been abandoned. It |

|slows down loading significantly since you have to idLexer everything....and in return I am not sure what you got. Was it to make it easier to |

|the mod community ? |

| |

|John Carmack - In hindsight, this was a mistake. There are benefits during development for text based formats, but it isn't worth the load time|

|costs. It might have been justified for the animation system, which went through a significant development process during D3 and had to |

|interact with an exporter from Maya, but it certainly wasn't for general static models. |

| |

|Fabien Sanglard - The rendering system is now broken down in a frontend/backend: It reminds me of the design of a compiler which usually has a |

|frontend->IR->backend pipeline. What this inspired by the design of LCC which was used for Quake3 bytecode generation ? I wonder what are the |

|advantages over a monolithic renderer like Doom, idTech1 and idTech2. |

| |

|John Carmack - This was explicitly to support dual processor systems. It worked well on my dev system, but it never seemed stable enough in |

|broad use, so we backed off from it. Interestingly, we only just found out last year why it was problematic (the same thing applied to Rage’s |

|r_useSMP option, which we had to disable on the PC) – on windows, OpenGL can only safely draw to a window that was created by the same thread. |

|We created the window on the launch thread, but then did all the rendering on a separate render thread. It would be nice if doing this just |

|failed with a clear error, but instead it works on some systems and randomly fails on others for no apparent reason. |

| |

|The Doom 4 codebase now jumps through hoops to create the game window from the render thread and pump messages on it, but the better solution, |

|which I have implemented in another project under development, is to leave the rendering on the launch thread, and run the game logic in the |

|spawned thread. |

| |

|Fabien Sanglard - Quake3 VM was converting the bytecode to x86 instruction at loadtime ; combining the security of Quake1 and the speed of |

|Quake2. In idTech4 the bytecode is always interpreted: Why not have a "onload" bytecode to x86 compiler ? Did you elect the speed gain was not |

|worth the development time ? |

| |

|John Carmack - Q1 and Q3 implemented all of the “game code” in the (potentially) interpreted language. D3 was only supposed to use the |

|interpreted code for “scripting” events. It still got overused, and we did have performance issues related to it. Our takeaway was to severely |

|deprecate its use for Rage – there is still a scripting engine there, but it is really only used for commanding things to happen in the levels,|

|not anything resembling enemy or weapon behavior. We still believe this is the correct call – real programming should be done in real |

|programming languages, with proper debugging and tool support. |

| |

|Fabien Sanglard - The frontend/backend form a pipeline which is very friendly to SMP systems/functional programming: Is it an approach that was|

|satisfactory and then generalized in idTech5 to every subsystems (physics, renderer, network, etc...) ? |

| |

|John Carmack - D3 was set up to have game code and the rendering front end run on one core, and the rendering back end that actually issued |

|OpenGL calls on another. This provided good balance on the PC, where OpenGL driver overhead is high. For Rage, we optimized more for the |

|consoles where graphics API overhead is very low, running all rendering on one thread and just the game code on another thread. In most |

|performance limited areas, the game code still dominated. |

| |

|More CPU cycles in Rage are spent in a general “job system” that takes lists of relatively fine grained work and parcels them out between all |

|available cores. This was pretty much required for taking good advantage of the cell processors on the PS3, but it is generally a better |

|direction than manual thread scheduling once you are above two or three cores. |

| |

|Fabien Sanglard - Is there any aspect of the design and architecture of the code you were particularly proud of in idTech4 ? |

| |

|John Carmack - I think the in-game GUI system is also worth mentioning – it added a lot to the character of the game. |

2004 (October): Interview for "The making of Doom3" book .

In October 2004 was released a pretty good book by Steven L.Kent: "The making of Doom III" where you can find a very insightful interview of John Carmack in the last chapter. Since the book is out of print I don't think it is an issue to transcrit part of the interview here:

|Steven Kent - Was the Doom3 graphic engine harder to create than past engines ? |

| |

|John Carmack - That side of the development went really nicely. The features were all pretty much ready years ago, and I spent a year or so |

|tuning it up and adding the different options and parameterizations that people needed to get exactly the effects that they wanted. All of that|

|went kind of according to our original schedule. |

| |

| |

|Steven Kent - I understand you created Doom3 in C++. |

| |

|John Carmack - DOOM is our first game programmed in C++. I actually did the original renderer in straight C working inside the QUAKE III Areana|

|framework, but most of it has been objectified since then, and all of our new code is set up that way. |

| |

|It's been a mixed bag. There have been some bad things from going about it that way; but in general, it's been moderately positive for the |

|development stuff. Having as much as we do now, having the large objects, it's been a useful thing.  |

| |

|Steven Kent - Just how much code went into creating the doom 3 rendering engine ? |

| |

|John Carmack - The actual rendering side of things, the core medium, is not all that big. It's not that much larger that the previous stuff we |

|have done. |

|That's actually something that causes me a fair amount of concern. We have more programmers working on DOOM3...We've had five programmers at a |

|time, which is much more that we have had on previous projects. Perhaps even more significantly, individual programmers have been creating |

|subsystems effectively from scratch, where in the previous games I wrote the face of everything. I produced a functional system, and we would |

|have usually a secondary programmer kind of flesh out of the stuff that I wrote; but I wrote the entire basic framework, and it fit together |

|nicely. I had consistent vision throughout everything. |

| |

|With DOOM3, we started off with multiple programmers writing large subsystems form scratch, which means that things don't git together as |

|nicely as when we started off with one person setting everything up. There are always little inefficiencies you get when you have different |

|people [who] dont' [always] think in exactly the same way. You look for the synergies between the different areas and the ways you can simplify|

|things down. There's always a strong desire with functionality to kind of pile things down. |

| |

|Historically, I always resisted this. I've been one of the big believers in keeping it as simple and clear as possible. It pays off in |

|maintenance and flexibility and stuff like that. |

|As we have more people working on things, a lot of features get added. That is definitely a two-edged sword. Sure, new features are great; but |

|what's not always obvious is that every time you add a feature, you are at the very least increasing entropy to the code, if not actually |

|breaking things. You are adding bugs and inefficiencies. |

|that's one of my larger concerns with increasing the feature count and the number od developers. IN previous games, when it all came from me, |

|any time there was any problem I could go in very rapidly and find what the source of the problem was. Now, we've got lots of situations where |

|if something is not right, it could be like "Oh that's Jan Paul's code, or Jim's code, or Robert's code. It's not so much a case where one |

|person can just go in and immediately diagnose and fix things. So, there is a level of inefficiency. |

|It's certainly manageable. Lots of projects that are managed in the world today require huge numbers of resources and complexity, but you add |

|this additional layer of oversight and accept this additional level of inefficiency.  |

| |

|Steven Kent - What new features have you added to the Doom3 engine ? |

| |

|John Carmack - Well, the fundamental thing about it on the rendering side is that it completely, properly unifies the lighting of surfaces. |

|With previous games, you always had to use a collection of tricks and hacks to do your lighting. We would do light maps, blurring, ray-casted |

|light maps for the static lighting, and static lights on static surfaces in the games. We used a different level-point Gouraud thing doing the |

|static lights on dynamic surfaces moving around and then mushing together all of the dynamic lights onto the dynamic surfaces to modify the |

|Gouraud shading. |

| |

|There was this matrix of four things you would have static surfaces, dynamic surfaces, static lights, and dynamic lights, and there were four |

|different sorts of ways that things got rendered. You might have lights this way and that way for one, and you might have shadows a different |

|way and lighting a different way for another thing. That was more or less forced because of the limitations that we had to work with in terms |

|of what the hardware and processors could do. I always thought that was a negative thing.Things Shaved differently as to whether they were |

|going to move or not. i referred to it as the 'Hanna- Barbera effect/You could always tell these rocks were going to fall away because they |

|looked a little different than the cell painting behind them. The big goal for DOOM 3 was to unify the lighting and shading so that everything |

|behaved the same no matter where it came from, whether it's moving around or a fixed Part of the world. |

| |

|Lots of effort still goes into optimizing things when they are static, but the resulting pixels are exactly the same. Now, somewhat tied in |

|with that, lighting becomes this first-class object rather than lots of lights mushed around with the world, kind of painted with light like |

|the QUAKE series would do.The bump mapping ties in with the lighting and the shadowing to produce the DOOM 3 look and visuals. This is the |

|visual style... In the next five years, we'll see this become the standard. Right now, there is the standard sort of QUAKE level of rendering |

|in graphics, where you have light-mapped worlds and Gouraud-shaded characters.That is pretty much where the industry standard is right now. |

|The Industry standard will be basically bump mapped surfaces and proper shadowing for the next five years or so. That's what defines the |

|graphics side fundamentally on a technical level. Now, what you do with that light surface interaction crosses the bridge between what the game|

|does, what the scripting does, the interactions with the renderer, and how models are built, ties in with lots of areas that you have as |

|technical data points. There are two different particle systems in DOOM 3. One can be [used for] moving things around and affecting things |

|dynamically like the smoke drifting out of guns.The other is more of a static effect type of thing like smoke and bubbles coming in the world. |

|You 11 differentiate those two for performance reasons because the things that are just effects in the world ... you don't want to mess with |

|them if they are not in view. So, particle effects are just dynamic models that get tossed in when necessary. |

| |

|The animation system is a big part of it. The animation subsystem is a big part of the coding in DOOM 3, where the motion of the characters |

|determines a lot of things that happen. All this requires very complicated set of interaction, and that is a lot of what Jim (Dose] has been |

|working on. People look at that and think of it as sort of a render feature, but the way DOOM 3 is set up, it's not really part of the |

|renderer. It's mostly 11 Part of the game code. | The renderer just looks $ at it as, "Okay, here is something that is generated as part of a |

|model surface. Now I need to make lighting and shadowing for it." And that also then ties in with the animation system and how it interacts |

|with the rag doll system that Jan Paul wrote, which interacts with the physics and the precise collision detection which does feedback with the|

|renderer and model data structure. AH of that kind of goes back and forth a lot. The scripting system that we have, to let the level designers |

|add more complex stuff, is something that is actually an outgrowth of fairly old technology.That took an interesting developmental path. Back |

|in the original QUAKE days, the game code was done in the U-C. interpretive language. One of the licensees, Ritual where Jim Dos used to work |

|evolved and expanded that [the scripting system] in a lot of ways for their game Sin.Then that technology was licensed for Alice and used in |

|Heavy Metal. They had been developing this branched path while we had gone with QUAKE II back to the in-code DLLs and stayed that way with |

|QUAKE III Arena. |

| |

|We actually brought most of that evolution ack in with DOOM 3.There was a rewrite where we restructured and cleaned up and got to apply the |

|lessons learned. But that is not a good way to write game code. I actually think we made some mistakes by doing more stuff in script than we |

|should have for development reasons. It [scripting] is a convenient thing for level designers to be able to make more interesting things happen|

|than they could with just tying things together in the level editor. One of our big, not so much technological improvements, but structural |

|architectural improvements, is the integration of all of our utilities into the executable. And that was something that actually saved a bunch |

|of code. I moaned and complained about the code size for everything, but integrating the utilities saved probably some tens of thousands of |

|lines of code that we used to have duplicated in slightly modified forms. We had three places that code could live: the game itself, the level |

|editor, and the off-line utilities. All of them had similar sets of things that were not quite similar enough that they could share a library |

|or something. Pulling all this together was a nice way to unify all of that, and one of the strong reasons for unifying them was also to allow |

|the editor to use the renderer exactly as the game uses it, which is something we have never done in our previous titles. It allows designers |

|to see exactly what their level is going to look like with all the lighting, shadowing, and bump mapping, animated textures, animated |

|particles, and all of that stuff, without having to actually load it ud into the game. One of the real gating factors to creativity in the |

|QUAKE generation of games was these significant preprocess times that you had to go through to get your simple, shaded view in the editor into |

|the game with all of the rendering effects. In small areas, it might only have been several minutes, but in the full-size levels, the times |

|were too long. Even when we were using these big, expensive multiprocessor machines, there were a lot of levels that would take over 30 minutes|

|to process. Some of the licensees did not make as effective decisions on the complexity issues of the maps, they did not have the big expensive|

|processing machines, and they would have levels that would take up to eight hours to process. There's an interesting slope of inactivity that |

|you get where the most creative aspect is when you are messing with something interactively where you are actually twiddling a kn ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download