2011年9月26日星期一

File Management Using Resource Files

File Management Using
Resource Files
Bruno Sousa, Fireworks Interactive
gems2@fireworks-interactive.com


As games increase in size (I think the grand prize goes to Phantasmagoria with
seven CDs), there is a need for organization of the game data. Having 10 files in
the same directory as the executable is acceptable, but having 10,000 is not. Moreover,
there is the directory structure, sometimes going five or more levels deep, which
is a pain to work with. Because our games will hardly resemble Windows Explorer, we
need to find a clean, fast way to store and organize our data. This is where resource
files come into play. Resource files give us the power to encapsulate files and directories into a single file, with a useful organization. They can also take advantage of compression, encryption, and any other features we might need.

What Is a Resource File?
We already use resource files all the time in our daily work—examples of these are
WinZip, the Windows installer, and backup programs. A resource file is nothing
more than a representation of data, usually from multiple files, but stored in just one
file (see Listing 1.15.1). Using directories, we can make a resource file work just like a
hard drive's file system does.

Listing 1.15.1 Resource file structure.
Signature = "SEALRFGNU" + '\0'
Version = 1.0
Number of Files = 58
Offset of First File = 19
[File 1]
[File 2]
[File 3]
[File .]
[File .]
[File .]
[File Number Of Files - 1]
[File Number Of Files]

Each lump (we will start calling files "lumps" from now on) in the resource file
has its own structure, followed by all of the data (see Listing 1.15.2).


Listing 1.15.2 File lump structure.
File Size = 14,340
Filename = "/bmp/Bob.bmp" + '\0'
Flags = COMPRESSED
Flagslnfo = OxF34A400B
[Byte 1]
[Byte 2]
[Byte 3]
[Byte .]
[Byte .]
[Byte .]
[Byte File Size - 1]
[Byte File Size]

Design

Before we do anything else, we'll need to name our resource system. We can then use
the name to give each component a special naming scheme, one that will differentiate it from the other parts of the game. Let's call this system the "Seal Resource File System," abbreviated to SRFS, and use "si" for class prefixes.

First, we need a resource file header. By looking at Listing 1.15.1, it's easy to see
that we are keeping our system simple. However, that doesn't mean it isn't powerful, it means that it was designed to accommodate the most-needed features and still retain a fairly understandable syntax and structure.
Our resource file header gives us all the relevant information about the system.
Multiple file types are used in games, and for each type, there is usually a file header
that contains something unique to differentiate it from other file types. SRFS is no
different, so the first data in its header is the file signature. This is usually a 5- to 10-
character string, and is required so that we can identify the file as a valid Seal resource file. The version information is pretty straightforward—it is used to keep track of the file's version, which is required for a very simple reason: if we decide to upgrade our system by adding new features or sorting the lumps differently, we need a way to verify if the file being used supports these new features, and if so, use the latest code. Otherwise, we should go back to the older code—backward compatibility across versions is an important design issue and should not be forgotten. The next field in the header is for special flags. For our first version of the file system, we won't use this, so it must always be NULL (0). Possible uses for this flag are described in the For the Future section. Following this is the number of lumps contained in the resource file, and the offset to the first lump. This offset is required to get back to the beginning of the resource file if we happen to get lost, and can also be used to support future versions of this system. Extra information could be added after this header for later versions, and the offset will point to the first lump.

We now move to our lump header, which holds the information we need to start
retrieving our data. We start with the lump size in bytes, followed by name and directory, stored as a fixed-length, NULL-terminated string. Following this is the flags
member, which specifies the type of algorithm(s) used on the lump, such as encryption or compression. After that is information about the algorithm, which can contain a checksum for encryption or dictionary information for compression (the exact details depend on the algorithms). Finally, after all of this comes the lump information stored in a binary form.
Our system has only two modules: a resource file module and a lump module. To
be able to use a lump, we need to load it from the resource file and possibly decrypt or decompress it into a block of memory, which can be accessed normally. Some systems prefer to encapsulate all functionality into the resource file module, and even allow direct access to lump data from within this module. This approach certainly has advantages, but the biggest disadvantage is probably that we need to have the wholeresource in memory at once, unless we use only raw data or complicated algorithms to dynamically uncompress or decrypt our lump data to memory. This is a difficultprocess and beyond the scope of this gem.We need functions to open the resource file, read the header, open individual
lumps, read information from lumps, and get data from lumps. These are covered in
the Implementation section.

Implementation

The sample code included in the CD is written in C++, but for the text, we will use
pseudocode so it will be easy to implement in any language.
The sICLump Module
Our lump module is similar to file streams in C++ or other language implementations
of files in that we can write to it. Unfortunately, updating the resource file with a
lump is very troublesome due to the nature of C++ streams. We can't add data to the
middle of the stream—we can only replace it—and we can't modify the parent
resource file.
DWORD dwLumpSize;
STRING szLumpName;
DWORD dwLumpPosition;
BYTE [dwLumpSize] abData;
The variable dwLumpSize is a double word (32 bits) that specifies the size of the
lump, szLumpName is a string describing die lump's name, dwLumpPosition keeps the
lump's pointer position, and abData is an array of bytes with the lump information.
Here are the sICLump module functions:
DWORD GetLumpSize (void);
STRING GetLumpName (void);

DWORD Read (BYTE [dwReadSize] abBuffer, DWORD dwReadSize);
DWORD Write (BYTE [dwReadSize] abBuffer, DWORD dwWriteSize);
DWORD Seek (DWORD dwSeekPosition, DWORD dwSeekType);
BOOLEAN IsValid (void);
GetLumpSizeO retrieves the lump's size, and GetLumpName() retrieves the lump's
name. Read() reads dwReadSize bytes into sbBuffer, and Write () does the exact
opposite, writing dwWriteSize bytes to sbBuffer. Seek() moves the lump's pointer by
a given number from a seek position, and I sValid () verifies if the lump is valid.

The sICResourceFile Module


This module has all the functionality needed to load any lump inside the resource.
The module members are nearly the same as those in the resource file header.
DWORD dwVersion;
DWORD dwFlags;
DWORD dwNumberOfLumps;
DWORD dwOffset;
STRING szCurrentDirectory;
FILE fFile;

The use of these members has already been described, so here is a brief definition
of each. dwVersion is a double word that specifies the file version, dwFlags is a double
word containing any special flags for the lump, dwNumberOfLumps is the number of
lumps in the resource, dwOffiet gives us the position in bytes where the first lump is
located, szCurrentDirectory is the directory we are in, and fFile is the actual C++
stream.
Now for the real meat of our system, the sICResourceFile functions—those that
we use to access each lump individually.
void OpenLump (STRING szLumpName, slCLump inOutLump);
void IsLumpValid (STRING szLumpName);
void SetCurrentDirectory (STRING szDirectory);
STRING GetCurrentDirectory (void);
Each of these functions is very simple. IsLumpValid () checks to see if a file with a
given szLumpName exists in the resource. SetCurrentDirectory () sets the resource file
directory to szDirectory. This directory name is prepended to each lump's name
when accessing individual lumps within the resource file. GetCurrentDirectory()
returns the current directory.
Now for our Open function. This function opens a lump within the resource file,
and the logic behind the algorithm is described in pseudocode.
Check flags of Lump
if Compressed
OpenLumpCompressed (szLumpName, inOutLump)
if Encrypted

OpenLumpEncrypted (szLumpName, inOutLump)
if Compressed and Encrypted
OpenLumpCompressedEncrypted (szLumpName, inOutLump)
else
OpenLumpRaw (szLumpName, inOutLump)
end if
Depending on the lump type, the appropriate function to open the lump is
called, thus maintaining a nice design and simple code. The source of each function is
included in the CD.

Last Words about the Implementation
Some support functions that are used to open the file or to keep track of information
that can't be called directly are not represented in the preceding text. It is advisable to
check the source code on the CD, which is well commented and easy to follow The
ON me ca algorithms for compression and encryption are simple RLE compression and bit-wise
encryption, the actual implementations of which are beyond the scope of this gem
and must be researched separately. Information about useful public domain algorithms
is at [WotsitOO], [WheelerOO], and [Gillies98].

Conclusion

This system can be easily upgraded or adapted to any project. Some possibilities
include supporting date and time validation, copy protection algorithms, checksums,
a data pool, and better compression and encryption algorithms. There is no limit.
References
[Hargrove98] Hargrove, Chris, "Code on the Cob 6," available online at
www.loonygames.com/content/1.11/cote/, November 2-6, 1998.
[TownerOO] Towner, Jesse, "Resource Files Explained," available online at
www.gamedev.net/reference/programming/features/resfiles/, January 11, 2000.
[WheelerOO] Wheeler, David J, et al, "TEA, The Tiny Encryption Algorithm," available
online at www.cl.cam.ac.uk/ftp/users/djw3/tea.ps.
[WotsitOO] Wotsit.org, "The Programmer's File Format Collection: Archive Files,"
available online atwww.wotsit.org, 1996—2000.
[Gillies98] Gillies, David A. G., "The Tiny Encryption Algorithm," available online
at http://vader.brad.ac.uk/tea/tea.shtml, 1995-1998.

2011年9月25日星期日

Resource Files Explained

  http://www.gamedev.net/page/resources/_/reference/programming/game-programming/300/resource-files-explained-r902
By Jesse Towner
Download attached article resource

How This Document Is Organized
  • Introduction
  • Resource Files In Depth
  • Design and Theory
  • Implementation
  • For the Future

Introduction

The time has come for you to venture into a new world of the game development universe - the elegant world of resource files. Perhaps you may be wondering what a resource file is, or you may have a vague idea of what they are and you're interested in using them. Perhaps you may even be fluent in the use of resource files. Regardless of which position you're in, we'll now go over what resource files are, and what they aren't. Let's begin by taking a non-technical approach to visualizing what they are; this way, we can have a much better overall understanding of the concept behind them. Suppose you have a large leather magic bag, which has the property of being essentially bottomless. You can take various items and put them in the bag, and you can these same items out. Although you don't necessarily have the option of taking a good gander at the contents of the bag, there is no problem in finding a particular item in order to take it out. This bag is also virtually weightless - you can easily hold it, carry it, and manage it. Now lets take this strange model of thought and apply it to the software-programming field, in our case, video game development. The magical leather bag becomes the resource file, and the items that we could put into the bag become the various pieces of data of which are used by the game – bitmaps, sounds, game-logic scripts, and so on. Additionally, the idea of being able to easily manage the bag still holds true, only in this case, we are managing the data in the resource file.

So basically, a resource file acts as an abstraction layer for managing multiple items of data; however, this is only one of many advantages that surface once you partake in the use of such files. Before we discuss the other advantages to using them, as well as going over a couple of the disadvantages, we will have a look at where resource files are already in use today.


Resource Files In Depth

Almost everywhere you look you will see some form of a resource file in use – perhaps even unknowingly. You may be wondering if resource files are in fact database files, and indeed, they are. Nevertheless, we still use the term "resource file" to discern between the two: a database usually keeps track of similar items of data, whereas a resource file has the ability of storing dissimilar items of data. Keeping this concept in mind, we can see that resource files are used by a great number of computer games as well as applications. Let's look at a couple specific examples of resource files in use. The "Zip File" archiving file format, in wide use today, is a great illustration of what a resource file is supposed to be. It can take multiple individual files and store them internally in a single file. Additionally, you have the option of compressing the files you add to a Zip file for optimum storage and transference. This is a good example because it shows you that you can take the data you want to add to a resource file and mutate it for the better without destroying it. Another example we'll look at is the WAD file format used by id Software in the Doom engine. The file format allowed the various items of game data to be stored in a single file – this included images, sprites, sounds, music, map data, and the likes. This will be the type of resource file format that we'll be most interested in, although we will implement our own original file format. Nonetheless, this will be that path that we take.

So, why do we even want to go through all the trouble of implementing resource files in our games or game engines? Why not just have all of the individual files located in a number of sub-directories within the directory of our game? Here, I will explain the reasons behind wanting to use resource files to store data for games. Obviously, being able to easily manage multiple items of data was an advantage we already discussed. Without resource files, and in a game where there is a lot of external data (images, sounds, etc.), you would have to store the data as individual files in a complex directory structure. This can amount to a huge mess that is almost impossible to manage and to keep up to date. With resource files, all of your data is located in one or a couple of large resource files – all of the data is there, elegantly encapsulated within a single construct. Additionally, one cannot go on without mentioning the professional "look" that resource files provide – this is a big plus. But, is there another advantage to having all of the files located within a resource file? Indeed, there is and it happens to be something very important - security. Without using resource files, most of the game data would be located out in the open, and most likely in it's native format. This would allow any average Joe, who happened to have your product installed, to modify, replace, or pirate the game data. In many cases, this is rather undesirable. When the data is in a resource file, the end user of your software has greater difficulty getting access to the individual items of data; of course, anyone can get access to data in a resource file with a fair amount of hackin' effort. However, having your data located within a resource file acts as a slight deterrent against this sort of activity. Also, when using resource files you can add additional security support. For instance, you may want to add a form of checksum error checking where you would make sure the size of each lump or the data itself hasn't changed; this would be protection against illegal modification of the data by adding, removing, and modifying bytes. Even though you may decide to restrict the access of the data in your resource files, many people ship game editors with their products anyway. Regardless, you can still restrict unofficial modification of the file.

The question we must ask our selves now is "should we go through all of the trouble of implementing a resource file format?" Sure, you can also tackle security issues by creating your own file formats for the individual types of data like images, sounds, and on and on – plus, you can get around the complex directory structure problem. So, is there another reason for using resource files? The answer is undeniably yes. Due to the inefficiency of the file systems used by Windows and DOS (FAT16 and FAT32), a lot of disk space is lost. In the FAT16 file system, each file block takes up a minimum of 32KB; therefore, if you had a file that was 2KB in size, it in reality would take up 32KB. Of course, the FAT32 file system is much better, but it still isn't perfect. Anyway, when using resource files, all of the data is packed together, allowing for more efficient disk space usage.

Now, if those aren't good enough reasons for using resource files for storing data for your games, I don't know what is! Of course, there are a couple of disadvantages to using resource files. The first such disadvantage is that of requiring more time and effort to implement a good resource file manager. But once you have your implementation up and running, it's smooth riding from there on. The second disadvantage is that of requiring more memory to operate – you must use up a fair amount of RAM when working with resource files. Nonetheless, this is a trivial problem, especially considering that most PC's now ship with at least 64 megabytes of RAM. It is quite lucidly apparent that the advantages most certainly out-weigh the disadvantages – so what are we waiting for! Let's begin our adventure into the technical side of resource files.


Design and Theory

I'm sure you're quite worked up about jumping in and implementing a resource file management library, and heck, so am I! But first, we must go over some design issues. I will use the term "lump" to refer to a data item within a resource file; thus, a lump can be an image, or a game object, or any piece of data we may stick in a resource file. Also, when I talk of an interfacing API, I mean a front-end API that can be used to manage resource files. Now, in general, the structure of a resource file can be broken up into three sections: the header, the data lump information table, and the actual data lump section. The header section of a resource file contains general information about the file; for instance, the number of data lumps within the resource file, the location of the Lump Information Table within the file, and a means for identifying resource files from other file types. The Header section is critical because it provides the interfacing API with the information required in order loading the file in from disk. The Lump Information Table section of the resource file contains information about each data lump within the file. The information stored about each lump might entail it's position and size within the file, as well as it's name or identification medium. For each Data Lump in the resource file, there is one corresponding entry or node in the Lump Info Table; thus, the Info Table is a list of nodes where each node corresponds to and contains information about a particular Data Lump. The Data Lump Section is the section in the resource file where the data for each Data Lump is actually stored. The lumps are generally stored in a linear fashion such that the data for "Lump One" is to be located first, the data for "Lump Two" is to be found next, and so on. These data lumps may be compressed or encrypted depending upon the file format implementation desired by the programmer. However, I should mention the position of each of these sections within the file. For a few reasons that will later become apparent, we will position the Data Lump Info Table at the end of the file, having the actual data lump section in the middle. We do this because it is much more efficient to do so, if we didn't do this, we'd end up doing multiple passes during the save process. Anyway, as we now have the general structure of how we're going to set up our files, we'll discuss our interfacing API that we must design.

So, what do we need in a resource file management library? Well, the interface must be elegant and intuitive, whilst at the same time providing enough flexibility and scalability to deem it as reusable. Well, let's first discuss the user-friendly interface, then we'll talk about scalability. So, what sort of functionality are we going to need? We should provide routines for quick creation, modification, and clean up; hence, we'll need three routines for opening, saving, and closing resource files. We must also provide some routines for managing lumps; this will consist of routines for creating, deleting, modifying, loading, and unloading data lumps. Those are the basic routines we'll need; although, some extra routines for error checking and for displaying information about a resource file wouldn't hurt - but what about flexibility and scalability?

Indeed, this is something that we must take into consideration if we're going to use our resource file management library more than once. Basically, we should provide some means of being able to handle different data lump types within our manager. But how are we going to accomplish this? It's all quite simple really – we'll create a lump handler system. A lump handler will consist of three low-level routines, all for managing a particular data lump type. These routines would allow the interfacing API to load, to save, and to unload a particular data type. So, what we would do is create a lump handler for managing a bitmap lump type, a second handler for managing sound lumps, and so on. The interfacing API would then look at a lump, figure out which lump handler to use, and call the correct routines. In order to provide scalability, we should allow a programmer to create his or her very own custom lump handlers and allow the lump handler to be added to the lump-handling list on the fly. So, we'll need two more routines within our interfacing API: one to add a lump handler, another to remove a handler. Because we'll using OOP and C++ to implement our resource file management library, a lump handler will be an object that will be used by the resource file management class. Anyway, now that we have an outline of what we're going to need to do in order to get a good implementation happening, let's actually implement this baby!


Implementation

Let us now begin our design of the structures that will make up our resource file format. We'll start with the structure of the resource file's header.

// Structure defining the header of our resource files.
typedef struct tagRESFILEHEADER
{
   CHAR cSignature[4];      // 4-character identification value.
   BYTE byVersionLo;        // Minor version.
   BYTE byVersionHi;        // Major version.
   WORD wFlags;          // Special flags for the resource file.
   DWORD dwNumLumps;        // Number of lumps in the resource file.
   DWORD dwChecksum;        // Checksum (0FFFFFFFFh ^ file size).
   DWORD dwLumpListOffset;  // Offset of the lump info list.
} RESFILEHEADER, *LPRESFILEHEADER;

As you can see, the header is indeed a critical part of a resource file - let's go over the various fields in this structure. The first field, "cSignature", is merely a file identification medium of which we use to identify resource files from other file types: it contains the non null-terminated string "RESF". The next two fields are the resource file's version numbers; we can use these values for keeping track of future revisions of the file format. The "wFlags" field is used to hold any special settings the resource file may have. The "dwNumLumps" holds the value indicating the number of data lumps that are currently within the file. Next, "dwChecksum" works as a simple and quick error checking method, which is used to make sure the size of the resource file was not illegally modified. Finally, "dwLumpListOffset" is the offset into the resource file where the data lump information table is stored.

Although the Lump Info Table is stored at the end of the file, we'll go over the structure that is used to create the table.

// Structure of each element in the lump info list. Holds information which
// can be used to retrieve the contents of a lump.
typedef struct tagRESFILELUMPINFO
{
   DWORD dwSize;      // Size of the lump, in bytes.
   DWORD dwOffset;    // Offset in the file of the lump.
   DWORD dwType;      // The type of lump.
   BYTE byNumChar;    // Length in characters of the lump's name.

   // ... The lump's name string goes here, but it can be of variable length;
   // therefore, we manually load it in to memory pointed to by 'lpName' in
   // the RESFILELUMP structure.

} RESFILELUMPINFO, *LPRESFILELUMPINFO;

Basically, the Lump Info Table is an array of RESFILELUMPINFO structures. The "dwLumpListOffset" field of the RESFILEHEADER structure simply points to the first entry in this list. When a resource file is opened and loaded in, this entire table is loaded into memory as a linked list. This allows us to easily add new lumps to the resource file. However, when we load in resource file, we load the list into memory using another structure…

// Structure used to keep track of a lump in memory.
typedef struct tagRESFILELUMP
{
   DWORD dwSize;      // Size of the lump, in bytes.
   DWORD dwOffset;    // Offset in the file of the lump.
   DWORD dwType;      // The type of lump.
   BOOL bNoFree;      // Should we automatically free the data?
   LPSTR lpName;      // Pointer to the name of the lump.
   LPVOID lpData;  // Pointer to the data of the lump.
   LPVOID *lplpData;  // Pointer to the data address variable.
   tagRESFILELUMP *lpNextLump;   // Pointer to the next lump in the list.
} RESFILELUMP, *LPRESFILELUMP;
Basically, this is the same as RESFILELUMPINFO, although this structure has a few more fields for managing lumps in memory. The "lpName" is the equivalent of a file name, only in this case, it happens to be a lump name. I should make a point of how the "lpData" field can be used. It doesn't necessarily have to point to a raw data buffer, it can also point to a structure. Then, it's merely a process of casting to the structure pointer type. This is useful when writing lump handlers that directly load lumps into structures; for example, loading a bitmap lump into a DirectDraw surface. Notice that the "dwType" field is evident in both of the lump structures. This is used as for identification when matching a particular lump with the corresponding lump handler. The management class just searches through the lump handler list, until it finds a match. Let's have a look at the structure used for keeping track of a lump handler.

typedef struct tagRESFILELUMPHANDLER
{
   DWORD dwType;
   BOOL __stdcall (* lpLoad)(fstream *, LPRESFILELUMP);
   BOOL __stdcall (* lpSave)(fstream *, LPRESFILELUMP);
   VOID __stdcall (* lpUnload)(LPRESFILELUMP);
   tagRESFILELUMPHANDLER *lpNextHandler;
} RESFILELUMPHANDLER, *LPRESFILELUMPHANDLER;
This structure also keeps a "dwType" field handy so that we can match it to a lump. Note the function pointers – they point to the functions that load, save, and unload a lump of that type. The loading and saving functions take a file stream pointer, along with a pointer to the lump's info. Because the management class takes care of positioning the file pointer to the beginning of the lump, writing these routines is rather trivial. In my implementation, I defined a "Raw Data" lump type that becomes the default lump type when one can't be determined. Here are the lump handling functions for the "Raw Data" lump type.

//- resLoadRawLump ------------------------------------------------------------
// Description: Loads in the data for a raw data lump.
// Parameters:  lpStream - pointer to the file stream object.
//              lpLump   - pointer to the lump node.
// Returns:  TRUE if successful, FALSE otherwise.
//-----------------------------------------------------------------------------
BOOL __stdcall resLoadRawLump(fstream *lpStream, LPRESFILELUMP lpLump)
{
   // First, allocate enough memory to the load the data into.
   if ((lpLump->lpData = LPVOID(new CHAR[lpLump->dwSize])) == NULL)
      return FALSE;

   // Next, load in the data.
   lpStream->read(PCHAR(lpLump->lpData), lpLump->dwSize);
   if (lpStream->fail())
   {
      delete [] PCHAR(lpLump->lpData);
      lpLump->lpData = NULL;
      return FALSE;
   }

   // It worked, so return true.
   return TRUE;
}

//- resSaveRawLump ------------------------------------------------------------
// Description: Saves the data of a raw data lump to a file.
// Parameters:  lpStream - pointer to the file stream object.
//              lpLump   - pointer to the lump node.
// Returns:  TRUE if successful, FALSE otherwise.
//-----------------------------------------------------------------------------
BOOL __stdcall resSaveRawLump(fstream *lpStream, LPRESFILELUMP lpLump)
{
   // Write the data to the file.
   lpStream->write(PCHAR(lpLump->lpData), lpLump->dwSize);
   if (lpStream->fail())
      return FALSE;

   // It worked, so return true!
   return TRUE;
}

//- resUnloadRawLump ----------------------------------------------------------
// Description: Unloads the data of a raw data lump.
// Parameters:  lpLump   - pointer to the lump node.
//-----------------------------------------------------------------------------
VOID __stdcall resUnloadRawLump(LPRESFILELUMP lpLump)
{
   // Delete the memory.
   delete [] PCHAR(lpLump->lpData);
   lpLump->lpData = NULL;
}

It's now time to take all of the structures and put them to good use; it's time to have a look at the resource file management class which happens to be our interfacing API.

// The actual Resource File Management class definition. The objects of the
// CResFile class can be used to create, load, or save resource files.
class CResFile
{
 protected:
   // Internal structure used to keep track of a lump handler.
   typedef struct tagRESFILELUMPHANDLER
   {
      DWORD dwType;
      BOOL __stdcall (* lpLoad)(fstream *, LPRESFILELUMP);
      BOOL __stdcall (* lpSave)(fstream *, LPRESFILELUMP);
      VOID __stdcall (* lpUnload)(LPRESFILELUMP);
      tagRESFILELUMPHANDLER *lpNextHandler;
   } RESFILELUMPHANDLER, *LPRESFILELUMPHANDLER;

   static BOOL m_bHandlerActive;
   static LPRESFILELUMPHANDLER m_lpLumpHandlerList;

   RESFILEHEADER m_resFileHeader;   // Header structure for the resource file.
   LPRESFILELUMP m_lpLumpList;      // Pointer to the root of the lump linked list.

   CHAR m_cFileName[MAX_PATH];      // Local storage of the resource file's name.
   CHAR m_cFileMode[4];    // Current file access mode.
   fstream m_fStream;      // File stream object.
   fstream m_fSaveStream;     // Temporary file stream object for save operations.

 // Private internal methods.
   LONG GetFileSize(VOID);
   LPRESFILELUMPHANDLER GetLumpHandler(DWORD);
   BOOL LoadLumpList(VOID);
   BOOL SaveLumpList(VOID);
   VOID UnloadLumpList(VOID);
   BOOL SaveLumps(VOID);
   static VOID Shutdown(VOID);

 public:

 //- Constructor --------------------------------------------------------------
 // Description: Default constructor of this class - clears out data structures.
 //----------------------------------------------------------------------------
   CResFile();

 //- Constructor --------------------------------------------------------------
 // Description: See the CResFile::Open() method description.
 //----------------------------------------------------------------------------
   CResFile(LPCSTR lpFileName, LPCSTR lpFileMode)
      { this->Open(lpFileName, lpFileMode); }

 //- Open ---------------------------------------------------------------------
 // Description: Opens a resource file either for reading from, writing to, or
 //              modification. Read mode opens up a resource file for read only
 //              operations, and in this mode, the resource file can not be
 //              modified. Write mode creates a new resource file (or overwrites
 //              any existing file with the same name) and allows the programmer
 //              to add lumps to the file. Modification mode opens up an existing
 //              file (or creates one if one doesn't exist) and allows the
 //              programmer to add more lumps to the file or to load in the lumps.
 // Parameters:  lpFileName - name of the resource file (may include path) to open.
 //    lpFileMode - string containing the access mode, can be of the
 //            following contents:
 //                              "r"  - Read Mode
 //   "w"  - Write Mode
 //   "r+" - Modification Mode
 // Returns:  TRUE if successful, FALSE otherwise.
 //----------------------------------------------------------------------------
   BOOL Open(LPCSTR lpFileName, LPCSTR lpFileMode);

 //- Save ---------------------------------------------------------------------
 // Description: Saves the contents of a resource file in memory to disk. If
 //              for some reason you want to save the file with an alternate
 //              file name (i.e. "save as" operations), then pass the new file name
 //              string pointer in the "lpAltFileName" parameter. Otherwise,
 //              you can just pass NULL for this parameter to keep the original
 //              file name.
 // Parameters:  lpAltFileName - alternate file name.
 // Returns:  TRUE if successful, FALSE otherwise.
 //----------------------------------------------------------------------------
   BOOL Save(LPCSTR lpAltFileName);

 //- Close --------------------------------------------------------------------
 // Description: Closes the resource file if one is currently open.
 //----------------------------------------------------------------------------
   VOID Close(VOID);

 //- RegisterLumpHandler ------------------------------------------------------
 // Description: Registers a lump handler with the CResFile class. A lump handler
 //              consists of three functions, one each for loading, saving, and
 //              unloading a lump of a particular type. You must also pass the
 //              value which will used to identify whether lumps should be
 //              handled by the lump handler or not - each value must be unique.
 // Parameters:  dwType   - the type of lumps this handler will handle.
 //              lpLoad   - pointer to the loading function.
 //              lpSave   - pointer to the saving function.
 //              lpUnload - pointer to the unloading function.
 // Returns:  TRUE if successful, FALSE otherwise.
 //----------------------------------------------------------------------------
   static BOOL RegisterLumpHandler(
   DWORD dwType,
   BOOL __stdcall (* lpLoad)(fstream *, LPRESFILELUMP), 
   BOOL __stdcall (* lpSave)(fstream *, LPRESFILELUMP), 
   VOID __stdcall (* lpUnload)(LPRESFILELUMP)
);

 //- RemoveLumpHandler --------------------------------------------------------
 // Description: Removes a lump handler previously added by a message to the
 //              CResFile::RegisterLumpHandler() method.
 // Parameters:  dwType - the lump type of the handler to remove.
 // Returns:  TRUE if successful, FALSE otherwise.
 //----------------------------------------------------------------------------
   static BOOL RemoveLumpHandler(DWORD dwType);

 //- LumpExists ---------------------------------------------------------------
 // Description: Checks to see if a lump, with a particular name, exists.
 // Parameters:  lpName - string of the lump to check for existence.
 // Returns:  TRUE if it exists, FALSE if it doesn't.
 //----------------------------------------------------------------------------
   BOOL LumpExists(LPCSTR lpName);

 //- CreateLump ---------------------------------------------------------------
 // Description: Creates a new lump and adds it to the active resource file.
 //              Note that it will not be saved with the file unless the
 //              CResFile::Save() method is messaged.
 // Parameters:  lpName - Name of the lump.
 //              dwType - Particular type of the lump (used for loading/saving)
 //              lpData - Pointer to the data or data structure that will be
 //        stored in the lump.
 //              dwSize - Size of the data in bytes (used only for RAW data lumps).
 //              bFree  - Set this to TRUE if you want the data (pointed to by
 //        lpData) to be de-allocated when the file is closed.
 // Returns:  TRUE is successful, FALSE otherwise.
 //----------------------------------------------------------------------------
   BOOL CreateLump(
   LPCSTR lpName,
   DWORD dwType,
   LPVOID lpData,
   DWORD dwSize,
   BOOL bFree=FALSE
   );

 //- DeleteLump ---------------------------------------------------------------
 // Description: Removes or deletes a particular lump, designated by 'lpName',
 //              from the active resource file.
 // Parameters:  lpName - name of the lump to remove from the file.
 // Returns:  TRUE if successful, FALSE otherwise.
 //----------------------------------------------------------------------------
   BOOL DeleteLump(LPCSTR lpName);

 //- LoadLump -----------------------------------------------------------------
 // Description: Loads in a lump from a resource file. Depending upon it's
 //              type, it will used the specially designed routine for loading
 //              it in. However, if such a routine doesn't exist, it will
 //              default the lump as raw data.
 // Parameters:  lpName   - name of the lump to load in.
 //              lplpData - pointer to the location where the address of the data
 //          or data structure will be stored.
 // Returns:  TRUE if successful, FALSE otherwise.
 //----------------------------------------------------------------------------
   BOOL LoadLump(LPCSTR lpName, LPVOID *lplpData);

 //- UnloadLump ---------------------------------------------------------------
 // Description: Unloads a lump from memory that was previously loaded in from
 //              a resource file using the CResFile::LoadLump() method.
 // Parameters:  lpName - name of the lump to unload from memory.
 // Returns:  TRUE if successful, FALSE otherwise.
 //----------------------------------------------------------------------------
   BOOL UnloadLump(LPCSTR lpName);

 //- Destructor ---------------------------------------------------------------
 // Description: Deallocates any memory and closes the resource file if it
 //              is still open.
 //----------------------------------------------------------------------------
   ~CResFile();
};

To open and load in a resource file's Lump Info Table, we first read in the header, seek to the position in the file where the Info Table begins. At this point, we load each entry in the table into a linked list. This is all done by Open() method of the CResFile class; additionally, this method sets up the header and the Lump Info Table if a new resource file is opened. Now, when someone wants to load in a data lump, the name of the lump would be passed to the LoadLump() method; this method searches through the Lump Info list until it finds a matching node. Once a match is found, it seeks to the position in the resource file where that particular data lump begins and gets ready to load it in. As was mentioned earlier, the lump handler list is searched at this point for a lump handler that matches the lump type. Once the lump has been loaded in, the LoadLump() method returns a pointer to the data or the lump's structure, depending upon the type of lump. Note that a lump is only loaded into memory once; moreover, if LoadLump() is messaged more than once using the same lump name, it will return the same pointer. If you want individual copies of a lump in memory, you must perform the "cloning" process on your own. Of course, you could always add functionality to the resource file manager to do this.

The process of saving a resource file is basically the same – we write out the header, then the data lumps, and finally the lump info table. When we first write out the header, we don't know of the position that the lump info table within the file; thus, we will have to seek to the beginning of the file and update the header section once all of the data has been saved. As the actual data lumps are being saved, we update the internal lump info linked list with the positions of the data lumps. Finally, we write out the Lump Info Table and return to the beginning of the file to update the header. This job is done by the Save() method of the CResFile class; of course, the file had to be opened using a write or modification access mode. Again, the lump handler list is queried for the correct lump handler when saving the lumps to the file.

Creating a new lump is simple; all we have to do is add a new node to the Lump Info Linked List and set a couple of flags in the node's structure indicating that we just created it. Deleting a lump is merely a process of removing it from the list. These functions are preformed by the CreateLump() and DeleteLump() methods of the CResFile class. If the lump type is not recognized, then the lump handler chosen will be the raw data lump handler.

Adding a custom lump handler is simpler than ever. Simply pass pointers of your lump handling functions to the RegisterLumpHandler() method along with the lump type for which the lump handler will process. You can also remove a lump handler using the RemoveLumpHandler() method.

I recommend that you have a good look at the source code at this point as it is well commented and it should give a good understanding of logic behind a resource file management library.


For the Future

Once you have your resource file management library up and running, you'll most likely want to make an editor so that you can easily create and manage various resource files. There are a couple of approaches you can for doing this. Probably the easiest way of doing this is creating a console application that accepts parameters on the command-line. You may also want to make an editor with a GUI that is similar to Windows Explorer. You could also incorporate a resource file editor into your game editor so that you have one application that acts as your tool set when creating the various resources. You will also probably want to create your own custom lump handling routines.

Still, is there anything else we can do to improve our resource file implementation? Of course there is, as the possibilities are limitless; however, I'll go over a few of them here. One thing you can do to organize the lumps in your resource files is by building an internal directory structure. This directory structure would emulate the one found in DOS, for instance. You could create numerous directories within the resource file and organize the data lumps into these directories. This leads to more efficient maintenance of resource files.

Another thing that would be interesting to do would be to create a data & file I/O streaming system that would be used by your resource file management library. You would create a base streaming class, and derive other streaming classes off of that. That way, you could have a compression streaming class, an encryption streaming class, an imbedded data streaming class, and so on. Then, you could add some settings to your resource file format that would allow you to compress and encrypt data as it's output to the file.

If you want to allow your resource file management library to be used to across a wide range of platforms you will have some extra work cut out for you. Of most concern would be the endian byte order used by the host platforms. You will have to take this into account by shifting the bytes around so that they will be compatible with the machine; this process would take place during the loading and saving of a resource file.

Another concept, which is interesting, is that of data caching. Since accessing a hard drive is much faster than reading off of a CD or DVD drive, a temporary cache file could be created on the hard drive. Then, the most frequently accessed data would be placed in the cache file for quick access. This would greatly speed up the loading sequences in your game once the cache file is created and operational.

Of course, there are many other things you could do with a resource file management library – it all depends on what you want to use the management library for. Anyway, I hope you've enjoyed this article, good luck with your coding endeavors, and code-on!

Download attached article resource

2011年9月24日星期六

Video RAM - how much do you really need?

http://www.yougamers.com/articles/13801_video_ram_-_how_much_do_you_really_need-page2/

 

Video RAM - how much do you really need?



10diggsdigg
By: Nick Evanson Nov 02, 2007


So what's it used for then?

Since this isn't supposed to be a detailed thesis on modern rendering techniques, let's keep this simple! Video RAM is used for three things: storing the permanent data needed to make images, storing temporary data created along the way and storing the final image. For a long time, the biggest slice of this was the permanent stuff; the materials and information used to generate the pictures seen.
The world of Oblivion minus textures and shaders: each of the dots connecting the lines is called a vertex and games use thousands of them.
This info is stored in the memory as a buffer (a geek term for "chunk of memory") and modern 3D games often have three of these: one each for vertices, indexes and textures. Vertices are points in 3D space that are used to build the actual scenes but they also carry with them further details such as what colour they should be and what texture needs to be applied to them. An index buffer contains lists of vertices, which might seem like it's just a repetition of the vertex buffer, but they're only lists, not the actual vertices themselves: index buffers are used to help speed up the processing of vertices, if there's lots of them. The daddy of them all, though, and especially today, is the texture buffer.
Whenever you look at something in a 3D game, you're actually looking at a 2D image that has been mapped out onto a shape. In the early days of graphics cards, they were tiny in size (often nothing more than 64 x 64 pixels in size) but these days they can be huge; Crysis, for example, routinely uses textures up to 2048 x 1024, and can weigh in at up to and over 1MB in size. A single megabyte might not sound very much but NVIDIA's original Riva 128 graphics card (released ten years ago) only had 4MB of video memory!
How about some figures to put things into perspective? Take Futuremark's 3DMark series: their first one, 3DMark99, used up to 5MB of textures for a single rendered frame; fast forward to 2003 and 3DMark03 now waves around texture buffers 50MB in size, and their latest version makes that look like nothing. But wait a moment - haven't graphics cards had more RAM than this anyway? Indeed they have: NVIDIA's GeForce 256, released in 1999, had 32MB and one could easily buy cards with 256MB in 2003. So they've always had enough then, yes? Actually, yes they have... until recently.
Three big rendering technologies have become standard in the past few years and they place huge demands on the RAM: normal mapping, anti-aliasing and post-processing.

Getting greedy now!

One of the hundreds of textures used in the current Crysis demo.
We would have actually run into problems with a lack of video memory many years ago, if it wasn't for something called compression. A single 128 x 128 texture, in 24 bit colour (8 bits for each channel), is only 48kB in size but a 1024 x 1024 "alpha" texture (some of it is transparent) can be as large as 4MB! Fortunately, textures can be compressed without losing too much of their original detail and they can be squashed up to 8 times smaller. There are different types of textures, though, and not all compress without causing problems - one culprit in particular is called a normal map.
These are used with the "base" texture (the surface of a wall, for example) to make them look rougher or have more detail; they also tend to be pretty big too. Normal (or bump, or parallax) mapping is ubiquitous in 3D games today and quite often a game has several texture maps for a single surface, especially if it's water - for example, the surface of the water in the first HDR test of 3DMark06 uses two normal maps, four wave maps, and reflection and refraction textures. So despite compression, the size of the texture buffer has grown ever more rapidly of late and we'll look at how much memory they can potentially use later in this article.
Anti-aliasing has also exploded in terms of usage. Explaining how it works is and what the different types are is for another time, but virtually all desktop graphics cards employ a method called multisampling. This procedure helps to smooth out the edges of objects in the 3D world by "blurring" the boundary between the object and the background: think of it like a dark charcoal line that somebody has run a thumb along. Unfortunately, the improvement in image quality doesn't come free and one of the costs lies in using additional video memory. As a very rough guideline, one can make an estimate to the size of the maximum amount of RAM multisample anti-aliasing will need, by using this formula: 4 bits x ((X x Y) x (1+2A)) where x and y are the resolution values (e.g. 1024 x 768) and A is the anti-aliasing level. For example, running at 1280 x 1024 at 4xAA could use up to 45MB of memory, just for the AA. Could is the important word here, though, as different graphics cards, drivers and games will consume different amounts - some more than this, some less.
Post-processing is also de rigeur in 3D games right now. What the term means is that after (or sometimes during) the final image has been rendered, additional effects are applied to it for creative purposes. One of the most commonly known, and commonly over-used, post-processing trick is bloom: this simulates how lenses causes overly bright objects to look brighter than they should be; other effects include motion blur, depth of field and bullet trails. These involve an increased use of the video memory because the image has to be completely rendered several times before finally being displayed, and those "intermediate" images have to be stored somewhere!
Futuremark's 3DMark06 highlights just how much difference the lack of post-processing (top) makes...
...in games where they've been designed to it (bottom). Part of the payback involves more RAM usage.

What happens when you run out of RAM?

Contrary to popular belief, it's not the quite end of the world if your graphics card doesn't have enough onboard RAM to store everything. The drivers and hardware handle things quite nicely, and games can be programmed to ensure that the performance doesn't crash into the floor (especially those that constantly stream data out of system memory like Oblivion). The most common side-effect is something called "texture thrashing" - if there isn't enough onboard memory to store all of the textures needed, then some will remain in the system memory. When the graphics processor wants to use them, it copies them across into its RAM, deleting other stuff to make room. Cue a spot of stuttering or slow down in the frame rate; this is because it takes quite a bit longer to swap textures around than just accessing them in the onboard (or to give it the correct name, local) RAM.
Some graphics cards, mostly low-cost, budget models, only have sufficient local memory to store rendered frames and a few other bits and bobs: nearly all of the textures remain in the system memory. One might think that this is a stupid idea but budget cards are low in performance (compared to their high end relatives) anyway, so there's no point in slapping on several hundred MBs of high speed GDDR3, when the rest of the card will just slow things up. However, if you're an enthusiast gamer, and you want the best possible performance with greatest of visuals, then you want to avoid running out of VRAM.
All of this nerd talk is to get to the point that the RAM on a graphics card is for more than just storing textures and it's usage is growing very quickly. With all this in mind then, how much are our games actually using? To examine the amount of video memory being consumed, we used the appropriate plugin for the hardware monitor section of RivaTuner. This method is only suitable for Windows XP, because Vista doesn't differentiate video RAM from system RAM - it's all the same, as far the operating system and games are concerned. We used a current generation NVIDIA graphics card with 512MB of onboard memory for all of the testing; when using anti-aliasing, ATI drivers allocate RAM in a different way to how RivaTuner's plugin is expecting it, so the tool displays the wrong amount. This isn't a problem because however much memory is being used by the game, it's pretty much the same regardless as to what brand of graphics card one is using.

multi-threaded game

Even though multicore processors have been available for the PC for well over a year, and the Xbox 360 has already sold millions, there is still a lack of knowledge regarding the development of game engines for multicore platforms. This article will attempt to provide a view to game engine parallelism on an architecture level.
As shown by Gabb and Lake[1], instead of looking at multithreading on the level of individual components, we can find better performance by looking for ways to add parallelism to the whole game loop. We will be looking at three different threading supported architecture models for the basic game loop, and comparing them with regards to qualities such as scalability, and expected performance.
There are two main ways to break down a program to concurrent parts: function parallelism, which divides the program to concurrent tasks, and data parallelism, which tries to find some set of data for which to perform the same tasks in parallel. Of the three compared models, two will be function parallel, and one data parallel.

Synchronous function parallel model

One way to include parallelism to a game loop is to find parallel tasks from an existing loop. To reduce the need for communication between parallel tasks, the tasks should preferably be truly independent of each other. An example of this could be doing a physics task while calculating an animation. Figure 1 shows a game loop parallelized using this technique.

Figure 1. A game loop parallelized using the synchronous function parallel model. The animation and the physics tasks can be executed in parallel.

Costa[2] presents a way to automate the scaling of this kind of an architecture. The idea is to divide the functionality to small tasks, build a graph of which tasks precede which task (such as the graph in Figure 1), then supply this task-dependency graph to a framework. The framework in turn will schedule the proper tasks to be run, minding the amount of available processor cores. The alternative to using such a framework is to rewrite parts of your code for each core amount you plan to support.

One major concern with both the function parallel models is that they have an upper limit to how many cores they can support. This is the limit of how many parallel tasks it is possible to find in the engine. The number of meaningful tasks is decreased by the fact that threading very small tasks will yield negligible results. The synchronous function parallel model imposes an additional limit – the parallel tasks should have very little dependencies on each other. For example it is not sensible to run a physics task in parallel with a rendering task if the rendering needs the object coordinates from the physics task.
The expected performance of the model can be directly seen from the length of the longest path of execution in the game loop. The length of this path of execution is directly tied to the amount of parallelism in the loop. As this model generally allows the least amount of parallelism of the three models, the same can be expected from the model's performance.
As the synchronous function parallel model assumes there are very few connections between tasks that run in parallel, existing components do not require many changes. For example if running the physics component update is a task that can be run concurrently with the sound mixer, neither component needs special support to operate.

Asynchronous function parallel model

Gabb and Lake propose an alternative function parallel model. The difference is that this model doesn't contain a game loop. Instead, the tasks that drive the game forward update at their own pace, using the latest information available. For example the rendering task might not wait for the physics task to finish, but would just use the latest completed physics update. By using this method it is possible to efficiently parallelize tasks that are interdependent. Figure 2 shows an example of the asynchronous function parallel model.

Figure 2. The asynchronous function parallel model enables interdependent tasks to run in parallel. The rendering task does not wait for the physics task to finish, but uses the latest complete physics update.

As with the synchronous model, the scalability of the asynchronous function parallel model is limited by how many tasks it is possible to find from the engine. Fortunately the communication between threads by only using the latest information available effectively reduces the need for the threads to be truly independent. Thus we can easily have a physics task run concurrently with a rendering task – the rendering task would use a previous physics update to get the coordinates for each object. Based on this, the asynchronous model can support a larger amount of tasks, and therefore a larger amount of processor cores, than the synchronous model.

As Gabb and Lake point out, communication between tasks in this model presents an intriguing problem with timing – a problem that the other models don't have. Suppose there are three tasks that work concurrently, an input tasks, a physics task that uses input to move game objects, and a rendering task that uses physics results to draw the objects. Optimally the input task would complete just before the physics task starts, which would complete just before the rendering task starts. In the worst case scenario the rendering would start just before the physics task is complete, and the physics task would start just before the input task is complete. This would result in a input-to-display time of roughly two times of the optimal scenario, and the time would fluctuate between the optimal and the worst on each frame. Gabb and Lake suggest a remedy of calibrating some tasks to run more often than others, such as having the input task run twice more often than the physics task. This may help alleviate the problem, but it will not eliminate it.
Since the asynchronous model assumes little or no synchronization between the concurrent tasks, the performance of the model is not limited as much by the serial parts of the program. Therefore the main performance limitation comes from the ability to find enough useful parallel tasks. The thing to keep in mind here is that the tasks should be well balanced - having one large task and several very small ones could signify a performance bottleneck.
Since the asynchronous model relies heavily on tasks not directly connecting to each other, but on communication using the last available information, there may be changes needed for current components to function on this model. At the very least each component needs a thread safe way to inquire the latest state update. Such changes should be easy enough to make, and they can even be added as an additional wrapper layer to the component.

Data parallel model

In addition to finding parallel tasks, it is possible to find some set of similar data for which to perform the same tasks in parallel. With game engines, such parallel data would be the objects in the game. For example, in a flying simulation, one might decide to divide all of the planes into two threads. Each thread would handle the simulation of half of the planes (see Figure 3). Optimally the engine would use as many threads as there are logical processor cores.

Figure 3. A game loop using the data parallel model. Each object thread simulates a part of the game objects.


An important issue is how to divide the objects into threads. One thing to consider is that the threads should be properly balanced, so that each processor core gets used to full capacity. A second thing to consider is what will happen when two objects in different threads need to interact. Communication using synchronization primitives could potentially reduce the amount of parallelism. Therefore a recommended plan of action is to use message passing accompanied by using latest known updates as in the asynchronous model. Communication between threads can be reduced by grouping objects that are most likely to interact with each other. Objects are more likely to come into contact with their neighbors, so one strategy could be to group objects by area.
The data parallel model has excellent scalability. The amount of object threads can be automatically set to the amount of cores the system is running, and the only non-parallelizable parts of the game loop would be ones that don't directly deal with game objects (Read input and Render tasks in Figure 3). While the function parallel models can still get the most out of a few cores, data parallelism is needed to fully utilize future processors with dozens of cores.

The performance of the data parallel model is directly related to how large a part of the game engine can be parallelized by data. As the amount of processor cores goes up, the data parallel parts of the engine take less time to run. Fortunately these are usually also the performance heavy parts of a game engine. If an engine can use data parallelization for most parts of the game loop, then the data parallel model will give the best performance of the three models.
The biggest drawback of the model is the need to have components that support data parallelism. For example, a physics component would need to be able to run several physics updates in parallel, and be able to correctly calculate collisions with objects that are in these separate threads. A potential solution is to leave the physics calculation completely out of the data parallel part of the game loop, and use internal parallelization in the physics component. Fortunately many other components won't need extensive changes. For example a component calculating the animations of graphical objects has no interaction between concurrent threads, and won't need any special support for parallelism.

Conclusion

Because the synchronous function parallel model does not require special changes to engine components, and is really just an enhancement of a regular game loop, it is well suited for adding some amount of parallelism to an existing game engine. The model is not suited for future use because of it's weak scaling support and low amount of parallelism.
The asynchronous function parallel model can be recommended for new game engines because of the high amount of possible parallelism and the fact that existing components need only few changes. This model is a good choice for game engines aimed at the generation of multicore processors that have a relatively small number of cores. The only drawback is the need to tune thread running times to minimize the impact of worst case thread timings. More research is needed to know how this timing fluctuation actually affects game play.
The data parallel model is definitely something to think about for the future. It will be needed when the amount of processor cores increases beyond the number of tasks available for a function parallel model. For current use the increased scalability doesn't offer enough benefits compared to the trouble of coding custom components to support this type of data parallelism.
The current trend seems to be towards creating engine components that use some internal form of parallelization. While this allows engine developers to not worry about threading issues, it may leave large parts of the program sequential, which results in poor performance. The view presented in this article has been that whole game loops could be parallelized, not just some parts of them. The models presented here can be a good starting point for developing specialized game engine architectures.

Suggested Reading

[1] Gabb H., Lake A. 2005. Threading 3D Game Engine Basics. http://www.gamasutra.com/features/20051117/gabb_01.shtml
[2] Costa S. 2004. Game Engineering for a Multiprocessor architecture. MSc Dissertation Computer Games Technology. Liverpool John Moores University.
Wilson K. 2006. Managing Concurrency: Latent Futures, Parallel Lives. http://www.gamearchitect.net/Articles/ManagingConcurrency1.html

havok's memory system

Chapter 2. Memory Management

Havok's memory management system is designed to maximize performance based on how Havok uses memory, as well to provide flexibility in order to fit into the memory requirements of your application. This chapter describes the components of the memory system, how to customize the system to your needs, and how to get detailed reports on memory use at any time.

1. Overview

There are three major components in the Havok memory system:

hkMemoryAllocator
The lowest level interface through which blocks can be allocated and freed
hkMemoryRouter
A per-thread collection of hkMemoryAllocators, e.g. it contains allocators for regular heap, temporary allocations and debug allocations.
hkMemorySystem
High level interface to the system as a whole, e.g. statistics, setting up a new threads memory router. Generally contains all the allocators for all threads.
The following diagram shows their relationships: the memory system owns and arranges the plumbing of the allocators; the memory routers point to allocators and only see a small part of the system; and the allocators themselves have an even more restricted overall view.
Each of these components is discussed in more detail in the following sections.

1.1. hkMemoryRouter Class

hkMemoryRouter is a facade object through which all allocations and deallocations are routed. It is little more than a structure containing pointers to allocators for each allocation pattern. The class is nonvirtual - its behaviour is customized by replacing the allocator pointers:

heap
Allocations which will typically last longer than a single frame, for example rigid bodies, constraints, the world and their contained arrays. May be freed by any thread and have no particular lifetime.
temp
Temporary allocation. Lifetime is restricted to the current scope, thus the same thread allocates and frees. Very often but not always in a LIFO pattern.
debug
Allocator used by profiling and checking code so as not to change the main memory profile. For example monitor streams, memory checking utilities and debug visualizers all use this allocator.
solver
Special allocator used exclusively by the physics constraint solver. Requests are only made from the constraint solver and their lifetime is temporary. Thus solver memory may be freed outside of the physics step. Requests are often for large contiguous blocks so this has been separated from the temp allocator and will likely be shared between all threads unlike the temp allocator.
stack
Allocator for pure last-in-first-out allocations, essentially a restricted form of the temp allocator. On most platforms with the default setup, this is the temp allocator. (On the PlayStation®3 SPU, this is not actually a virtual allocator class but is a simple pointer bump and all calls are inlined.)
Each thread has its own router instance, though usually many of the objects behind the facade are shared across multiple threads.

1.2. hkMemoryAllocator Interface

hkMemoryAllocator is the basic interface for allocations, providing the pure virtual methods blockAlloc and blockFree to allocate and deallocate blocks of memory. Note that the size must be explicitly given to blockFree which reduces per-allocation overhead.

Note

By default, Havok expects allocated memory to be 16-byte aligned. You will need to take this into account if you override the memory allocation methods.

1.2.1. Buffer allocations

The bufAlloc, bufFree and bufRealloc methods may be overridden to optimize handling of resizable blocks. They are used where the caller cares that bigger than requested block size may be returned by the allocator and also to give the allocator opportunity to resize an allocation without copying. hkArray is the major user of this interface. An implementation may take advantage of this fact by trying to leave space after buffer allocations so that a future reallocation will succeed without moving the buffer. The default implementation of the buffer allocator implemented using the block allocator.

1.2.2. Optimizing for space

Memory managers often store "magic" information, such as size, just in front of the allocated memory. On modern machines the overhead for this may be up to 64 bytes because of alignment issues. Havok's memory manager tries to save this space using two techniques:
  • Many simple classes (e.g. hkAabb) know their size, and so the memory allocator doesn't need to explicitly store it - the class knows how much space to free when it is deleted. These classes have their sizes compiled directly into their overridden operator new and delete operators. See HK_DECLARE_NONVIRTUAL_CLASS_ALLOCATOR.
  • For classes derived from hkReferencedObject, the allocated size is stored directly in the instance. See m_memSizeAndFlags in HK_DECLARE_CLASS_ALLOCATOR.

1.2.3. Allocator utilities

The default allocators all supply 16 byte aligned allocations and require the size to be given for deallocating. hkMemoryRouter has utility methods - easyAlloc, alignedAlloc - to allocate blocks with greater alignment and to stash the allocation size so it need not be supplied. They work by allocating a little more than requested and storing some bookkeeping information before the returned pointer, so the corresponding utility free must be used.

1.2.4. Marking unused memory

Some memory allocator implementations overwrite freed/uninitialized memory with known values to aid in debugging. hkFreeListAllocator overwrites freed memory and newly allocated memory with 0x7ffa110c when HK_DEBUG is defined. The allocator used in hkCheckingMemorySystem always overwrites its memory.

1.3. hkMemorySystem Interface

The hkMemorySystem is for operations on the memory system as a whole. For example it has methods for getting memory statistics and for setting up and tearing down the memory system.

1.3.1. Initialization

The memory system used by Havok must be initialized before the hkBaseSystem is initialized.
To initialize a memory system, two tasks need to be performed:
  • Set a global memory system object for global methods on the memory system
  • Initialize each of the allocators in a hkMemoryRouter instance and return it
Since all knowledge of the memory setup is encapsulated in the memory system class, each thread must ask the system to set up its memory router appropriately. One thread is must be designated the "main" thread - not necessarily the main system thread, just the first Havok aware thread to be initialized and the last to be deinitialized. The relevant methods on hkMemorySystem are mainInit and mainQuit. (Note that hkMemoryInitUtil calls these methods for you.)
After the main thread has been set up, each worker thread should call threadInit and threadQuit. These methods are implicitly called from the mainInit and mainQuit so should not be called in the main thread.

1.3.2. Garbage Collection

Many allocators cache their freed memory to satisfy future requests and do not release it to their parents unless explicitly requested. The memory system has several methods to release such cached memory. Because of the lockless nature of the thread local allocators, normally each thread needs to release its own cache back to the shared pool and a later call releases the shared cache back to the higher level allocator. This must be done manually by calling the memory system's garbageCollect method (see also garbageCollectThread and garbageCollectShared). This method will cause the allocator to optimize memory in space - potentially making more, and larger chunks of memory available. Doing a garbage collection takes some work, so it is not recommended that it is called every frame. It is better to call it in between levels, or at a point where you know your memory usage is going to change significantly.
Note that doing a garbage collection cannot make available all 'free' memory available - it only frees a block of memory if all of the contained blocks are free. Therefore if you allocated all of the memory with 128 byte chunks and then freed every other chunk - performing a 256 byte allocation will still fail, as no pool will have been freed, and moreover memory is now fragmented such that there is no contiguous block of 256 bytes.
When hkFreeListAllocator cannot allocate it will automatically perform a garbage collection and attempt to allocate again. It is still recommended that you perform a garbage collect independently of this mechanism, as otherwise the automatic garbage collection could happen at a time critical section of your application.
Note that calling threadQuit implicitly releases cached memory. Also hkFreeListMemorySystem will release cached memory when the temporary parts are released.

1.3.3. Reusing Temporary Memory

Some memory is only needed during specific intervals of execution. For instance the temp allocators, solver allocators and all thread stacks are completely empty outside of Havok SDK calls and could be reused while the Havok SDK is not being used. Most calls will require both a stack and a temp allocator, but the solver allocator is only ever accessed from the constraint solver, so can safely be reused when the world is not being stepped.
To support this, memory systems may support partially tearing down a threads resources. The following example shows a thread freeing its temporary parts while sleeping between workloads. The main thread would have a similar mainInit and mainQuit pair which would free any shared resources.
hkMemorySystem& memSystem = hkMemorySystem::getInstance();
hkMemoryRouter memRouter;
memSystem.threadInit( memRouter, "worker", hkMemorySystem::FLAG_PERSISTENT );
hkBaseSystem::initThread(&memRouter);

while( waitForStartSignal() == HK_SUCCESS )
{
    memSystem.threadInit( memRouter, "worker", hkMemorySystem::FLAG_TEMPORARY );
    doWork();
    memSystem.threadQuit( memRouter, hkMemorySystem::FLAG_TEMPORARY );
    sendWorkDoneSignal();
}

hkBaseSystem::quitThread();
memSystem.threadQuit( memRouter, hkMemorySystem::FLAG_PERSISTENT );

1.3.4. Memory Statistics

All memory systems support getting a summary of all memory allocated and in use by the memory system, via the getHeapStatistics method. This contains information for each allocator in the router. Values which are not applicable for a given memory system are set to -1.
Some memory systems also support getting snapshots to gather detailed data about the memory that is currently allocated by the memory system, via the getMemorySnapshot method. These snapshots contains the following information about each allocation, and can be used to give detailed reports on memory use:
  • A pointer to the allocation.
  • The size (in bytes) of the allocation.
  • A trace of the call stack when the allocation was made.
Please see the Memory Reporting section for more information on working with memory statistics.

1.4. Overriding Class New and Delete

Havok uses several macros to declare class local memory management. These ensure that all Havok objects are allocated by the appropriate allocators in the Havok memory system.
The HK_DECLARE_CLASS_ALLOCATOR macro overrides class new and delete so that classes derived from hkReferencedObject are automatically handled correctly by the memory system. Note that the memory manager expects to be able to find a hkReferencedObject at the object address, so when using multiple inheritance the hkReferencedObject should come first in the inheritance list.
Unfortunately, non virtual classes cannot inherit from a memory management class, since some compilers do not implement the empty base object optimization. For each of these classes we require them to use the HK_DECLARE_NONVIRTUAL_CLASS_ALLOCATOR macro. The memory class parameter is the same as in the virtual case. The name of the class must also be passed to get its size information and is compiled into their operator new and delete
Some classes can only usefully provide placement operators new and delete (most notably the memory classes themselves). These classes use HK_DECLARE_PLACEMENT_ALLOCATOR() and must be allocated statically or via another memory system and constructed in place.