http://www.spuify.co.uk/?p=454
Obligatory Introductory Parable
I really like Sushi, it’s tasty and convenient. I like the immediacy of being able to go into a Sushi restaurant complete with conveyor belt and being able to take a seat and grab something fresh and delicious from the belt without blowing my entire lunch hour. Having said that, what I really wouldn’t like is to be a member of staff in a Sushi restaurant, especially if it was my job to show the diners to their seats and here’s why…In programming we have to manage memory. Memory is a precious resource which must be carefully managed, much like the seats in our metaphorical sushi restaurant. Every time we allocate memory dynamically we’re reserving memory from something called the “heap”. In C and C++ this is typically done through the use of the malloc function and the new operator respectively. To continue the somewhat fishy analogy (last one I promise!), this is like our intrepid groups of diners asking to be seated in our sushi restaurant. The real shame though is what happened in our hypothetical scenario happens in the context of memory also, but the results are much worse than a couple of empty tummies. It is called fragmentation and it is a nightmare!
What’s wrong with malloc and new?
Sadly, the rest of the discussion won’t have such a fishy flavour to it as this post is going to talk about malloc and new and why they have a very limited place in the context of embedded systems (such as games consoles). While fragmentation is just one facet of problems caused by dynamic memory allocation, it is perhaps the most serious, but before we can come up with some alternatives we should take a look at all of the problems:1. malloc and new try to be all things to all programmers…
They will as soon as allocate you a few bytes as they will a few megabytes. They have no concept of what the data is that they’re allocating for you and what its lifetime is likely to be. Put another way, they don’t have the bigger picture that we have as programmers.
2. Run-time performance is relatively bad…
Allocations from the standard library functions or operators typical require descending into the kernel to service the allocation requests (this can involve all sorts of nasty side effects to your application’s performance, including flushing of translation lookaside buffers, copying blocks of memory, etc). For this reason alone using dynamic memory allocation can be very expensive in terms of performance. The cost of the free or delete operations in some allocation schemes can also be expensive as in many cases a lot of extra work is done to try to improve the state of the heap for subsequent allocations. “Extra work” is a fairly vague term, but can mean the merging of memory blocks and in some cases can mean walking an entire list of allocations your application has made! Certainly not something you’d want to be wasting valuable processor cycles on if you can avoid it!
3. They cause fragmentation of your heap…
If you’ve never been on a project that has suffered from the problems associated with fragmentation then count yourself very lucky, but rest of us know that heap fragmentation can be a complete and utter nightmare to address.
4. Tracking dynamic allocations can be tricky…
Dynamic allocation comes with the inherent risk of memory leaks. I’m sure we all know what memory leaks are, but if not, have a read of this. Most studios try to build some tracking infrastructure on top of their dynamic allocations to try and track what memory is in play and where.
5. Poor locality of reference…
There is essentially no way of knowing where the memory you get back from malloc or new will be in relation to any of the other memory in your application. This can lead to more us suffering more increasingly expensive cache misses than we need to, as we end up dancing through memory like Billy Elliot on Ecstasy.
So what is the alternative? The idea behind this post is provide you with the details (and some code!) for a few different alternatives that you can use in place of malloc and new to help combat the problems we’ve just discussed.
The Humble Linear Allocator
As the name of this section suggests this allocator is certainly the simplest of all those I’m going to present, although truth be told; they’re all simple (and that’s part of the beauty). The linear allocator essentially assumes that there is no fine grained de-allocation of allocated memory resources and proceeds to make allocations from a fixed pool of memory one after the other in a linear fashion. Check out the diagram below.It is of course possible to support aligned allocations by applying the required bitwise operations to the offset during the course of the allocation. This can be incredibly useful, but be aware that depending on the size of the data you’re allocating from your buffer (and in some cases the order in which those allocations are made) you may find that you get some wasted space in the buffer between allocations. This wasted space is typically okay for alignments of a reasonable size, but can get prohibitively wasteful if you are allocating memory which requires a much larger alignment, for example 1MB. The ordering of your allocations from linear allocators can have a drastic effect on the amount of wasted memory in these types of situations.
/** Header structure for linear buffer. */
typedef struct _LinearBuffer {
uint8_t *mem; /*!< Pointer to buffer memory. */
uint32_t totalSize; /*!< Total size in bytes. */
uint32_t offset; /*!< Offset. */
} LinearBuffer;
/* non-aligned allocation from linear buffer. */
void* linearBufferAlloc(LinearBuffer* buf, uint32_t size) {
if(!buf || !size)
return NULL;
uint32_t newOffset = buf->offset + size;
if(newOffset <= buf->totalSize) {
void* ptr = buf->mem + buf->offset;
buf->offset = newOffset;
return ptr;
}
return NULL; /* out of memory */
}
To reset the allocator (perhaps at the end of a level), all we need to do is set the value of offset to zero. Just like with all allocations you would do, clients of the allocator need to ensure they’re not hanging on to any pointers that you’ve effectively de-allocated by doing this, otherwise you risk corrupting the allocator. Any C++ objects you’ve allocated from the buffer will also need their destructors calling manually.
The Stack
Let’s get something out of the way before we start into this allocator. When I talk about a “stack allocator” in this particular case, I’m not talking about the call stack, however, that stack does have an important part to play in avoiding run-time heap allocations as we shall see later on. So what am I talking about then? Well, the linear allocator I just described above is an excellent solution to many allocation problems, but what if you want slightly more control over how you free up your resources? The stack allocator will give you this.Towards the end of my description of the linear allocator, I mentioned that to reset the allocator you can simply set the offset to zero in order to free up all the resources in the allocator. The principle of setting the offset to a particular value is the principle that guides the implementation of the stack allocator. If you are not familiar with the concept of the stack data structure then now is probably a good time to get acquainted, you can do so here.
Back? Okay, each allocation from our stack allocator will optionally be able to get a handle which represents the state of the stack allocator just before that allocation was made. This means that if we restore the allocator to this state (by changing its offset) we are effectively ‘freeing’ up the memory to be reused again. This is shown in the diagram below.
typedef uint32_t StackHandle;
void* stackBufferAlloc(StackBuffer* buf, uint32_t size, StackHandle* handle) {
if(!buf || !size)
return NULL;
const uint32_t currOffset = buf->offset;
if(currOffset + size <= buf->totalSize) {
uint8_t* ptr = buf->mem + currOffset;
buf->offset += size;
if(handle)
*handle = currOffset; /* set the handle to old offset */
return (void*)ptr;
}
return NULL;
}
void stackBufferSet(StackBuffer* buf, StackHandle handle) {
buf->offset = handle;
return;
}
The Double Stack
If you’re comfortable with the stack concept above, we can now move on to the double-ended stack. This is similar to the stack allocator we just described except that there are two stacks, one which grows from the bottom of the buffer upward and one which grows from the top of the buffer downward. This is shown in the diagram below.I don’t think I really need insult your intelligence by including a code sample for the double stack allocator due to its inherent resemblance to the single stack allocator discussed previously.
The Pool
We’re going to shift gears a little now from the family of allocators described above that are based on linearly advancing pointers or offsets and move to something a little different. The pool allocator I’m about to describe is designed to work with data types that are of the same kind or size. It splits up the memory buffer it is managing into equally sized chunks, and then allows the client to allocate and free these chunks at will (see the diagram below). To do this, it must keep a track of which chunks are free and I’ve seen several ways of implementing this. I personally shy away from implementations such as those using a stack of indices (or pointers) to available chunks, due to the extra storage required which can often be prohibitive, but I’ve seen them around. The implementation I shall describe here uses no additional storage to manage which chunks in the pool are free. In fact the header structure for this allocator contains only two member variables, making it the smallest of all the allocators described in this post.
/* allocate a chunk from the pool. */
void* poolAlloc(Pool* buf) {
if(!buf)
return NULL;
if(!buf->head)
return NULL; /* out of memory */
uint8_t* currPtr = buf->head;
buf->head = (*((uint8_t**)(buf->head)));
return currPtr;
}
/* return a chunk to the pool. */
void poolFree(Pool* buf, void* ptr) {
if(!buf || !ptr)
return;
*((uint8_t**)ptr) = buf->head;
buf->head = (uint8_t*)ptr;
return;
}
/* initialise the pool header structure,
and set all chunks in the pool as empty. */
void poolInit(Pool* buf, uint8_t* mem, uint32_t size, uint32_t chunkSize) {
if(!buf || !mem || !size || !chunkSize)
return;
const uint32_t chunkCount = (size / chunkSize) - 1;
for(uint32_t chunkIndex=0; chunkIndex<chunkCount; ++chunkIndex) {
uint8_t* currChunk = mem + (chunkIndex * chunkSize);
*((uint8_t**)currChunk) = currChunk + chunkSize;
}
*((uint8_t**)&mem[chunkCount * chunkSize]) = NULL; /* terminating NULL */
buf->mem = buf->head = mem;
return;
}
A Note on Stack-based Allocation (alloca is your friend)
Earlier on, you may recall that I said there’d be a mention of stack based allocations in the context of the call stack. I’m sure you’ve seen code which conceptually looks something like this:myFunction() {There is a function you can use which comes with most C compilers which should mean (depending on the size of your allocation) that you won’t have to resort to heap allocations for temporary buffers of this ilk. That function is
myTemporaryMemoryBuffer = malloc(myMemorySize);
/* do processing limited to this function. */
free(myTemporaryMemoryBuffer);
}
alloca
. How alloca
works internally is architecture dependant, but essentially it performs allocations by adjusting the stack frame for your function to allow you to write data to an area, this can be as simple as moving the stack pointer (just like the linear allocator we mentioned right at the start). The memory returned to you by alloca is then freed up when the function returns.There are a few caveats with using alloca that you should be aware of however. Be sure to check the size of your allocation to make sure you’re not requesting an unreasonable amount from the stack, this can cause nasty crashes later on, if your gets so large that it stack overflows. For this reason it is also best to know all the places where your function will be called in the context of the program’s overall flow if you’re contemplating allocating a large chunk with alloca. Use of alloca can affect portability in some limited circumstances (apparently), but I’ve yet to come across a compiler that doesn’t support it.
For more details you can consult your favourite search engine.
A Final Thought…
Often the memory one allocates during the course of a processing task is temporary and persists only for the lifetime of a single frame. Taking advantage of this type of knowledge and moving allocations of this type to separate memory buffers is essential to combating the adverse affects of fragmentation in a limited memory system. Any of the above allocation schemes will work, but I would probably suggest going with one of the linear ones, as they are much easier to reset than the pool implementation I’ve described here. Noel Llopis talks about this topic in more detail in this excellent blog post.The right allocator for you depends on many factors and what the problem you’re trying to solve demands. The advice I would offer is to think carefully about the patterns of allocations and de-allocations you wish to perform with your system, think about the sizes and lifetimes of these allocations and try to manage them with allocators that make sense in that context. It can sometimes help to draw memory layouts on paper (yeah, with a pen and everything) or to graph roughly how you expect your memory usage will look over time, believe it or not, this can really help you to understand how a system produces and consumes data. Doing these things can help you make calls about how to separate your allocations to make memory management as easy, quick and fragmentation-free as possible. Remember, what I’ve talked about here is just a small selection of the some of the simplest strategies that I tend to favour when writing code for console games. It is my hope that if you’ve made it this far and you weren’t already doing this stuff, that your brain is alive with ideas about code you’ve written in the past which could have taken advantage of these techniques to mitigate the substantial drawbacks associated with dynamic memory allocations, or of other strategies you can exploit to solve your memory management problems without dynamic allocation in limited memory systems.
In closing, I believe that programmers should be mindful of the impact of making dynamic allocations (especially in console games) and think twice about using the malloc function or the new operator. It is easy to have the attitude that you’re not doing many allocations so it doesn’t really matter, but this type of thinking spread across a whole team quickly snowballs and results in a death by a thousand paper cuts. If not nipped in the bud, fragmentation and the performance penalties arising from the use of dynamic memory allocations can have catastrophic consequences later on in your development lifecycle which are not easy to solve. Projects where memory allocation and management is not thought through and managed properly often suffer from random crashes after prolonged playing session due to out of memory conditions (which by the way are near impossible to reproduce) and cost hundreds of programmer hours trying to free up memory and reorganise allocations.
Remember: You can never start thinking about memory early enough in your project and you’ll always wish you had started earlier!
More Information
Here are some links to related topics, or to topics that I didn’t have time to cover:http://gamesfromwithin.com/start-pre-allocating-and-stop-worrying
http://en.wikipedia.org/wiki/Circular_buffer
http://en.wikipedia.org/wiki/Buddy_memory_allocation
http://www.memorymanagement.org/articles/alloc.html
Thanks to Sarah, Jaymin, Richard and Dat for proof reading this drivel.
没有评论:
发表评论