Monday, September 14, 2009

Spillage!

Hi,

One of the things I was determined to implement in BlitzMax right from the start was a decent 'register allocator'.

Register allocation is the process of mapping 'local' program variables (i.e. those variables declared 'Local' in BlitzMax) to CPU registers (i.e. the finite set of 'variables' inside the actual x86 CPU).

Blitz3D and BlitzPlus largely ignored this. Instead, all local variables were simply stored in memory (on the stack), and loaded into CPU registers as required. For example, Blitz3D code like this…

Local x,y,sum
x=1
y=2
sum=x+y

…would involve approximately the following steps:

1) Write the constant '1' into the memory location for variable 'x'.
2) Write the constant '2' into the memory location for variable 'y'.
3) Read the variables 'x' and 'y' from memory and add them together.
4) Store the result into the memory location for variable 'sum'.

That's 2 reads and 3 writes from/to system memory, and system memory is generally slow (even with fancy caches) when compared with the CPU (but I guess it'll be catching up soon…).

BlitzMax on the other hand will try to represent the variables x,y and sum using CPU registers, so equivalent BlitzMax code might involve something like…

1) Write the constant '1' to CPU register EAX.
2) Write the constant '2' to CPU register ECX.
3) Write the sum of EAX and ECX to CPU register EDX.

…none of which 'hits' memory so is theoretically much faster.

And, happily, this time at least, theory matches practice - BlitzMax code runs significantly faster than equivalent Blitz3D code in general, and this is almost completely due to BlitzMax's register allocator.

There is, however, one fairly major drawback to writing a register allocator - it's *@#!$^& hard! ESPECIALLY on the crappy Intel X86 which has only about 8 general purpose registers.

In particular, the trickiest thing to deal with is 'spillage' - handling the situation where there just aren't enough CPU registers to go round and some local variables MUST be 'spilled' to memory.

In this case, a spilled variable ends up behaving much like a Blitz3D/BlitzPlus variable above does - it must be constantly loaded/stored from/to memory. Therefore, the most important job of the register allocator becomes minimizing the number of spilt variables.

This is tricky because it can be hard to know precisely what effect spilling a variable will have. Spilling a variable may even cause a 'cascade' of more spills, because even though a spilled variable is stored in memory, it still needs to be moved into a CPU register at SOME point to be used for anything useful (e.g. arithmetic, memory addressing etc)!

And every now and then, someone will come up with a little piece of code in BlitzMax that causes this effect. The code will usually be small, involve about 8 or so local variables and will cause the compiler to lock up completely as the register allocator gets caught in an endless allocate/spill/allocate/spill loop...

In fact, it happened again just yesterday - hence this post!

I *think* I have it sorted now, but to be honest I seriously doubt I'll ever be able to say with 100% confidence that I know it works!

Peace out!
Mark

Saturday, September 5, 2009

Snow Leopard (and OS upgrades in general)

Hi,

As a software developer, I'm pretty much required to upgrade my OS's whenever the latest/greatest OS comes out. However, as a bit of an old fart, it has definitely become less a less of a thrilling experience!

The worst software upgrade experience was easily Vista. No doubt, as an early adopter, I encountered many of it's teething problems earlier than others. But these were pretty major, including: terrible support for non-MS networks; incredibly slow file management; generally dodgy network behaviour (LOTS of network timeouts); and an incredibly 'random' (if more attractive) GUI.

Some of these issues were resolved via various registry hacks, and no doubt service packs improved things, but I only lasted a few months on Vista before reverting to XP as my 'main' OS. And that is the first time I have *ever* gone backwards OS wise. I suspect Windows 7 will be better - but for the love of god, if it doesn't have a 'make all windows look like this' button I'm gonna go Joe 90 on someone!

But before all this makes me sound like too much of a Microsoft hater (I only hate them a *little* bit, honest!) I would like to stress that my favourite OS of all time is easily Windows 2000.

Perhaps it was the timing, perhaps I'd spent too long on the Win95 line, but I really, really liked Win2K: It was small, it was fast, it was stable and it was no-nonsense: All the 'options' dialogs were presented as nice tidy consistent 'key/property' dialog thingys. It rocked, and at the time I was looking forward to more of the same.

Unfortunately, I think Microsoft failed to realize they were onto a good thing, and decided instead to 'ape' the Mac: The next release was XP, with big ugly buttons and unnecessarily dumbed down dialogs, and then Vista with the window renderer. I guess they had too though...

I like the Mac, but I don't use it as my primary development machine - there are too many little annoying things about it: No 'up a folder' browse button; no full path name in browser window title bar; no ability to mess with 'hidden' files (ie: files starting with '.' - you can hack the OS to show them, but you can't modify them) - and more. Of course, at this point you're supposed to use the command shell (and the Mac shell is unix-grunty...) but stuff that - I'm a child of the GUI generation! I don't like the shell!

Which brings us to Snow Leopard, the latest MacOS update. It is apparently faster; you can apparently do 'Microsoft exchangey' things with it; and it only cost 60NZD, BUT: For what I use a computer for, there is absolutely no difference whatsoever!

And frankly, I'm disappointed - hell, if they'd forced a new desktop wallpaper on me I might even feel a bit better! From Apple, I'm a little surprised: 10.4 had the spotlight feature added (which would have required serious hacking of the file system) and 10.5 had the clever time machine feature (ditto), but 10.6...?

But perhaps I'm being a bit harsh - OS development and maintenance costs, and perhaps it's more honest of Apple to release a cheap 'tweak' update than try to pretend it's something new.

All in all though, it definitely feels like OS development (not counting sexy windows) is moving sideways much more than it's moving fowards. Perhaps we've exhausted the UNIX paradigm? Perhaps (gulp) there's nothing better?

Whatever - it's been a LONG time since I've been excited about OS technology. Which is a bit of a bummmer.

I am, howver, very much looking forward to whatever google do with Chrome...

Peace Out!
Mark

Wednesday, September 2, 2009

BlitzMax Garbage Collection part 1...

Hi,

Ok, a few random comments on the history and design of BlitzMax's garbage collection system - probably not for the faint of heart...

The GC system is BlitzMax has actually gone through several iterations by now. The first was, ahem, a little rough. It was basically a reference counting system that required the programmer to manually call a 'FlushMem' command every now and then. Worse than that, they couldn't just hide the FlushMem in a function somewhere, it had to be in the right place 'on the stack' as it couldn't 'flush' anything higher up the stack.

Yes, users found it a little confusing!

But it worked, and I eventually managed to get rid of the need for FlushMem entirely. This is the GC BlitzMax has used for a long time now and it has one very cool thing going for it: it's *fast*!

However, it has 2 fairly serious disadvantages - it can't handle 'cycles' of objects (this is where a chunk of memory directly or indirectly points to itself - such memory can 'leak') and it doesn't work nicely with multithreading. And EVERYONE wants multithreading these days, right?

So I decided it was time to implement a 'real' garbage collector - one that could handle both cycles and threading.

Fortunately, thanks to the wonders of open source software, there is an excellent GC available that is *almost* perfect for BlitzMax: the 'Boehm-Demers-Weiser' (BDWGC) conservative garbage collector:

http://www.hpl.hp.com/personal/Hans_Boehm/gc/

This is already in use by many pieces of software, so is well tested, and is pretty easy to 'drop into' an existing project such as BlitzMax.

And, best of it all - it just works! By using this in your software, you can allocate memory and forget about it. It is one seriously cool piece of programming...

However, there were a few things I was a little uncomfortable with about it:

* BDWGC was designed to handle any old generic code, yet I 'know' stuff about BlitzMax code that they don't - surely I could use this knowledge to make the GC more efficient? One example: C/C++ etc code can often generate 'derived' pointers. This is where a pointer points 'into the middle' of an object, not at the beginning. However, with BlitzMax code you don't have to worry about derived pointers (not coincidentally due to changes I had to make to get rid of FlushMem way back!) so the code in BDWGC to 'validate' a pointer is unnecessarily complex.

* There appears to be a bug in BDWGC to do with 'cyclic finalizers' which can cause crashes or leaks. No doubt, this will be fixed eventually, but hey, it's open source so surely I can fix it myself? Unfortunately…

* The BDWGC code is pretty hard to follow. It's designed to run on about a gazillion versions of EVERY OS ever, has a ton of optional features and is 'macro-ed' up the wazoo. After a few hours trying to hunt down the cyclic finalizer problem, I realized I was out of my depth and gave up (Actually, this may have more to do with me - I kind of suck at reading other people's code...)

So, the last effort on the GC front has been to write a 'proper' cycle/threading friendly GC on my own (MSGC), that can take advantage of the way BlitzMax works - and the results are looking good!

So far, MSGC is faster than BDWGC (although not as much as I would like!) with everything we've tested it with and appears to be very stable (although I'm sure the upcoming public release will change that).

It's been a pretty interesting experience writing it too, and I'll go into some of the details in a future posting, but that's probably enough for now.

Peace out!
Mark