The third insight and success!
The third insight is we don’t need to use three bits per cell for the neighbor counts. Each cell has at most five neighbors. This is a base-6 number. 6^4 is 1,296 which will fit comfortably in 11 bits! I don’t have to worry about packing and unpacking the bits because we only use this for table-lookup. Now we’ve got something that will work. Woohoo!
The new format looks like this:
E | ABCD | NNNNNNNNNNN E : edge ABCD : life/death status NN : neighbor counts stored as four base-6 integers.
We can even make the table smaller than the old Qlife table by arranging the bits so the neighbor count comes first. This is even more cache friendly! As we will see later, not all of the 11-bit numbers will appear so L1 cache performance should be excellent.
It gets even better. Here's another possible improvement. The old Qlife had code for every case of the current-generation and next-generation states. If we did that in the new version it would mean 256 code cases (512 when you factor in the edge bit.) That's not too bad but it occurred to me that we can really think of this as a base-3 number where each digit represents:
0 : this cell doesn’t change 1 : this cell dies 2 : this cell is born
There are 3^4 states or 81 code routines per pass (162 with the edge bit.) It is effectively one less since the 000 case will never be called. Less code makes this even more efficient than the old version! This doesn’t necessarily mean we require a table-lookup to get the address of the code to jump to. We can still generate it directly from the base-3 number if we wish.
It's amazing to me that just a simple change in my attitude toward the problem yielded a solution where none seemed possible before. These insights all tumbled out almost at once.
This should yield a much faster life simulator. I could have written this for my 386 in 1992! The only thing I lacked was the insight to do it. It took me eight years to find a better way.
How much faster will it be? We can't know for sure until it's built but the difference is surely substantial. It wouldn't surprise me if it were twice as fast. Now why couldn't I figure this out eight years ago?
One thing still nagged at me. I wanted to fit a 'visible' bit into the data structure. After all this trimming was there still room one more bit? Should I even bother? I've already got a much improved design for a life simulator and the problem I originally sat out to solve has been solved.
But wouldn't it be cool if there were a way to fit that bit in somewhere?