Another insight, another solution
I wondered exactly what the distribution of those 472 states looked like. I wrote a little program to generate a map of all 1,296 states that marked the ones that were possible. Here it is.
That pattern looks so regular it just jumps out and grabs you! There has to be a way to compress it so that the values to use for each cell are still simple integers.
I showed this to James Grimmelmann who suggested viewing this not as a 2x2 matrix of cells but as a two 2x1 matrices stacked on top of each other. The advantage of this arrangement is the two elements on both ends are not used which makes it not a 36 cell arrangement but a 32 cell arrangement. Just enough to fit in 5 bits! You just use two of these and now we're down from eleven bits to ten! Thanks, James!
That would get me space for the bit I wanted but I had the feeling we could make it even smaller. The smaller the table (or to be more precise, the fewer cache lines the table used) the faster it would run. And speed is the thing.
I couldn't construct an analytical solution (maybe someone else can but it is beyond my ability.) I could, however, write a program to brute-force test every possibility until it found the one that worked. This worked out quite well. Source is here: qtable.cpp
This program got it down to just 696 states! It's still ten bits but it's an even smaller table. Here is what the three best equivalent solutions look like when they are graphed.
qtable.cpp should be optimized further to make sure we are not just building the smallest table but building the table that uses the fewest cache lines.
Now I can put the visiblity bit in place!
Then a wonderful thing happened. As part of a conversation with Terje Mathisen it became obvious that it didn't really make sense to use a 'visible' bit. What makes sense is to separate the rendering from the generation entirely. At some fixed rate, no more than the screen refresh rate, the rendering process would sample the life grid and render an entire scene at once. The generation process could be clocked at any independent speed under control of the user. This design makes a lot more sense since there's no point in trying to render faster than the speed which the screen can be updated. It also simplifies the code considerably since we no longer have to draw individual life cells to the screen. That logic just goes away.
All I can say is it's a good thing that I didn't realize this earlier or I never would have tried to make room for another bit!
Another observation is the table is a mirror image of itself. There might be some way to take advantage of this although it doesn't come to mind.
Now I'm down to fifteen bits to represent four cells. The old format used sixteen bits for three cells. I never would have thought this to be possible.
So, is that it? This is good design and represents a huge improvement over my old Qlife. But there was one more insight -- a really big one -- that had eluded me entirely.