patternjavaModerate

Optimize Conway's Game of Life

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

lifegameconwayoptimize

Problem

I have coded up an implementation of Conway's Game of Life and I have a performance bottleneck in it which I wish to be optimized. The main logic is in the Universe class. I have omitted all code which is not applicable here for brewity:

```
public class Universe {

private static final int FLIP_INDEX = 0;
private static final int FLOP_INDEX = 1;
private final boolean[][][] universeDoubleBuffer;
private final int height;
private final int width;
private int flipFlopIndex = FLIP_INDEX;

public Universe(boolean[][] universeState) {
height = universeState.length;
width = universeState[0].length;
universeDoubleBuffer = new boolean[2][height][width];
for (int y = 0; y < height; y++) {
for (int x = 0; x < width; x++) {
universeDoubleBuffer[FLIP_INDEX][y][x] = universeState[y][x];
}
}
}

public boolean[][] recalculateUniverseState() {
int newFlipFlopIndex = (flipFlopIndex == FLIP_INDEX ? FLOP_INDEX : FLIP_INDEX);
for (int y = 0; y < height; y++) {
for (int x = 0; x < width; x++) {
int liveNeighbors = countLiveNeighbors(x, y);
boolean isLiving = universeDoubleBuffer[flipFlopIndex][y][x];
if (!isLiving && liveNeighbors == 3) {
universeDoubleBuffer[newFlipFlopIndex][y][x] = true;
} else if (isLiving && (liveNeighbors == 2 || liveNeighbors == 3)) {
universeDoubleBuffer[newFlipFlopIndex][y][x] = true;
} else {
universeDoubleBuffer[newFlipFlopIndex][y][x] = false;
}
}
}
stampPatterns();
logger.info("Old:" + Arrays.deepToString(universeDoubleBuffer[flipFlopIndex]));
logger.info("New:" + Arrays.deepToString(universeDoubleBuffer[newFlipFlopIndex]));
flipFlopIndex = newFlipFlopIndex;
return universeDoubleBuffer[flipFlopIndex];
}

private int countLiveNeighbors(int x, int y) {
int result

Solution

First of all, this should be obvious, but when you have performance issues in code that involves a tight inner loop, you want to simplify that loop as much as you can. If you can save one cycle in the loop by spending a dozen or a hundred cycles somewhere else, do it, because every cycle in the loop gets multiplied by 5000².

Also, what you really want to minimize in the inner loop are memory accesses. Accessing local variables is fast, since they're typically cached and the JVM knows that their values cannot change unless the code in the method itself changes them. In comparison, pulling a random element from a large array is relatively slow, since it requires a full RAM access, as is, typically, accessing a member variable or calling an uncached method.

I've done something like this before,* so, rather than listing every improvement I might make to your code, let me start by sketching how I might rewrite your core update loop:

// assume these arrays are (height + 2) by (width + 2)
boolean[][] oldBuffer = universeDoubleBuffer[flipFlopIndex],
            newBuffer = universeDoubleBuffer[newFlipFlopIndex];

for (int y = 1; y <= height; y++) {
    int environment
        = (oldBuffer[y-1][0] ? 32 : 0) + (oldBuffer[y-1][1] ?  4 : 0)
        + (oldBuffer[y  ][0] ? 16 : 0) + (oldBuffer[y  ][1] ?  2 : 0)
        + (oldBuffer[y+1][0] ?  8 : 0) + (oldBuffer[y+1][1] ?  1 : 0);

    for (int x = 1; x <= width; x++) {
        environment = ((environment % 64) * 8)
            + (oldBuffer[y-1][x+1] ? 4 : 0)
            + (oldBuffer[y  ][x+1] ? 2 : 0)
            + (oldBuffer[y+1][x+1] ? 1 : 0);

        newBuffer[y][x] = lookupTable[ environment ];
    }
}

A few things to note about this code:

-
There is no countLiveNeightbors() method; in fact, the code does not explicitly count live neighbors at all. Instead, the pattern of the nine cells including and surrounding the cell at (x, y) is maintained in the variable environment, which is used as an index to a 512-element lookup table.

For example, if the surroundings of the current cell look like this (# = live cell, _ = dead cell):

# # _
_ # #
# _ _

then the value of the environment variable will be:

(1 * 256) + (1 * 32) + (0 * 4) +
(0 * 128) + (1 * 16) + (1 * 2) +
(1 *  64) + (0 *  8) + (0 * 1) = 370

Of course, you'll have to set up this lookup table before using it, but that is something you can do outside the core loop (e.g. in your constructor), so it's not performance-critical.

(In fact, while I haven't benchmarked this, I'd at least consider making lookupTable a local variable and rebuilding it every time the update method is called, as the extra cost of rebuilding the table might be balanced out by extra optimization opportunities available to the compiler / JVM if it knows that no other code can modify the table. You may want to test it both ways and see which one is faster.)

One additional advantage of such a lookup table based implementation is that, by changing the lookup table, it can simulate any two-state cellular automaton using the Moore neighborhood, not just Conway's Game of Life specifically.

-
By reusing the parts of the environment pattern that are shared between adjacent cells, the code only needs to do three reads from the oldBuffer array per cell, as opposed to nine in your version. Uncached array access is expensive, so this is likely to provide a significant speedup. (Also, like your code, mine also makes sure to access the buffers as close to sequentially as possible, iterating first by rows and then by columns. This is also important for CPU cache locality.)

-
The code above doesn't update the cells at the edges of the array, which means it doesn't have to worry about (literal) edge cases such as array indices being out of bounds. (Hopefully, the compiler / JVM may notice this too, and may omit some of its internal array bounds checks.)

If you want your grid to be surrounded by dead cells, you can just initially mark these border cells as dead and leave them untouched in your update method. Alternatively, if, say, you want the grid to wrap around, you can update the edge cells in a separate loop (which can be less efficient, since it only runs for a tiny fraction of all the cells).

Actually, there are a few ways in which the code I suggested above could be optimized further. For example, an obvious optimization is to get rid of the two-dimensional array accesses in the inner loop, since they require two array lookups each.

There are (at least) two ways you could do this:

-
a) In the outer loop, save the previous, current, and next rows of oldBuffer in local variables, like this:

boolean[] prevRow = oldBuffer[y-1],
          currRow = oldBuffer[y],
          nextRow = oldBuffer[y+1];

-
b) Make the buffers themselves one-dimensional arrays, and adjust the indexing so that, instead of buffer[y][x], you use buffer[ y * (width+2) + x ]. You can precalculate th

Code Snippets

// assume these arrays are (height + 2) by (width + 2)
boolean[][] oldBuffer = universeDoubleBuffer[flipFlopIndex],
            newBuffer = universeDoubleBuffer[newFlipFlopIndex];

for (int y = 1; y <= height; y++) {
    int environment
        = (oldBuffer[y-1][0] ? 32 : 0) + (oldBuffer[y-1][1] ?  4 : 0)
        + (oldBuffer[y  ][0] ? 16 : 0) + (oldBuffer[y  ][1] ?  2 : 0)
        + (oldBuffer[y+1][0] ?  8 : 0) + (oldBuffer[y+1][1] ?  1 : 0);

    for (int x = 1; x <= width; x++) {
        environment = ((environment % 64) * 8)
            + (oldBuffer[y-1][x+1] ? 4 : 0)
            + (oldBuffer[y  ][x+1] ? 2 : 0)
            + (oldBuffer[y+1][x+1] ? 1 : 0);

        newBuffer[y][x] = lookupTable[ environment ];
    }
}

# # _
_ # #
# _ _

(1 * 256) + (1 * 32) + (0 * 4) +
(0 * 128) + (1 * 16) + (1 * 2) +
(1 *  64) + (0 *  8) + (0 * 1) = 370

boolean[] prevRow = oldBuffer[y-1],
          currRow = oldBuffer[y],
          nextRow = oldBuffer[y+1];

// assume that buffer is a (height) by (width) byte array, and that
// xQueue and yQueue are (height * width) int arrays, of which the
// first (activeCells) elements contain the current active cell coords

// loop over active cells, and find those whose state will change
int changedCells = 0;
for (int i = 0; i < activeCells; i++) {
    int x = xQueue[i], y = yQueue[i];
    byte packedState = buffer[y][x];  // = 4*neighbors + 2*state + active
    boolean wasAlive = (oldState & 2 == 2);

    // quickly check if cell will live under the Game of Life rules
    // (10-11 = alive, 2 neighbors; 12-13 = dead, 3 neighbors; 14-15 = alive, 3 neighbors)
    boolean isAlive = (10 <= packedState && packedState <= 15);

    // add cell to queue if it died or was born
    // we know that changedCells <= i, so it's safe to reuse the same queue space
    if (wasAlive != isAlive) {
        xQueue[changedCells] = x;
        yQueue[changedCells] = y;
        changedCells++;
        // assume active bit is already 1, so no need to change
    }
    else {
        buffer[y][x] = packedState - (byte)1;  // unset active bit
    }
}

// carry out deferred state updates
activeCells = changedCells;
for (int i = 0; i < changedCells; i++) {
    int x = xQueue[i], y = yQueue[i];
    boolean wasAlive = (buffer[y][x] & 2 == 2);
    byte delta = (byte)(wasAlive ? -4 : +4);  // change in neighbor count * 4

    // update neighbor counts of adjacent cells, and mark them as active
    int yMin = (y > 0 ? y-1 : 0), yMax = (y < height-1 ? y+1 : height-1);
    int xMin = (x > 0 ? x-1 : 0), xMax = (x < width-1 ? x+1 : width-1);
    for (int ny = yMin; ny <= yMax; ny++) {
        for (int nx = xMin; nx <= xMax; nx++) {
            byte newState = buffer[ny][nx] + delta;
            // if this cell is not yet active, add it to the queue for next time step
            if (newState & 1 == 0) {
                xQueue[activeCells] = nx;
                yQueue[activeCells] = ny;
                activeCells++;
                newState += (byte)1;
            }
            buffer[ny][nx] = newState;
        }
    }
    // the middle cell only needs to adjusted by +/-2
    buffer[y][x] -= (byte)(delta / 2);
}

Context

StackExchange Code Review Q#42718, answer score: 17

Revisions (0)

No revisions yet.