patternMinor

Are today's massive parallel processing units able to run cellular automata efficiently?

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

unitsefficientlyrunmassivearetodaycellularautomataparallelprocessing

Problem

I wonder whether the massively parallel computation units provided in graphic cards nowadays (one that is programmable in OpenCL, for example) are good enough to simulate 1D cellular automata (or maybe 2D cellular automata?) efficiently.

If we choose whatever finite grid would fit inside the memory of the chip, can we expect one transition of a cellular automaton defined on this grid to be computed in (quasi)constant time?

I assume 2D cellular automata would require more bandwidth for communication between the different parts of the chips than 1D automata.

I'd also be interested by the same question in the case of FPGA programming or custom chips.

Solution

Excellent question. I believe the answer is yes.

Evolving a cellular automaton is essentially equivalent to performing a stencil computation. On some 1D, 2D, or 3D grid, successive values of points (or cells) are computed based on the last value of point's neighborhood. In a simple 1D CA, this neighborhood might be the cell and the two cells to the left and right. There are lots of examples of stencil computations being performed on GPUs; ORNL's SHOC benchmark suite for OpenCL/CUDA contains a 2D stencil example, for instance.

The basic idea is to have each thread get a local copy of the neighborhood for several points, then compute the next values for points determined by that neighborhood. By appropriately using the memory hierarchy in e.g. CUDA (registers, shared, constant, texture, and global memories) and the SIMT processing model (e.g., by appropriately computing the transition function without introducing excessive warp divergence), good performance can be achieved.

This answer would be a lot better if I were to give an example, but I am too busy to write any code right now... But in theory, I think it should be feasible to efficiently simulate CAs on GPUs by modeling them after stencil computations. Lots of considerations go into writing a good stencil computation for GPUs, though.

Context

StackExchange Computer Science Q#256, answer score: 7

Revisions (0)

No revisions yet.