.gitignore | ||
2**20.png | ||
2**21.png | ||
2**22.png | ||
2**23.png | ||
2**24.png | ||
2**25.png | ||
2**26.png | ||
check.sh | ||
common.c | ||
README.md | ||
requirements.txt | ||
sandpile_1d.c | ||
sandpile_add.c | ||
sandpile_arithmetic.py | ||
sandpile_array.py | ||
sandpile_fifo.c | ||
sandpile_numba.py | ||
sandpile_numpy.py | ||
sandpile_pointers.c | ||
sandpile_pythran.py | ||
sandpile_quarter_fifo.c | ||
sandpile_quarter.c | ||
sandpile_recursive.py | ||
sandpile_struct.c | ||
sandpile-video.c | ||
sandpile.c | ||
sandpile.py |
Searching for a fast Abelian Sandpile implementation
For context about the Abelian Sandpile Model see: https://en.wikipedia.org/wiki/Abelian_sandpile_model
I'm focusing only on the "flattening" step which I like to call "apply_gravity" in my code.
I'm focusing only in flattening a single huge pile of sand placed in the middle.
Here the performance I'm getting on an intel i9-9980HK
to flatten a
20_000 sand grain tower:
Profiling Python implementations
sandpile.py
This is the easy implementation, just applying the fireing rule repeateadly.
Mean +- std dev: 8.87 sec +- 0.27 sec
sandpile_arithmetic.py
This one uses integer division to fire columns instead of repeteadly removing 4 grains.
Mean +- std dev: 1.85 sec +- 0.04 sec
sandpile_array.py
Try using Python's arrays instead of lists.
Mean +- std dev: 3.13 sec +- 0.12 sec
sandpile_numba.py
Just compiling with numba.
Mean +- std dev: 68.7 ms +- 1.2 ms
sandpile_numpy.py
Just using numpy.
Mean +- std dev: 972 ms +- 20 ms
sandpile_pythran.py
Mean +- std dev: 306 ms +- 8 ms
sandpile_recursive.py
This one is "recursive" but only to the left, where the linear scan won't go until the next run. Those bonus fires can push grains to the next line where they're picked again by the linear scan.
Mean +- std dev: 1.48 sec +- 0.03 sec
Profiling C implementations
sandpile.c
This is the "dumb" implementation in C.
command: Mean +- std dev: 31.6 ms +- 0.2 ms
sandpile_1d.c
This one does not rely on pointers of pointers to represent a 2d
array. So no [x][y]
access. Instead it represents the surface
as an 1d line, in which, if the current cell is i
:
- the previous cell is at
surface[i - 1]
- the cell in the next row is at
surface[i + width]
- the cell in the previous row is at
surface[i - width]
- the next cell is at
surface[i + 1]
command: Mean +- std dev: 30.0 ms +- 0.2 ms
sandpile_add.c
This one starts with a small (1024 typically) power of two stack of sand grains, applies gravity on it, and multiply the resulting grid by two. This is done iteratively until the needed number of grains is reached.
The resulting grid is the same, but the performance are not better.
command: Mean +- std dev: 52.0 ms +- 0.6 ms
sandpile_fifo.c
This one is the "smart" one, instead of doing a full scan of the grid, a FIFO of points to fire is kept up to date each time a cell is firered. Sadly "smart" is not smart.
command: Mean +- std dev: 114 ms +- 3 ms
sandpile_pointers.c
This one stores, for each cells, 4 pointers to the west, south, north, and east cells so incrementing them is easy.
command: Mean +- std dev: 38.2 ms +- 0.4 ms
sandpile_quarter.c
This is my fastest implemtation, it relies on the 8-fold symetry of flattening a centered pile. I in fact only rely on a 4-fold symmetry for simplicity, so it could be enchanced again.
command: Mean +- std dev: 8.75 ms +- 0.04 ms
sandpile_quarter_fifo.c
This is a merge of the fifo
implementation and the quarter
one.
command: Mean +- std dev: 34.1 ms +- 0.4 ms
sandpile_struct.c
This one is just to measure the cost of using a struct to store together the width and the values.
command: Mean +- std dev: 37.5 ms +- 0.3 ms
TODO
- Try a CUDA implementation.
- Try a multithreaded implementation (beware of the lock bottleneck!).