Adding Python implementations.

This commit is contained in:
Julien Palard 2023-10-18 23:43:41 +02:00
parent 22a489d9f8
commit 316195cd64
Signed by: mdk
GPG Key ID: 0EFC1AC1006886F8
18 changed files with 450 additions and 59 deletions

143
README.md
View File

@ -7,80 +7,123 @@ I'm focusing only on the "flattening" step which I like to call "apply_gravity"
I'm focusing only in flattening a single huge pile of sand placed in the middle.
Here the performance I'm getting on an intel `i9-9980HK` to flatten a
100_000 sand grain tower:
20_000 sand grain tower:
## sandpile_quarter.c
# Profiling Python implementations
This is my fastest implemtation, it relies on the 8-fold symetry of
flattening a centered pile. I in fact only rely on a 4-fold symmetry
for simplicity, so it could be enchanced again.
## sandpile.py
> Mean +- std dev: 208 ms +- 5 ms
This is the easy implementation, just applying the fireing rule repeateadly.
> Mean +- std dev: 8.87 sec +- 0.27 sec
## sandpile_arithmetic.py
This one uses integer division to fire columns instead of
repeteadly removing 4 grains.
> Mean +- std dev: 1.85 sec +- 0.04 sec
## sandpile_array.py
Try using Python's arrays instead of lists.
> Mean +- std dev: 3.13 sec +- 0.12 sec
## sandpile_numba.py
Just compiling with numba.
> Mean +- std dev: 68.7 ms +- 1.2 ms
## sandpile_numpy.py
Just using numpy.
> Mean +- std dev: 972 ms +- 20 ms
## sandpile_pythran.py
> Mean +- std dev: 306 ms +- 8 ms
## sandpile_recursive.py
This one is "recursive" but only to the left, where the linear scan
won't go until the next run. Those bonus fires can push grains to the
next line where they're picked again by the linear scan.
> Mean +- std dev: 1.48 sec +- 0.03 sec
# Profiling C implementations
## sandpile.c
This is the "dumb" implementation in C.
> command: Mean +- std dev: 31.6 ms +- 0.2 ms
## sandpile_1d.c
This one does not rely on pointers of pointers to represent a 2d
array. So no `[x][y]` access. Instead it represents the surface as an
1d line, in which, if the current cell is `i`:
array. So no `[x][y]` access. Instead it represents the surface
as an 1d line, in which, if the current cell is `i`:
- the previous cell is at `surface[i - 1]`
- the cell in the next row is at `surface[i + width]`
- the cell in the previous row is at `surface[i - width]`
- the next cell is at `surface[i + 1]`
> Mean +- std dev: 731 ms +- 9 ms
## sandpile.c
This is the "dumb" implementation.
> Mean +- std dev: 750 ms +- 10 ms
## sandpile_struct.c
This one is just to measure the cost of using a struct to store together the width and the values.
> Mean +- std dev: 936 ms +- 16 ms
## sandpile_quarter_fifo.c
This is a merge of the `fifo` implementation and the `quarter` one.
> Mean +- std dev: 831 ms +- 14 ms
## sandpile_pointers.c
This one stores, for each cells, 4 pointers to the west, south, north,
and east cells so incrementing them is easy.
> Mean +- std dev: 933 ms +- 27 ms
> command: Mean +- std dev: 30.0 ms +- 0.2 ms
## sandpile_add.c
This one starts with a small (1024 typically) power of two stack of
sand grains, applies gravity on it, and multiply the resulting grid by
two. This is done iteratively until the needed number of grains is
reached.
This one starts with a small (1024 typically) power of two stack
of sand grains, applies gravity on it, and multiply the resulting
grid by two. This is done iteratively until the needed number of
grains is reached.
The resulting grid is the same, but the performance are not better.
> Mean +- std dev: 1.18 sec +- 0.03 sec
The resulting grid is the same, but the performance are not
better.
> command: Mean +- std dev: 52.0 ms +- 0.6 ms
## sandpile_fifo.c
This one is the "smart" one, instead of doing a full scan of the grid,
a FIFO of points to fire is kept up to date each time a cell is
firered.
This one is the "smart" one, instead of doing a full scan of the
grid, a FIFO of points to fire is kept up to date each time a cell
is firered. Sadly "smart" is not smart.
> command: Mean +- std dev: 114 ms +- 3 ms
> Mean +- std dev: 2.81 sec +- 0.05 sec
## sandpile_pointers.c
This one stores, for each cells, 4 pointers to the west, south,
north, and east cells so incrementing them is easy.
> command: Mean +- std dev: 38.2 ms +- 0.4 ms
## sandpile_quarter.c
This is my fastest implemtation, it relies on the 8-fold symetry
of flattening a centered pile. I in fact only rely on a 4-fold
symmetry for simplicity, so it could be enchanced again.
> command: Mean +- std dev: 8.75 ms +- 0.04 ms
## sandpile_quarter_fifo.c
This is a merge of the `fifo` implementation and the `quarter`
one.
> command: Mean +- std dev: 34.1 ms +- 0.4 ms
## sandpile_struct.c
This one is just to measure the cost of using a struct to store
together the width and the values.
> command: Mean +- std dev: 37.5 ms +- 0.3 ms
## TODO

View File

@ -1,12 +1,38 @@
for file in sandpile.c sandpile_*.c
TARGET=20000
echo "# Profiling Python implementations"
echo
for file in sandpile*.py
do
echo "## $file"
dest=${file%.c}
cc -DNDEBUG -Ofast -W -Wall -pedantic --std=c99 "$file" -lm -lpng -o "$dest"
rm -f "$dest-100000.png"
python -m pyperf command --fast --quiet "./$dest" 100000
./"$dest" 16 --stdout
./"$dest" 100000 "$dest-100000.png"
printf "## %s\n\n" "$file"
module="${file%.py}"
python -c "print(__import__('$module').__doc__)"
grep '^#compiling:' "$file" | cut -d: -f2- | bash -x
printf "> "
python -m pyperf timeit --fast --quiet --setup "from $module import main" "main($TARGET, False)"
python -c "__import__('$module').main($TARGET, True)" > $module-$TARGET.txt
echo
echo
done
sha1sum ./*-100000.png
sha1sum ./*-$TARGET.txt
rm ./*-$TARGET.txt
echo
echo "# Profiling C implementations"
echo
for file in sandpile.c sandpile_*.c
do
printf "## %s\n\n" "$file"
grep '^///' "$file" | cut -b5-
dest="${file%.c}"
cc -DNDEBUG -Ofast -W -Wall -pedantic --std=c99 "$file" -lm -lpng -o "$dest"
rm -f "$dest-$TARGET.png"
printf "> "
python -m pyperf command --quiet "./$dest" $TARGET
./"$dest" $TARGET "$dest-$TARGET.png"
echo
echo
done
sha1sum ./*-$TARGET.png
rm ./*-$TARGET.png

5
requirements.txt Normal file
View File

@ -0,0 +1,5 @@
numpy
numba
pyperf
pythran
cython

View File

@ -8,6 +8,9 @@
#include "common.c"
/// This is the "dumb" implementation in C.
void apply_gravity(int width, int **terrain)
{
bool did_something;

40
sandpile.py Normal file
View File

@ -0,0 +1,40 @@
"""This is the easy implementation, just applying the fireing rule repeateadly."""
import sys
def apply_gravity(terrain):
"""
$ python -m pyperf timeit --fast -s 'from examples.sandpile1 import main' 'main(10_000, False)'
...........
Mean +- std dev: 11.1 sec +- 0.2 sec
"""
while True:
width = len(terrain)
did_someting = False
for x in range(width):
for y in range(width):
if terrain[x][y] >= 4:
terrain[x][y] -= 4
terrain[x - 1][y] += 1
terrain[x + 1][y] += 1
terrain[x][y + 1] += 1
terrain[x][y - 1] += 1
did_someting = True
if not did_someting:
return
def main(height, show=True):
width = int(height ** .5) + 1
terrain = [[0] * width for _ in range(width)]
terrain[width // 2][width // 2] = height
apply_gravity(terrain)
if show:
import numpy
numpy.set_printoptions(threshold=sys.maxsize, linewidth=999)
print(numpy.array(terrain))
if __name__ == "__main__":
main(int(sys.argv[1]))

View File

@ -7,6 +7,15 @@
#include "common.c"
/// This one does not rely on pointers of pointers to represent a 2d
/// array. So no `[x][y]` access. Instead it represents the surface
/// as an 1d line, in which, if the current cell is `i`:
///
/// - the previous cell is at `surface[i - 1]`
/// - the cell in the next row is at `surface[i + width]`
/// - the cell in the previous row is at `surface[i - width]`
/// - the next cell is at `surface[i + 1]`
void apply_gravity(int width, int *terrain)
{

View File

@ -8,6 +8,14 @@
#include "common.c"
/// This one starts with a small (1024 typically) power of two stack
/// of sand grains, applies gravity on it, and multiply the resulting
/// grid by two. This is done iteratively until the needed number of
/// grains is reached.
///
/// The resulting grid is the same, but the performance are not
/// better.
void sandpile_multiply(int width, int **terrain, int mul)
{
for (int x = 0; x < width; x++)

41
sandpile_arithmetic.py Normal file
View File

@ -0,0 +1,41 @@
"""This one uses integer division to fire columns instead of
repeteadly removing 4 grains."""
import sys
def apply_gravity(terrain):
"""
$ python -m pyperf timeit --fast -s 'from examples.sandpile2 import main' 'main(10_000, False)'
...........
Mean +- std dev: 2.42 sec +- 0.04 sec
"""
width = len(terrain)
while True:
did_someting = False
for x in range(width):
for y in range(width):
if terrain[x][y] >= 4:
div, terrain[x][y] = divmod(terrain[x][y], 4)
terrain[x - 1][y] += div
terrain[x + 1][y] += div
terrain[x][y + 1] += div
terrain[x][y - 1] += div
did_someting = True
if not did_someting:
break
def main(height, show=True):
width = int(height ** .5) + 1
terrain = [[0] * width for _ in range(width)]
terrain[width // 2][width // 2] = height
apply_gravity(terrain)
if show:
import numpy
numpy.set_printoptions(threshold=sys.maxsize, linewidth=999)
print(numpy.array(terrain))
if __name__ == "__main__":
main(int(sys.argv[1]))

35
sandpile_array.py Normal file
View File

@ -0,0 +1,35 @@
"""Try using Python's arrays instead of lists."""
from array import array
import sys
def apply_gravity(terrain, width):
while True:
did_someting = False
for x in range(width ** 2):
if terrain[x] >= 4:
div, terrain[x] = divmod(terrain[x], 4)
terrain[x - 1] += div
terrain[x + 1] += div
terrain[x - width] += div
terrain[x + width] += div
did_someting = True
if not did_someting:
break
def main(height, show=True):
width = int(height ** .5) + 1
terrain = array("Q", [0] * width ** 2)
terrain[width // 2 * width + width // 2] = height
apply_gravity(terrain, width)
if show:
import numpy
numpy.set_printoptions(threshold=sys.maxsize, linewidth=999)
print(numpy.array(terrain).reshape((width, width)))
if __name__ == "__main__":
main(int(sys.argv[1]))

View File

@ -8,6 +8,9 @@
#include "common.c"
/// This one is the "smart" one, instead of doing a full scan of the
/// grid, a FIFO of points to fire is kept up to date each time a cell
/// is firered. Sadly "smart" is not smart.
typedef struct point {
int x;

44
sandpile_numba.py Normal file
View File

@ -0,0 +1,44 @@
"""Just compiling with numba."""
import numba
import numpy as np
@numba.njit(numba.void(numba.int64[:,:]))
def apply_gravity(terrain):
"""
$ python -m pyperf timeit --fast -s 'from examples.sandpile_numpy import main' 'main(10_000, False)'
Mean +- std dev: 19.2 ms +- 0.3 ms
$ python -m pyperf timeit --fast -s 'from examples.sandpile_numpy import main' 'main(100_000, False)'
Mean +- std dev: 1.83 sec +- 0.04 sec
2**16 grains : 1.6s
2**17 grains : 3.9s
2**18 grains : 13s
2**19 grains : 51s
2**20 grains : 3m5
"""
shape = np.shape(terrain)
while True:
did_someting = False
for x, y in np.ndindex(shape):
if terrain[x, y] >= 4:
div, terrain[x, y] = divmod(terrain[x][y], 4)
terrain[x - 1][y] += div
terrain[x + 1][y] += div
terrain[x][y + 1] += div
terrain[x][y - 1] += div
did_someting = True
if not did_someting:
return
def main(height, show=True):
width = int(height**0.5) + 1
terrain = np.zeros((width, width), dtype=np.int64)
terrain[width // 2, width // 2] = height
apply_gravity(terrain)
if show:
import sys
np.set_printoptions(threshold=sys.maxsize, linewidth=999)
print(terrain)

30
sandpile_numpy.py Normal file
View File

@ -0,0 +1,30 @@
"""Just using numpy."""
import numpy as np
import sys
def apply_gravity(terrain):
tumbled = terrain.copy()
while True:
np.divmod(terrain, 4, tumbled, terrain)
if not np.any(tumbled):
return
terrain[1:, :] += tumbled[:-1, :]
terrain[:-1, :] += tumbled[1:, :]
terrain[:, 1:] += tumbled[:, :-1]
terrain[:, :-1] += tumbled[:, 1:]
def main(height, show=True):
width = int(height**0.5) + 1
terrain = np.zeros((width, width), dtype=np.int64)
terrain[width // 2, width // 2] = height
apply_gravity(terrain)
if show:
np.set_printoptions(threshold=sys.maxsize, linewidth=999)
print(terrain)
if __name__ == "__main__":
main(int(sys.argv[1]))

View File

@ -7,6 +7,10 @@
#include "common.c"
/// This one stores, for each cells, 4 pointers to the west, south,
/// north, and east cells so incrementing them is easy.
typedef struct cell cell;
struct cell {

36
sandpile_pythran.py Normal file
View File

@ -0,0 +1,36 @@
"""Compiling with pythran."""
#compiling: pythran sandpile_pythran.py
import numpy as np
#pythran export apply_gravity(int[][])
def apply_gravity(terrain):
shape = np.shape(terrain)
while True:
did_someting = False
for x, y in np.ndindex(shape):
if terrain[x, y] >= 4:
div, terrain[x, y] = divmod(terrain[x][y], 4)
terrain[x - 1][y] += div
terrain[x + 1][y] += div
terrain[x][y + 1] += div
terrain[x][y - 1] += div
did_someting = True
if not did_someting:
return
#pythran export main(int, bool)
def main(height, show):
width = int(height ** .5) + 1
terrain = np.zeros((width, width), dtype=np.int64)
terrain[width // 2, width // 2] = height
apply_gravity(terrain)
if show:
print(np.array(terrain))

View File

@ -1,3 +1,7 @@
/// This is my fastest implemtation, it relies on the 8-fold symetry
/// of flattening a centered pile. I in fact only rely on a 4-fold
/// symmetry for simplicity, so it could be enchanced again.
/*
Some performance measures:

View File

@ -8,6 +8,10 @@
#include "common.c"
/// This is a merge of the `fifo` implementation and the `quarter`
/// one.
typedef struct terrain {
int width;
int **cells;

53
sandpile_recursive.py Normal file
View File

@ -0,0 +1,53 @@
"""This one is "recursive" but only to the left, where the linear scan
won't go until the next run. Those bonus fires can push grains to the
next line where they're picked again by the linear scan."""
import sys
def topple(terrain, x, y):
div, terrain[x][y] = divmod(terrain[x][y], 4)
terrain[x - 1][y] += div
if terrain[x-1][y] >= 4:
topple(terrain, x - 1, y)
terrain[x + 1][y] += div
terrain[x][y - 1] += div
terrain[x][y + 1] += div
def apply_gravity(terrain):
"""
$ python -m pyperf timeit --fast -s 'from examples.sandpile3 import main' 'main(10_000, False)'
...........
Mean +- std dev: 2.02 sec +- 0.04 sec
"""
width = len(terrain)
while True:
did_someting = False
for x in range(width):
for y in range(width):
if terrain[x][y] >= 4:
topple(terrain, x, y)
did_someting = True
if not did_someting:
return
def main(height, show=True):
width = int(height ** .5) + 1
terrain = [[0] * width for _ in range(width)]
terrain[width // 2][width // 2] = height
apply_gravity(terrain)
if show:
import numpy as np
np.set_printoptions(threshold=sys.maxsize, linewidth=999)
print(np.array(terrain))
if __name__ == "__main__":
main(int(sys.argv[1]))

View File

@ -7,6 +7,9 @@
#include "common.c"
/// This one is just to measure the cost of using a struct to store
/// together the width and the values.
typedef struct terrain {
int width;