Adding Python implementations.
This commit is contained in:
parent
22a489d9f8
commit
316195cd64
143
README.md
143
README.md
|
@ -7,80 +7,123 @@ I'm focusing only on the "flattening" step which I like to call "apply_gravity"
|
|||
I'm focusing only in flattening a single huge pile of sand placed in the middle.
|
||||
|
||||
Here the performance I'm getting on an intel `i9-9980HK` to flatten a
|
||||
100_000 sand grain tower:
|
||||
20_000 sand grain tower:
|
||||
|
||||
|
||||
## sandpile_quarter.c
|
||||
# Profiling Python implementations
|
||||
|
||||
This is my fastest implemtation, it relies on the 8-fold symetry of
|
||||
flattening a centered pile. I in fact only rely on a 4-fold symmetry
|
||||
for simplicity, so it could be enchanced again.
|
||||
## sandpile.py
|
||||
|
||||
> Mean +- std dev: 208 ms +- 5 ms
|
||||
This is the easy implementation, just applying the fireing rule repeateadly.
|
||||
> Mean +- std dev: 8.87 sec +- 0.27 sec
|
||||
|
||||
|
||||
## sandpile_arithmetic.py
|
||||
|
||||
This one uses integer division to fire columns instead of
|
||||
repeteadly removing 4 grains.
|
||||
> Mean +- std dev: 1.85 sec +- 0.04 sec
|
||||
|
||||
|
||||
## sandpile_array.py
|
||||
|
||||
Try using Python's arrays instead of lists.
|
||||
> Mean +- std dev: 3.13 sec +- 0.12 sec
|
||||
|
||||
|
||||
## sandpile_numba.py
|
||||
|
||||
Just compiling with numba.
|
||||
> Mean +- std dev: 68.7 ms +- 1.2 ms
|
||||
|
||||
|
||||
## sandpile_numpy.py
|
||||
|
||||
Just using numpy.
|
||||
> Mean +- std dev: 972 ms +- 20 ms
|
||||
|
||||
|
||||
## sandpile_pythran.py
|
||||
|
||||
> Mean +- std dev: 306 ms +- 8 ms
|
||||
|
||||
|
||||
## sandpile_recursive.py
|
||||
|
||||
This one is "recursive" but only to the left, where the linear scan
|
||||
won't go until the next run. Those bonus fires can push grains to the
|
||||
next line where they're picked again by the linear scan.
|
||||
> Mean +- std dev: 1.48 sec +- 0.03 sec
|
||||
|
||||
|
||||
# Profiling C implementations
|
||||
|
||||
## sandpile.c
|
||||
|
||||
This is the "dumb" implementation in C.
|
||||
> command: Mean +- std dev: 31.6 ms +- 0.2 ms
|
||||
|
||||
|
||||
## sandpile_1d.c
|
||||
|
||||
This one does not rely on pointers of pointers to represent a 2d
|
||||
array. So no `[x][y]` access. Instead it represents the surface as an
|
||||
1d line, in which, if the current cell is `i`:
|
||||
array. So no `[x][y]` access. Instead it represents the surface
|
||||
as an 1d line, in which, if the current cell is `i`:
|
||||
|
||||
- the previous cell is at `surface[i - 1]`
|
||||
- the cell in the next row is at `surface[i + width]`
|
||||
- the cell in the previous row is at `surface[i - width]`
|
||||
- the next cell is at `surface[i + 1]`
|
||||
|
||||
> Mean +- std dev: 731 ms +- 9 ms
|
||||
|
||||
|
||||
## sandpile.c
|
||||
|
||||
This is the "dumb" implementation.
|
||||
|
||||
> Mean +- std dev: 750 ms +- 10 ms
|
||||
|
||||
|
||||
## sandpile_struct.c
|
||||
|
||||
This one is just to measure the cost of using a struct to store together the width and the values.
|
||||
|
||||
> Mean +- std dev: 936 ms +- 16 ms
|
||||
|
||||
|
||||
## sandpile_quarter_fifo.c
|
||||
|
||||
This is a merge of the `fifo` implementation and the `quarter` one.
|
||||
|
||||
> Mean +- std dev: 831 ms +- 14 ms
|
||||
|
||||
|
||||
## sandpile_pointers.c
|
||||
|
||||
This one stores, for each cells, 4 pointers to the west, south, north,
|
||||
and east cells so incrementing them is easy.
|
||||
|
||||
> Mean +- std dev: 933 ms +- 27 ms
|
||||
> command: Mean +- std dev: 30.0 ms +- 0.2 ms
|
||||
|
||||
|
||||
## sandpile_add.c
|
||||
|
||||
This one starts with a small (1024 typically) power of two stack of
|
||||
sand grains, applies gravity on it, and multiply the resulting grid by
|
||||
two. This is done iteratively until the needed number of grains is
|
||||
reached.
|
||||
This one starts with a small (1024 typically) power of two stack
|
||||
of sand grains, applies gravity on it, and multiply the resulting
|
||||
grid by two. This is done iteratively until the needed number of
|
||||
grains is reached.
|
||||
|
||||
The resulting grid is the same, but the performance are not better.
|
||||
|
||||
> Mean +- std dev: 1.18 sec +- 0.03 sec
|
||||
The resulting grid is the same, but the performance are not
|
||||
better.
|
||||
> command: Mean +- std dev: 52.0 ms +- 0.6 ms
|
||||
|
||||
|
||||
## sandpile_fifo.c
|
||||
|
||||
This one is the "smart" one, instead of doing a full scan of the grid,
|
||||
a FIFO of points to fire is kept up to date each time a cell is
|
||||
firered.
|
||||
This one is the "smart" one, instead of doing a full scan of the
|
||||
grid, a FIFO of points to fire is kept up to date each time a cell
|
||||
is firered. Sadly "smart" is not smart.
|
||||
> command: Mean +- std dev: 114 ms +- 3 ms
|
||||
|
||||
> Mean +- std dev: 2.81 sec +- 0.05 sec
|
||||
|
||||
## sandpile_pointers.c
|
||||
|
||||
This one stores, for each cells, 4 pointers to the west, south,
|
||||
north, and east cells so incrementing them is easy.
|
||||
> command: Mean +- std dev: 38.2 ms +- 0.4 ms
|
||||
|
||||
|
||||
## sandpile_quarter.c
|
||||
|
||||
This is my fastest implemtation, it relies on the 8-fold symetry
|
||||
of flattening a centered pile. I in fact only rely on a 4-fold
|
||||
symmetry for simplicity, so it could be enchanced again.
|
||||
> command: Mean +- std dev: 8.75 ms +- 0.04 ms
|
||||
|
||||
|
||||
## sandpile_quarter_fifo.c
|
||||
|
||||
This is a merge of the `fifo` implementation and the `quarter`
|
||||
one.
|
||||
> command: Mean +- std dev: 34.1 ms +- 0.4 ms
|
||||
|
||||
|
||||
## sandpile_struct.c
|
||||
|
||||
This one is just to measure the cost of using a struct to store
|
||||
together the width and the values.
|
||||
> command: Mean +- std dev: 37.5 ms +- 0.3 ms
|
||||
|
||||
|
||||
## TODO
|
||||
|
|
44
check.sh
44
check.sh
|
@ -1,12 +1,38 @@
|
|||
for file in sandpile.c sandpile_*.c
|
||||
TARGET=20000
|
||||
|
||||
echo "# Profiling Python implementations"
|
||||
echo
|
||||
for file in sandpile*.py
|
||||
do
|
||||
echo "## $file"
|
||||
dest=${file%.c}
|
||||
cc -DNDEBUG -Ofast -W -Wall -pedantic --std=c99 "$file" -lm -lpng -o "$dest"
|
||||
rm -f "$dest-100000.png"
|
||||
python -m pyperf command --fast --quiet "./$dest" 100000
|
||||
./"$dest" 16 --stdout
|
||||
./"$dest" 100000 "$dest-100000.png"
|
||||
printf "## %s\n\n" "$file"
|
||||
module="${file%.py}"
|
||||
python -c "print(__import__('$module').__doc__)"
|
||||
grep '^#compiling:' "$file" | cut -d: -f2- | bash -x
|
||||
printf "> "
|
||||
python -m pyperf timeit --fast --quiet --setup "from $module import main" "main($TARGET, False)"
|
||||
python -c "__import__('$module').main($TARGET, True)" > $module-$TARGET.txt
|
||||
echo
|
||||
echo
|
||||
done
|
||||
sha1sum ./*-100000.png
|
||||
|
||||
sha1sum ./*-$TARGET.txt
|
||||
rm ./*-$TARGET.txt
|
||||
|
||||
echo
|
||||
echo "# Profiling C implementations"
|
||||
echo
|
||||
for file in sandpile.c sandpile_*.c
|
||||
do
|
||||
printf "## %s\n\n" "$file"
|
||||
grep '^///' "$file" | cut -b5-
|
||||
dest="${file%.c}"
|
||||
cc -DNDEBUG -Ofast -W -Wall -pedantic --std=c99 "$file" -lm -lpng -o "$dest"
|
||||
rm -f "$dest-$TARGET.png"
|
||||
printf "> "
|
||||
python -m pyperf command --quiet "./$dest" $TARGET
|
||||
./"$dest" $TARGET "$dest-$TARGET.png"
|
||||
echo
|
||||
echo
|
||||
done
|
||||
sha1sum ./*-$TARGET.png
|
||||
rm ./*-$TARGET.png
|
||||
|
|
|
@ -0,0 +1,5 @@
|
|||
numpy
|
||||
numba
|
||||
pyperf
|
||||
pythran
|
||||
cython
|
|
@ -8,6 +8,9 @@
|
|||
#include "common.c"
|
||||
|
||||
|
||||
/// This is the "dumb" implementation in C.
|
||||
|
||||
|
||||
void apply_gravity(int width, int **terrain)
|
||||
{
|
||||
bool did_something;
|
||||
|
|
|
@ -0,0 +1,40 @@
|
|||
"""This is the easy implementation, just applying the fireing rule repeateadly."""
|
||||
|
||||
import sys
|
||||
|
||||
|
||||
def apply_gravity(terrain):
|
||||
"""
|
||||
$ python -m pyperf timeit --fast -s 'from examples.sandpile1 import main' 'main(10_000, False)'
|
||||
...........
|
||||
Mean +- std dev: 11.1 sec +- 0.2 sec
|
||||
"""
|
||||
while True:
|
||||
width = len(terrain)
|
||||
did_someting = False
|
||||
for x in range(width):
|
||||
for y in range(width):
|
||||
if terrain[x][y] >= 4:
|
||||
terrain[x][y] -= 4
|
||||
terrain[x - 1][y] += 1
|
||||
terrain[x + 1][y] += 1
|
||||
terrain[x][y + 1] += 1
|
||||
terrain[x][y - 1] += 1
|
||||
did_someting = True
|
||||
if not did_someting:
|
||||
return
|
||||
|
||||
|
||||
def main(height, show=True):
|
||||
width = int(height ** .5) + 1
|
||||
terrain = [[0] * width for _ in range(width)]
|
||||
terrain[width // 2][width // 2] = height
|
||||
apply_gravity(terrain)
|
||||
if show:
|
||||
import numpy
|
||||
numpy.set_printoptions(threshold=sys.maxsize, linewidth=999)
|
||||
print(numpy.array(terrain))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main(int(sys.argv[1]))
|
|
@ -7,6 +7,15 @@
|
|||
|
||||
#include "common.c"
|
||||
|
||||
/// This one does not rely on pointers of pointers to represent a 2d
|
||||
/// array. So no `[x][y]` access. Instead it represents the surface
|
||||
/// as an 1d line, in which, if the current cell is `i`:
|
||||
///
|
||||
/// - the previous cell is at `surface[i - 1]`
|
||||
/// - the cell in the next row is at `surface[i + width]`
|
||||
/// - the cell in the previous row is at `surface[i - width]`
|
||||
/// - the next cell is at `surface[i + 1]`
|
||||
|
||||
|
||||
void apply_gravity(int width, int *terrain)
|
||||
{
|
||||
|
|
|
@ -8,6 +8,14 @@
|
|||
#include "common.c"
|
||||
|
||||
|
||||
/// This one starts with a small (1024 typically) power of two stack
|
||||
/// of sand grains, applies gravity on it, and multiply the resulting
|
||||
/// grid by two. This is done iteratively until the needed number of
|
||||
/// grains is reached.
|
||||
///
|
||||
/// The resulting grid is the same, but the performance are not
|
||||
/// better.
|
||||
|
||||
void sandpile_multiply(int width, int **terrain, int mul)
|
||||
{
|
||||
for (int x = 0; x < width; x++)
|
||||
|
|
|
@ -0,0 +1,41 @@
|
|||
"""This one uses integer division to fire columns instead of
|
||||
repeteadly removing 4 grains."""
|
||||
|
||||
import sys
|
||||
|
||||
|
||||
def apply_gravity(terrain):
|
||||
"""
|
||||
$ python -m pyperf timeit --fast -s 'from examples.sandpile2 import main' 'main(10_000, False)'
|
||||
...........
|
||||
Mean +- std dev: 2.42 sec +- 0.04 sec
|
||||
"""
|
||||
width = len(terrain)
|
||||
while True:
|
||||
did_someting = False
|
||||
for x in range(width):
|
||||
for y in range(width):
|
||||
if terrain[x][y] >= 4:
|
||||
div, terrain[x][y] = divmod(terrain[x][y], 4)
|
||||
terrain[x - 1][y] += div
|
||||
terrain[x + 1][y] += div
|
||||
terrain[x][y + 1] += div
|
||||
terrain[x][y - 1] += div
|
||||
did_someting = True
|
||||
if not did_someting:
|
||||
break
|
||||
|
||||
|
||||
def main(height, show=True):
|
||||
width = int(height ** .5) + 1
|
||||
terrain = [[0] * width for _ in range(width)]
|
||||
terrain[width // 2][width // 2] = height
|
||||
apply_gravity(terrain)
|
||||
if show:
|
||||
import numpy
|
||||
numpy.set_printoptions(threshold=sys.maxsize, linewidth=999)
|
||||
print(numpy.array(terrain))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main(int(sys.argv[1]))
|
|
@ -0,0 +1,35 @@
|
|||
"""Try using Python's arrays instead of lists."""
|
||||
|
||||
|
||||
from array import array
|
||||
import sys
|
||||
|
||||
|
||||
def apply_gravity(terrain, width):
|
||||
while True:
|
||||
did_someting = False
|
||||
for x in range(width ** 2):
|
||||
if terrain[x] >= 4:
|
||||
div, terrain[x] = divmod(terrain[x], 4)
|
||||
terrain[x - 1] += div
|
||||
terrain[x + 1] += div
|
||||
terrain[x - width] += div
|
||||
terrain[x + width] += div
|
||||
did_someting = True
|
||||
if not did_someting:
|
||||
break
|
||||
|
||||
|
||||
def main(height, show=True):
|
||||
width = int(height ** .5) + 1
|
||||
terrain = array("Q", [0] * width ** 2)
|
||||
terrain[width // 2 * width + width // 2] = height
|
||||
apply_gravity(terrain, width)
|
||||
if show:
|
||||
import numpy
|
||||
numpy.set_printoptions(threshold=sys.maxsize, linewidth=999)
|
||||
print(numpy.array(terrain).reshape((width, width)))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main(int(sys.argv[1]))
|
|
@ -8,6 +8,9 @@
|
|||
|
||||
#include "common.c"
|
||||
|
||||
/// This one is the "smart" one, instead of doing a full scan of the
|
||||
/// grid, a FIFO of points to fire is kept up to date each time a cell
|
||||
/// is firered. Sadly "smart" is not smart.
|
||||
|
||||
typedef struct point {
|
||||
int x;
|
||||
|
|
|
@ -0,0 +1,44 @@
|
|||
"""Just compiling with numba."""
|
||||
import numba
|
||||
import numpy as np
|
||||
|
||||
|
||||
@numba.njit(numba.void(numba.int64[:,:]))
|
||||
def apply_gravity(terrain):
|
||||
"""
|
||||
$ python -m pyperf timeit --fast -s 'from examples.sandpile_numpy import main' 'main(10_000, False)'
|
||||
Mean +- std dev: 19.2 ms +- 0.3 ms
|
||||
$ python -m pyperf timeit --fast -s 'from examples.sandpile_numpy import main' 'main(100_000, False)'
|
||||
Mean +- std dev: 1.83 sec +- 0.04 sec
|
||||
|
||||
2**16 grains : 1.6s
|
||||
2**17 grains : 3.9s
|
||||
2**18 grains : 13s
|
||||
2**19 grains : 51s
|
||||
2**20 grains : 3m5
|
||||
"""
|
||||
shape = np.shape(terrain)
|
||||
while True:
|
||||
did_someting = False
|
||||
for x, y in np.ndindex(shape):
|
||||
if terrain[x, y] >= 4:
|
||||
div, terrain[x, y] = divmod(terrain[x][y], 4)
|
||||
terrain[x - 1][y] += div
|
||||
terrain[x + 1][y] += div
|
||||
terrain[x][y + 1] += div
|
||||
terrain[x][y - 1] += div
|
||||
did_someting = True
|
||||
|
||||
if not did_someting:
|
||||
return
|
||||
|
||||
|
||||
def main(height, show=True):
|
||||
width = int(height**0.5) + 1
|
||||
terrain = np.zeros((width, width), dtype=np.int64)
|
||||
terrain[width // 2, width // 2] = height
|
||||
apply_gravity(terrain)
|
||||
if show:
|
||||
import sys
|
||||
np.set_printoptions(threshold=sys.maxsize, linewidth=999)
|
||||
print(terrain)
|
|
@ -0,0 +1,30 @@
|
|||
"""Just using numpy."""
|
||||
|
||||
import numpy as np
|
||||
import sys
|
||||
|
||||
|
||||
def apply_gravity(terrain):
|
||||
tumbled = terrain.copy()
|
||||
while True:
|
||||
np.divmod(terrain, 4, tumbled, terrain)
|
||||
if not np.any(tumbled):
|
||||
return
|
||||
terrain[1:, :] += tumbled[:-1, :]
|
||||
terrain[:-1, :] += tumbled[1:, :]
|
||||
terrain[:, 1:] += tumbled[:, :-1]
|
||||
terrain[:, :-1] += tumbled[:, 1:]
|
||||
|
||||
|
||||
def main(height, show=True):
|
||||
width = int(height**0.5) + 1
|
||||
terrain = np.zeros((width, width), dtype=np.int64)
|
||||
terrain[width // 2, width // 2] = height
|
||||
apply_gravity(terrain)
|
||||
if show:
|
||||
np.set_printoptions(threshold=sys.maxsize, linewidth=999)
|
||||
print(terrain)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main(int(sys.argv[1]))
|
|
@ -7,6 +7,10 @@
|
|||
|
||||
#include "common.c"
|
||||
|
||||
|
||||
/// This one stores, for each cells, 4 pointers to the west, south,
|
||||
/// north, and east cells so incrementing them is easy.
|
||||
|
||||
typedef struct cell cell;
|
||||
|
||||
struct cell {
|
||||
|
|
|
@ -0,0 +1,36 @@
|
|||
"""Compiling with pythran."""
|
||||
|
||||
#compiling: pythran sandpile_pythran.py
|
||||
|
||||
import numpy as np
|
||||
|
||||
|
||||
#pythran export apply_gravity(int[][])
|
||||
|
||||
def apply_gravity(terrain):
|
||||
shape = np.shape(terrain)
|
||||
while True:
|
||||
did_someting = False
|
||||
for x, y in np.ndindex(shape):
|
||||
if terrain[x, y] >= 4:
|
||||
div, terrain[x, y] = divmod(terrain[x][y], 4)
|
||||
terrain[x - 1][y] += div
|
||||
terrain[x + 1][y] += div
|
||||
terrain[x][y + 1] += div
|
||||
terrain[x][y - 1] += div
|
||||
did_someting = True
|
||||
|
||||
if not did_someting:
|
||||
return
|
||||
|
||||
|
||||
|
||||
#pythran export main(int, bool)
|
||||
|
||||
def main(height, show):
|
||||
width = int(height ** .5) + 1
|
||||
terrain = np.zeros((width, width), dtype=np.int64)
|
||||
terrain[width // 2, width // 2] = height
|
||||
apply_gravity(terrain)
|
||||
if show:
|
||||
print(np.array(terrain))
|
|
@ -1,3 +1,7 @@
|
|||
/// This is my fastest implemtation, it relies on the 8-fold symetry
|
||||
/// of flattening a centered pile. I in fact only rely on a 4-fold
|
||||
/// symmetry for simplicity, so it could be enchanced again.
|
||||
|
||||
/*
|
||||
|
||||
Some performance measures:
|
||||
|
|
|
@ -8,6 +8,10 @@
|
|||
|
||||
#include "common.c"
|
||||
|
||||
/// This is a merge of the `fifo` implementation and the `quarter`
|
||||
/// one.
|
||||
|
||||
|
||||
typedef struct terrain {
|
||||
int width;
|
||||
int **cells;
|
||||
|
|
|
@ -0,0 +1,53 @@
|
|||
"""This one is "recursive" but only to the left, where the linear scan
|
||||
won't go until the next run. Those bonus fires can push grains to the
|
||||
next line where they're picked again by the linear scan."""
|
||||
|
||||
import sys
|
||||
|
||||
|
||||
def topple(terrain, x, y):
|
||||
div, terrain[x][y] = divmod(terrain[x][y], 4)
|
||||
|
||||
terrain[x - 1][y] += div
|
||||
|
||||
if terrain[x-1][y] >= 4:
|
||||
topple(terrain, x - 1, y)
|
||||
|
||||
terrain[x + 1][y] += div
|
||||
terrain[x][y - 1] += div
|
||||
terrain[x][y + 1] += div
|
||||
|
||||
|
||||
|
||||
def apply_gravity(terrain):
|
||||
"""
|
||||
$ python -m pyperf timeit --fast -s 'from examples.sandpile3 import main' 'main(10_000, False)'
|
||||
...........
|
||||
Mean +- std dev: 2.02 sec +- 0.04 sec
|
||||
|
||||
"""
|
||||
width = len(terrain)
|
||||
while True:
|
||||
did_someting = False
|
||||
for x in range(width):
|
||||
for y in range(width):
|
||||
if terrain[x][y] >= 4:
|
||||
topple(terrain, x, y)
|
||||
did_someting = True
|
||||
if not did_someting:
|
||||
return
|
||||
|
||||
|
||||
def main(height, show=True):
|
||||
width = int(height ** .5) + 1
|
||||
terrain = [[0] * width for _ in range(width)]
|
||||
terrain[width // 2][width // 2] = height
|
||||
apply_gravity(terrain)
|
||||
if show:
|
||||
import numpy as np
|
||||
np.set_printoptions(threshold=sys.maxsize, linewidth=999)
|
||||
print(np.array(terrain))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main(int(sys.argv[1]))
|
|
@ -7,6 +7,9 @@
|
|||
|
||||
#include "common.c"
|
||||
|
||||
/// This one is just to measure the cost of using a struct to store
|
||||
/// together the width and the values.
|
||||
|
||||
|
||||
typedef struct terrain {
|
||||
int width;
|
||||
|
|
Loading…
Reference in New Issue