191 lines
7.0 KiB
Markdown
191 lines
7.0 KiB
Markdown
|
Title: Python: Introducing ppipe : Parallel Pipe
|
||
|
|
||
|
> /!\ this was highly experimental code written in 2011.
|
||
|
> Today you should _NOT_ use it, just look at it if the subject amuses you.
|
||
|
> Better take a look at `asyncio` if you want this kind of tools.
|
||
|
|
||
|
I'll speak about my pipe python module so if you didn't know it, you
|
||
|
should first read [the first aricle about
|
||
|
pipes](http://dev-tricks.net/pipe-infix-syntax-for-python). The idea
|
||
|
behind `ppipe` (parallel pipe) is to transparently make an
|
||
|
asynchronous pipe, multithreaded or not. As multithreading isn't an
|
||
|
easy piece of code for everybody, with loads of pollutions like locks,
|
||
|
queues, giving code far away from the actual simple task you tried to
|
||
|
do... The idea is that one kind of multithreading can be nicely
|
||
|
handled with a simple design pattern well implemented in python: the
|
||
|
queue. The queue handles all the locking part so you don't have to
|
||
|
worry about it, just make your workers work, enqueue, dequeue, work
|
||
|
... but you still have to create workers! As pipe is not far away from
|
||
|
the concept of queue, as, every part of a pipe command works on a
|
||
|
piece of data and then git it to the next worker, it's not hard to
|
||
|
imagine an asynchronous pipe in which every part of a pipe command can
|
||
|
work at the same time. Then it's not hard to imagine that n threads
|
||
|
can be started for a single step of a pipe command, leading to a
|
||
|
completly multithreaded application having \~0 lines of code bloat for
|
||
|
the tread generation / synchronization in your actual code. So I tried
|
||
|
to implement it, keeping the actual contract which is very simple that
|
||
|
is : Every pipe should take an iterable as input. (I was tempted to
|
||
|
change it to 'every pipe must take a Queue as input' ... but if I
|
||
|
don't change the contract, normal pipes and parallel pipes should be
|
||
|
mixed.), so I created a branch you'll found on
|
||
|
[github](https://github.com/julienpalard/pipe/tree/parallel_pipe) with
|
||
|
a single new file 'ppipe.py' that, actually, is not 'importable' it's
|
||
|
only a proof of concept, that can be launched. Here is the test I
|
||
|
wrote using ppipe :
|
||
|
|
||
|
:::python
|
||
|
print "Normal execution :"
|
||
|
xrange(4) | where(fat_big_condition1) \
|
||
|
| where(fat_big_condition2) \
|
||
|
| add | lineout
|
||
|
|
||
|
print "Parallel with 1 worker"
|
||
|
xrange(4) | parallel_where(fat_big_condition1) \
|
||
|
| where(fat_big_condition2) \
|
||
|
| add | lineout
|
||
|
|
||
|
print "Parallel with 2 workers"
|
||
|
xrange(4) | parallel_where(fat_big_condition1, qte_of_workers=2) \
|
||
|
| parallel_where(fat_big_condition2, qte_of_workers=2) | add | stdout
|
||
|
|
||
|
print "Parallel with 4 workers"
|
||
|
xrange(4) | parallel_where(fat_big_condition1, qte_of_workers=4) \
|
||
|
| parallel_where(fat_big_condition2, qte_of_workers=4) | add | stdout
|
||
|
|
||
|
The idea is to compare normal pipe (Normal execution) with asynchronous
|
||
|
pipe (Parallel with 1 worker), as 1 worker is the default, and then 2
|
||
|
and 4 workers that can be given to a ppipe using 'qte\_of\_workers='.
|
||
|
fat\_big\_condition1 and 2 are just f\*cking long running piece of code
|
||
|
like fetching something far far away in the internet ... but for our
|
||
|
tests, let's use time.sleep:
|
||
|
|
||
|
:::python
|
||
|
def fat_big_condition1(x):
|
||
|
log(1, "Working...")
|
||
|
time.sleep(2)
|
||
|
log(1, "Done !")
|
||
|
return 1
|
||
|
|
||
|
def fat_big_condition2(x):
|
||
|
log(2, "Working...")
|
||
|
time.sleep(2)
|
||
|
log(2, "Done !")
|
||
|
return 1
|
||
|
|
||
|
They always return 1... and they log using a simple log function that
|
||
|
make fat\_big\_condition1 to log in the left column and
|
||
|
fat\_big\_condition2 to log in the right column:
|
||
|
|
||
|
:::python
|
||
|
stdoutlock = Lock()
|
||
|
def log(column, text):
|
||
|
stdoutlock.acquire()
|
||
|
print ' ' * column * 10,
|
||
|
print str(datetime.now().time().strftime("%S")),
|
||
|
print text
|
||
|
stdoutlock.release()
|
||
|
|
||
|
And that is the output (integers are the current second, so the times
|
||
|
didn't start at 0...):
|
||
|
|
||
|
Normal execution :
|
||
|
57 Working...
|
||
|
59 Done !
|
||
|
59 Working...
|
||
|
01 Done !
|
||
|
01 Working...
|
||
|
03 Done !
|
||
|
03 Working...
|
||
|
05 Done !
|
||
|
05 Working...
|
||
|
07 Done !
|
||
|
07 Working...
|
||
|
09 Done !
|
||
|
09 Working...
|
||
|
11 Done !
|
||
|
11 Working...
|
||
|
13 Done !
|
||
|
|
||
|
// As you can see here, only one condition is executed at a time,
|
||
|
// that is a normal behavior for a non-threaded program.
|
||
|
|
||
|
Parallel with 1 worker
|
||
|
13 Working...
|
||
|
15 Done !
|
||
|
15 Working...
|
||
|
15 Working...
|
||
|
17 Done !
|
||
|
17 Done !
|
||
|
17 Working...
|
||
|
17 Working...
|
||
|
19 Done !
|
||
|
19 Done !
|
||
|
19 Working...
|
||
|
19 Working...
|
||
|
21 Done !
|
||
|
21 Done !
|
||
|
21 Working...
|
||
|
23 Done !
|
||
|
|
||
|
// Just adding parallel_ to the first where, you now see that it's
|
||
|
// asynchronous and that the two conditions can work at the
|
||
|
// same time, interlacing a bit the output.
|
||
|
|
||
|
Parallel with 2 workers
|
||
|
23 Working...
|
||
|
23 Working...
|
||
|
25 Done !
|
||
|
25 Working...
|
||
|
25 Done !
|
||
|
25 Working...
|
||
|
25 Working...
|
||
|
25 Working...
|
||
|
27 Done !
|
||
|
27 Done !
|
||
|
27 Done !
|
||
|
27 Working...
|
||
|
27 Done !
|
||
|
27 Working...
|
||
|
29 Done !
|
||
|
29 Done !
|
||
|
|
||
|
|
||
|
Parallel with 4 workers
|
||
|
55 Working...
|
||
|
55 Working...
|
||
|
55 Working...
|
||
|
55 Working...
|
||
|
57 Done !
|
||
|
57 Done !
|
||
|
57 Done !
|
||
|
57 Done !
|
||
|
57 Working...
|
||
|
57 Working...
|
||
|
57 Working...
|
||
|
57 Working...
|
||
|
59 Done !
|
||
|
59 Done !
|
||
|
59 Done !
|
||
|
59 Done !
|
||
|
|
||
|
// And now with 2 and 4 workers you can clearly see what
|
||
|
// happens, with 2 workers, input is computed by pairs,
|
||
|
// and with 4 threads, all the input can be computed at once
|
||
|
// but the 4 workers of the 2nd condition have to wait the data
|
||
|
// before starting to work, so in the last test, you have 8 threads,
|
||
|
// only the 4 firsts are working the 2 first second, then only the 4
|
||
|
// others works.
|
||
|
|
||
|
To make the creation of ppipe simple, I excluded all the 'threading'
|
||
|
part in a function usable as a decorator, so writing a parallel\_where
|
||
|
give :
|
||
|
|
||
|
:::python
|
||
|
@Pipe
|
||
|
@Threaded
|
||
|
def parallel_where(item, output, condition):
|
||
|
if condition(item):
|
||
|
output.put(item)
|
||
|
|
||
|
You can see the queue here! :-) Enjoy!
|