The first thing I found was that the original Python code uses a package called
multiprocessing. The purpose of this module is to parallelize code across cores on a multi-core machine. The researchers had in part identified DCP as a potentially valuable tool because they already knew their task was simply parallelizable.
So they essentially took a
for loop in their code that was perfectly parallel and did the syntax juggling needed to use
multiprocessing. Here’s an excerpt from my test case as an example:
def test(**kwargs): # option 1 - simple for loop results = [func(x, **kwargs) for x in xx] # option 2 - parallel for loop packedfunc = partial(func, **kwargs) pool = mp.Pool() results = pool.map(packedfunc, xx, chunksize=1) pool.close() pool.join() # either way we get back results return results
In the real case we had a function calling a function calling a function so I suspected some overhead their played a role, but we can ignore the details of
func because this
timeit.timeit("test(amplitude=0.2, epsilon=0.1)", setup="from __main__ import test", number=2000)
243.281s for option 2 and
1.432s for option 1.
So the first lesson is: don’t assume that because you are using the
multiprocessing function you are getting a -fold speed up where is the number of cores you have. Someone could poke through my example (it was stripped down to just 36 lines, 10 of which are above) and figure out exactly why but in the context of DCP there is one important factor to consider: Python does not support multi-core execution through multithreading. It spins up a new Python interpreter on each core and that costs overhead.
My example code exacerbates this because it executes so quickly (just a couple of dummy multiplications and array manipulations), but even in the real-world code this slow-down was around 8x and those calculations took a minute or more to run. So what else could it be?
When I dug deeper into the timing issue I found that in addition to
A quick google search of speed comparisons between them confirmed what I was seeing, when it comes to crunching numbers, Node beats Python:
- High portability (any device with a browser)
- Great performance (thanks to the wizards working on V8)
- Easy parallelizability (thanks to DCP)