python multiprocessing and threads 01 — pydata: Huiming's learning notes

1. `Multiprocessing` v.s. `Threading` in Python

multiprocessing会开多线程(process)，各个线程有独立的memory；threading开多进程(threads)，他们共享内存。python开多进程比开多线程要快，但是多进程会被GIL堵住，如果是高CPU使用的环境多进程反而会非常慢，因为需要时间来切换进程。如果I/O很多，开多进程会显著提高速度。

The threading module uses threads, the multiprocessing uses processes. The difference is that threads run in the same memory space, while processes have separate memory. This makes it a bit harder to share objects between processes with multiprocessing. Since threads use the same memory, precautions have to be taken or two threads will write to the same memory at the same time. This is what the global interpreter lock is for.

Spawning processes is a bit slower than spawning threads. Once they are running, there is not much difference.

`Multiprocessing`

Pros

Separate memory space
Code is usually straightforward
Takes advantage of multiple CPUs & cores
Avoids GIL limitations for cPython
Eliminates most needs for synchronization primitives unless if you use shared memory (instead, it's more of a communication model for IPC)
Child processes are interruptible/killable
Python multiprocessing module includes useful abstractions with an interface much like threading.Thread
A must with cPython for CPU-bound processing

Cons

IPC a little more complicated with more overhead (communication model vs. shared memory/objects)
Larger memory footprint

Threading

Pros

Lightweight - low memory footprint
Shared memory - makes access to state from another context easier
Allows you to easily make responsive UIs
cPython C extension modules that properly release the GIL will run in parallel
Great option for I/O-bound applications

Cons

cPython - subject to the GIL
Not interruptible/killable
If not following a command queue/message pump model (using the Queue module), then manual use of synchronization primitives become a necessity (decisions are needed for the granularity of locking)
Code is usually harder to understand and to get right - the potential for race conditions increases dramatically

Multiprocessing vs Threading Python

2. `multiprocessing.Process` v.s. `multiprocessing.Pool`

开多线程有两个办法，既可以multiprocessing.Process,也可以使用multiprocessing.Pool.按照下面这个reference的说法，multiprocessing.Process会一个进程run一个worker，multiprocessing.Pool会交替run，但是结果应该一样。

Python Multiprocessing Process or Pool for what I am doing?

3. `apply`, `apply_async` v.s. `map`, `map_async`

multiprocessing.Pool又有apply (apply_async)或者map (map_async)，我自己的测试是apply只会启动一个进程，运行结束才会启动下一个进程，即使在pool里面设置10个进程的参数；apply_async会同时启动10个进程，但是结果会无序，很可能会跟输入的data顺序不一样；map会启动10个进程，结果会原来的data一样。map跟map_async没什么区别。

`apply` v.s. `apply_async`

Pool.apply is like Python apply, except that the function call is performed in a separate process. Pool.apply blocks until the function is completed.

Pool.apply_async is also like Python's built-in apply, except that the call returns immediately instead of waiting for the result. An ApplyResult object is returned. You call its get() method to retrieve the result of the function call. The get() method blocks until the function is completed. Thus, pool.apply(func, args, kwargs) is equivalent to pool.apply_async(func, args, kwargs).get().

`apply` v.s. `map`

pool.apply(f, args): f is only executed in ONE of the workers of the pool. So ONE of the processes in the pool will run f(args).

pool.map(f, iterable): This method chops the iterable into a number of chunks which it submits to the process pool as separate tasks. So you take advantage of all the processes in the pool.

Python multiprocessing.Pool: when to use apply, apply_async or map?

Reference

Multiprocessing vs Threading Python 总结
Python - parallelizing CPU-bound tasks with multiprocessing 实际计算时间比较
多进程进程/线程比较
Python多进程模块Multiprocessing介绍 map/apply怎么用
异步
Python Multiprocessing Process or Pool for what I am doing? multiprocessing.Process multiprocessing.Pool
Python multiprocessing.Pool: when to use apply, apply_async or map? -apply-apply-async-or-map
Multiprocessing: How to use Pool.map on a function defined in a class?

1. Multiprocessing v.s. Threading in Python

Multiprocessing

Pros

Cons

Threading

Pros

Cons

2. multiprocessing.Process v.s. multiprocessing.Pool

3. apply, apply_async v.s. map, map_async

apply v.s. apply_async

apply v.s. map

Reference

1. `Multiprocessing` v.s. `Threading` in Python

`Multiprocessing`

2. `multiprocessing.Process` v.s. `multiprocessing.Pool`

3. `apply`, `apply_async` v.s. `map`, `map_async`

`apply` v.s. `apply_async`

`apply` v.s. `map`