Multiprocessing in Python and Jupyter Notebooks on Windows

Table of Contents

When you are using Python for data science you will probably process and analyze large amounts of data. To speed up your code you can use the multiprocessing library (API documentation). This library is build into Python and will allow you to run multiple processes in parallel. When you’re using multiprocessing on a Unix based operating system you’re basically golden and everything works as expected. When using multiprocessing in Windows however, there are some weird quicks to keep in mind. If you try to use it in a Jupyter Notebook on Windows you will run into even more problems. Luckily there are some workarounds to get it working.

Multiprocessing on Linux vs. Windows #

The main difference between Windows and an Unix based OS is that Windows does not have a fork() system call. This means that when you start a new process in Windows it will spawn a new process, that is, start a new Python interpreter and import the main module. This means that the main module will be executed again from the beginning, where in Unix it will start at the point where you called multiprocessing.Process().

How can we prove this behavior? Let’s create a simple Python script that prints a message when it starts and when it ends.

import multiprocessing as mp
from time import sleep

print('Before defining mp_func')

def mp_func():
    print('Starting mp_func')
    sleep(1)
    print('Finishing mp_func')

if __name__ == '__main__':
    p = mp.Process(target=mp_func)
    p.start()
    print('Waiting for mp_func to end')
    p.join()

When we run this script in Linux we get the following output:

Before defining mp_func
Waiting for mp_func to end
Starting mp_func
Finishing mp_func

This is the behaviour we would expect to see. Now when we run this script in Windows we get the following output:

Before defining mp_func
Waiting for mp_func to end
Before defining mp_func
Starting mp_func
Finishing mp_func

We see that that Before defining mp_func is printed twice. This is because the main module is executed twice. The first time when we run the script and a second time when we execute mp.Process() to start a new process. This is because Windows spawning the new process instead of forking it.

Now it also becomes clear why the if __name__ == '__main__': block at the end is necessary. If we omit the if __name__ == '__main__': block and instead include the code directly in the main module we would get an infinite loop on Windows because the main module will be executed again and again.

Luckily, Python doesn’t enter an infinite loop but instead throws a RuntimeError when you try to start a new process without the if __name__ == '__main__': block present:

RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

Multiprocessing in Jupyter Notebooks #

The most common usecase for multiprocessing in Jupyter Notebooks is parrallelizing a for-loop. Let’s say we want to calculate the square of a list of numbers. We can do this in a for-loop like this:

def square(x):
    return x**2

numbers = [1, 2, 3, 4, 5]
squares = []

for number in numbers:
    squares.append(square(number))

print(squares)

Which will give us the following output:

[1, 4, 9, 16, 25]

Now let’s say we want to speed up this code by using multiprocessing. We can do this by using the multiprocessing.Pool() class. This class will allow us to run multiple processes in parallel. We can use the map() method to apply a function to a list of arguments. Let’s see how this works:

from multiprocessing import Pool

def square(x):
    return x**2

numbers = [1, 2, 3, 4, 5]

with Pool() as pool:
    squares = pool.map(square, numbers)

print(squares)

This will work only when you run it on a Unix based operating system. When you try to run this code on Windows however, you will get an AttributeError.

On an Unix based OS this will give us the following output:

[1, 4, 9, 16, 25]

What happens under the hood when Python spawns a process is that python pickles (serializes) the arguments and the main module and sends them to the new process. The new process will create a new Python interpreter and re-import the main module, because of this the main module will be executed again from the beginning. The new process will call the mp_func() function with the unpickled arguments as input arguments. After the new process has finished executing the main module it will pickle (serialize) the return value of the mp_func() function and send it back to the parent process. The parent process will then unpickle (deserialize) the return value and continue executing the main module.

There is one problem with this approach. A Jupyter Notebook is not a regular Python script but instead is a web application that runs an interactive Python kernel in the background. This means that the main module is not a regular Python script but instead a Python object. This object can’t be pickled and therefore can’t be sent to the new process.

There are four possible solutions to the problem.

Solution 1: Import the worker function from a separate module #

The first solution is to define the worker function in a separate python file and then import the worker function as a separate module. This will work because the worker function is not part of the main module and therefore can be pickled and sent to the new process. Let’s see how this works:

# worker.py
def square(x):
    return x**2

# Cell in Jupyter Notebook
from multiprocessing import Pool
from worker import square

numbers = [1, 2, 3, 4, 5]

with Pool() as pool:
    squares = pool.map(square, numbers)

print(squares)

Which will give us the following output:

[1, 4, 9, 16, 25]

Solution 2: Use a ThreadPool instead of a Pool #

Note that when using a ThreadPool python is not actually running multiple processes in parallel but instead is running multiple threads in parallel. This means that the GIL (Global Interpreter Lock) will still be in effect and therefore you will only get a speedup when the worker function is IO bound and not when it is CPU bound.

The second solution is to use a multiprocessing.pool.ThreadPool instead of a multiprocessing.Pool. This will work because a multiprocessing.pool.ThreadPool will not spawn a new process but instead will spawn a new thread. This means that the main module will not be executed again and therefore the main module does not have to be pickled and sent to the new process. Let’s see how this works:

from multiprocessing.pool import ThreadPool

def square(x):
    return x**2

numbers = [1, 2, 3, 4, 5]

with ThreadPool() as pool:
    squares = pool.map(square, numbers)

print(squares)

Which will give us the following output:

[1, 4, 9, 16, 25]

Solution 3: Use joblib’s embarrassingly parallel for loops #

joblib is a third party library that provides tools for parallel computing. You can install joblib by running:

pip install joblib

joblib provides the Parallel class which allows you to run multiple processes in parallel. We can use the delayed() function to apply a function to a list of arguments. An added benefit of using joblib is that it offers us automatich batching of tasks and it can show us the current progress. joblib uses loky as its backend for spawning processes which in turn uses cloudpickle instead of pickle.

The benefit of using joblib is that it can be used directly in a Jupyter notebook on Windows and it had a very simple API.

Let’s see how this works:

from joblib import Parallel, delayed

def square(x):
    return x**2

numbers = [1, 2, 3, 4, 5]

squares = Parallel(n_jobs=-1, verbose=1)(delayed(square)(number) for number in numbers)

print(squares)

Which will give us the following output:

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[1, 4, 9, 16, 25]
[Parallel(n_jobs=-1)]: Done   5 out of   5 | elapsed:    0.4s finished

Solution 4: Use dask for parallel computing #

dask is a third party library that provides tools for parallel computing. You can install dask by running:

pip install dask

One of these tools that dask provides is the dask.delayed decorator. This decorator will allow you to run multiple processes in parallel. dask offers automatic batching of tasks and uses, like joblib, cloudpickle under the hood and can therefore be used directly in a Jupyter notebook on Windows. Let’s see how this works:

The benefit of using dask is that if needed, you can scale up the parallel computing to a cluster of machines.

Let’s see how this works:

import dask

@dask.delayed
def square(x):
    return x**2

numbers = [1, 2, 3, 4, 5]

squares = []

for number in numbers:
    s = square(number)
    squares.append(s)

print(*dask.compute(squares))

Which will give us the following output:

[1, 4, 9, 16, 25]

Conclusion #

In this post we have seen that multiprocessing in Python has some quircks on Windows and some more in Juptyer Notebooks. We have seen that multiprocessing in Python on Windows is different from multiprocessing in Python on Linux. We have seen that there are four possible solutions to the problem. For IO bound tasks you can use multiprocessing.pool.ThreadPool to speed up your code. When dealing with CPU bound tasks you can use a multiprocessing.Pool to speed up your code as long as you import the worker function from an separate module. If you want a simpler API you can use joblib for parallelizing for loops. Lastly, if you want to scale up your parallel computing to a cluster of machines you can use dask for parallel computing.