Python Parallel Processing with tqdm

zhangyuting
2 min readJun 2, 2020
Photo by ArtisanalPhoto on Unsplash

It’s important to monitor the progress of a parallel processing task. A progress bar will be helpful in this case. tqdm is an excellent tool to show a progress bar in python and it’s widely adopted in the machine learning area.

In this article, I will use python's new module concurrent.futures to have a parallel task with process or thread. In addition, multiple approaches to use tqdm will be shown.

concurrent.futures

New in python 3.2, The concurrent.futures module provides a high-level interface for asynchronously executing callables.

The asynchronous execution can be performed with threads, using ThreadPoolExecutor, or separate processes, using ProcessPoolExecutor. Both implement the same interface, which is defined by the abstract Executor class.

In my opinion, the python parallel with the executor will be more elegant. I will show you several examples later.

ThreadPoolExecutor

import concurrent.futures
import urllib.request

URLS = ['http://www.foxnews.com/',
'http://www.cnn.com/',
'http://europe.wsj.com/',
'http://www.bbc.co.uk/',
'http://some-made-up-domain.com/']

# Retrieve a single page and report the URL and contents
def load_url(url, timeout):
with urllib.request.urlopen(url, timeout=timeout) as conn:
return conn.read()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
# Start the load operations and mark each future with its URL
future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
try:
data = future.result()
except Exception as exc:
print('%r generated an exception: %s' % (url, exc))
else:
print('%r page is %d bytes' % (url, len(data)))

--

--