Multiprocessing with Python

get those cores working!

3 ways:

asyncio:

  • primarily for I/O bound tasks (eg reading/writing to disk, network, etc)
  • cooporative pausing/waiting

threading

  • non-cooperative pausing/waiting
  • good for I/O bound tasks
  • good to do long running tasks in the background

Global Interpreter Lock (GIL) - only one thread can execute at a time.

multiprocessing

primarily for CPU bound tasks

code

from multiprocessing import Pool
from multiprocessing.dummy import Pool as ThreadPool

filenames = ['file1.txt', 'file2.txt', 'file3.txt']

with Pool(processes=25) as pool:
    results = pool.imap(etl, filenames)

    for filename, duration in results:
        print(f"{filename} took {duration:.2f}s")

concurrent

code

Working code:

from concurrent.futures import ThreadPoolExecutor, as_completed

def process_data_in_parallel(MY_DATA_LIST, workers=3): # change workers numbers as neede

    with ThreadPoolExecutor(max_workers=workers) as executor:

        futures = {executor.submit(MY_FUNCTION, x): x for x in MY_DATA_LIST} 

        for future in as_completed(futures):

            result = future.result()            
            # Update the SQLite table immediately after processing each worker
            if not test:
                update_record(DB_TWITTER, table, result)
            else:
                print(f"ℹ️ Test mode: not updating SQLite table with {result}")

links

social