Table of Contents
3 ways:
asyncio:
- primarily for I/O bound tasks (eg reading/writing to disk, network, etc)
- cooporative pausing/waiting
threading
- non-cooperative pausing/waiting
- good for I/O bound tasks
- good to do long running tasks in the background
Global Interpreter Lock (GIL) - only one thread can execute at a time.
multiprocessing
primarily for CPU bound tasks
code
from multiprocessing import Pool
from multiprocessing.dummy import Pool as ThreadPool
filenames = ['file1.txt', 'file2.txt', 'file3.txt']
with Pool(processes=25) as pool:
results = pool.imap(etl, filenames)
for filename, duration in results:
print(f"{filename} took {duration:.2f}s")
concurrent
code
Working code:
from concurrent.futures import ThreadPoolExecutor, as_completed
def process_data_in_parallel(MY_DATA_LIST, workers=3): # change workers numbers as neede
with ThreadPoolExecutor(max_workers=workers) as executor:
futures = {executor.submit(MY_FUNCTION, x): x for x in MY_DATA_LIST}
for future in as_completed(futures):
result = future.result()
# Update the SQLite table immediately after processing each worker
if not test:
update_record(DB_TWITTER, table, result)
else:
print(f"ℹ️ Test mode: not updating SQLite table with {result}")