Python — How to measure thread execution time in multithreaded application?

K-ashif

Published in

The Startup

6 min readJun 22, 2020

In this piece we will see how to measure thread execution time in a multithreaded python program.

Problem

Let’s start with an anecdote — I was working on a python project and it had lot of legacy code (libraries/methods for handling different tasks), this application spawns lot of threads for many IO bound tasks and the time it takes to complete these tasks depends on various variable factors like the platform application runs on, network latency, API response time, disk IO etc. We observed that the time taken for application to run was quite random and as we scaled and supported multiple platforms we started seeing more time drifts. Though we already knew the factors causing these time drifts but it was essential for us to quantify the time taken by application(piece by piece), so that we can identify the poor performing parts of code. To accomplish this, we had to have the data of all different pieces of the program. As we knew, major chunk of the application were these ‘n’ no. of threads application instantiate for performing various IO bound tasks hence to tune application performance we need the the numbers. In this post, i will share ‘decorator’ based approach which can be utilized to measure the thread execution time your programs.

We will simplify this problem so that we can understand it with an example, below is an example(simpler version) of similar problem, a python program which spawns 10 worker threads to perform some tasks. We will use python’s time.sleep() method to add random delay to our workers as an input to simulate real world example. We will also measure program’s overall execution time(do checkout timeit module approach as well on measure program’s execution time).

import threading
import time# Capture program start time
start_time = time.perf_counter()

thread_pool = list()                  # container to hold threads
total_delay = 0# container with random int (5, 20)
delay_index = [8, 15, 18, 12, 19, 7, 14, 8, 16, 16]
# A worker method
def worker(delay: int) -> None:
    # Doing something
    time.sleep(delay)
    print(f"Worker_{delay} took {delay} secs to complete")
# Create worker threads and add them to thread_pool
for i in range(len(delay_index)):
    _delay = delay_index[i]
    total_delay += _delay
    t = threading.Thread(target=worker,, args(_delay,),name=f'Worker_{_delay}'))    thread_pool.append(t)
# start worker threads
for _thread in thread_pool:
    _thread.start()             
    print(f'--- Started {_thread.name}')
# join the threads
for _thread in thread_pool:
    _thread.join()               
    print(f'--- Completed {_thread.name}')
    print(f'{_thread.name} took {_thread.thread_duration} secs ')
    duration_from_decorator += _thread.thread_duration
# Capture program execution time
end_time = time.perf_counter()
execution_time = end_time - start_time# print stats
print(f'Total execution time: {execution_time} secs')
print(f'Total no of threads: {len(thread_pool)}')
print(f'Average total time: {execution_time / len(thread_pool)}')

Above was our sample program, let‘s’ unpack what is happening:

worker: This method will simulates program’s IO bound tasks, it will complete the task after delay secs (delay is an input to the worker method).
delay_index : I have used random.randint(5, 20) to generate random integers, this would be the input to worker method (we will use same input later as well).
Then we instantiated 10 worker thread objects using threading module, we assigned worker method to target and then appended instantiated thread objects thread_pool list.
Later we used thread_pool to get thread objects, we start them and wait till they are done.
You’ll notice, we are also measuring program execution_time along with some print statements.

Now, lets run the program and check the output below:

You’ll notice we have ran the program 3 times (Run-1, Run-2, Run-3) with same input to worker method i.e. delay_index = [8, 15, 18, 12, 19, 7, 14, 8, 16, 16].
Note that theAverage time for each thread comes around ~ 1.900 secs. We use following formula to calculate Average time (total execution time / no of threads).
Total time program takes to run is around ~19.00 secs despite using multihreading and thats because we have delay of 19 secs at delay_index[4].

# Run-1
Total execution time: 19.004492659 secs
Total no of threads: 10
Average time: 1.9004492659# Run-2
Total execution time: 19.005527918000002 secs
Total no of threads: 10
Average time: 1.9005527918000003# Run-3
Total execution time: 19.004814707 secs
Total no of threads: 10
Average time: 1.9004814707000002

Solution : using decorators

Now, we will write a decorator for the program which will measure thread execution time. Decorator will take thread as an input function and in wrapper func we will calculate the execution time using python time module . We will then use this time decorator in our program to wrap worker method to measure its execution time and at the end we will take a look at the results.

Note following statement threading.current_thread().thread_duration = end — start, we used start = time.perf_counter() and end = time.perf_counter() to calculate the thread execution time and then assigned it to thread object as thread_duration .

import threading
from functools import wrapsdef thread_time_decorator(thread):
    @wraps(thread)                 # Use wraps to preserve metadata
    def wrapper(*args, **kwargs):
        start = time.perf_counter()
        thread(*args, **kwargs)
        end = time.perf_counter()
        threading.current_thread().thread_duration = end - start
    return wrapper

Final Program with Time decorator

Below is the program with thread_time_decorator:

import threading
import time
from functools import wraps
import pprint
import json# Capture program start time
start_time = time.perf_counter()

thread_pool = list()                  # container to hold threads
total_delay = 0
thread_time_map = dict()
duration_from_decorator = 0


# container with random int (5, 20)
delay_index = [8, 15, 18, 12, 19, 7, 14, 8, 16, 16]


def thread_time_decorator(thread):
    @wraps(thread)
    def wrapper(*args, **kwargs):
        start = time.perf_counter()
        thread(*args, **kwargs)
        end = time.perf_counter()
        threading.current_thread().thread_duration = end - start
    return wrapper


def worker(delay: int) -> None:
    # Do something
    time.sleep(delay)  # int -> emulates time for random tasks
    print(f"Worker_{delay} took {delay} secs to complete")


for i in range(len(delay_index)):
    _delay = delay_index[i]
    total_delay += _delay
    wrapped_worker = thread_time_decorator(worker)
    t = threading.Thread(target=wrapped_worker, args=(_delay,), name=f'Worker_{_delay}')
    thread_pool.append(t)

for _thread in thread_pool:
    _thread.start()
    print(f'--- Started {_thread.name}')

for _thread in thread_pool:
    _thread.join()
    print(f'--- Completed {_thread.name}')
    print(f'{_thread.name} took {_thread.thread_duration} secs ')
    duration_from_decorator += _thread.thread_duration
    thread_time_map[_thread.name] = _thread.thread_durationthread_time_map['Total sum'] = sum(thread_time_map.values())
thread_time_map['Total No of threads'] = len(thread_pool)
thread_time_map['Average thread time'] = thread_time_map['Total No of threads'] / len(thread_pool)# dump stats to file
with open('/tmp/program_stats.json', 'w') as file_obj:
    time.sleep(4)
    json.dump(thread_time_map, file_obj, indent=4)
# Capture program end time
end_time = time.perf_counter()
execution_time = end_time - start_time


print(f'Total execution time: {execution_time} secs')
print(f'Total no of threads: {len(thread_pool)}')
print(f'Average time: {execution_time / len(thread_pool)}')
print(f'Decorated Threads total duration: {duration_from_decorator}')
print(f'Decorated Average: {duration_from_decorator / len(thread_pool)}')
pprint.pprint(thread_time_map)

Lets unpack again:

We have added thread_time_decorator to our program, we decorated worker method with following statement wrapped_worker = thread_time_decorator(worker). We will now be using wrapped_worker method instead of worker method.
We also have thread_time_map = dict() to hold worker thread name and its execution time values which we will print at the end of program execution.
We also added intentional sleep time of 4 secs during file write operation. Additional delay of 4 secs will be reflected in output.

Output

Below we can see the output of program after including thread_time_decorator:

Input delay_index: [8, 15, 18, 12, 19, 7, 14, 8, 16, 16]
Total execution time: 23.007319214 secs
Total no of threads: 10
Average time: 2.3007319213999997
Decorated Threads total duration: 109.014385562
{   
    'Average thread time': 10.9014385562,
    'Total No of threads': 10,
    'Total thread sum': 109.014385562,
    'Worker_12': 12.002929894,
    'Worker_14': 14.000238368000002,
    'Worker_15': 15.001009709,
    'Worker_16': 16.002071014,
    'Worker_18': 18.003824902999998,
    'Worker_19': 19.000254499,
    'Worker_7': 7.000348325,
    'Worker_8': 8.00370885
}# Thread stats from program_stats.json$ cat /tmp/program_stats.json 
{
    "Worker_8": 8.002169541,
    "Worker_15": 15.004676658000001,
    "Worker_18": 18.002442814000002,
    "Worker_12": 12.001904778,
    "Worker_19": 19.005101655999997,
    "Worker_7": 7.004905184,
    "Worker_14": 14.002090611,
    "Worker_16": 16.004202229,
    "Total sum": 109.02749347099999,
    "Total No of threads": 10,
    "Average thread time": 10.902749347099999
}

First line prints input i.e. delay_index: [8, 15, 18, 12, 19, 7, 14, 8, 16, 16] (same as of previous example).
Next line shows Total execution time (time program took to run) , it has increased to +4 secs due to time.sleep(4).
Average Time also has increased to .4 secs thats again due to additional time.sleep(4) (check out in code Decorated Threads total duration was calculated).
Next we print thread_time_map which has thread execution time per thread along with some additional fields that could be useful at scale.
At the end, we validated contents of/tmp/program_stats.json file.

Note: This was a simpler version of the problem and its solution. Similar generic approach can be build for timing threads in other programs.

Python — How to measure thread execution time in multithreaded application?

Problem

Solution : using decorators

Final Program with Time decorator

Output

Written by K-ashif