patternpythonMinor
Visualize the runtimes of list algorithms
Viewed 0 times
visualizethealgorithmsruntimeslist
Problem
I wrote a small script to visualize the runtimes of list algorithms.
The script works by creating a list of lists with sampled lengths from 1 to
My questions are:
other type of concurrency (like asyncio)?
`from time import time
from random import randrange
from concurrent.futures import ThreadPoolExecutor
from functools import partial
from matplotlib import pyplot as plt
def func_timer(func, *params, verbose=False):
"""Takes in a function and some parameters to the function and returns the execution time"""
start = time()
func(*params)
t = time() - start
if verbose:
print('function {func_name} took {time} seconds to complete given parameters'.format(
func_name=func.__name__, time=t))
return t
def list_generator(max_size, num_samples, n=10, upper_num=100):
"""Generates random integer lists with (sampled) lengths from range 1 to max_size.
The difference between lengths (length spacing) of each list is delta, where delta is
defined as the floor division of max_size and num_samples.
max_size: int
The max length of a generated list
num_samples: int
The number of lists to generate
returns:
lst_of_rand_lsts: list of lists
"""
def get_rand_list(n):
"""Returns a list of random numbers."""
return [randrange(upper_num) for x in range(n)]
assert max_size > num_samples
delta = max_size // num_samples # uses the floor function to get difference between each sample.
lst_lengths = [delta*x for x in range(1, num_samples + 1)] # Shift range by 1 to avoid making an empy list.
return (get_rand_list(x) for x in lst_lengths)
de
The script works by creating a list of lists with sampled lengths from 1 to
max_size. Next, I map the given algorithm over each list element and replaced the list with the resulting runtime. The runtimes of each list are plotted against the length of each list to generate a runtime plot.My questions are:
- Does my code deviate from best practices?
- Could my code get a performance increase by using co-routines or some
other type of concurrency (like asyncio)?
`from time import time
from random import randrange
from concurrent.futures import ThreadPoolExecutor
from functools import partial
from matplotlib import pyplot as plt
def func_timer(func, *params, verbose=False):
"""Takes in a function and some parameters to the function and returns the execution time"""
start = time()
func(*params)
t = time() - start
if verbose:
print('function {func_name} took {time} seconds to complete given parameters'.format(
func_name=func.__name__, time=t))
return t
def list_generator(max_size, num_samples, n=10, upper_num=100):
"""Generates random integer lists with (sampled) lengths from range 1 to max_size.
The difference between lengths (length spacing) of each list is delta, where delta is
defined as the floor division of max_size and num_samples.
max_size: int
The max length of a generated list
num_samples: int
The number of lists to generate
returns:
lst_of_rand_lsts: list of lists
"""
def get_rand_list(n):
"""Returns a list of random numbers."""
return [randrange(upper_num) for x in range(n)]
assert max_size > num_samples
delta = max_size // num_samples # uses the floor function to get difference between each sample.
lst_lengths = [delta*x for x in range(1, num_samples + 1)] # Shift range by 1 to avoid making an empy list.
return (get_rand_list(x) for x in lst_lengths)
de
Solution
An even more vital question than does it deviate from best practices, is whether it actually does what you intend for it to do! Does it accurately time exection of the function in you are testing?
I believe not due to the following reasons:
-
You're setting up the time testing method within the actual time taking. You're timing on the outside of the
Lets say that test setup takes 100 ms, and your method 1 ms for 1 item, and 10 ms for 1000 ms. Your total time will vary from 101 ms to 110 ms, so you'll likely conclude that it is almost constant time, as 101 ms ~ 110 ms, but the real situation is that is has ten-doubled, 1 ms ~ 10 ms.
Best practice in Python, seems to use the timeit module for timing executions. This module can setup your runs, and execute them multiple times and eliminate some of these caveats.
Although you really gotta keep your head straight even when using this module, as timing execution depends on a lot of sub parameters like: operating system, python interpreter, other load on testing platform, memory issues, code issues, sub effects of memoisations or other programming paradigm, and list goes on.
I believe not due to the following reasons:
- Using
timeone a single run is at best inaccurate. Any code when run only once is kind of a random indicator as to how long time it'll take to do on the average. The shorter time it takes to execute, the more repeats you should have. You might also run into issues related to which current platform you are on, and imprecision in thetime.time()call, as the documentation only promises a precision of a second.
-
You're setting up the time testing method within the actual time taking. You're timing on the outside of the
runtime_generator, and do functools.partial and ThreadPoolExecutor within that function. In the best situations this only skew all of your data, in a more likely situation your method disappears in the time to set up the test run. This effect will be worse the faster your tested function are.Lets say that test setup takes 100 ms, and your method 1 ms for 1 item, and 10 ms for 1000 ms. Your total time will vary from 101 ms to 110 ms, so you'll likely conclude that it is almost constant time, as 101 ms ~ 110 ms, but the real situation is that is has ten-doubled, 1 ms ~ 10 ms.
Best practice in Python, seems to use the timeit module for timing executions. This module can setup your runs, and execute them multiple times and eliminate some of these caveats.
Although you really gotta keep your head straight even when using this module, as timing execution depends on a lot of sub parameters like: operating system, python interpreter, other load on testing platform, memory issues, code issues, sub effects of memoisations or other programming paradigm, and list goes on.
Context
StackExchange Code Review Q#110312, answer score: 3
Revisions (0)
No revisions yet.