HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Describing the asymtotic complexity of processing two-dimensional data structures

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
thedescribingtwoasymtoticdimensionalprocessingdatastructurescomplexity

Problem

Consider the following Python function:

def count_chars(list_of_strings):
  chars = 0
  for s in list_of_strings:
    for c in s:  # intentionally avoiding len() here for the sake of example
      chars += 1;
  return chars


Clearly this is a linear operation, since we iterate over every element/character once. But I've often seen the Big-O of this function as $O(n*k)$ where $n$ is the length of the list, and $k$ is the length of the longest string. In a sense, that's polynomial or quadratic (at a minimum, I've heard people refer to this type of algorithm as such).

What is the appropriate way to refer to this type of function? Is there an intuitive way to understand this is linear, despite multiple variables? In particular, how can we discuss that this is worse than a solution that uses len() (which is $O(1)$) without getting hung up on the fact that, in practice, both are linear?

Context: I'm posing an interview question that, while more complex than this example, involves identifying and eliminating $k$. I've had candidates struggle to see that and it's felt like a communication issue more than a understanding problem.

I'm looking for ways to help guide a discussion, and to recognize that while this is linear, it is still asymptotically worse than an operation that avoids hitting each character individually.

Solution

The solution is that you shouldn't just ask whether it is "linear" -- ask what variable it is "linear" in.

Your example code has running time that is linear in the total length of the input strings. There's a long history of characterizing the running time of an algorithm as a function of the length of the input to the algorithm, and that's an entirely valid thing to do.

It's also true that its running time is $O(nk)$ where $n$ is the number of strings and $k$ is the length of the longest string. This might be a looser bound.

Both statements are true, and don't contradict each other. It's just a matter of identifying which upper bound on the running time is more useful.

Context

StackExchange Computer Science Q#49598, answer score: 6

Revisions (0)

No revisions yet.