HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Find common preamble of a list of strings

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
preamblefindliststringscommon

Problem

Think of a set of text lines starting with a common string, e.g. indented code.

my $preamble = reduce {
    my $len = min(length $a, length $b);
    --$len while substr($a, 0, $len) ne substr($b, 0, $len);
    return substr($a, 0, $len);
} @lines;


Note that after the first invocation chances are high that $a already has the final result.

I'm wondering if there is a better approach in comparing the two strings in the reduce block. The while loop does not seem to be the best approach. It's also less readable since it does not convey the intent of the code (find common preamble of $a and $b).

Update:
After feedback from amon here is an alternative:

my $preamble = reduce {
    my $len = min(length $a, length $b);
    my ($current_prefix, $string) = (substr($a, 0, $len), substr($b, 0, $len));

    while($current_prefix ne $string) {
        chop $current_prefix;
        chop $string;
    }

    return $current_prefix;
} @lines;


I think this is an improvement in both performance and readability.

Solution

First of all, I think it is very good that you used reduce, as it clearly shows how the algorithm works. At least, to a reader who understands functional idioms.

One problem in your code is that you keep using $a and $b. These two names do not convey any meaning. We could do instead:

my $prefix = reduce {
  my ($current_prefix, $string) = ($a, $b);
  ...


Then, I would shorten the $current_prefix until it is at the beginning of the other $string:

until (0 == index $string, $current_prefix) {
  chop $current_prefix;
}


I did not use the statement modifier form of until or while – I don't believe that they make code much easier to read the way a postfix if or for can do. Note that the empty string occurs at the beginning of any string, so the termination condition of that loop is safe.

If you do not like index or chop, you could equivalently use the less efficient

until ($string =~ /\A\Q$current_prefix/) {
  $current_prefix =~ s/.\z//s;
}


Either way, the concept of shortening the prefix by removing one character at a time until it fits should be easier to read than all the substringing you used.

Cutting the prefix to the length of the string would just be an optimization:

$current_prefix = substr $current_prefix, 0, length $string;


Put together, we would get the following code:

my $prefix = reduce {
  my ($current_prefix, $string) = ($a, $b);

  # the prefix cannot be longer than the string
  $current_prefix = substr $current_prefix, 0, length $string;

  # remove characters from the prefix until it occurs at the beginning.
  # "" is always a prefix, so the loop properly terminates.
  until (0 == index $string, $current_prefix) {
    chop $current_prefix;
  }

  return $current_prefix;
} @strings;


The code might be easier to understand for people who don't know reduce if you express it in the imperative form:

my ($prefix, @strings) = @original_strings;
for my $string (@strings) {
  # the prefix cannot be longer than the string
  $prefix = substr $prefix, 0, length $string;

  # remove characters from the prefix until it occurs at the beginning.
  # "" is always a prefix, so the loop properly terminates.
  until (0 == index $string, $prefix) {
    chop $prefix;
  }
}
# now $prefix is the prefix of all @original_strings


Oh look, it's shorter too!

Code Snippets

my $prefix = reduce {
  my ($current_prefix, $string) = ($a, $b);
  ...
until (0 == index $string, $current_prefix) {
  chop $current_prefix;
}
until ($string =~ /\A\Q$current_prefix/) {
  $current_prefix =~ s/.\z//s;
}
$current_prefix = substr $current_prefix, 0, length $string;
my $prefix = reduce {
  my ($current_prefix, $string) = ($a, $b);

  # the prefix cannot be longer than the string
  $current_prefix = substr $current_prefix, 0, length $string;

  # remove characters from the prefix until it occurs at the beginning.
  # "" is always a prefix, so the loop properly terminates.
  until (0 == index $string, $current_prefix) {
    chop $current_prefix;
  }

  return $current_prefix;
} @strings;

Context

StackExchange Code Review Q#37054, answer score: 3

Revisions (0)

No revisions yet.