HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Slow anagrams detection

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
detectionslowanagrams

Problem

This is my code for this question on hackerrank.
The idea is to find the amount of substrings of a given word that are anagrams of each other.

The code works fine, but it times out with longer strings. I've downloaded the test cases to make sure the result is correct and it is, it just takes too long.
I can do this in C and pass all tests, but I was wondering if there is a faster alternative in Perl.

I also tried using numbers, i.e. multiplying the ord()-97 of the characters in the two substrings, and then doing a single check on the number, to see if they are anagrams of each other.
That is much faster, but only works on small strings because it overflows quickly.

In this specific case, I'm not really that concerned about style, as much as performance. How can this be improved?

```
use strict;
use warnings;

use Data::Dumper;

sub get_all_substr {
my ($input_string) = @_;

my @results;
push @results, $input_string =~ /(?=(.{$_}))/g for 1 .. length($input_string);

return \@results;
}

sub is_anagram {
my ($string_a, $string_b) = @_;

my $len_a = length($string_a);

return 0 if (length($string_b) != $len_a);

my %vec_a;
my %vec_b;

foreach my $i (0..$len_a-1) {
$vec_a{substr($string_a, $i, 1)}++;
$vec_b{substr($string_b, $i, 1)}++;
}

return 0 if (scalar(keys(%vec_a)) != scalar(keys(%vec_b)));

foreach my $key (keys(%vec_a)) {
if (!defined($vec_b{$key}) || ($vec_a{$key} != $vec_b{$key})) {
return 0;
}
}

return 1;
}

my @results;

my $t = ;
chomp $t;

while($t--) {
my $input_string = ;
chomp $input_string;

my $count = 0;
my $strings;
$strings = get_all_substr($input_string);

my $max_size = scalar(@{$strings})-1;
foreach my $i (0..$max_size) {
foreach my $k ($i+1..$max_size) {
if (is_anagram($strings->[$k], $strings->[$i])) {
$count++;
}
}
}
push @results, $count;
}

Solution

Here's my Perl solution to this problem, for comparison. It simply iterates through every non-empty substring of the original and sorts the characters of each one. The hash %count keeps track of the count of equivalent substrings

Once the counts have been established, the problem requires that the number of possible unordered pairs be calculated for each set of equivalent substrings

Because all of N items may be paired with N-1 others we have N (N-1) possible pairs. But a pair like N0 N1 is the same as N1 N0, so we must divide that expression by two, so the number of possible unordered pairs of N items is N (N-1) / 2. That expression appears literally in the code below

use strict;
use warnings 'all';

my ($n) = split ' ', <>;

while ( <> ) {

    my @s = /[a-z]/g;

    my %count;

    for my $i ( 0 .. $#s ) {
        for my $j ( $i .. $#s ) {
            my $ss = join '', sort @s[$i .. $j];
            ++$count{$ss};
        }
    }

    my $total = 0;

    for my $n ( values %count ) {
        $total += $n * ($n-1) / 2;
    }

    print $total, "\n";
}

Code Snippets

use strict;
use warnings 'all';

my ($n) = split ' ', <>;

while ( <> ) {

    my @s = /[a-z]/g;

    my %count;

    for my $i ( 0 .. $#s ) {
        for my $j ( $i .. $#s ) {
            my $ss = join '', sort @s[$i .. $j];
            ++$count{$ss};
        }
    }

    my $total = 0;

    for my $n ( values %count ) {
        $total += $n * ($n-1) / 2;
    }

    print $total, "\n";
}

Context

StackExchange Code Review Q#139151, answer score: 2

Revisions (0)

No revisions yet.