HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaMinor

Sherlock and Anagrams

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
andanagramssherlock

Problem

Given a string S, find the number of "unordered anagrammatic pairs"
of substrings.


Input Format


First line contains T, the number of testcases. Each testcase
consists of string S in one line.


Output Format


For each testcase, print the required answer in one line.


Sample Input#00

2
abba
abcd




Sample Output#00

4
0


In the below code I am finding all the possible sub strings and then checking all the pairs of equal length to check if they are anagrams or not.

public class Solution {
    private static int N = 26;

    private static boolean isAnagram(String a, String b) {
        int []countA = new int[N];
        int []countB = new int[N];

        for (char c : a.toCharArray()) {
            countA[c - 'a']++;
        }

        for (char c : b.toCharArray()) {
            countB[c - 'a']++;
        }

        for (int i = 0; i  subsets = new ArrayList<>();
        for (int i = 0; i < length; i++) {
            for (int j = 1; j <= length - i; j++) {
                subsets.add(text.substring(i, i + j));
            }
        }

        for (int i = 0; i < subsets.size(); i++) {
            for (int j = i + 1; j < subsets.size(); j++) {
                String s1 = subsets.get(i);
                String s2 = subsets.get(j);
                if (i != j &&
                    s1.length() == s2.length() &&
                    isAnagram(s1, s2)) {
                    count++;
                }
            }
        }
        return count;
    }

    public static void main(String[] args) {
        Scanner s = new Scanner(System.in);
        int N = s.nextInt();

        for (int i = 0; i < N; i++) {
            System.out.println(getPairsCount(s.next()));
        }
    }
}


Questions:

I have to agree it took me a lot of time to find the above solution because I spend a lot of time into finding the linear algorithm. For this solution I think that it's \$\mathcal{O}(n^3)\$ solution which feels slow to

Solution

You can speed up your code by changing changing strategy. The overall goal is to determine the occurrence counts (frequencies) of all possible 'normalised' substrings of the given input, and then to use these counts for computing the number of pairings.

'Normalised' means processed in a way that gives all anagrams the same physical representation, either by sorting the substrings on character code (which is \$\mathcal{O}(N log N)\$ where \$N\$ is the length of the string to be normalised) or by frequency counting (which is \$\mathcal{O}(N)\$).

Byte-sized variables are sufficient for frequency counting since inputs are no longer than 100 characters, which means the normalised form of any string will by an array of 26 bytes. Which version is better depends on the (unknown) input distribution, but normalisation by sorting is simpler to arrange in Java than frequency counting and hence preferrable.

The number of possible substrings of a string of length \$K\$ is \$K * \frac{(K + 1)}{2}\$, normalisation was discussed above, and dictionary lookups are amortised \$O(M)\$ where \$M\$ is the length of what's being looked up. Hence overall complexity of this simple approach is quadratic, modulo a logarithmic component if normalisation is done by sorting instead of frequency counting (but that doesn't change the overall picture much), and modulo an additional linear factor if you look up variable-length substrings instead of fixed-length frequency count arrays.

Given the shortness of the inputs, this simple solution should be fast enough to be accepted. If the envelope needs to be pushed then there is no ready-made recipe. The only thing safe to say is that enumerating substrings one by one will invariably add a quadratic component to the complexity.

Also, the constant factor dropped by the asymptotic can be several orders of magnitude depending on how smart you go about enumerating the normalised forms of all substrings. For example, you could enumerate all substrings starting at a given position by starting with the character at that position, adding it to the map, inserting the next character in sorted order, adding that to the map, and so on. That would make the cost of a single normalisation \$\mathcal{O}(log N)\$ instead of the \$\mathcal{O}(N log N)\$ for taking a substring and sorting it, or \$\mathcal{O}(1)\$ if frequency counting is used. A similar strategy can be used with frequency counts. This incremental approach is what makes the overall complexity quadratic instead of cubic (but beware of var-length strings as discussed above).

Also, the are data structures like tries that can handle operations on substrings in linear time instead of quadratic, but the algorithms get really involved. You really need to know your basic data structures and algorithms inside out in order to see whether they can help you with a given problem. I'm not certain whether Sherlock and Anagrams can be solved with less than quadratic complexity without resorting to tries.

However, as a general strategy you should take the problem and look for ways of not doing work, to strip the problem down to its irreducible core of work that cannot possibly be avoided. And remember that counting things can be easier computationally speaking than enumerating them and incrementing a count. E.g. if you know that there are \$K\$ somethings then you can compute the number of their possible pairings in \$\mathcal{O}(1)\$ with a bit of combinatorics, instead of enumerating and counting at a cost of \$\mathcal{O}(K^2)\$. Another classic example is the length of the longest common subsequence of two strings which can be computed in \$\mathcal{O}(N^2)\$ even though the number of possible common subsequences is exponential.

Context

StackExchange Code Review Q#151229, answer score: 8

Revisions (0)

No revisions yet.