HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaMinor

Ashton and String Hackerrank

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
andstringhackerrankashton

Problem

Online coding challenge Hacker Rank.

I tried to solve it using the naive appraoch first but its failing on some of the inputs and rest its getting timed out. How to get started in problem like these so that I could reduce the time complexity involved?

public class Solution {
    private static Set substrings(String text) {
        int length = text.length();
        Set set = new TreeSet<>();

        for (int i = 0; i < length; i++) {
            for (int j = i + 1; j <= length; j++) {
                set.add(text.substring(i, j));
            }
        }
        return set;
    }

    public static void main(String[] args) {
        Scanner scanner =  new Scanner(System.in);
        int T = scanner.nextInt();
        String s = scanner.next();
        int K = scanner.nextInt();
        StringBuilder sb = new StringBuilder();
        for (String sub : substrings(s)) {
            sb.append(sub);
        }
        System.out.println(sb.charAt(K - 1));
    }
}

Solution

###Program only handles one testcase###

Your program reads in T but doesn't iterate T times, so it only handles the first test case. This could be the cause of the "wrong result" error.

###Memory limit exceeded###

Your program solves the problem correctly, but it uses \$O(n^3)\$ memory, so on large inputs it will run out of memory (possibly giving you a wrong answer instead of a timeout). For a string of size 100000 (which is the maximum according to the problem description), the total length of all substrings is: 166671666700000, or \$1.7 * 10^{14}\$, which would require 170 terabytes of memory.

###Solution using: Suffix array + LCP array###

In the problem description, you will notice on the "Need Help?" section on the right hand side that there are links to two topics:

  • Suffix array



  • LCP array



You probably will want to read those links later, because they will give useful information on how to generate the two arrays in the most efficient way possible. But for now, I will explain how, if you have already generated the suffix array and LCP array, you can solve the rest of the problem in \$O(n)\$ time and no additional space other than the \$O(n)\$ space used for the two arrays.

###Definitions###

First, some quick definitions of my own (if you haven't read through those above links yet):

-
The suffix array for a string of length \$n\$ is just a sorted array of the \$n\$ suffixes of the string. To save space, the array contains integers instead of strings, where the integer is the index of the starting character of the suffix in the original string. For example, if suffix[3] = 5, that is equivalent to suffix[3] = original_string.substring(5).

-
The LCP array holds the length of the longest common prefix between two successive strings of the suffix array. For example, if suffix[5] = "abcd" and suffix[6] = "abyz", then LCP[6] = 2 because the two strings have a common prefix of length 2.

###Generating substrings in order###

The key to finding the kth character in the concatenated substrings is that you can generate all substrings in lexicographical order by iterating across the suffix array. Let's take an example of the string "abcd". The suffix array for this string is:

suffix[0] = "abcd"
suffix[1] = "bcd"
suffix[2] = "cd"
suffix[3] = "d"


If you take each possible prefix string of each possible suffix string, you will have generated all possible substrings of the original string. For example, if you take suffix[0] = "abcd", there are four possible prefix strings:

"a", "ab", "abc", "abcd"


If you take suffix[1] = "bcd", there are three possible prefix strings:

"b", "bc", "bcd"


Now, if you notice, the order in which I just generated the substrings are the first 7 substrings of the original string in correct lexicographical order. If you continue through the suffix array like this, all of the strings you generate will be in the order that you are supposed to concatenate to the final string (except for duplicate substrings, which I will talk about later).

You will also notice that for a particular length suffix, it is easy to compute how long all of its substrings will be. For example, if the length of the suffix is 4 (such as "abcd"), then the length of the substrings it generates is 1 + 2 + 3 + 4. In general, if a suffix has length n, then the length of the substrings it generates is n * (n+1) / 2.

###Solving the problem (no duplicate substrings)###

This gives us the following code to find the kth character, assuming that there are no duplicate substrings. For each suffix in the suffix array, we find out if K falls within the substrings for that suffix. If it does, we find out which substring it is and print out the required character. If it does not, we quickly move on to the next suffix.

int     n         = s.length();
    Integer [] suffix = GenerateSuffixArray(s);

    for (int i=0;i count) {
            K -= count;
            continue;
        }

        // K must be found within this suffix.  Now iterate through its
        // substring to find the exact substring and character.
        for (long j=1;j<=len;j++) {
            if (K <= j) {
                System.out.println(s.charAt((int) (suffix[i] + j - 1)));
                break;
            }
            K -= j;
        }
        break;
    }


###Handling duplicate substrings###

Duplicate substrings are handled by skipping past them. For example, suppose you had two suffixes in your suffix array like this:

suffix[0] = "abcd"
suffix[1] = "abxyabcd"


Notice that the substrings generated would be:

"a", "ab", "abc", "abcd"
"a", "ab", "abx", "abxy", "abxya", "abxyab", "abxyabc", "abxyabcd"


Because there is a common prefix of length 2, the first 2 substrings generated are the same. So when we handle the second suffix, we should skip those because they were already handled previously.

###Sample solution###

Here is a sample solution

Code Snippets

suffix[0] = "abcd"
suffix[1] = "bcd"
suffix[2] = "cd"
suffix[3] = "d"
"a", "ab", "abc", "abcd"
"b", "bc", "bcd"
int     n         = s.length();
    Integer [] suffix = GenerateSuffixArray(s);

    for (int i=0;i<n;i++) {
        long len   = n - suffix[i];
        long count = len * (len + 1) / 2;

        // If K is not found within the substrings generated by this
        // suffix, then go on to the next suffix.
        if (K > count) {
            K -= count;
            continue;
        }

        // K must be found within this suffix.  Now iterate through its
        // substring to find the exact substring and character.
        for (long j=1;j<=len;j++) {
            if (K <= j) {
                System.out.println(s.charAt((int) (suffix[i] + j - 1)));
                break;
            }
            K -= j;
        }
        break;
    }
suffix[0] = "abcd"
suffix[1] = "abxyabcd"

Context

StackExchange Code Review Q#152062, answer score: 4

Revisions (0)

No revisions yet.