HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaMinor

Counting all substrings with exactly k distinct characters

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
distinctsubstringscountingallwithcharactersexactly

Problem

Problem Statement : Given a string of lowercase alphabets, count all possible substrings (not necessarily distinct) that has exactly k distinct characters. Example:

Input: abc, k = 2 Output: 2 Possible substrings are {"ab", "bc"}

I have written the solution with a two pointer approach. I am not sure how to calculate the time complexity of the program?

According to me complexity should be O(n*k)

public static void main(String[] args)
    {
    String s = "abacuusttlvbnc";
    int k=3;
    char[] sArr = s.toCharArray();
    int strLen = sArr.length;
    Set set = new LinkedHashSet();
    int l=0;
    int r=0;

    while(l<=strLen-k){  // will run (arrayLength - k) times
        for(int i=0;i<k;i++){   // will run k times for every while iteration
            set.add(sArr[l]);
            l++;
        }
        if(set.size()==k){
            System.out.println("substring : " + set);
        }
        set.clear(); // O(k) for every while iteration
        r++;
        l= r;
     }
  }
}


Output :

substring : [b, a, c]
substring : [a, c, u]
substring : [u, s, t]
substring : [t, l, v]
substring : [l, v, b]
substring : [v, b, n]
substring : [b, n, c]

Solution

Consider

public static void findSubstringsWithKDistinctCharacters(String s, int k) {
    char[] letters = s.toCharArray();

    for (int i = 0, n = letters.length - k; i  uniques = new LinkedHashSet();

        for (int j = i, m = i + k; j < m; j++) {
            uniques.add(letters[j]);
        }

        if (uniques.size() == k) {
            System.out.println("substring : " + uniques);
        }
    }
}


This is a simpler version of what you wrote. It gets rid of your r variable entirely, as it is unnecessary.

I also changed the names of sArr and set to things that I find more descriptive.

I moved the code into its own method so that it can be called multiple times.

There are two reasons to move the declaration of uniques into the loop. One, this is less code. Two, if you changed this code to produce a list of results rather than print the results, the other version doesn't work. You'd have multiple copies of whatever the last set was rather than unique copies. I'd only use the clear version if I knew that performance of this method was a bottleneck.

Bug

Unfortunately, both this version and your original do not match the linked problem statement:


Input: aba, k = 2

Output: 3

Possible substrings are {"ab", "ba", "aba"}


Input: aa, k = 1

Output: 3

Possible substrings are {"a", "a", "aa"}

They only find two solutions for each of these, as they stop counting once there are k characters (distinct or not) in the substring. They should keep going until they've verified that the next character isn't a duplicate of a character already in the substring.

Complexity

Calling the time complexity \$\mathcal{O}(n\cdot k)\$ is reasonable. It's slightly more accurate to say that it is \$\mathcal{O}(n\cdot k - k^2)\$, but \$k\$ is never larger than \$n\$ so it's reasonable to view this as \$\mathcal{O}(n\cdot k)\$. Note that if you fix the algorithm, it would be \$\mathcal{O}(n^2)\$, as the worst case is an input of all the same character. That gives substrings up to length \$n\$ where this is limited to length \$k\$.

Code Snippets

public static void findSubstringsWithKDistinctCharacters(String s, int k) {
    char[] letters = s.toCharArray();

    for (int i = 0, n = letters.length - k; i <= n; i++) {
        Set<Character> uniques = new LinkedHashSet<Character>();

        for (int j = i, m = i + k; j < m; j++) {
            uniques.add(letters[j]);
        }

        if (uniques.size() == k) {
            System.out.println("substring : " + uniques);
        }
    }
}

Context

StackExchange Code Review Q#137842, answer score: 4

Revisions (0)

No revisions yet.