patternjavaMinor
Counting all substrings with exactly k distinct characters
Viewed 0 times
distinctsubstringscountingallwithcharactersexactly
Problem
Problem Statement : Given a string of lowercase alphabets, count all possible substrings (not necessarily distinct) that has exactly k distinct characters. Example:
Input: abc, k = 2 Output: 2 Possible substrings are {"ab", "bc"}
I have written the solution with a two pointer approach. I am not sure how to calculate the time complexity of the program?
According to me complexity should be O(n*k)
Output :
Input: abc, k = 2 Output: 2 Possible substrings are {"ab", "bc"}
I have written the solution with a two pointer approach. I am not sure how to calculate the time complexity of the program?
According to me complexity should be O(n*k)
public static void main(String[] args)
{
String s = "abacuusttlvbnc";
int k=3;
char[] sArr = s.toCharArray();
int strLen = sArr.length;
Set set = new LinkedHashSet();
int l=0;
int r=0;
while(l<=strLen-k){ // will run (arrayLength - k) times
for(int i=0;i<k;i++){ // will run k times for every while iteration
set.add(sArr[l]);
l++;
}
if(set.size()==k){
System.out.println("substring : " + set);
}
set.clear(); // O(k) for every while iteration
r++;
l= r;
}
}
}Output :
substring : [b, a, c]
substring : [a, c, u]
substring : [u, s, t]
substring : [t, l, v]
substring : [l, v, b]
substring : [v, b, n]
substring : [b, n, c]Solution
Consider
This is a simpler version of what you wrote. It gets rid of your
I also changed the names of
I moved the code into its own method so that it can be called multiple times.
There are two reasons to move the declaration of
Bug
Unfortunately, both this version and your original do not match the linked problem statement:
Input: aba, k = 2
Output: 3
Possible substrings are {"ab", "ba", "aba"}
Input: aa, k = 1
Output: 3
Possible substrings are {"a", "a", "aa"}
They only find two solutions for each of these, as they stop counting once there are
Complexity
Calling the time complexity \$\mathcal{O}(n\cdot k)\$ is reasonable. It's slightly more accurate to say that it is \$\mathcal{O}(n\cdot k - k^2)\$, but \$k\$ is never larger than \$n\$ so it's reasonable to view this as \$\mathcal{O}(n\cdot k)\$. Note that if you fix the algorithm, it would be \$\mathcal{O}(n^2)\$, as the worst case is an input of all the same character. That gives substrings up to length \$n\$ where this is limited to length \$k\$.
public static void findSubstringsWithKDistinctCharacters(String s, int k) {
char[] letters = s.toCharArray();
for (int i = 0, n = letters.length - k; i uniques = new LinkedHashSet();
for (int j = i, m = i + k; j < m; j++) {
uniques.add(letters[j]);
}
if (uniques.size() == k) {
System.out.println("substring : " + uniques);
}
}
}This is a simpler version of what you wrote. It gets rid of your
r variable entirely, as it is unnecessary. I also changed the names of
sArr and set to things that I find more descriptive. I moved the code into its own method so that it can be called multiple times.
There are two reasons to move the declaration of
uniques into the loop. One, this is less code. Two, if you changed this code to produce a list of results rather than print the results, the other version doesn't work. You'd have multiple copies of whatever the last set was rather than unique copies. I'd only use the clear version if I knew that performance of this method was a bottleneck. Bug
Unfortunately, both this version and your original do not match the linked problem statement:
Input: aba, k = 2
Output: 3
Possible substrings are {"ab", "ba", "aba"}
Input: aa, k = 1
Output: 3
Possible substrings are {"a", "a", "aa"}
They only find two solutions for each of these, as they stop counting once there are
k characters (distinct or not) in the substring. They should keep going until they've verified that the next character isn't a duplicate of a character already in the substring. Complexity
Calling the time complexity \$\mathcal{O}(n\cdot k)\$ is reasonable. It's slightly more accurate to say that it is \$\mathcal{O}(n\cdot k - k^2)\$, but \$k\$ is never larger than \$n\$ so it's reasonable to view this as \$\mathcal{O}(n\cdot k)\$. Note that if you fix the algorithm, it would be \$\mathcal{O}(n^2)\$, as the worst case is an input of all the same character. That gives substrings up to length \$n\$ where this is limited to length \$k\$.
Code Snippets
public static void findSubstringsWithKDistinctCharacters(String s, int k) {
char[] letters = s.toCharArray();
for (int i = 0, n = letters.length - k; i <= n; i++) {
Set<Character> uniques = new LinkedHashSet<Character>();
for (int j = i, m = i + k; j < m; j++) {
uniques.add(letters[j]);
}
if (uniques.size() == k) {
System.out.println("substring : " + uniques);
}
}
}Context
StackExchange Code Review Q#137842, answer score: 4
Revisions (0)
No revisions yet.