patternjavaMinor
Finding longest substring containing k distinct characters
Viewed 0 times
containingdistinctsubstringfindingcharacterslongest
Problem
Below is my method to find the longest substring having
I have used
The maximum length of input string can be up to million characters. The above method works fine for smaller strings but slows down if input string is large. How can I tweak my logic to improve running time for large strings?
A few test cases:
Input:
Input:
k distinct characters.I have used
Set to keep track of distinct characters.public static int maxLengthSubstring(String str,int k) {
int i=0;
int j=i;
int max=0;
//use set to keep track of distinct characters
Set set=new TreeSet<>();
char arr[] = str.toCharArray();
int len = arr.length;
while(ik){
int substrLen=Math.abs(i-j);
if(substrLen>max){
//update the length of maximum substring found
max=substrLen;
}
//move to the next character in the array
i=i+1;
j=i;
//clear the set containing previously found distinct characters
set.clear();
continue;
}
//if set contains the exact number for distinct characters
//then store the length of the substring and move to next iteration
if(set.size()==k){
int substrLen = Math.abs(i-(j+1));
if(substrLen>max){
//update the length of maximum substring found
max=substrLen;
}
}
j++;
}
return max;
}The maximum length of input string can be up to million characters. The above method works fine for smaller strings but slows down if input string is large. How can I tweak my logic to improve running time for large strings?
A few test cases:
Input:
str="zxzxzxzxzx cvcvcvcvcvcvcvcvcv"
k=2
Output=18
Input:
str="aaaaaaaa abcabc aaaabbbbaaaabbbbbbabaa"
k=3
Output=23 (longest substring in this case is " aaaabbbbaaaabbbbbbabaa" including space character)
Solution
//use set to keep track of distinct characters
Set set=new TreeSet<>();Use spaces, they're free. According to conventions, there should be spaces around each operator. If you want to use them more sparingly, it's your choice, but think about how they separate things. You code looks like composed of three things
Set set=new TreeSet<>();which is rather confusing.
You write "keep track of distinct characters" and declare
Set. Why not Set?Don't use
TreeSet unless you need it. HashSet is faster.Don't call it
set, when you can call it e.g. foundCharacters.char arr[] = str.toCharArray();Naming.
while(i<len && j<len){Spacing. Always prefer "for" loops as they're much clearer. They also allow you to declare variables in a smaller scope. I'd go for
for (int i=0, j=0; i<len && j<len; ++j) ...(add spaces to adhere to conventions), but you're modifying
j somewhere in the middle (which is unexpected with a "for" loop). So feel free to stick with "while".int substrLen=Math.abs(i-j);This looks like you lost track of what your variables do. I guess,
j>=i always holds, but you should know. You may want to call them start and end.int substrLen = Math.abs(i-(j+1));This makes me suspect. Why do you need a different expression for computing the same thing?
Optimizations
How can I tweak my logic to improve running time for large strings?
What the problem? You run through the string until you find out that they're too many characters. Then you throw away all information collected so far and start over at the next index. This implies a complexity of \$O(n^2)\$, which is way too slow for \$n=10^6\$.
When you find out that you collected too many distinct characters, you could move the start index until their number gets smaller again. For this, a
Set is insufficient, as you have to know, when you get rid of the last occurrence of a given character.So instead of remembering what characters you ran across, you need to count their occurrences. A
Map would do. Guava Multiset` would be even better as a multiset is just like a set but allowing duplicates.
But the best solution is an array. As there are just \$2^{16}\$ characters (let's ignore surrogates), you can keep track of them in new int[Character.MAX_VALUE+1]`. This gives you a nice speed bonus of maybe factor 10 (it's just a bonus, it would not help you with the quadratic complexity and it's most probably not needed with the new algorithm).Just a snippet to get you started:
private int distinctChars;
private int[] counts = new int[Character.MAX_VALUE+1];
private void addCharacter(char c) {
if (counts[c] == 0) {
distinctChars++;
}
counts[c]++;
}
private void removeCharacter(char c) {...}
public int maximumLength(String str, int limit) {
...
for (int i=0, j=0; j<len; ) {
if (distinctChars <= limit) {
max = Math.max(max, j-i);
addCharacter(str[j++]);
} else {
removeCharacter(str[i++]);
}
}
...
}Code Snippets
//use set to keep track of distinct characters
Set<Integer> set=new TreeSet<>();char arr[] = str.toCharArray();while(i<len && j<len){for (int i=0, j=0; i<len && j<len; ++j) ...int substrLen=Math.abs(i-j);Context
StackExchange Code Review Q#96632, answer score: 5
Revisions (0)
No revisions yet.