patternMinor
Counting substrings with a given number of different characters in O(N)
Viewed 0 times
substringsnumbercountingwithdifferentcharactersgiven
Problem
Given a string $S$ of length $n$, and a number $k$, count the number of substrings (regardless of their length) that contain exactly $k$ different characters.
The obvious solution takes $O(n^2)$ time (fix substring start point, and move its end point while keeping track of the set of seen characters).
Is there a way to do that in $O(n)$ time? Or at least in $O(n c)$ time, where $c$ is the number of unique characters in the string $S$?
I tried to keep track of two pointers (start and end) and move only one of them forward at each step, while updating the hash table with character counts. But counting substrings became really messy.
The obvious solution takes $O(n^2)$ time (fix substring start point, and move its end point while keeping track of the set of seen characters).
Is there a way to do that in $O(n)$ time? Or at least in $O(n c)$ time, where $c$ is the number of unique characters in the string $S$?
I tried to keep track of two pointers (start and end) and move only one of them forward at each step, while updating the hash table with character counts. But counting substrings became really messy.
Solution
You can solve this in $O(n)$ time using two (well, three) pointers that both move leftward.
Let $S$ be the string. We'll let $i$ range from $n$ down to $1$, and for each value of $i$, we're going to count the number of substrings that start at position $i$.
For each $i$, find the smallest $j_\text{min} \ge i$ such that $S[i..j_\text{min}]$ has exactly $k$ unique characters as well as the largest $j_\text{max} \ge i$ such that $S[i..j_\text{max}]$ has exactly $k$ unique characters. You can do this efficiently, by keeping an array (or hashtable) with character counts for $S[i..j_\text{min}]$ and one with character counts for $S[i..j_\text{max}]$; you'll update these each time you decrement $i$, $j_\text{min}$, or $j_\text{max}$. (In addition to the array/hashtable, also keep track of the number of of characters with non-zero count, and update this each time you update the array or hashtable.) When you decrement $i$, you can update the array/hashtable with character counts for $S[i..j_\text{min}]$, then use that to see whether you need to decrease $j_\text{min}$ and by how much (updating the array/hashtable each time you decrement $j_\text{min}$). Same for $j_\text{max}$.
Finally, sum up the counts you get for each value of $i$.
Note that $j_\text{min}$ starts out at $n$ and only ever decreases: when you decrement $i$, $j_\text{min}$ can only ever get smaller (but not bigger). So, you'll only need to decrement $j_\text{min}$ (and update the array/hashtable) at most $n$ times: it starts out at $n$, and only ever decreases, and never gets smaller than $1$, so it can only be decreased at most $n$ times. The same is true for $j_\text{max}$. Consequently, we do at most $O(n)$ updates to the array/hashtable (summed up over all the steps of the algorithm), so the total running time of this algorithm is $O(n)$.
Credit: Thanks to @aaaaajack for a major improvement to my algorithm, and to @Raphael for further improvements.
Let $S$ be the string. We'll let $i$ range from $n$ down to $1$, and for each value of $i$, we're going to count the number of substrings that start at position $i$.
For each $i$, find the smallest $j_\text{min} \ge i$ such that $S[i..j_\text{min}]$ has exactly $k$ unique characters as well as the largest $j_\text{max} \ge i$ such that $S[i..j_\text{max}]$ has exactly $k$ unique characters. You can do this efficiently, by keeping an array (or hashtable) with character counts for $S[i..j_\text{min}]$ and one with character counts for $S[i..j_\text{max}]$; you'll update these each time you decrement $i$, $j_\text{min}$, or $j_\text{max}$. (In addition to the array/hashtable, also keep track of the number of of characters with non-zero count, and update this each time you update the array or hashtable.) When you decrement $i$, you can update the array/hashtable with character counts for $S[i..j_\text{min}]$, then use that to see whether you need to decrease $j_\text{min}$ and by how much (updating the array/hashtable each time you decrement $j_\text{min}$). Same for $j_\text{max}$.
Finally, sum up the counts you get for each value of $i$.
Note that $j_\text{min}$ starts out at $n$ and only ever decreases: when you decrement $i$, $j_\text{min}$ can only ever get smaller (but not bigger). So, you'll only need to decrement $j_\text{min}$ (and update the array/hashtable) at most $n$ times: it starts out at $n$, and only ever decreases, and never gets smaller than $1$, so it can only be decreased at most $n$ times. The same is true for $j_\text{max}$. Consequently, we do at most $O(n)$ updates to the array/hashtable (summed up over all the steps of the algorithm), so the total running time of this algorithm is $O(n)$.
Credit: Thanks to @aaaaajack for a major improvement to my algorithm, and to @Raphael for further improvements.
Context
StackExchange Computer Science Q#68188, answer score: 5
Revisions (0)
No revisions yet.