patterncppMinor
Number of occurence of two letters separted by a finit gap
Viewed 0 times
occurencenumbersepartedgaptwofinitletters
Problem
Lets we have a string
I have to perform a function that calculates number of occurrence of letter
Example:
I used this code for DNA sequence (alphabet
complexity:
basem:calculate the number of occurence for a gap
basegap: calculate the number of occurence for a gap from 1 to X
Running Example:
```
a
[1] "atgccatcactcagtaaagaagcggccctggttcatgaagcgttagttgcgcgaggactggaaacaccgctgcgcccgcccgtgcatgaaatggataacgaaacgccgaaaagccttattgctggtcatatgaccgaaatcatgcagctgctgaatctcgacctggctgatgacagtttgatggaaacgcggcatcgcatcgctaaaatgtatgtcgatgaaattttctccggtctggattacgccaatttcccgaaaatcaccctcattgaaaacaaaatgaaggtcgatgaaatggtcaccgtccgcgatatcactctgaccagcacctgtgaacaccattttgttaccatcgatggcaaagcgacggtggcctatatcccgaaagattcggtgatcggtctgtcaaaaattaaccgcattgtgcagttctttgcccagcgtccgcaggtgcaggaacgtctgacgcagcaaattcttgttgccgcgctacaaacgctgctgggcaccaataacgtggctgtctcgatcgacgcggtgcattactgcgtgaaggcgcgtggcatccgcgatgcaaccagtgccacgacaacgtcctctcttggtggattgttcaaatccagtcagaatacgcgc
Seq of size N and contain z alphabet { A,B,C….}.I have to perform a function that calculates number of occurrence of letter
i which is separated with l letters (gap) form the letter j:Example:
I=A, j=T and gap=10
That have to find the number of occurrence of AT,A-T,A--T,A---T,A----T,,,A----------T(- is any alphabet)I used this code for DNA sequence (alphabet
z=4) and suppose that is N(L*N)complexity:
inline const short int V (const char x){
switch(x){
case 'A':case 'a':
return 0;
break;
case 'C':case 'c':
return 1;
break;
case 'G':case 'g':
return 2;
break;
default:
return 3;
break;
}
}
// [[Rcpp::export]]
IntegerVector basem(std::string seq ,int gap ,int N){
IntegerVector res(16);
short int x=0;
short int y=0;
for(int i=0;i<N-gap;i++){
x=V(seq[i]);
y=V(seq[i+gap]);
res[4*x+y]++;
}
return res;
}
// [[Rcpp::export]]
List basegap (std::string sequence,int x){
int n=sequence.size();
List result ;
for(int j=1;j<x;j++) {
result.push_back(basem(sequence,j,n));
}
return result;
}basem:calculate the number of occurence for a gap
basegap: calculate the number of occurence for a gap from 1 to X
Running Example:
```
a
[1] "atgccatcactcagtaaagaagcggccctggttcatgaagcgttagttgcgcgaggactggaaacaccgctgcgcccgcccgtgcatgaaatggataacgaaacgccgaaaagccttattgctggtcatatgaccgaaatcatgcagctgctgaatctcgacctggctgatgacagtttgatggaaacgcggcatcgcatcgctaaaatgtatgtcgatgaaattttctccggtctggattacgccaatttcccgaaaatcaccctcattgaaaacaaaatgaaggtcgatgaaatggtcaccgtccgcgatatcactctgaccagcacctgtgaacaccattttgttaccatcgatggcaaagcgacggtggcctatatcccgaaagattcggtgatcggtctgtcaaaaattaaccgcattgtgcagttctttgcccagcgtccgcaggtgcaggaacgtctgacgcagcaaattcttgttgccgcgctacaaacgctgctgggcaccaataacgtggctgtctcgatcgacgcggtgcattactgcgtgaaggcgcgtggcatccgcgatgcaaccagtgccacgacaacgtcctctcttggtggattgttcaaatccagtcagaatacgcgc
Solution
I assume you are trying to find the number of pairs of a letter
Use a sliding window algorithm. The sliding window will contain all indexes of an element
Step 1. Store all indexes of the letter
Step 2. Store all indexes of the letter
Example:
Step 3 (more complicated):
Let me explain some of the lines.
The condition
The first condition
All you have to do after this to output
Although there are 2 nested loops, this algorithm runs in
All of the code is in C++ as that is my main language, though the concept can be applied to any language. You might want to look up other examples of the sliding window algorithm like this question from the Australian Informatics Olympiad.
i and a letter j (i and j are arbitrary letters) that are at most L indices away from each other.Use a sliding window algorithm. The sliding window will contain all indexes of an element
i that are within the range of a certain index with the letter j.Step 1. Store all indexes of the letter
i in a sorted fashion (scan the input in order) O(N) Step 2. Store all indexes of the letter
j in a sorted fashion (similar to Step 1) O(N)Example:
string s; letter1, letter2; <- Store indices
for (int i = 0; i < N; i++){
if (s[i] == 'a'){ <- Our letter 'i'
letter1.push_back(i);
}
if (s[i] == 'c'){ <- Our letter 'j'
letter2.push_back(i);
}
}Step 3 (more complicated):
int track = 0, sum = 0; window; = (letter2[i] - L)){
window.push(letter1[track]);
track++;
}
sum = sum + window.size();
}Let me explain some of the lines.
while (!window.empty() && window.front() < (letter2[i] - L)){
window.pop_front();
}The condition
!window.empty() is to prevent a run-time error when calling window.front() later on. window.front() < (letter2[i] - L) implies that the smallest index in the queue is less than the minimum allowed value in the range, thus we should exclude it. We do not have to consider it ever again as the smallest possible value in the allowed range only increases as we iterate over letter2. We continue removing the elements till all of the elements are within the range or there are no elements left.while (track = (letter2[i] - L)) window.push(letter1[track]);
track++;
}The first condition
track < letter1.size() is also to prevent a run-time error. The next condition is to check if the element is less than or equal the maximum allowed value in the range. If the element is also more than or equal to the smallest allowed value in the range, then we add it to the queue. We still increment track even if no value is pushed into the queue as at the end of each iteration of the loop, letter1[track] is always the smallest element larger than the allowed range during that iteration or is out of bounds.All you have to do after this to output
sum.Although there are 2 nested loops, this algorithm runs in
O(N) where N is the number of letters. Why? Because each index of the letter i is pushed and popped from the queue at most once each throughout the entire loop. This is at most 2*N operations, which is linear time.All of the code is in C++ as that is my main language, though the concept can be applied to any language. You might want to look up other examples of the sliding window algorithm like this question from the Australian Informatics Olympiad.
Code Snippets
string s; <- Assume input is already in this string
vector<int> letter1, letter2; <- Store indices
for (int i = 0; i < N; i++){
if (s[i] == 'a'){ <- Our letter 'i'
letter1.push_back(i);
}
if (s[i] == 'c'){ <- Our letter 'j'
letter2.push_back(i);
}
}int track = 0, sum = 0; <- Track the front-most index and the running sum of pairs
queue<int> window; <- Our sliding window
for (int i = 0; i < letter2.size(); i++){ <- For every letter 'j'
while (!window.empty() && window.front() < (letter2[i] - L)){
window.pop_front();
}
while (letter1[track] <= (letter2[i] + L) && (letter1[track] >= (letter2[i] - L)){
window.push(letter1[track]);
track++;
}
sum = sum + window.size();
}while (!window.empty() && window.front() < (letter2[i] - L)){
window.pop_front();
}while (track < letter1.size() && letter1[track] <= (letter2[i] + L) ){
if ((letter1[track] >= (letter2[i] - L)) window.push(letter1[track]);
track++;
}Context
StackExchange Code Review Q#141930, answer score: 2
Revisions (0)
No revisions yet.