HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Time Complexity of Rabin-Karp matching algorithm

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
rabintimealgorithmcomplexitymatchingkarp

Problem

I asked a question on Rabin-Karp Searching algorithm here, which I am reading from the book "Introduction to Algorithms" 3rd edition Cormen et al..

After reading few para of the section on Rabin-Karp, I got some more confusions:

In the third paragraph the authors say that the if we could find p (decimal value of pattern P[1....m] ) in time O(m) and all the ts values (i.e decimal value of length-m sub-string T[s+1....s+m], s=0,1,2,,,,n-m) in time O(n-m+1), then we could determine all valid shifts s in time O(m) + O(n-m+1) by comparing p with each of the ts values.

How is this possible? O(m) is for finding p, O(n-m+1) is for finding all ts, so total pre-processing time so far is O(m) + O(n-m+1). This is the total pre-processing time; the comparison has yet to start, I have to spend some extra $ for doing comparison of a decimal p with each of the (n-m+1)-ts values.

1-Then why the authors say in the first para that the pre-processing time is O(m)? Why it is not O(m) + O(n-m+1) which include processing time of p and all ts values?

2- Now if we talk about worst case matching time, what should be that? So in the worst my decimal number p (already calculated ) will be compared with each of the another (m-n+1) decimal numbers, which are the values of ts (already calculated, no extra cash needed for doing this job now ). The worst case is when I am most unlucky and I have to compare every value of ts with p. Right?

Based on my understanding,(if I am right) the worst case matching time should be O(m-n+1) and not O((m-n+1)m) as claimed by the authors in the first para. For example let us say my Pattern is P[1...m]=226 and Text is T[1....n]=224225226. so my p is decimal 226, and ts is decimal value of T[s+1, s+2, s+3], for s=0,1,2...6 as n=9, and m=3. The ts values will be as follows:

s=0 => T[224]=> ts=224

s=1 => T[242]=> ts=242

s=2 => T[422]=> ts=422

s=3 => T[225]=> ts=225

s=4 => T[252]=> ts=252

s=5 => T[522]=> ts=522

s=6 => T[226]=> ts=226

Now you will be

Solution

As pointed out by vzn above, the wiki article has all the details you are looking for.

Firstly, what is pre-processing and what is the online runtime, depends on what is remaining constant and what is varying. For example, if we are given some fixed text and are asked to match various small strings with that text, pre-processing would be hashing the text.

In this case, let us say we have only one string to be compared with one text, to avoid confusion about pre-processing. The operations that we need to perform are

  • Hash for string to be searched for ($p$) = $O(m)$



  • Hash for each $m$-sized substring in the text ($T$) (assuming rolling hash) = $O(n-m+1)$



  • Number of hash comparisons (one per each $m$ sized substring in $T$) = $O(n-m+1)$



  • If there are $r$ matches in the hashes, then we compare each of those substrings of $T$ with $p$ (each comparison being $O(m)$) = $O(rm)$



But in worst case, the number of matches in hash can be as large as $n-m+1$. So worst case complexity will be $O(m(n-m+1))$.

Context

StackExchange Computer Science Q#10258, answer score: 6

Revisions (0)

No revisions yet.