HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

String matching algorithm - check if a string matches a pattern

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
matchespatternalgorithmcheckstringmatching

Problem

This looks like quite the challenge; given a pattern $P$ (of length $n$) and a string $S$ (of length $m$), how would you check whether the string matches the pattern? For instance:

  • If $P$ = "xyx" and $S$ = "foobarfoo" then $S$ matches $P.$



  • If $P$ = "acca" and $S$ = "carbuscarbus" then $S$ does not match $P.$



My thoughts so far: This looks like a dynamic programming problem. We can define a boolean

$M(i, j)$ = True iff pattern $P[i:n]$ matches $S[j:m]$

We then need $M(0, 0).$ Note here that the substring notation is $P[i:n] = P_iP_{i+1}...P_n$.

I couldn't think of an efficient recurrence beyond this. It seems like I would have to introduce an extra mapping (from a substring of $P$ to a substring of $S$, indicating that the earlier "matches" the latter; for example in the first example, ("x", "foo") would be part of the mapping. You then replace "x" by "foo" in the substring of P). This would make it $O(2^n)$ or worse though.

If you don't see the intuitive meaning, here's a more rigorous definition: S matches P iff there exists a mapping $$T : P[i:j] \rightarrow S[a:b]$$ for $i, j \in [1, 2 ...n]$ and $a, b \in [1, 2 ...m]$, such that replacing $P[i:j]$ with $S[a:b]$ transforms P into S. For example, in the first case where $P$ = "xyx" and $S$ = "foobarfoo", $T$ is as follows:

  • T('x') = 'foo'. {Here $i = 1, j = 1, a = 1, b = 3$}



  • T('y') = 'bar'. {Here $i = 2, j = 2, a = 4, b = 6$}



Replacing all instances (in $P$) of 'x' with 'foo' and 'y' with 'bar', gives us 'foobarfoo' i.e. $S$.

Thus, $S$ matches $P.$

Solution

There are various algorithms for pattern matching in string.

Exact String matching algorithms

  • Brute Force



  • Rabin Karp



  • Boyer-Moree



  • KMP



  • Aho Corasick etc.



Approximate String Matching Algorithm

Approximate String matching is applied in text searching, pattern
recognition and signal processing applications. For a
text T[1..n] and pattern P[1...m], we are supposed to
find all the occurrences of pattern in the text whose edit
distance to the pattern is at most K. The edit distance
between two strings is defined as minimum number of
character insertion, deletion and replacements needed to
make them equal.

Approximate string matching problem is solved with the
help of dynamic programming.

You can have a look through all these algorithms.
These algorithms will take optimum complexity like O(m),O(n+m) (in case of KMP for example).

Context

StackExchange Computer Science Q#64072, answer score: 5

Revisions (0)

No revisions yet.