patternjavaMinor
Simplified regular expression engine
Viewed 0 times
expressionsimplifiedregularengine
Problem
Task:
On an alphabet set [a-z], a simplified regular expression is much
simpler than the normal regular expression.
It has only two meta characters: '.' and '*'.
'.' -- exact one arbitrary character match.
'*' -- zero or more arbitrary character match.
Note that it is different from the normal regular expression in that
'' is on its own and does NOT act as a decorator on the leading
character, I.e. in our simplified regular expression. "" is
equivalent to ".*" in regular expression. Write a parser which
takes a pattern string and a candidate string as input and decides to
either accept or reject the candidate string based on the pattern
string.
to validate the input strings.
Example:
```
| pattern string | candidate string | result |
| abc | abc | accept |
| * | abc | accept |
| *abc | abc | accept |
| *abc | aaabbbabc | accept |
| a*bc | aaabbbabc | accept |
| a*bc | abc | accept |
| a* | abc | accept |
| a* | a | accept |
| a* | aa | accept |
| a* | abcdef | accept |
| abc | abc | accept |
| * | abc | accept |
| … | abc | accept |
| .* | abc | accept |
| .bc* | abc | accept |
| .bca | abca | accept |
| * | /EMPTY STRING/ | accept |
| abc | abcd | r
On an alphabet set [a-z], a simplified regular expression is much
simpler than the normal regular expression.
It has only two meta characters: '.' and '*'.
'.' -- exact one arbitrary character match.
'*' -- zero or more arbitrary character match.
Note that it is different from the normal regular expression in that
'' is on its own and does NOT act as a decorator on the leading
character, I.e. in our simplified regular expression. "" is
equivalent to ".*" in regular expression. Write a parser which
takes a pattern string and a candidate string as input and decides to
either accept or reject the candidate string based on the pattern
string.
- Only the full match is acceptable. The parser rejects partial matches.
- You can safely assume all the characters in both input pattern string and candidate string are valid characters, i.e. you don't need
to validate the input strings.
- Your parser need to at least pass all the following test cases.
- You can write the parser in any programing language you like.
- Write up an analysis on the run-time complexity of your code.
Example:
```
| pattern string | candidate string | result |
| abc | abc | accept |
| * | abc | accept |
| *abc | abc | accept |
| *abc | aaabbbabc | accept |
| a*bc | aaabbbabc | accept |
| a*bc | abc | accept |
| a* | abc | accept |
| a* | a | accept |
| a* | aa | accept |
| a* | abcdef | accept |
| abc | abc | accept |
| * | abc | accept |
| … | abc | accept |
| .* | abc | accept |
| .bc* | abc | accept |
| .bca | abca | accept |
| * | /EMPTY STRING/ | accept |
| abc | abcd | r
Solution
In answer to 1) - No. Consider the regex
That being said, this solution seems to be going in the right direction, and for this restricted language we can stop short of actually building the DFA equivalent to the expression. (Which is expensive in space to create, but will be cheaper each time it is run against a candidate string).
We acheive this by caching the result of each regex-substring for future use (the dynamic programming approach). This prevents the problematic recursive explosion in the original problem and will guarantee an O(nm) runtime, but could cost up to O(nm) space for each regex candidate pair.
*****a and the string b. Your code will have to consider all 9! (362880) cases before returning false.That being said, this solution seems to be going in the right direction, and for this restricted language we can stop short of actually building the DFA equivalent to the expression. (Which is expensive in space to create, but will be cheaper each time it is run against a candidate string).
We acheive this by caching the result of each regex-substring for future use (the dynamic programming approach). This prevents the problematic recursive explosion in the original problem and will guarantee an O(nm) runtime, but could cost up to O(nm) space for each regex candidate pair.
static Map, boolean> result_cache;
//Assume we have a pair, as in:
//http://stackoverflow.com/a/677248/1331457
public static boolean match(String regx, String candidate)
{
Pair key = Pair(regx, candidate);
if result_cache.containsKey(key){
return true;
}
boolean result = false;
// Empty regx will always return false
if (regx.isEmpty())
{
result = false;
}
else
{
if (regx.charAt(0) == '*')
{
// Last * matches everything
if (regx.length() == 1)
{
result = true;
}
else
{
result = matchStar(regx.substring(1), candidate);
}
}
// Return false if text is empty but pattern is not *
else if (candidate.isEmpty())
{
result = false;
}
else if (regx.charAt(0) == '.' || regx.charAt(0) == candidate.charAt(0))
{
// If the last regx matches the last text
if (regx.length() == 1 && candidate.length() == 1)
{
result = true;
}
// If hasn't reached the end, try to match the rest strings
else
{
result = match(regx.substring(1), candidate.substring(1));
}
}
else
{
result = false;
}
}
result_cache.put(key, result);
return result;
}
//Similarly for other methods.Code Snippets
static Map<Pair<String, String>, boolean> result_cache;
//Assume we have a pair, as in:
//http://stackoverflow.com/a/677248/1331457
public static boolean match(String regx, String candidate)
{
Pair<String, String> key = Pair(regx, candidate);
if result_cache.containsKey(key){
return true;
}
boolean result = false;
// Empty regx will always return false
if (regx.isEmpty())
{
result = false;
}
else
{
if (regx.charAt(0) == '*')
{
// Last * matches everything
if (regx.length() == 1)
{
result = true;
}
else
{
result = matchStar(regx.substring(1), candidate);
}
}
// Return false if text is empty but pattern is not *
else if (candidate.isEmpty())
{
result = false;
}
else if (regx.charAt(0) == '.' || regx.charAt(0) == candidate.charAt(0))
{
// If the last regx matches the last text
if (regx.length() == 1 && candidate.length() == 1)
{
result = true;
}
// If hasn't reached the end, try to match the rest strings
else
{
result = match(regx.substring(1), candidate.substring(1));
}
}
else
{
result = false;
}
}
result_cache.put(key, result);
return result;
}
//Similarly for other methods.Context
StackExchange Code Review Q#10776, answer score: 3
Revisions (0)
No revisions yet.