patterncppMinor
Another regex subset matcher
Viewed 0 times
regexsubsetmatcheranother
Problem
After reading @JavaDeveloper's recent question, I was inspired1 to try my hand at writing code to accomplish the same task.
The "rules" for this code (i.e., its intent) is to match a subset of regular expressions, defined as follows:
Just to be clear: I'm well aware that this is not how regular expressions are normally defined, and that the fundamental capability of this subset is exactly that--a subset of what full regexes can recognize/match.
As an aside: although tagging a question with both C and C++ is normally a mistake, I believe in this case it's justified. In particular, the
Without further ado, my attempt at implementing this functionality:
```
#include
#ifndef __cplusplus
#include
#endif
bool match(char const needle, char const haystack) {
for (; *needle!='\0'; ++needle) {
switch (*needle) {
case '.': ++haystack;
case '?': break;
case '*': {
size_t max = strlen(needle);
for (size_t i = 0; i
#include
#include
template
void assertTrue(F f, char const needle, char const hay_stack) {
static const std::string names [] = { "Failure", "Success" };
std::cout "
void assertFalse(F f, char const needle, char const hay_stack) {
static const std::string names [] = { "Success", "Failure" };
std::cout " regexList { "abc**", "abc", "abc", "abc", "abc****", "abc*" };
char const *str1 = "abc";
for (auto regex : regexList)
assertTrue(match,regex, str1);
char const *regex = "abc****";
std::vector strList1 { "abcxyz", "abcx", "abc" };
for (auto str :
The "rules" for this code (i.e., its intent) is to match a subset of regular expressions, defined as follows:
.(dot) means a any single character match
*(star) means 0 or more character match
?(question) means previous char is an option i.ecolou?rmatchescolorandcolour.
Just to be clear: I'm well aware that this is not how regular expressions are normally defined, and that the fundamental capability of this subset is exactly that--a subset of what full regexes can recognize/match.
As an aside: although tagging a question with both C and C++ is normally a mistake, I believe in this case it's justified. In particular, the
match function is at least intended to be work as either C or C++. The test rig (enabled by defining TEST when compiling) uses features to specific to C++, but match itself should not.Without further ado, my attempt at implementing this functionality:
```
#include
#ifndef __cplusplus
#include
#endif
bool match(char const needle, char const haystack) {
for (; *needle!='\0'; ++needle) {
switch (*needle) {
case '.': ++haystack;
case '?': break;
case '*': {
size_t max = strlen(needle);
for (size_t i = 0; i
#include
#include
template
void assertTrue(F f, char const needle, char const hay_stack) {
static const std::string names [] = { "Failure", "Success" };
std::cout "
void assertFalse(F f, char const needle, char const hay_stack) {
static const std::string names [] = { "Success", "Failure" };
std::cout " regexList { "abc**", "abc", "abc", "abc", "abc****", "abc*" };
char const *str1 = "abc";
for (auto regex : regexList)
assertTrue(match,regex, str1);
char const *regex = "abc****";
std::vector strList1 { "abcxyz", "abcx", "abc" };
for (auto str :
Solution
The regex flavour is non-standard, as you noted. Just to be clear,
You have an access-past-the-end-of-string error when testing for glob matches in a test such as this:
The for-loop speculatively matches
Your
? acts as a zero-or-one-of-the-previous-character modifier, while * acts more like a shell glob.You have an access-past-the-end-of-string error when testing for glob matches in a test such as this:
assertFalse(match, "**a", "b");The for-loop speculatively matches
max substrings of the haystack, where max is based on the length of the needle. If the matches don't succeed, you could easily walk past the end of the haystack.Your
switch block is fragile: the '.' case relies on the break of the following '?' case. That's a bit too clever and dangerous for my taste, and a warning comment would be useful.Code Snippets
assertFalse(match, "**a", "b");Context
StackExchange Code Review Q#43637, answer score: 2
Revisions (0)
No revisions yet.