HiveBrain v1.2.0
Get Started
← Back to all entries
patterncppMinor

Another regex subset matcher

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
regexsubsetmatcheranother

Problem

After reading @JavaDeveloper's recent question, I was inspired1 to try my hand at writing code to accomplish the same task.

The "rules" for this code (i.e., its intent) is to match a subset of regular expressions, defined as follows:



  • . (dot) means a any single character match



  • * (star) means 0 or more character match



  • ? (question) means previous char is an option i.e colou?r matches color and colour.




Just to be clear: I'm well aware that this is not how regular expressions are normally defined, and that the fundamental capability of this subset is exactly that--a subset of what full regexes can recognize/match.

As an aside: although tagging a question with both C and C++ is normally a mistake, I believe in this case it's justified. In particular, the match function is at least intended to be work as either C or C++. The test rig (enabled by defining TEST when compiling) uses features to specific to C++, but match itself should not.

Without further ado, my attempt at implementing this functionality:

```
#include
#ifndef __cplusplus
#include
#endif

bool match(char const needle, char const haystack) {
for (; *needle!='\0'; ++needle) {
switch (*needle) {
case '.': ++haystack;
case '?': break;
case '*': {
size_t max = strlen(needle);
for (size_t i = 0; i
#include
#include

template
void assertTrue(F f, char const needle, char const hay_stack) {
static const std::string names [] = { "Failure", "Success" };
std::cout "
void assertFalse(F f, char const needle, char const hay_stack) {
static const std::string names [] = { "Success", "Failure" };
std::cout " regexList { "abc**", "abc", "abc", "abc", "abc****", "abc*" };
char const *str1 = "abc";

for (auto regex : regexList)
assertTrue(match,regex, str1);

char const *regex = "abc****";
std::vector strList1 { "abcxyz", "abcx", "abc" };
for (auto str :

Solution

The regex flavour is non-standard, as you noted. Just to be clear, ? acts as a zero-or-one-of-the-previous-character modifier, while * acts more like a shell glob.

You have an access-past-the-end-of-string error when testing for glob matches in a test such as this:

assertFalse(match, "**a", "b");


The for-loop speculatively matches max substrings of the haystack, where max is based on the length of the needle. If the matches don't succeed, you could easily walk past the end of the haystack.

Your switch block is fragile: the '.' case relies on the break of the following '?' case. That's a bit too clever and dangerous for my taste, and a warning comment would be useful.

Code Snippets

assertFalse(match, "**a", "b");

Context

StackExchange Code Review Q#43637, answer score: 2

Revisions (0)

No revisions yet.