HiveBrain v1.2.0
Get Started
← Back to all entries
patternphpMinor

Finding an exact "phrase" with a given string (as typed/in order)

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
exactorderwithtypedphrasefindingstringgiven

Problem

 $input, 'FOUND' => 1,
                'VALUEofFOUND' => $phrase[$i]];
            } else {
                $out[] = 'Not found';
            }
            $i++;
        }
        print_r($out);
    } //end function

    $this->filterExactPhrase("this is a test, foobar", "foobar");


Calling the function like this yields

Array ( [0] => Array ( [INPUT] => this is a test, foobar [FOUND] => 1 [VALUEofFOUND] => foobar ) )

Giving the function foo as $input instead of foobar yields

$this->filterExactPhrase("this is a test, foo", "foobar");


Array ( [0] => Not found )

I thought this was quite interesting to find as I was looking for a solution to find a very specific phrase with spaces, in a extensively long, log file to remove from it.

Solution

Basically, Alex has already made some good suggestions (especially the bit about using \b). However, your function will fail in certain cases like this one:

//looking for chars like *, +, ? and such
filterExactPhrase("some *markdown* string", "*markdown*");


The string could contain special regex chars (. \ + * ? [ ^ ] $ ( ) { } = ! | : -), or the delimiter you use:

//this is operator error (regex + markup don't mix)
filterExactPhrase('foobar', '');


Yes, it's evil, but people still seem hellbent on using regex's to consume markup, so your code should either check and throw an exception if it's used to do that, or you should defend against it. The example above will generate an error, because the string ` is concatenated into the regex raw, so you end up with this:

/\b()\b/
/\b( faulty regex
hi>\b/ -> unknown and invalid flags


Another thing to think of is that people, once they find out the function uses a regex, will start passing regular expressions instead of a string to it. Kind of like people entering SQL wildcards in search forms (stuff like
foo%).

filterExactPhrase("some string with words and 123 numbers", "[\w\d]+");


So how do you go about this? simple:
preg_quote filters the input for you, and escapes whatever chars need escaping. Basically, all I'm trying to say is change this:

$numFound = preg_match_all("/\b(" . $phrase . ")\b/", $input);


to this:

$escaped = preg_quote($phrase, '/');//second param is the delimiter
$numFound = preg_match_all("/\b(" . $phrase . ")\b/", $input);


Now chars like
+ or * are escaped properly, and so are the delimiters.

The other thing I'd suggest is to remove the
print_r` from your function/method. I realize that it's probably there for debugging purposes, but still: a function/method does one thing. In this case its job is to process a piece of string, and find exact matches of another string. Whether or not that data should be shown (displayed, echoed or printed or whatever) is not a call this method should make. It's not aware of output buffers, headers that might be set later on, so it shouldn't forcibly generate output.

For all you know, I might want to call this method, store the data somewhere, and send something entirely different to the output stream. In short: a function/method should return data, not print/echo it

This dogma doesn't apply for methods in a renderer component or view class, of course. But it holds true for data-processing code units

Code Snippets

//looking for chars like *, +, ? and such
filterExactPhrase("some *markdown* string", "*markdown*");
//this is operator error (regex + markup don't mix)
filterExactPhrase('<h1>foobar</h1>', '</h1>');
/\b(</h1>)\b/
/\b(</ -> faulty regex
hi>\b/ -> unknown and invalid flags
filterExactPhrase("some string with words and 123 numbers", "[\w\d]+");
$numFound = preg_match_all("/\b(" . $phrase . ")\b/", $input);

Context

StackExchange Code Review Q#92939, answer score: 4

Revisions (0)

No revisions yet.