HiveBrain v1.2.0
Get Started
← Back to all entries
patternphpMinor

Clean regex matches with named matches

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
matcheswithnamedregexclean

Problem

I have a regex pattern that will match some elements from a string and give them a particular name. For example, #^(?.*)$# will match the whole string and name it foo.

My problem is that the matches also contain the "classic", numbered matches.

For example:

.*)$#';
$str = '123';
$matches = null;
preg_match($pattern, $str, $matches);
print_r($matches);


will print:

Array
(
    [0] => 123
    [foo] => 123
    [1] => 123
)


As all the matches will always be named, I decided to manually remove the numbered indexes from $matches in order to clean things up:

.*)$#';
$str = '123';
$matches = null;
if (preg_match($pattern, $str, $matches))
{
    foreach ($matches as $key => $value)
    {
        if (is_int($key))
            unset($matches[$key]);
    }
}
print_r($matches);


Which prints:

Array
(
    [foo] => 123
)


It works, but I feel that it can be improved. Is there a better way to do this, especially without the foreach loop?

In practice, $pattern and $str can be much more complicated than the example I gave and I want this to be executed as fast as possible.

Solution

If you know what the named matches are up front (ie if you know what the pattern looks like, you can simply use array_intersect_key to extract only the values that have a specific key from the $matches array:

$names = ['foo' => null];

$pattern = '#^(?.*)$#';//changed ^ to $ at the end ;)
$str = '123';
if (preg_match($pattern, $str, $matches))
{//or return here
    $matches = array_intersect_key($matches, $names);
}
return $matches;


Of course, if you are not in control of the names that will be used in the pattern, you'll have to either iterate over the $matches array like you're doing now. However, I'd recommend you don't use unset on the $matches array, but rather copy the relevant values to a new one and return that array instead:

$returnValue = [];//new array
foreach ($matches as $k => $v) {
    if (!is_int($k)) {
        $returnValue[$k] = $v;
    }
}


There are a couple of reasons for this:

  • It's considered bad practice to change the array you're iterating over inside the loop. It can cause issues in certain cases, and it will definitely bite you if you decide to pick up another language



  • PHP's memory management and copy-on-write mechanisms work well with code like the loop above: the new array will be assigned a reference to the value in $matches, but once the function returns, $matches is GC'ed. The values not referenced by $returnValue will be GC'ed, the other values are then "owned" by the return array (that's not 100% accurate, but it's true enough for now)



  • It's probably the most efficient (in terms of readability and execution time) approach.



Time to get silly

Just for the fun of it: you can opt for an inception-style preg_match_all call on the regex you're passing to preg_match (regex matching on a regex... let's be honest, that sounds a tad absurd). It's silly, but it can be done:

$pattern = '#^(?.*)$#';//changed ^ to $ at the end ;)
$str = '123';
$names = null;
if (preg_match_all('/(?]+)/', $pattern, $matches))
{//create an assoc array containing the match names
    $names = array_fill_keys($matches[0], null);
}

$matches = null;
if (preg_match($pattern, $str, $matches))
{
    if ($names) {
        //gets only the named keys
        $matches = array_intersect_key($matches, $names);
    }
    return $matches;
}
//throw exception, return null, or do something else here


Now, this is not exactly the way to go, but in some cases it might happen that you're processing a string, but what regex you apply to it can change depending on any number of reasons. In that case, array_intersect_key is definitely worth a look, seeing as it only returns the keys that exist in all of the arguments you pass to it:

class Foo
{
    const DOMAIN_PATTERN = '/(?[^@\.]+)(?=\.)/';//or something
    const EXTENSION = '/\.(?[a-z]{3,4})$/';

    protected static $names = [
        'domain'     => null,
        'extension'  => null,
    ];

    protected $mode = null;

    public function setValidationOptions(array $options)
    {//based on these options, one or more specific regex's will be applied to the data
        $this->mode = $options;
        return $this;
    }
    public function validateString($string)
    {
        $regex = $this->getPatterns();
        $result = [];
        foreach ($regex as $pattern) {
            if (preg_match($string, $pattern, $matches)) {
                $result = array_merge(
                    $result,
                    array_intersect_keys(
                        $matches,
                        static::$names
                    )
                );
            }
        }
        return $result;
    }
}


This is just a crude example of how you could use array_intersect_key to handle regex matches with named sub-patterns

Code Snippets

$names = ['foo' => null];

$pattern = '#^(?<foo>.*)$#';//changed ^ to $ at the end ;)
$str = '123';
if (preg_match($pattern, $str, $matches))
{//or return here
    $matches = array_intersect_key($matches, $names);
}
return $matches;
$returnValue = [];//new array
foreach ($matches as $k => $v) {
    if (!is_int($k)) {
        $returnValue[$k] = $v;
    }
}
$pattern = '#^(?<foo>.*)$#';//changed ^ to $ at the end ;)
$str = '123';
$names = null;
if (preg_match_all('/(?<=\?<)([^>]+)/', $pattern, $matches))
{//create an assoc array containing the match names
    $names = array_fill_keys($matches[0], null);
}

$matches = null;
if (preg_match($pattern, $str, $matches))
{
    if ($names) {
        //gets only the named keys
        $matches = array_intersect_key($matches, $names);
    }
    return $matches;
}
//throw exception, return null, or do something else here
class Foo
{
    const DOMAIN_PATTERN = '/(?<=@)(?<domain>[^@\.]+)(?=\.)/';//or something
    const EXTENSION = '/\.(?<extension>[a-z]{3,4})$/';

    protected static $names = [
        'domain'     => null,
        'extension'  => null,
    ];

    protected $mode = null;

    public function setValidationOptions(array $options)
    {//based on these options, one or more specific regex's will be applied to the data
        $this->mode = $options;
        return $this;
    }
    public function validateString($string)
    {
        $regex = $this->getPatterns();
        $result = [];
        foreach ($regex as $pattern) {
            if (preg_match($string, $pattern, $matches)) {
                $result = array_merge(
                    $result,
                    array_intersect_keys(
                        $matches,
                        static::$names
                    )
                );
            }
        }
        return $result;
    }
}

Context

StackExchange Code Review Q#104817, answer score: 4

Revisions (0)

No revisions yet.