patternphpMinor
Clean regex matches with named matches
Viewed 0 times
matcheswithnamedregexclean
Problem
I have a regex pattern that will match some elements from a string and give them a particular name. For example,
My problem is that the matches also contain the "classic", numbered matches.
For example:
will print:
As all the matches will always be named, I decided to manually remove the numbered indexes from
Which prints:
It works, but I feel that it can be improved. Is there a better way to do this, especially without the
In practice,
#^(?.*)$# will match the whole string and name it foo.My problem is that the matches also contain the "classic", numbered matches.
For example:
.*)$#';
$str = '123';
$matches = null;
preg_match($pattern, $str, $matches);
print_r($matches);will print:
Array
(
[0] => 123
[foo] => 123
[1] => 123
)As all the matches will always be named, I decided to manually remove the numbered indexes from
$matches in order to clean things up:.*)$#';
$str = '123';
$matches = null;
if (preg_match($pattern, $str, $matches))
{
foreach ($matches as $key => $value)
{
if (is_int($key))
unset($matches[$key]);
}
}
print_r($matches);Which prints:
Array
(
[foo] => 123
)It works, but I feel that it can be improved. Is there a better way to do this, especially without the
foreach loop?In practice,
$pattern and $str can be much more complicated than the example I gave and I want this to be executed as fast as possible.Solution
If you know what the named matches are up front (ie if you know what the pattern looks like, you can simply use
Of course, if you are not in control of the names that will be used in the pattern, you'll have to either iterate over the
There are a couple of reasons for this:
Time to get silly
Just for the fun of it: you can opt for an inception-style
Now, this is not exactly the way to go, but in some cases it might happen that you're processing a string, but what regex you apply to it can change depending on any number of reasons. In that case,
This is just a crude example of how you could use
array_intersect_key to extract only the values that have a specific key from the $matches array:$names = ['foo' => null];
$pattern = '#^(?.*)$#';//changed ^ to $ at the end ;)
$str = '123';
if (preg_match($pattern, $str, $matches))
{//or return here
$matches = array_intersect_key($matches, $names);
}
return $matches;Of course, if you are not in control of the names that will be used in the pattern, you'll have to either iterate over the
$matches array like you're doing now. However, I'd recommend you don't use unset on the $matches array, but rather copy the relevant values to a new one and return that array instead:$returnValue = [];//new array
foreach ($matches as $k => $v) {
if (!is_int($k)) {
$returnValue[$k] = $v;
}
}There are a couple of reasons for this:
- It's considered bad practice to change the array you're iterating over inside the loop. It can cause issues in certain cases, and it will definitely bite you if you decide to pick up another language
- PHP's memory management and copy-on-write mechanisms work well with code like the loop above: the new array will be assigned a reference to the value in
$matches, but once the function returns,$matchesis GC'ed. The values not referenced by$returnValuewill be GC'ed, the other values are then "owned" by the return array (that's not 100% accurate, but it's true enough for now)
- It's probably the most efficient (in terms of readability and execution time) approach.
Time to get silly
Just for the fun of it: you can opt for an inception-style
preg_match_all call on the regex you're passing to preg_match (regex matching on a regex... let's be honest, that sounds a tad absurd). It's silly, but it can be done:$pattern = '#^(?.*)$#';//changed ^ to $ at the end ;)
$str = '123';
$names = null;
if (preg_match_all('/(?]+)/', $pattern, $matches))
{//create an assoc array containing the match names
$names = array_fill_keys($matches[0], null);
}
$matches = null;
if (preg_match($pattern, $str, $matches))
{
if ($names) {
//gets only the named keys
$matches = array_intersect_key($matches, $names);
}
return $matches;
}
//throw exception, return null, or do something else hereNow, this is not exactly the way to go, but in some cases it might happen that you're processing a string, but what regex you apply to it can change depending on any number of reasons. In that case,
array_intersect_key is definitely worth a look, seeing as it only returns the keys that exist in all of the arguments you pass to it:class Foo
{
const DOMAIN_PATTERN = '/(?[^@\.]+)(?=\.)/';//or something
const EXTENSION = '/\.(?[a-z]{3,4})$/';
protected static $names = [
'domain' => null,
'extension' => null,
];
protected $mode = null;
public function setValidationOptions(array $options)
{//based on these options, one or more specific regex's will be applied to the data
$this->mode = $options;
return $this;
}
public function validateString($string)
{
$regex = $this->getPatterns();
$result = [];
foreach ($regex as $pattern) {
if (preg_match($string, $pattern, $matches)) {
$result = array_merge(
$result,
array_intersect_keys(
$matches,
static::$names
)
);
}
}
return $result;
}
}This is just a crude example of how you could use
array_intersect_key to handle regex matches with named sub-patternsCode Snippets
$names = ['foo' => null];
$pattern = '#^(?<foo>.*)$#';//changed ^ to $ at the end ;)
$str = '123';
if (preg_match($pattern, $str, $matches))
{//or return here
$matches = array_intersect_key($matches, $names);
}
return $matches;$returnValue = [];//new array
foreach ($matches as $k => $v) {
if (!is_int($k)) {
$returnValue[$k] = $v;
}
}$pattern = '#^(?<foo>.*)$#';//changed ^ to $ at the end ;)
$str = '123';
$names = null;
if (preg_match_all('/(?<=\?<)([^>]+)/', $pattern, $matches))
{//create an assoc array containing the match names
$names = array_fill_keys($matches[0], null);
}
$matches = null;
if (preg_match($pattern, $str, $matches))
{
if ($names) {
//gets only the named keys
$matches = array_intersect_key($matches, $names);
}
return $matches;
}
//throw exception, return null, or do something else hereclass Foo
{
const DOMAIN_PATTERN = '/(?<=@)(?<domain>[^@\.]+)(?=\.)/';//or something
const EXTENSION = '/\.(?<extension>[a-z]{3,4})$/';
protected static $names = [
'domain' => null,
'extension' => null,
];
protected $mode = null;
public function setValidationOptions(array $options)
{//based on these options, one or more specific regex's will be applied to the data
$this->mode = $options;
return $this;
}
public function validateString($string)
{
$regex = $this->getPatterns();
$result = [];
foreach ($regex as $pattern) {
if (preg_match($string, $pattern, $matches)) {
$result = array_merge(
$result,
array_intersect_keys(
$matches,
static::$names
)
);
}
}
return $result;
}
}Context
StackExchange Code Review Q#104817, answer score: 4
Revisions (0)
No revisions yet.