HiveBrain v1.2.0
Get Started
← Back to all entries
patternphpMinor

Capturing optional regex segment with PHP

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
capturingwithphpsegmentoptionalregex

Problem

I need to check the end of a URL for the possible existence of /news_archive or /news_archive/5 in PHP. The below snippet does exactly what I want, but I know that I could achieve this with one preg_match rather than two. How can I improve this code to treat the /5 as an optional segment and capture it if it exists?

if (preg_match('~/[0-9A-Za-z_-]+_archive/[0-9]+$~', $_SERVER['HTTP_REFERER'], $matches) || preg_match('~/[0-9A-Za-z_-]+_archive$~', $_SERVER['HTTP_REFERER'], $matches)) {
    $page_info['parent_page']['page_label'] = ltrim($matches[0], '/');
}

Solution

Consider your first pattern:

~/[0-9A-Za-z_-]+_archive/[0-9]+$~


Let's break it down:

  • / a literal string /



  • [0-9A-Za-z_-]+ one or more of 0-9, A-Z, a-z, _ or -



  • _archive a literal string _archive



  • / literal slash again



  • [0-9]+ one or more digits



  • $ the end of the string must follow the one or more digits



So basically you want to make #4 and #5 optional. To be more specific, you want either both 4 and 5, or neither 4 nor 5.

Consider this:

(a[b]+)?

This means that you have one a followed by one or more b, and that this grouped a/b entity is optional.

Letting a be #4 and b be digits like in #5, we're left with:

(/[0-9]+)?

Or:

~/[0-9A-Za-z_-]+_archive(/[0-9]+)?$~


This will capture the entire group though, like /5:

php -r "preg_match('~/[0-9A-Za-z_-]+_archive(/([0-9]+))?$~', '/news_archive/5', $m); var_dump($m);"
array(2) {
  [0] =>
  string(15) "/news_archive/5"
  [1] =>
  string(2) "/5"
}


You can just add another group to remedy that though:

~/[0-9A-Za-z_-]+_archive(/([0-9]+))?$~


Example:

php -r "preg_match('~/[0-9A-Za-z_-]+_archive(/([0-9]+))?$~', '/news_archive/44', $m); var_dump($m);"
array(3) {
  [0] =>
  string(16) "/news_archive/44"
  [1] =>
  string(3) "/44"
  [2] =>
  string(2) "44"
}


You could technically make the outside group a non-capturing group (like (?:/([0-9]+))?), but I don't think the added complication is worth not grabbing the / part too.

(By the way, sorry if you're familiar with regex and you found this excessive. I tend to take a very verbose approach to any regex related question :).)

Code Snippets

~/[0-9A-Za-z_-]+_archive/[0-9]+$~
~/[0-9A-Za-z_-]+_archive(/[0-9]+)?$~
php -r "preg_match('~/[0-9A-Za-z_-]+_archive(/([0-9]+))?$~', '/news_archive/5', $m); var_dump($m);"
array(2) {
  [0] =>
  string(15) "/news_archive/5"
  [1] =>
  string(2) "/5"
}
~/[0-9A-Za-z_-]+_archive(/([0-9]+))?$~
php -r "preg_match('~/[0-9A-Za-z_-]+_archive(/([0-9]+))?$~', '/news_archive/44', $m); var_dump($m);"
array(3) {
  [0] =>
  string(16) "/news_archive/44"
  [1] =>
  string(3) "/44"
  [2] =>
  string(2) "44"
}

Context

StackExchange Code Review Q#16230, answer score: 3

Revisions (0)

No revisions yet.