HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Advanced multi-conditonal split with an regex in Perl

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
multiperlwithadvancedconditonalsplitregex

Problem

I would like to split a string on multiple conditions such as:

  • After each /;/



  • After each /{/ or /}/



  • After each /\w+:/ but not after /\w+:\s+\{/



  • After each /#\w.*$/



I've found that I can keep a pattern and split before it with

/(?=pattern)/


or split after it (if the pattern has a fixed width)

/(?<=pattern)/


or split after it (if the pattern has a variable width)

/pattern\K/


or even remove the pattern during the operation

/pattern/


With all this knowledge I wrote this:

#!/usr/bin/perl    
$_ = do { local $/;  };
s/\#\w.+\n\K|\n//g;

my @content = split /(?:(?<=[;{}])|(?<=:)(?!\s*\{)|#\w.*\$\K)/, $_;

print join "\n", @content;

__DATA__
carrot;
#orange
apple: {pear; { cabbage; } }
#passion
sprout: celeri;
tomato;


Where the output should be this (after manually reindenting it):

carrot;
#orange
apple: {
pear;
{
cabbage;
}
}
#passion
sprout:
celeri;
tomato;


I am not really happy with this method for multiple reasons:

  • I cannot use a x regex in split like split m/re/x to make the regex more readable



  • I need to treat the exception for the special case /^\s#.$/ where I cannot remove CR otherwise I will get for instance #passionsprout:



Can I do a better job?

Solution

Think the other way around: try to add a newline to all characters that need one. That way your regex is a bit simpler.

#!/usr/bin/perl -w

use strict;

my $content = do { local $/;  };

my $regex = qr{
  (?m)
    (^
     |
     :\s*
    )
    \{                  # open curly brace preceded by 
                        # beginning of line
                        # OR
                        # colon

  |                     # OR
    [:;\}]              # any of these characters
    (?!\s*\n)           # NOT followed by newline
}x;

$content =~ s/($regex)/$1\n/g;
print $content, "\n";

Code Snippets

#!/usr/bin/perl -w

use strict;

my $content = do { local $/; <DATA> };

my $regex = qr{
  (?m)
    (^
     |
     :\s*
    )
    \{                  # open curly brace preceded by 
                        # beginning of line
                        # OR
                        # colon

  |                     # OR
    [:;\}]              # any of these characters
    (?!\s*\n)           # NOT followed by newline
}x;

$content =~ s/($regex)/$1\n/g;
print $content, "\n";

Context

StackExchange Code Review Q#94931, answer score: 4

Revisions (0)

No revisions yet.