patternMinor
Advanced multi-conditonal split with an regex in Perl
Viewed 0 times
multiperlwithadvancedconditonalsplitregex
Problem
I would like to split a string on multiple conditions such as:
I've found that I can keep a pattern and split before it with
or split after it (if the pattern has a fixed width)
or split after it (if the pattern has a variable width)
or even remove the pattern during the operation
With all this knowledge I wrote this:
Where the output should be this (after manually reindenting it):
I am not really happy with this method for multiple reasons:
Can I do a better job?
- After each
/;/
- After each
/{/or/}/
- After each
/\w+:/but not after/\w+:\s+\{/
- After each
/#\w.*$/
I've found that I can keep a pattern and split before it with
/(?=pattern)/or split after it (if the pattern has a fixed width)
/(?<=pattern)/or split after it (if the pattern has a variable width)
/pattern\K/or even remove the pattern during the operation
/pattern/With all this knowledge I wrote this:
#!/usr/bin/perl
$_ = do { local $/; };
s/\#\w.+\n\K|\n//g;
my @content = split /(?:(?<=[;{}])|(?<=:)(?!\s*\{)|#\w.*\$\K)/, $_;
print join "\n", @content;
__DATA__
carrot;
#orange
apple: {pear; { cabbage; } }
#passion
sprout: celeri;
tomato;Where the output should be this (after manually reindenting it):
carrot;
#orange
apple: {
pear;
{
cabbage;
}
}
#passion
sprout:
celeri;
tomato;
I am not really happy with this method for multiple reasons:
- I cannot use a
xregex in split likesplit m/re/xto make the regex more readable
- I need to treat the exception for the special case
/^\s#.$/where I cannot remove CR otherwise I will get for instance#passionsprout:
Can I do a better job?
Solution
Think the other way around: try to add a newline to all characters that need one. That way your regex is a bit simpler.
#!/usr/bin/perl -w
use strict;
my $content = do { local $/; };
my $regex = qr{
(?m)
(^
|
:\s*
)
\{ # open curly brace preceded by
# beginning of line
# OR
# colon
| # OR
[:;\}] # any of these characters
(?!\s*\n) # NOT followed by newline
}x;
$content =~ s/($regex)/$1\n/g;
print $content, "\n";Code Snippets
#!/usr/bin/perl -w
use strict;
my $content = do { local $/; <DATA> };
my $regex = qr{
(?m)
(^
|
:\s*
)
\{ # open curly brace preceded by
# beginning of line
# OR
# colon
| # OR
[:;\}] # any of these characters
(?!\s*\n) # NOT followed by newline
}x;
$content =~ s/($regex)/$1\n/g;
print $content, "\n";Context
StackExchange Code Review Q#94931, answer score: 4
Revisions (0)
No revisions yet.