patternMinor
Parsing text from reports
Viewed 0 times
fromreportstextparsing
Problem
I want to parse some reports from multiple devices, reports looks like this:
I need only vr, destination, mac, age, static, vlan, vid and port fields.
I can parse it using
perldoc says I can use
But it dies on lines with length < 84. And I got to check string length every time (Or maybe using
The code will looks like:
```
#!/usr/bin/perl
use strict;
use warnings;
my $arp = <<'ARP';
VR Destination Mac Age Static VLAN VID Port
VR-Default 192.168.11.13 90:e2:ba:3c:95:c0 2 NO intra1 350 49
VR-Default 192.168.1.1 00:0e:a6:f7:b6:b5 0 NO main 602 1
VR-Default 192.168.1.2 00:0d:88:63:bf:d1 3 NO main 602 1
VR-Default 192.168.1.14 00:1c:f0:c7:d2:52 4 NO main
VR Destination Mac Age Static VLAN VID Port
VR-Default 192.168.11.13 90:e2:ba:3c:95:c0 2 NO intra1 350 49
VR-Default 192.168.1.1 00:0e:a6:f7:b6:b5 0 NO main 602 1
VR-Default 192.168.1.2 00:0d:88:63:bf:d1 3 NO main 602 1
VR-Default 192.168.1.14 00:1c:f0:c7:d2:52 4 NO main 602 1
etc...
Dynamic Entries : 19 Static Entries : 0
Pending Entries : 1
In Request : 3888802 In Response : 4531
and some more data...
Rx Error : 0 Dup IP Addr : 0.0.0.0
and some more...I need only vr, destination, mac, age, static, vlan, vid and port fields.
I can parse it using
split function and regexes, but split fails if one field (e.g. Age) is empty.perldoc says I can use
unpack:my $template = 'A13xA16xA18xA4xA7xA13xA5xA*';
for my $line ( split /\n/, $data ) {
chomp $line;
my ($vr, $destination, $mac, $age, $static, $vlan, $vid, $port) = unpack $template, $line;
...
}But it dies on lines with length < 84. And I got to check string length every time (Or maybe using
eval on unpack? Is it better?). And again I got to use regexes or index to find the end of main table and skip headers.The code will looks like:
```
#!/usr/bin/perl
use strict;
use warnings;
my $arp = <<'ARP';
VR Destination Mac Age Static VLAN VID Port
VR-Default 192.168.11.13 90:e2:ba:3c:95:c0 2 NO intra1 350 49
VR-Default 192.168.1.1 00:0e:a6:f7:b6:b5 0 NO main 602 1
VR-Default 192.168.1.2 00:0d:88:63:bf:d1 3 NO main 602 1
VR-Default 192.168.1.14 00:1c:f0:c7:d2:52 4 NO main
Solution
If a problem has been encountered before by someone else, chances are that there is a CPAN module for that (
If you don't want to use a CPAN module, then my next choice would be to use regular expressions.
Note that
DataExtract::FixedWidth).If you don't want to use a CPAN module, then my next choice would be to use regular expressions.
use strict;
# Strips leading and trailing whitespace from all parameters
sub strip {
for (@_) { s/^\s+//; s/\s+$//; }
@_;
}
# Extracts data from lines of text in tabular format.
#
# First parameter is a regular expression for capturing fixed-width fields.
#
# Subsequent parameters are the lines of tabular data, the first of which holds
# the column headings. Any line that does not match the regular expression,
# as well as subsequent lines, are discarded.
#
# Returns a list (one element per input line) of hashes (keyed by column names).
sub extract_table {
my ($fmt, $first_line) = (shift, shift);
my (@headers) = strip($first_line =~ $fmt);
my @table;
for my $line (@_) {
my (@fields) = $line =~ $fmt;
last unless @fields;
my %data;
@data{@headers} = strip(@fields);
push @table, \%data;
}
return @table;
}
my $fmt = qr/^(.{14})(.{17})(.{19})(.{5})(.{8})(.{14})(.{6})(.*)/;
# Take lines of input from a reasonable source (STDIN or a filename
# argument on the command line)
my @table = extract_table($fmt, <>);
use Data::Dumper;
print Dumper(\@table);Note that
chomp() is unnecessary since we're stripping whitespace characters anyway.Code Snippets
use strict;
# Strips leading and trailing whitespace from all parameters
sub strip {
for (@_) { s/^\s+//; s/\s+$//; }
@_;
}
# Extracts data from lines of text in tabular format.
#
# First parameter is a regular expression for capturing fixed-width fields.
#
# Subsequent parameters are the lines of tabular data, the first of which holds
# the column headings. Any line that does not match the regular expression,
# as well as subsequent lines, are discarded.
#
# Returns a list (one element per input line) of hashes (keyed by column names).
sub extract_table {
my ($fmt, $first_line) = (shift, shift);
my (@headers) = strip($first_line =~ $fmt);
my @table;
for my $line (@_) {
my (@fields) = $line =~ $fmt;
last unless @fields;
my %data;
@data{@headers} = strip(@fields);
push @table, \%data;
}
return @table;
}
my $fmt = qr/^(.{14})(.{17})(.{19})(.{5})(.{8})(.{14})(.{6})(.*)/;
# Take lines of input from a reasonable source (STDIN or a filename
# argument on the command line)
my @table = extract_table($fmt, <>);
use Data::Dumper;
print Dumper(\@table);Context
StackExchange Code Review Q#33859, answer score: 2
Revisions (0)
No revisions yet.