HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Parsing text from reports

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
fromreportstextparsing

Problem

I want to parse some reports from multiple devices, reports looks like this:

VR            Destination      Mac                Age  Static  VLAN          VID   Port
VR-Default    192.168.11.13    90:e2:ba:3c:95:c0    2      NO  intra1        350   49
VR-Default    192.168.1.1      00:0e:a6:f7:b6:b5    0      NO  main          602   1
VR-Default    192.168.1.2      00:0d:88:63:bf:d1    3      NO  main          602   1
VR-Default    192.168.1.14     00:1c:f0:c7:d2:52    4      NO  main          602   1
etc...
Dynamic Entries  :          19             Static Entries            :          0
Pending Entries  :           1
In Request       :     3888802             In Response               :       4531
and some more data...
Rx Error         :           0             Dup IP Addr               :         0.0.0.0
and some more...


I need only vr, destination, mac, age, static, vlan, vid and port fields.
I can parse it using split function and regexes, but split fails if one field (e.g. Age) is empty.
perldoc says I can use unpack:

my $template = 'A13xA16xA18xA4xA7xA13xA5xA*';    
for my $line ( split /\n/, $data ) {
   chomp $line;
   my ($vr, $destination, $mac, $age, $static, $vlan, $vid, $port) = unpack $template, $line;
...
}


But it dies on lines with length < 84. And I got to check string length every time (Or maybe using eval on unpack? Is it better?). And again I got to use regexes or index to find the end of main table and skip headers.
The code will looks like:

```
#!/usr/bin/perl
use strict;
use warnings;
my $arp = <<'ARP';
VR Destination Mac Age Static VLAN VID Port
VR-Default 192.168.11.13 90:e2:ba:3c:95:c0 2 NO intra1 350 49
VR-Default 192.168.1.1 00:0e:a6:f7:b6:b5 0 NO main 602 1
VR-Default 192.168.1.2 00:0d:88:63:bf:d1 3 NO main 602 1
VR-Default 192.168.1.14 00:1c:f0:c7:d2:52 4 NO main

Solution

If a problem has been encountered before by someone else, chances are that there is a CPAN module for that (DataExtract::FixedWidth).

If you don't want to use a CPAN module, then my next choice would be to use regular expressions.

use strict;

# Strips leading and trailing whitespace from all parameters
sub strip {
    for (@_) { s/^\s+//; s/\s+$//; }
    @_;
}

# Extracts data from lines of text in tabular format.
#
# First parameter is a regular expression for capturing fixed-width fields.
#
# Subsequent parameters are the lines of tabular data, the first of which holds
# the column headings.  Any line that does not match the regular expression,
# as well as subsequent lines, are discarded.
#
# Returns a list (one element per input line) of hashes (keyed by column names).
sub extract_table {
    my ($fmt, $first_line) = (shift, shift);

    my (@headers) = strip($first_line =~ $fmt);

    my @table;
    for my $line (@_) {
        my (@fields) = $line =~ $fmt;
        last unless @fields;

        my %data;
        @data{@headers} = strip(@fields);
        push @table, \%data;
    }
    return @table;
}

my $fmt = qr/^(.{14})(.{17})(.{19})(.{5})(.{8})(.{14})(.{6})(.*)/;

# Take lines of input from a reasonable source (STDIN or a filename
# argument on the command line)
my @table = extract_table($fmt, <>);

use Data::Dumper;
print Dumper(\@table);


Note that chomp() is unnecessary since we're stripping whitespace characters anyway.

Code Snippets

use strict;

# Strips leading and trailing whitespace from all parameters
sub strip {
    for (@_) { s/^\s+//; s/\s+$//; }
    @_;
}

# Extracts data from lines of text in tabular format.
#
# First parameter is a regular expression for capturing fixed-width fields.
#
# Subsequent parameters are the lines of tabular data, the first of which holds
# the column headings.  Any line that does not match the regular expression,
# as well as subsequent lines, are discarded.
#
# Returns a list (one element per input line) of hashes (keyed by column names).
sub extract_table {
    my ($fmt, $first_line) = (shift, shift);

    my (@headers) = strip($first_line =~ $fmt);

    my @table;
    for my $line (@_) {
        my (@fields) = $line =~ $fmt;
        last unless @fields;

        my %data;
        @data{@headers} = strip(@fields);
        push @table, \%data;
    }
    return @table;
}

my $fmt = qr/^(.{14})(.{17})(.{19})(.{5})(.{8})(.{14})(.{6})(.*)/;

# Take lines of input from a reasonable source (STDIN or a filename
# argument on the command line)
my @table = extract_table($fmt, <>);

use Data::Dumper;
print Dumper(\@table);

Context

StackExchange Code Review Q#33859, answer score: 2

Revisions (0)

No revisions yet.