patternMinor

Performance optimization in function for datastructure mapping

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

functionoptimizationdatastructureforperformancemapping

Problem

I want to optimize a Perl function which is frequently used in my application. The function creates a special datastructure from the results of DBI::fetchall_arrayref which looks like:

$columns = ['COLNAME_1','COLNAME_2','COLNAME_3']
$rows    = [ ['row_1_col_1', 'row_1_col_2', 'row_1_col_3'],
             ['row_2_col_1', 'row_2_col_2', 'row_2_col_3'],
             ['row_3_col_1', 'row_3_col_2', 'row_3_col_3']
];

The new datastructure must contain the data in the following form (all row-values for every column in a single arrayref)

$retval = { 
    row_count => 3,
    col_count => 3,
    COLNAME_1 => ['row_1_col_1', 'row_2_col_1', 'row_3_col_1' ],
    COLNAME_2 => ['row_1_col_2', 'row_2_col_2', 'row_3_col_2' ],
    COLNAME_3 => ['row_1_col_3', 'row_2_col_3', 'row_3_col_3' ]
}

The new datastructure is a Hash of Arrays and is used in the whole application. I cannot change the format (its too frequently used). I wrote a function for this conversion. I've already done some some performance optimization after profiling my application. But it's not enough. Now the function looks like:

sub reorganize($) {
    my ($self,$columns,$rows) = @_;
    my $col_count = scalar(@$columns);
    my $row_count = scalar(@$rows);
    my $col_index = 0;
    my $row_index = 0;
    my $retval = {  # new datastructure
        row_count   => $row_count,
        col_count   => $col_count    
    };

    # iterate through all columns
    for($col_index=0; $col_index[$row_index] = $rows->[$row_index][$col_index];            
        }
        # Assign the arrayref to the hash. The hash-key is the name of the column
        $retval->{$columns->[$col_index]} = $tmp;
    }
    return $retval;
}

My Question:

Is there a way to further optimize this function (maybe using $[...])? I found some hints here at page 18 and 19, but I don't have any experience in using $ in different contexts.

I have to say that the function listed above is the best I can do. There may be

Solution

The following code is about 35% faster (measured with Benchmark). The tricks:

-
no anonymous array created for $tmp.

-
explicit return removed.

-
variables created in place where their value is needed.

Some of the tricks added just a 3%, the first one seemed the most important. YMMV.

I experimented with $_ and maps, too, but it seems the plain old C-style loop is the fastest.

sub faster {
    my ($self, $columns, $rows) = @_;
    my $retval = {
        row_count   => my $row_count = @$rows,
        col_count   => my $col_count = @$columns,
    };
    for (my $col_index = 0 ; $col_index [$row_index] = $rows->[$row_index][$col_index];
        }
        $retval->{$columns->[$col_index]} = $tmp;
    }
    $retval
}

Code Snippets

sub faster {
    my ($self, $columns, $rows) = @_;
    my $retval = {
        row_count   => my $row_count = @$rows,
        col_count   => my $col_count = @$columns,
    };
    for (my $col_index = 0 ; $col_index < $col_count ; $col_index++) {
        my $tmp;
        for (my $row_index = 0 ; $row_index < $row_count ; $row_index++) {
            $tmp->[$row_index] = $rows->[$row_index][$col_index];
        }
        $retval->{$columns->[$col_index]} = $tmp;
    }
    $retval
}

Context

StackExchange Code Review Q#44059, answer score: 3

Revisions (0)

No revisions yet.