HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaMinor

Reducing memory footprint when manipulating big csv file

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
filemanipulatingcsvreducingbigmemorywhenfootprint

Problem

I have CSV file which has structure like this (separated by tabulators):

VariableName    DateAdded   ValueNumeric
VariableName    DateAdded   ValueNumeric
VariableName    DateAdded   ValueNumeric


Examples of lines in input data:
VariableName DateAdded ValueNumeric

St.podatkovnih_blokov   1.12.2015 0:00:21   0,2000
St.podatkovnih_blokov   1.12.2015 0:01:15   0,2000
St.podatkovnih_blokov   1.12.2015 0:02:14   0,2000
.
.
.
St.podatkovnih_blokov   31.12.2015 10:08:02 0,2000
St.podatkovnih_blokov   31.12.2015 22:31:04 0,2000
NAD_krmilnika   1.12.2015 0:00:21   1310,2000
NAD_krmilnika   1.12.2015 0:01:15   1310,2000


...
and I am changing it to form where there are all variables with same date in same line:

DateAdded,VariableName,VariableName,VariableName,VariableName,VariableName //this is header and below are values for variables in the header
DateAdded,ValueNumeric,ValueNumeric,ValueNumeric,ValueNumeric,ValueNumeric
DateAdded,ValueNumeric,ValueNumeric,ValueNumeric,ValueNumeric,ValueNumeric


So lines in output data look like this:

```
datum,St.podatkovnih_blokov,NAD_krmilnika,K5_en_hlajenja_MWh_last_month,K5_en_hlajenja_MWh_this_year,K5_en_hlajenja_MWh_last_year,Rezerva_1,Run_Counter,negative_active_energy_today,negative_active_energy_yesterday,negative_active_energy_this_week,negative_active_energy_last_week,negative_active_energy_this_mont,negative_active_energy_last_mont,negative_active_energy_this_year,negative_active_energy_last_year,strosek_danes_EUR,strosek_vceraj_EUR,K1_en_gretja_kWh_today,K1_en_gretja_kWh_yesterday,K1_en_gretja_MWh_this_month,K1_en_gretja_MWh_last_month,K1_en_gretja_MWh_this_year,K1_en_gretja_MWh_last_year,K1_en_hlajenja_MWh_this_year,K2_en_gretja_kWh_today,K2_en_gretja_kWh_yesterday,K2_en_gretja_MWh_this_month,K2_en_gretja_MWh_last_month,K2_en_gretja_MWh_this_year,K2_en_gretja_MWh_last_year,K2_en_hlajenja_kWh_today,K2_en_hlajenja_kWh_yesterday,K2_en_hlajenja_MWh_this_month,K2_en_hlajenja_MWh_last_month,K2_en_h

Solution

Never catch exception

catch(Exception ex)
        {
            System.out.println("index for lines to big: " + ex);
        }


Catching the just the Exception class is a anti-pattern, you don't know what exceptions are going to be catch;t.

In this situation, you should check if the index is valid, and if it isn't valid, then print a message.

Don't do string concatenation when working with a writer

private static void write(List records, Writer writer) throws IOException {
    long start = System.currentTimeMillis();
    for (String record: records) {
        writer.write(record + "\n");
    }
    writer.flush();


You are doing string concatenation when writing, when doing this, a copy of the string is created in the memory, to be finally thrown away. Change the code to 2 different calls:

writer.write(record);
writer.write(System.lineSeparator());


A stream should be closed in the same scope as where its opened

try 
        {
            FileWriter writer = new FileWriter(file);
            System.out.print("Writing raw... ");
            write(lines, writer);
        } finally {
        }


By closing a stream at the point where it is opened, you prevent any exception from preventing the closure.

try(FileWriter writer = new FileWriter(file)) {
            System.out.print("Writing raw... ");
            write(lines, writer);
        }


Inconsistent order of modifiers

public static void main(String[] args) throws FileNotFoundException 
static public void printLines()
static public void writeToOutputfile() throws IOException
private static void write(List records, Writer writer) throws IOException {
static void buildRestOfValues()


Sometimes you use "access modifier", "static" and other times, you do it the other way around. Having a default standard for these kind of things makes your code look better.

Inconsistent output to logger / output to stdout

Logger.getLogger(ScadaParse.class.getName()).log(Level.SEVERE, null, ex);
System.out.println("ex in main loop: " + ex);


Your code is inconsistent when it should output to a logger or to the system out, making code reuse very hard.

Everything's static

My making everything static, your code cannot be properly reused in other places, you should place everything inside a object, and every setting should passed in either via the constructor, or via a design pattern like the builder of factory.

Variables have unnecessary broad scopes

You variables have a unneeded global scope, by limiting the scope of variables to the places they are used makes your code nicer.

Use a imperative for loop instead of a iterator if your needing the index

int i = 0;
        System.out.println("Printing lines");
        for(String line : lines)
        {
            i++;
            System.out.println("line " + i + ": " + line);
        }


Why not just a simple for loop?

int length = lines.size();
for(int i = 0; i < length; i++) {
    System.out.println("line " + i + ": " + lines.get(i));
}

Code Snippets

catch(Exception ex)
        {
            System.out.println("index for lines to big: " + ex);
        }
private static void write(List<String> records, Writer writer) throws IOException {
    long start = System.currentTimeMillis();
    for (String record: records) {
        writer.write(record + "\n");
    }
    writer.flush();
writer.write(record);
writer.write(System.lineSeparator());
try 
        {
            FileWriter writer = new FileWriter(file);
            System.out.print("Writing raw... ");
            write(lines, writer);
        } finally {
        }
try(FileWriter writer = new FileWriter(file)) {
            System.out.print("Writing raw... ");
            write(lines, writer);
        }

Context

StackExchange Code Review Q#120146, answer score: 5

Revisions (0)

No revisions yet.