patternjavaMinor
Reducing memory footprint when manipulating big csv file
Viewed 0 times
filemanipulatingcsvreducingbigmemorywhenfootprint
Problem
I have CSV file which has structure like this (separated by tabulators):
Examples of lines in input data:
VariableName DateAdded ValueNumeric
...
and I am changing it to form where there are all variables with same date in same line:
So lines in output data look like this:
```
datum,St.podatkovnih_blokov,NAD_krmilnika,K5_en_hlajenja_MWh_last_month,K5_en_hlajenja_MWh_this_year,K5_en_hlajenja_MWh_last_year,Rezerva_1,Run_Counter,negative_active_energy_today,negative_active_energy_yesterday,negative_active_energy_this_week,negative_active_energy_last_week,negative_active_energy_this_mont,negative_active_energy_last_mont,negative_active_energy_this_year,negative_active_energy_last_year,strosek_danes_EUR,strosek_vceraj_EUR,K1_en_gretja_kWh_today,K1_en_gretja_kWh_yesterday,K1_en_gretja_MWh_this_month,K1_en_gretja_MWh_last_month,K1_en_gretja_MWh_this_year,K1_en_gretja_MWh_last_year,K1_en_hlajenja_MWh_this_year,K2_en_gretja_kWh_today,K2_en_gretja_kWh_yesterday,K2_en_gretja_MWh_this_month,K2_en_gretja_MWh_last_month,K2_en_gretja_MWh_this_year,K2_en_gretja_MWh_last_year,K2_en_hlajenja_kWh_today,K2_en_hlajenja_kWh_yesterday,K2_en_hlajenja_MWh_this_month,K2_en_hlajenja_MWh_last_month,K2_en_h
VariableName DateAdded ValueNumeric
VariableName DateAdded ValueNumeric
VariableName DateAdded ValueNumericExamples of lines in input data:
VariableName DateAdded ValueNumeric
St.podatkovnih_blokov 1.12.2015 0:00:21 0,2000
St.podatkovnih_blokov 1.12.2015 0:01:15 0,2000
St.podatkovnih_blokov 1.12.2015 0:02:14 0,2000
.
.
.
St.podatkovnih_blokov 31.12.2015 10:08:02 0,2000
St.podatkovnih_blokov 31.12.2015 22:31:04 0,2000
NAD_krmilnika 1.12.2015 0:00:21 1310,2000
NAD_krmilnika 1.12.2015 0:01:15 1310,2000...
and I am changing it to form where there are all variables with same date in same line:
DateAdded,VariableName,VariableName,VariableName,VariableName,VariableName //this is header and below are values for variables in the header
DateAdded,ValueNumeric,ValueNumeric,ValueNumeric,ValueNumeric,ValueNumeric
DateAdded,ValueNumeric,ValueNumeric,ValueNumeric,ValueNumeric,ValueNumericSo lines in output data look like this:
```
datum,St.podatkovnih_blokov,NAD_krmilnika,K5_en_hlajenja_MWh_last_month,K5_en_hlajenja_MWh_this_year,K5_en_hlajenja_MWh_last_year,Rezerva_1,Run_Counter,negative_active_energy_today,negative_active_energy_yesterday,negative_active_energy_this_week,negative_active_energy_last_week,negative_active_energy_this_mont,negative_active_energy_last_mont,negative_active_energy_this_year,negative_active_energy_last_year,strosek_danes_EUR,strosek_vceraj_EUR,K1_en_gretja_kWh_today,K1_en_gretja_kWh_yesterday,K1_en_gretja_MWh_this_month,K1_en_gretja_MWh_last_month,K1_en_gretja_MWh_this_year,K1_en_gretja_MWh_last_year,K1_en_hlajenja_MWh_this_year,K2_en_gretja_kWh_today,K2_en_gretja_kWh_yesterday,K2_en_gretja_MWh_this_month,K2_en_gretja_MWh_last_month,K2_en_gretja_MWh_this_year,K2_en_gretja_MWh_last_year,K2_en_hlajenja_kWh_today,K2_en_hlajenja_kWh_yesterday,K2_en_hlajenja_MWh_this_month,K2_en_hlajenja_MWh_last_month,K2_en_h
Solution
Never catch exception
Catching the just the
In this situation, you should check if the index is valid, and if it isn't valid, then print a message.
Don't do string concatenation when working with a writer
You are doing string concatenation when writing, when doing this, a copy of the string is created in the memory, to be finally thrown away. Change the code to 2 different calls:
A stream should be closed in the same scope as where its opened
By closing a stream at the point where it is opened, you prevent any exception from preventing the closure.
Inconsistent order of modifiers
Sometimes you use "access modifier", "static" and other times, you do it the other way around. Having a default standard for these kind of things makes your code look better.
Inconsistent output to logger / output to stdout
Your code is inconsistent when it should output to a logger or to the system out, making code reuse very hard.
Everything's static
My making everything static, your code cannot be properly reused in other places, you should place everything inside a object, and every setting should passed in either via the constructor, or via a design pattern like the builder of factory.
Variables have unnecessary broad scopes
You variables have a unneeded global scope, by limiting the scope of variables to the places they are used makes your code nicer.
Use a imperative for loop instead of a iterator if your needing the index
Why not just a simple for loop?
catch(Exception ex)
{
System.out.println("index for lines to big: " + ex);
}Catching the just the
Exception class is a anti-pattern, you don't know what exceptions are going to be catch;t.In this situation, you should check if the index is valid, and if it isn't valid, then print a message.
Don't do string concatenation when working with a writer
private static void write(List records, Writer writer) throws IOException {
long start = System.currentTimeMillis();
for (String record: records) {
writer.write(record + "\n");
}
writer.flush();You are doing string concatenation when writing, when doing this, a copy of the string is created in the memory, to be finally thrown away. Change the code to 2 different calls:
writer.write(record);
writer.write(System.lineSeparator());A stream should be closed in the same scope as where its opened
try
{
FileWriter writer = new FileWriter(file);
System.out.print("Writing raw... ");
write(lines, writer);
} finally {
}By closing a stream at the point where it is opened, you prevent any exception from preventing the closure.
try(FileWriter writer = new FileWriter(file)) {
System.out.print("Writing raw... ");
write(lines, writer);
}Inconsistent order of modifiers
public static void main(String[] args) throws FileNotFoundException
static public void printLines()
static public void writeToOutputfile() throws IOException
private static void write(List records, Writer writer) throws IOException {
static void buildRestOfValues()Sometimes you use "access modifier", "static" and other times, you do it the other way around. Having a default standard for these kind of things makes your code look better.
Inconsistent output to logger / output to stdout
Logger.getLogger(ScadaParse.class.getName()).log(Level.SEVERE, null, ex);
System.out.println("ex in main loop: " + ex);Your code is inconsistent when it should output to a logger or to the system out, making code reuse very hard.
Everything's static
My making everything static, your code cannot be properly reused in other places, you should place everything inside a object, and every setting should passed in either via the constructor, or via a design pattern like the builder of factory.
Variables have unnecessary broad scopes
You variables have a unneeded global scope, by limiting the scope of variables to the places they are used makes your code nicer.
Use a imperative for loop instead of a iterator if your needing the index
int i = 0;
System.out.println("Printing lines");
for(String line : lines)
{
i++;
System.out.println("line " + i + ": " + line);
}Why not just a simple for loop?
int length = lines.size();
for(int i = 0; i < length; i++) {
System.out.println("line " + i + ": " + lines.get(i));
}Code Snippets
catch(Exception ex)
{
System.out.println("index for lines to big: " + ex);
}private static void write(List<String> records, Writer writer) throws IOException {
long start = System.currentTimeMillis();
for (String record: records) {
writer.write(record + "\n");
}
writer.flush();writer.write(record);
writer.write(System.lineSeparator());try
{
FileWriter writer = new FileWriter(file);
System.out.print("Writing raw... ");
write(lines, writer);
} finally {
}try(FileWriter writer = new FileWriter(file)) {
System.out.print("Writing raw... ");
write(lines, writer);
}Context
StackExchange Code Review Q#120146, answer score: 5
Revisions (0)
No revisions yet.