patternjavaMinor
Loading, computation and writing 500.000 rows in database
Viewed 0 times
rowscomputation000writingdatabase500loadingand
Problem
first let me explain what I am trying to achieve here :
I am supposed to compute some results, using data from several tables, filtered by a run_id.
One of these tables has about 500.000 rows per run_id, and 60 columns.
All other tables have only a few thousand rows per run_id.
I also have to write about 500.000 rows of data in a table with only 4 columns.
So, I first tried to load all these tables at once, which failed with an OutOfMemoryError because of the big table (called
So, I decided to do some batch select to load the data from EPE. However, depending on the 'business' of the database,
the whole process (load, compute, write) can take between 15 and 30 minutes.
Can I improve my code to make it faster ?
Here is my code, slightly reworked, so it might have some typos. Please let me know.
Main class :
```
@Component
public class Computation {
public static final int MAX_RESULTS = 50000;
public static final String SPECIAL = "207";
Map dfs;
Map pds;
Map les;
@Autowired
@Qualifier("curveDAO")
CurveAccess curveDAO;
@Autowired
@Qualifier("dealDAO")
DealAccess dealDAO;
@Autowired
@Qualifier("entityDAO")
EntityAccess entityDAO;
@Autowired
@Qualifier("resultDAO")
ResultAccess resultDAO;
@Autowired
@Qualifier("traceDAO")
TraceAccess traceDAO;
@Autowired
Runs runs;
public Computation() {
}
public Run initialize(Long runId) {
Run run = runs.getRun(runId);
resultDAO.purge(run);
dfs = curveDAO.getDfs(run);
pds = curveDAO.getPds(run);
ent = entityDAO.getByRun(run);
return run;
}
public Map, Curve> prepareEpe(Run run, int start, int max) {
Map, Curve> epe = new HashMap, Curve>();
epe = dealDAO.getEpe(run, start, max);
return epe;
}
public void calculateResult(Run run, Map, Curve> epes) {
List results = new ArrayList();
Curve pdSPECIAL = pds
I am supposed to compute some results, using data from several tables, filtered by a run_id.
One of these tables has about 500.000 rows per run_id, and 60 columns.
All other tables have only a few thousand rows per run_id.
I also have to write about 500.000 rows of data in a table with only 4 columns.
So, I first tried to load all these tables at once, which failed with an OutOfMemoryError because of the big table (called
EPE from now on).So, I decided to do some batch select to load the data from EPE. However, depending on the 'business' of the database,
the whole process (load, compute, write) can take between 15 and 30 minutes.
Can I improve my code to make it faster ?
Here is my code, slightly reworked, so it might have some typos. Please let me know.
Main class :
```
@Component
public class Computation {
public static final int MAX_RESULTS = 50000;
public static final String SPECIAL = "207";
Map dfs;
Map pds;
Map les;
@Autowired
@Qualifier("curveDAO")
CurveAccess curveDAO;
@Autowired
@Qualifier("dealDAO")
DealAccess dealDAO;
@Autowired
@Qualifier("entityDAO")
EntityAccess entityDAO;
@Autowired
@Qualifier("resultDAO")
ResultAccess resultDAO;
@Autowired
@Qualifier("traceDAO")
TraceAccess traceDAO;
@Autowired
Runs runs;
public Computation() {
}
public Run initialize(Long runId) {
Run run = runs.getRun(runId);
resultDAO.purge(run);
dfs = curveDAO.getDfs(run);
pds = curveDAO.getPds(run);
ent = entityDAO.getByRun(run);
return run;
}
public Map, Curve> prepareEpe(Run run, int start, int max) {
Map, Curve> epe = new HashMap, Curve>();
epe = dealDAO.getEpe(run, start, max);
return epe;
}
public void calculateResult(Run run, Map, Curve> epes) {
List results = new ArrayList();
Curve pdSPECIAL = pds
Solution
Naming, naming, naming...
Your code is very hard to read -
Break it down
Your methods are very long (
Refactor your methods to be shorter, with expressive names, that will make your code more readable, more testable, and more optimizeable.
Distribute the load
As @Dan suggested in his comments - you should consider spreading the load on several threads/CPUs/machines...
As far as I could see, your calculations are independent of each other, so you could simply arbitrarily split the load on several instances (let's say take over 9 of your friends' machines, and give each machine all the run ids ending with a specific digit...).
This should give you linear scale (up to the point where writing to the DB is your bottleneck).
Optimizations for one-time code
Perhaps I misunderstood your explanation, but it seems that this code is meant to run once. If that is the case, why do you care if it runs 15 minutes, or 8 hours?
Just run it and get on with your life!
Your code is very hard to read -
ntt, epe, lgd, SPECIAL = "207"(?!?). I understand (hope?) that some of these names are part of your business nomenclature, but, especially when asking strangers to read your code for optimization, giving meaningful names can go a long way. Asking me to figure out how to optimize a loop with result += epe discountFactor probaDown * (1 - probaDownSPECIAL) would be hard enough if the names there meant something to me...Break it down
Your methods are very long (
calculateResult is 67 lines long...) this is a code smell, which also makes your code hard to read, as well as hides the structure of your flow, and possible complexity issues.Refactor your methods to be shorter, with expressive names, that will make your code more readable, more testable, and more optimizeable.
Distribute the load
As @Dan suggested in his comments - you should consider spreading the load on several threads/CPUs/machines...
As far as I could see, your calculations are independent of each other, so you could simply arbitrarily split the load on several instances (let's say take over 9 of your friends' machines, and give each machine all the run ids ending with a specific digit...).
This should give you linear scale (up to the point where writing to the DB is your bottleneck).
Optimizations for one-time code
Perhaps I misunderstood your explanation, but it seems that this code is meant to run once. If that is the case, why do you care if it runs 15 minutes, or 8 hours?
Just run it and get on with your life!
Context
StackExchange Code Review Q#45933, answer score: 2
Revisions (0)
No revisions yet.