debugjavaMinor
Inefficient hash map operation provokes OutOfMemory: Java heap space error
Viewed 0 times
mapspaceerrorhashprovokesinefficientoperationoutofmemoryjavaheap
Problem
I know I can increase the size of the heap but that seems like a poor solution.
This program runs correctly on small files but when run on large data sets it crashes with the OutOfMemory: Java heap space error.
This code executes multiple hash map operations and I'm certain that the haphazard way in which I strung the data structures together is the root of this problem however I'm too inexperienced to think of a solution. I've never before had to think about runtime outside of answering questions about big O notation!
I'm trying to create a program to learn rules by inference, i.e.
I've reproduced the code here, it's not that long.
Machine learning component architecture:
```
private List sentences = new ArrayList<>();
/*
* The following maps store the relation of a string occurring
* as a subject or object, respectively, to the list of Sentence
* ordinals where they occur.
*/
private Map> subject2index = new HashMap<>();
private Map> object2index = new HashMap<>();
/*
* This set contains strings that occur as both,
* subject and object. This is useful for determining strings
* acting as an in-between connecting two relations.
*/
private Set joints = new HashSet<>();
public void addSentence( Sentence s )
{
// add Sentence to the list of all Sentences
sentences.add( s );
// add the Subject of the Sentence to the map mapping strings
// occurring as a subject to the ordinal of this Sentence
List subind = subject2index.get( s.getSubject() );
This program runs correctly on small files but when run on large data sets it crashes with the OutOfMemory: Java heap space error.
This code executes multiple hash map operations and I'm certain that the haphazard way in which I strung the data structures together is the root of this problem however I'm too inexperienced to think of a solution. I've never before had to think about runtime outside of answering questions about big O notation!
I'm trying to create a program to learn rules by inference, i.e.
'contains'('vitamin c', 'oranges')., 'prevents'('scurvy', 'vitamin c'). would yield the output "rule" 'prevents'('scurvy', 'oranges'). I have code which will produce that output but then I wanted to eliminate duplicate "rules" from the input while keeping track of the number of times they were observed (as a naive confidence measure, since frequently observed rules are more likely to be true), so I implemented a hash map which stores the "rule" as a key and the number of observed instances as the value. I've reproduced the code here, it's not that long.
Machine learning component architecture:
```
private List sentences = new ArrayList<>();
/*
* The following maps store the relation of a string occurring
* as a subject or object, respectively, to the list of Sentence
* ordinals where they occur.
*/
private Map> subject2index = new HashMap<>();
private Map> object2index = new HashMap<>();
/*
* This set contains strings that occur as both,
* subject and object. This is useful for determining strings
* acting as an in-between connecting two relations.
*/
private Set joints = new HashSet<>();
public void addSentence( Sentence s )
{
// add Sentence to the list of all Sentences
sentences.add( s );
// add the Subject of the Sentence to the map mapping strings
// occurring as a subject to the ordinal of this Sentence
List subind = subject2index.get( s.getSubject() );
Solution
A couple of overall comments.
- String concatenation is bad, bad, bad. Sure, compiler no longer creates tons of Strings, but an instance of
StringBuilderand an instance ofStringare created anyway. They get gc'ed eventually, sure, butString.formatdoes the same with less resources, so try to use it instead, especially in often-called methods.
- You create a new
Sentenceinside the third nested loop. Some of them (#verb x #object x #subject) are used as keys and don't get collected. How many instances is that? Default maximum heap size is 1/4 th of your machine memory or 1Gb, whichever is smaller; could it be so that there actually are thousands of instances cluttering the memory?
- At any rate, OOM means that you create and retain too much instances of objects. Do you really need them all? Maybe your algorithm could be simplified so it gets less memory-intensive?
- (Not related to OOM) in your
Sentence#equals()implementation you don't checkotherfor being null, nor do you check its fields for being null. That may once result in an NPE.
Context
StackExchange Code Review Q#78540, answer score: 2
Revisions (0)
No revisions yet.