patternjavaMinor
Algorithm to return unique values from unsorted input
Viewed 0 times
uniquereturnunsortedinputalgorithmvaluesfrom
Problem
I need to create an efficient algorithm that returns unique values from an unsorted input. I don't know the length of the input.
As the function that will call this algorithm can abort the reading at any time, I think that using a well defined
Today, I am using a
The code below is my today's working algorithm:
```
import java.util.Iterator;
import java.util.HashSet;
import java.util.Set;
import java.util.NoSuchElementException;
import java.io.BufferedReader;
import java.io.StringReader;
import java.io.IOException;
public class UniqueValues implements Iterable {
private final Iterator iterator;
public UniqueValues(BufferedReader r) {
this.iterator = new UniqueValuesIterator(r);
}
public Iterator iterator() {
return iterator;
}
static class UniqueValuesIterator implements Iterator {
private final BufferedReader r;
private final Set values = new HashSet<>();
// When 'next' is null, need to get the next value
private String next;
public UniqueValuesIterator(BufferedReader r) {
this.r = r;
}
public boolean hasNext() {
// Good point from OldCurmudgeon
if(next != null) return true;
try {
String line;
while((line = r.readLine()) != null) {
if(values.add(line)) { // add() returns 'true' when it is not a duplicate value.
next = line;
return true;
}
}
} catch(IOException e) { }
return false;
}
public String next() {
if(next == null) {
if(! hasNext() ) throw new NoSuchEleme
As the function that will call this algorithm can abort the reading at any time, I think that using a well defined
Iterable implementation is the right way, so I will not waste extra processing power for the uneeded input.Today, I am using a
Set to keep track of the values I've already read. But I don't know if this is the most efficient algorithm, as my input length can be huge.The code below is my today's working algorithm:
```
import java.util.Iterator;
import java.util.HashSet;
import java.util.Set;
import java.util.NoSuchElementException;
import java.io.BufferedReader;
import java.io.StringReader;
import java.io.IOException;
public class UniqueValues implements Iterable {
private final Iterator iterator;
public UniqueValues(BufferedReader r) {
this.iterator = new UniqueValuesIterator(r);
}
public Iterator iterator() {
return iterator;
}
static class UniqueValuesIterator implements Iterator {
private final BufferedReader r;
private final Set values = new HashSet<>();
// When 'next' is null, need to get the next value
private String next;
public UniqueValuesIterator(BufferedReader r) {
this.r = r;
}
public boolean hasNext() {
// Good point from OldCurmudgeon
if(next != null) return true;
try {
String line;
while((line = r.readLine()) != null) {
if(values.add(line)) { // add() returns 'true' when it is not a duplicate value.
next = line;
return true;
}
}
} catch(IOException e) { }
return false;
}
public String next() {
if(next == null) {
if(! hasNext() ) throw new NoSuchEleme
Solution
There are three significant issues I would address here...
-
Ratchet Freak is right about being concerned about the Iterator/FilteredReader... but I disagree with his suggestion. I think the problem is that your code is converting a BufferedReader in to an Iterator, and making the results unique at the same time. This is a class that is performing 2 functions... and you shoould have two classes instead. One that converts the BufferedReader to an interator, and the other that enforces uniqueness.
-
If your input data really is huge, then a Set may not be the right data structure because of it's memory footprint. I have found that custom implementations of memory-efficient structures can save a lot of space. I hesitate to recommend that you change from the Set though, it is the 'logical' choice, but, if, for example, your average String value is about 16 characters, then more than half of your memory will be in the Set overhead. I have, in the past, had reason to do similar things to you, and have written a memory-efficient class that can be seen in JDOM here (you will need to make changes to that code if you want to use it because it will need to have a mechanism for a true/false
-
I have a pattern I use for Iterators that is really effective, and makes the Iterator logic much simpler/readable. I'll give you an example....
First, part 1, a Reader-to-Iterator implementation:
Note how, in this class, I use a trick for the
So, that is a single-purpose class, it converts a
Now, you need a unique
-
Ratchet Freak is right about being concerned about the Iterator/FilteredReader... but I disagree with his suggestion. I think the problem is that your code is converting a BufferedReader in to an Iterator, and making the results unique at the same time. This is a class that is performing 2 functions... and you shoould have two classes instead. One that converts the BufferedReader to an interator, and the other that enforces uniqueness.
-
If your input data really is huge, then a Set may not be the right data structure because of it's memory footprint. I have found that custom implementations of memory-efficient structures can save a lot of space. I hesitate to recommend that you change from the Set though, it is the 'logical' choice, but, if, for example, your average String value is about 16 characters, then more than half of your memory will be in the Set overhead. I have, in the past, had reason to do similar things to you, and have written a memory-efficient class that can be seen in JDOM here (you will need to make changes to that code if you want to use it because it will need to have a mechanism for a true/false
seen-it test).-
I have a pattern I use for Iterators that is really effective, and makes the Iterator logic much simpler/readable. I'll give you an example....
First, part 1, a Reader-to-Iterator implementation:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.Reader;
import java.util.Iterator;
import java.util.NoSuchElementException;
@SuppressWarnings("javadoc")
public class ReaderLineIterator implements Iterator {
private final BufferedReader reader;
private String nextval;
public ReaderLineIterator(Reader reader) {
this.reader = (reader instanceof BufferedReader) ? (BufferedReader)reader :
new BufferedReader(reader);
advance();
}
private void advance() {
try {
nextval = reader.readLine();
} catch (IOException ioe) {
throw new IllegalStateException("Unable to read from reader.", ioe);
}
}
@Override
public boolean hasNext() {
return nextval != null;
}
@Override
public String next() {
if (nextval == null) {
throw new NoSuchElementException();
}
try {
return nextval;
} finally {
advance();
}
}
@Override
public void remove() {
throw new UnsupportedOperationException();
}
}Note how, in this class, I use a trick for the
nextval, where it is advanced in the finally block of the next() call. This is a pattern I like because it makes the hasNext() method very light-weight, and it is always a step-ahead of the data.So, that is a single-purpose class, it converts a
Reader to a line-at-a-time Iterator.Now, you need a unique
Iterator... which can look something like:import java.util.HashSet;
import java.util.Iterator;
import java.util.NoSuchElementException;
import java.util.Set;
@SuppressWarnings("javadoc")
public class UniqueIterator implements Iterator {
private final Iterator source;
private String nextval = null;
Set seenit = new HashSet();
public UniqueIterator(Iterator source) {
this.source = source;
advance();
}
private void advance() {
while (source.hasNext()) {
String nxt = source.next();
if (seenit.add(nxt)) {
// found a unique value....
nextval = nxt;
return;
}
}
// no more unique values.
nextval = null;
}
@Override
public boolean hasNext() {
return nextval != null;
}
@Override
public String next() {
if (nextval == null) {
throw new NoSuchElementException();
}
try {
return nextval;
} finally {
advance();
}
}
@Override
public void remove() {
throw new UnsupportedOperationException();
}
}Code Snippets
import java.io.BufferedReader;
import java.io.IOException;
import java.io.Reader;
import java.util.Iterator;
import java.util.NoSuchElementException;
@SuppressWarnings("javadoc")
public class ReaderLineIterator implements Iterator<String> {
private final BufferedReader reader;
private String nextval;
public ReaderLineIterator(Reader reader) {
this.reader = (reader instanceof BufferedReader) ? (BufferedReader)reader :
new BufferedReader(reader);
advance();
}
private void advance() {
try {
nextval = reader.readLine();
} catch (IOException ioe) {
throw new IllegalStateException("Unable to read from reader.", ioe);
}
}
@Override
public boolean hasNext() {
return nextval != null;
}
@Override
public String next() {
if (nextval == null) {
throw new NoSuchElementException();
}
try {
return nextval;
} finally {
advance();
}
}
@Override
public void remove() {
throw new UnsupportedOperationException();
}
}import java.util.HashSet;
import java.util.Iterator;
import java.util.NoSuchElementException;
import java.util.Set;
@SuppressWarnings("javadoc")
public class UniqueIterator implements Iterator<String> {
private final Iterator<String> source;
private String nextval = null;
Set<String> seenit = new HashSet<String>();
public UniqueIterator(Iterator<String> source) {
this.source = source;
advance();
}
private void advance() {
while (source.hasNext()) {
String nxt = source.next();
if (seenit.add(nxt)) {
// found a unique value....
nextval = nxt;
return;
}
}
// no more unique values.
nextval = null;
}
@Override
public boolean hasNext() {
return nextval != null;
}
@Override
public String next() {
if (nextval == null) {
throw new NoSuchElementException();
}
try {
return nextval;
} finally {
advance();
}
}
@Override
public void remove() {
throw new UnsupportedOperationException();
}
}Context
StackExchange Code Review Q#41327, answer score: 8
Revisions (0)
No revisions yet.