HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaMinor

Java 8 spliterator for paged results

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
javaspliteratorforresultspaged

Problem

Summary: I am using an API which returns paged results. I want to have these results as Java 8 Stream and implemented a Spliterator for this purpose.

I am using AWS S3 Java API to list objects in the S3 bucket. The API returns paged results: when I call client.listObjects(bucketName, rootKey) for the first time, I get an ObjectListing instance which may be complete or truncated i.e. return just one "page" of results.

If ObjectListing is truncated I have to request further "page" via client.listNextBatchOfObjects(objectListing) (providing current "page" as marker) and so on until I get an ObjectListing which is not truncated.

I want to use Java 8 Stream APIs to work with ObjectListings. Ideally, I want to hide querying pages of ObjectListings behind some facility which would just give me a Stream. For this I've implemented a Spliterator:

```
import java.util.Objects;
import java.util.Spliterator;
import java.util.function.Consumer;

import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.model.ObjectListing;

public class ObjectListingSpliterator implements Spliterator {

private final AmazonS3 client;
private ObjectListing objectListing;
private volatile boolean split = false;
private volatile boolean currentObjectListingWasConsumed = false;

public ObjectListingSpliterator(AmazonS3 client, ObjectListing objectListing) {
Objects.requireNonNull(client, "client must not be null.");
Objects.requireNonNull(objectListing, "objectListing must not be null.");
this.client = client;
this.objectListing = objectListing;
}

@Override
public boolean tryAdvance(Consumer action) {
if (!currentObjectListingWasConsumed) {
action.accept(objectListing);
currentObjectListingWasConsumed = true;
if (!split && objectListing.isTruncated()) {
objectListing = client.listNextBatchOfObjects(objectListing);
cu

Solution

First of all, splitting the spliterator only makes sense, if both the remainder of the current spliterator and the returned spliterator still have work pending. In your case, this is (almost) not true, as you operate on the complete batches and the current spliterator at most returns its current batch after split. Thus, I'd replace the trySplit() with a simple return null. This also addresses any potential concurrency issues (which I have not looked into in depth.)

Characteristics basically tell the caller the - for lack of a better word - characteristics of your spliterator. :-)

I think for the batch approach you take, ORDERED, NONNULL, IMMUTABLE should be OK.

Apart from that, for more direct utility, I'd rather take the approach not to iterate over the batches, but over their contents, i.e. create a Spliterator which gets initialized with the first ObjectListing batch and then transparently goes through the underlying collections element-wise and fetches the next batch as needed. This would eliminate the need to flatMap on the result stream and feel more natural for a streaming approach. (In fact, this sounds so useful that I'd like to have it :-))

Context

StackExchange Code Review Q#154706, answer score: 3

Revisions (0)

No revisions yet.