HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaMajor

List objects in a Amazon S3 folder without also listing objects in sub folders

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
withoutobjectssubfoldersalsolistingamazonfolderlist

Problem

I'm using the Amazon S3 Java SDK to fetch a list of files in a (simulated) sub-folder. This code is rather standard (AWSConfiguration is a class that contains a bunch of account specific values):

String prefix = "/images/cars/";
int prefix_size = prefix.length();
AmazonS3 s3 = new AmazonS3Client(new AWSConfiguration());
ObjectListing objectListing = s3.listObjects(new ListObjectsRequest().
    withBucketName(AWSConfiguration.BUCKET).
    withPrefix(prefix));


Now this list will include objects like /images/cars/default.png as well as /images/cars/ford/Default.png (because they both contain the same prefix). To list only the objects that are directly inside the /images/cars/ "folder" I have the following function (in a class called S3Asset)

public static boolean isInsideFolder(int root_size, String key) {
  return (key.substring(root_size).indexOf("/") == -1);
}


This looks at the full key for any / after the prefix as a clue that it is inside a sub-folder. This lets me iterate over the objects with the following code (I'm trimming the prefix for clarity):

for(S3ObjectSummary objectSummary : objectListing.getObjectSummaries()) {
  if(S3Asset.isInsideFolder(prefix_size, objectSummary.getKey())) {
    System.out.println(objectSummary.getKey().substring(prefix_size));
  }
}


As near as I can tell this is the cleanest way to do this but it has one characteristic that I don't like. If I'm looking a root level folder I'm requesting the names of all files in all sub-folders only to iterate over them and learn that there is only one object in the actual root level folder. I've considered associating a key with the value being the full path of the folder, which would allow me to request objects with a predictable key instead of the prefix, but the major downside to this is that the key would have to be generated in code and therefor assets uploaded directly in to the S3 Bucket (through the management console) would not have this key. A

Solution

In the ListObjectsRequest javadoc there is a method called withDelimiter(String delimiter). Adding .withDelimiter("/") after the .withPrefix(prefix) call then you will receive only a list of objects at the same folder level as the prefix (avoiding the need to filter the returned ObjectListing after the list was sent over the wire).

Some notes about the code:

1, I'd extract out to a local variable for the ListObjectsRequest instance:

final ListObjectsRequest listObjectRequest = new ListObjectsRequest().
    withBucketName(AWSConfiguration.BUCKET).
    withPrefix(prefix);
final ObjectListing objectListing = s3.listObjects(listObjectRequest);


It's easier to read.

2, root_size should be rootSize. (Regarding to the Java Coding Conventions.)

3, I would use String.contains instead of indexOf. It's more meaningful, easier to read since you don't have to use the -1 magic number.

4, In the last snippet I'd create a local variable for the key:

for (final S3ObjectSummary objectSummary: objectListing.getObjectSummaries()) {
    final String key = objectSummary.getKey();
    if (S3Asset.isImmediateDescendant(prefix, key)) {
        final String relativePath = getRelativePath(prefix, key);
        System.out.println(relativePath);
    }
}


5, Furthermore, I'd move the length call inside the helper method:

public String getRelativePath(final String parent, final String child) {
    if (!child.startsWith(parent)) {
        throw new IllegalArgumentException("Invalid child '" + child 
            + "' for parent '" + parent + "'");
    }
    // a String.replace() also would be fine here
    final int parentLen = parent.length();
    return child.substring(parentLen);
}

public boolean isImmediateDescendant(final String parent, final String child) {
    if (!child.startsWith(parent)) {
        // maybe we just should return false
        throw new IllegalArgumentException("Invalid child '" + child 
            + "' for parent '" + parent + "'");
    }
    final int parentLen = parent.length();
    final String childWithoutParent = child.substring(parentLen);
    if (childWithoutParent.contains("/")) {
        return false;
    }
    return true;
}


Note the input check. (Effective Java, Second Edition, Item 38: Check parameters for validity)

The multiple calls of length could look redundant and slow but premature optimization is not a good thing (see Effective Java, Second Edition, Item 55: Optimize judiciously). If you check the source of java.lang.String, you will find this:

/** The count is the number of characters in the String. */
private final int count;

...

public int length() {
    return count;
}


String is immutable, so it's easy to cache its length and JDK does it for you.

Code Snippets

final ListObjectsRequest listObjectRequest = new ListObjectsRequest().
    withBucketName(AWSConfiguration.BUCKET).
    withPrefix(prefix);
final ObjectListing objectListing = s3.listObjects(listObjectRequest);
for (final S3ObjectSummary objectSummary: objectListing.getObjectSummaries()) {
    final String key = objectSummary.getKey();
    if (S3Asset.isImmediateDescendant(prefix, key)) {
        final String relativePath = getRelativePath(prefix, key);
        System.out.println(relativePath);
    }
}
public String getRelativePath(final String parent, final String child) {
    if (!child.startsWith(parent)) {
        throw new IllegalArgumentException("Invalid child '" + child 
            + "' for parent '" + parent + "'");
    }
    // a String.replace() also would be fine here
    final int parentLen = parent.length();
    return child.substring(parentLen);
}

public boolean isImmediateDescendant(final String parent, final String child) {
    if (!child.startsWith(parent)) {
        // maybe we just should return false
        throw new IllegalArgumentException("Invalid child '" + child 
            + "' for parent '" + parent + "'");
    }
    final int parentLen = parent.length();
    final String childWithoutParent = child.substring(parentLen);
    if (childWithoutParent.contains("/")) {
        return false;
    }
    return true;
}
/** The count is the number of characters in the String. */
private final int count;

...

public int length() {
    return count;
}

Context

StackExchange Code Review Q#6847, answer score: 23

Revisions (0)

No revisions yet.