HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaMinor

Truncate String to target length using only whole words

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
targetwholelengthwordstruncateusingonlystring

Problem

I've got a bit of code that I'm wanting to tidy up. The aim is to take a String that consists of multiple words connected by a separator (It's produced using Collectors.joining(SEPARATOR) as below, although the stream processing is not quite so trivial), and return a portion from the front of it.

This portion should never contain partial words, but should in most cases be less than the TARGET_LENGTH number of total characters. It is safe to assume that the words themselves do not contain the SEPARATOR character.

String SEPARATOR = "-";
int TARGET_LENGTH = 20;

//Expected output: "this-is-a-sample"
String s = Arrays.asList("this", "is", "a", "sample", "string")
            .stream()
            .collect(Collectors.joining(SEPARATOR));

if (s.length() > TARGET_LENGTH) {
    if (s.contains(SEPARATOR)) {
        if (s.substring(0, TARGET_LENGTH + 1).contains(SEPARATOR)) {
            return s.substring(0, s.substring(0, TARGET_LENGTH + 1).lastIndexOf(SEPARATOR));
        } else {
            return s.substring(0, s.indexOf(SEPARATOR));
        }
    }
}
return s;


Expected output is as follows:

s                                   ||  result
"easy"                              ||  "easy"
"simple-case"                       ||  "simple-case"
"very-long-input-will-be-truncated" ||  "very-long-input-will"
"countercountermeasures"            ||  "countercountermeasures" //Longer than 20 chars
"countercountermeasures-foo-bar-baz"||  "countercountermeasures" //Longer than 20 chars
"a-b-c-d-e-f-g-veryveryverylongword"||  "a-b-c-d-e-f-g"


The provided code works, but isn't very clear and I imagine it will be horrible to maintain if any requirements change in the future. I'm looking for suggestions as to how it could be improved. I've considered adding a filter to the Stream that generates the String, with a counter keeping track of the current output length declared outside of the string-generation, but this feels like an even worse thing to do (At

Solution

Style

if (s.length() > TARGET_LENGTH) {
    if (s.contains(SEPARATOR)) {
        if (s.substring(0, TARGET_LENGTH + 1).contains(SEPARATOR)) {
            return s.substring(0, s.substring(0, TARGET_LENGTH + 1).lastIndexOf(SEPARATOR));
        } else {
            return s.substring(0, s.indexOf(SEPARATOR));
        }
    }
}


The above code is cumbersome. When you have a return statement inside an if-condition, there is no reason to have an 'else' condition.

Additionally, when you are using early-returns anyway, there's no reason to have such nested code blocks....

Alternatives

I can see multiple options to this problem. I would recommend testing them to see which performs better for you. The Regex option may in fact be faster if the pattern is reused often.

On the fly

In your own answer, you suggest on-the-fly building of the string, and ending early. This is a good option, but I would do it differently:

public static String portion(final int length, final String separator, String...words) {
    if (words.length == 0) {
        return "";
    }

    StringBuilder sb = new StringBuilder(length);
    sb.append(words[0]);
    for (int i = 1; i  length) {
            return sb.toString();
        }
        sb.append(separator).append(words[i]);
    }
    return sb.toString();
}


Regex

Yeah, yeah, the whole "use a regular expression, now you have two problems...." but, this concept is:

  • If the String is



  • If the string is >= 20 or there are no separators within 20, then take until the end of string, or the first separator, whichever is shorter.



If you have the separator - and the length 20, then this can be expressed in the regex:

^(.{0,20}(?=$|-)|.*?(?=$|-))


where:

.{0,20}(?=$|-)


is: find up to 20 chars followed either by the end-of-line, or a separator.

and:

.*?(?=$|-)


is: otherwise, if the previous test fails, take until the end-of-line, or the next separator, whichever happens first.

You can build this in to a reusable pattern (and being defensive about separators that may have special regex meaning....):

private static final Pattern buildPattern(int length, String separator) {
    String qpat = Pattern.quote(separator);
    return Pattern.compile(String.format("^(.{0,%d}(?=$|%s)|.*?(?=$|%s))", length, qpat, qpat));
}


Then, you can use this as follows:

public static String portion(Pattern pat, String s) {
    Matcher mat = pat.matcher(s);
    mat.find(); // patterns will always match, will never fail...
    return mat.group();
}


I imagine this can be used, in your code, like:

private static final Pattern TWENTY_DASH = buildPattern(TARGET_LENGTH, SEPARATOR);


and then in your remaining code:

String s = Arrays.asList("this", "is", "a", "sample", "string")
            .stream()
            .collect(Collectors.joining(SEPARATOR));

return portion(TWENTY_DASH, s);


Return-Early

Having suggested the RegEx option, I would also suggest a simplified lastIndexOf/indexOf using early-returns, instead of nested if-blocks:

public static String portion(final int length, final String separator, String text) {

    if (text.length() = 0) {
        return text.substring(0, pos);
    }

    pos = text.indexOf(separator, length);
    if (pos >= 0) {
        return text.substring(0, pos);
    }

    return text;

}

Code Snippets

if (s.length() > TARGET_LENGTH) {
    if (s.contains(SEPARATOR)) {
        if (s.substring(0, TARGET_LENGTH + 1).contains(SEPARATOR)) {
            return s.substring(0, s.substring(0, TARGET_LENGTH + 1).lastIndexOf(SEPARATOR));
        } else {
            return s.substring(0, s.indexOf(SEPARATOR));
        }
    }
}
public static String portion(final int length, final String separator, String...words) {
    if (words.length == 0) {
        return "";
    }

    StringBuilder sb = new StringBuilder(length);
    sb.append(words[0]);
    for (int i = 1; i < words.length; i++) {
        if (sb.length() + separator.length() + words[i].length() > length) {
            return sb.toString();
        }
        sb.append(separator).append(words[i]);
    }
    return sb.toString();
}
^(.{0,20}(?=$|-)|.*?(?=$|-))
.{0,20}(?=$|-)
private static final Pattern buildPattern(int length, String separator) {
    String qpat = Pattern.quote(separator);
    return Pattern.compile(String.format("^(.{0,%d}(?=$|%s)|.*?(?=$|%s))", length, qpat, qpat));
}

Context

StackExchange Code Review Q#75704, answer score: 4

Revisions (0)

No revisions yet.