HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaMinor

Duplicate words in a text

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
duplicatewordstext

Problem

Here is a simplified implementation to obtain the duplicate words in a text using lambda expressions.

public class FindDuplicateWordsInText {

    public static Set findDuplicateWordsInText(String text) {
        String[] words = text.split(" ");
        Set duplicatesRemovedSet = new HashSet<>();
        Set duplicatesSet = Arrays.stream(words).filter(string -> !duplicatesRemovedSet.add(string))
                .collect(Collectors.toSet());
        return duplicatesSet;
    }
}

Solution

Your use of the boolean return value of the Set.add() call is a clever way to check for your duplicates. The concept you have is good, and I can't think of a faster way.

Additionally, I like how you have used Interface-based types on the left-side of assignments Set and the concrete classes on the right new HashSet<>() .... people often put the concrete type on the left too, and it's good to see that you did not.

In terms of the Java streaming API, though, I can't help but feel that you missed out on an opportunity to improve the process by streaming the split.... The Pattern class has a splitAsStream method which would reduce your latency on the first words....

As an aside, a word should probably be on a contiguous whitespace, not just a single space (i.e. "\\s+" instead of " ").

Here's your code done differently:

private static final Pattern SPACE = Pattern.compile("\\s+");

public static Set findDuplicateWordsInText(String text) {
    Set duplicatesRemovedSet = new HashSet<>();
    return SPACE.splitAsStream(text)
            .filter(string -> !duplicatesRemovedSet.add(string))
            .collect(Collectors.toSet());
}

Code Snippets

private static final Pattern SPACE = Pattern.compile("\\s+");

public static Set<String> findDuplicateWordsInText(String text) {
    Set<String> duplicatesRemovedSet = new HashSet<>();
    return SPACE.splitAsStream(text)
            .filter(string -> !duplicatesRemovedSet.add(string))
            .collect(Collectors.toSet());
}

Context

StackExchange Code Review Q#100722, answer score: 4

Revisions (0)

No revisions yet.