HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaMinor

Extracting a sentence containing a specific word from a longer text

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
containinglongertextwordsentenceextractingspecificfrom

Problem

My goal is to return a specific sentence in a story that contains a given word. Returns null if the word is not in the story. I'm ok with returning the first occurrence if there are multiples. Is there a more efficient or cleaner way to do this?

public static String getSentence(String text, String word) {
    String sentence = "";
    if (text.toLowerCase().contains(word)) {
        if (text.contains(".")) {  //Are there sentences terminating in a period?
            int loc = text.toLowerCase().indexOf(word);
            int a = loc;
            while (a >= 0) {
                if (text.charAt(a) == '.' || a == 0) {
                    sentence = text.substring(a,loc);
                    a = 0;
                }
                a--;
            }
            a = loc + word.length();
            while (a <= text.length()) {
                if (text.charAt(a) == '.' || a == text.length()) {
                    sentence += text.substring(loc,a+1);
                    a = text.length()+1;
                }
                a++;
            }
            return sentence;
        } else {
            return text;      //If no period, return full text
        }
    } else {
        return null;
    }
}


FYI - I'm implementing this in Android, so I don't believe I have access to Java 8.

Solution

There are a few things of interest with your solution. Firstly, it is a very literal implementation of the problem to solve, and I worry that it is too literal. For example, are you sure that sentences end with just a period .? Is it not a period and some whitespace? Is a URL like example.com two sentences?

The second issue I have is the blind trust you have in the inputs. You happily convert the input text to lower-case (too often, actually), but you do not convert the word to lower-case. If someone gives an upper-case word, you'll never find it.

I would prefer a more zen approach, using regular expressions... actually, just a split, and some Java 8 niceness.

private static final Pattern END_OF_SENTENCE = Pattern.compile("\\.\\s+");

public static String getSentence(String text, String word) {
    final String lcword = word.toLowerCase();
    return END_OF_SENTENCE.splitAsStream(text)
            .filter(s -> s.toLowerCase().contains(lcword))
            .findAny()
            .orElse(null);
}


Why is that better? Well, it streams the text in the form of sentences, and then finds the first match in a sentence. If there are no sentences, it matches the whole thing.

Note that the same principles can be used with a non-streaming approach. Split by sentences, then find the first match.

In an android environment, you could do:

private static final Pattern END_OF_SENTENCE = Pattern.compile("\\.\\s+");

public static String getSentence(String text, String word) {
    final String lcword = word.toLowerCase();
    for (String sentence : END_OF_SENTENCE.split(text)) {
        if (sentence.toLowerCase().contains(lcword)) {
            return sentence;
        }
    }
    return null;
}


Note that the results from the above code may, or may not include the terminating period. If the match is in the last sentence of a text, and that text ends with a period, then the period may be returned as part of the result. If there is a match in the middle of the text, then the period will not be included.

Code Snippets

private static final Pattern END_OF_SENTENCE = Pattern.compile("\\.\\s+");

public static String getSentence(String text, String word) {
    final String lcword = word.toLowerCase();
    return END_OF_SENTENCE.splitAsStream(text)
            .filter(s -> s.toLowerCase().contains(lcword))
            .findAny()
            .orElse(null);
}
private static final Pattern END_OF_SENTENCE = Pattern.compile("\\.\\s+");

public static String getSentence(String text, String word) {
    final String lcword = word.toLowerCase();
    for (String sentence : END_OF_SENTENCE.split(text)) {
        if (sentence.toLowerCase().contains(lcword)) {
            return sentence;
        }
    }
    return null;
}

Context

StackExchange Code Review Q#90474, answer score: 8

Revisions (0)

No revisions yet.