patternjavaMinor
Extracting a sentence containing a specific word from a longer text
Viewed 0 times
containinglongertextwordsentenceextractingspecificfrom
Problem
My goal is to return a specific sentence in a story that contains a given word. Returns
FYI - I'm implementing this in Android, so I don't believe I have access to Java 8.
null if the word is not in the story. I'm ok with returning the first occurrence if there are multiples. Is there a more efficient or cleaner way to do this?public static String getSentence(String text, String word) {
String sentence = "";
if (text.toLowerCase().contains(word)) {
if (text.contains(".")) { //Are there sentences terminating in a period?
int loc = text.toLowerCase().indexOf(word);
int a = loc;
while (a >= 0) {
if (text.charAt(a) == '.' || a == 0) {
sentence = text.substring(a,loc);
a = 0;
}
a--;
}
a = loc + word.length();
while (a <= text.length()) {
if (text.charAt(a) == '.' || a == text.length()) {
sentence += text.substring(loc,a+1);
a = text.length()+1;
}
a++;
}
return sentence;
} else {
return text; //If no period, return full text
}
} else {
return null;
}
}FYI - I'm implementing this in Android, so I don't believe I have access to Java 8.
Solution
There are a few things of interest with your solution. Firstly, it is a very literal implementation of the problem to solve, and I worry that it is too literal. For example, are you sure that sentences end with just a period
The second issue I have is the blind trust you have in the inputs. You happily convert the input text to lower-case (too often, actually), but you do not convert the
I would prefer a more zen approach, using regular expressions... actually, just a split, and some Java 8 niceness.
Why is that better? Well, it streams the text in the form of sentences, and then finds the first match in a sentence. If there are no sentences, it matches the whole thing.
Note that the same principles can be used with a non-streaming approach. Split by sentences, then find the first match.
In an android environment, you could do:
Note that the results from the above code may, or may not include the terminating period. If the match is in the last sentence of a text, and that text ends with a period, then the period may be returned as part of the result. If there is a match in the middle of the text, then the period will not be included.
.? Is it not a period and some whitespace? Is a URL like example.com two sentences?The second issue I have is the blind trust you have in the inputs. You happily convert the input text to lower-case (too often, actually), but you do not convert the
word to lower-case. If someone gives an upper-case word, you'll never find it.I would prefer a more zen approach, using regular expressions... actually, just a split, and some Java 8 niceness.
private static final Pattern END_OF_SENTENCE = Pattern.compile("\\.\\s+");
public static String getSentence(String text, String word) {
final String lcword = word.toLowerCase();
return END_OF_SENTENCE.splitAsStream(text)
.filter(s -> s.toLowerCase().contains(lcword))
.findAny()
.orElse(null);
}Why is that better? Well, it streams the text in the form of sentences, and then finds the first match in a sentence. If there are no sentences, it matches the whole thing.
Note that the same principles can be used with a non-streaming approach. Split by sentences, then find the first match.
In an android environment, you could do:
private static final Pattern END_OF_SENTENCE = Pattern.compile("\\.\\s+");
public static String getSentence(String text, String word) {
final String lcword = word.toLowerCase();
for (String sentence : END_OF_SENTENCE.split(text)) {
if (sentence.toLowerCase().contains(lcword)) {
return sentence;
}
}
return null;
}Note that the results from the above code may, or may not include the terminating period. If the match is in the last sentence of a text, and that text ends with a period, then the period may be returned as part of the result. If there is a match in the middle of the text, then the period will not be included.
Code Snippets
private static final Pattern END_OF_SENTENCE = Pattern.compile("\\.\\s+");
public static String getSentence(String text, String word) {
final String lcword = word.toLowerCase();
return END_OF_SENTENCE.splitAsStream(text)
.filter(s -> s.toLowerCase().contains(lcword))
.findAny()
.orElse(null);
}private static final Pattern END_OF_SENTENCE = Pattern.compile("\\.\\s+");
public static String getSentence(String text, String word) {
final String lcword = word.toLowerCase();
for (String sentence : END_OF_SENTENCE.split(text)) {
if (sentence.toLowerCase().contains(lcword)) {
return sentence;
}
}
return null;
}Context
StackExchange Code Review Q#90474, answer score: 8
Revisions (0)
No revisions yet.