patternjavaModerate
Match Simple Sentence or Partial Sentence
Viewed 0 times
simplematchsentencepartial
Problem
Description
Match a Simple Sentence or a partial sentence
Suitable for matching
Input/Result
match("Because I'm Batman") true
match("John Cena's fans") true
match("Nice.") true
match("Mat's Mug") true
match("Good Boy") true
match("Assert True.") true
match("The students' projects") true
match("The Johnsons' house is on fire.") true
match("Tim's and Marty's ice cream") true
match("You're late.") true
match("Cat's eyes are blue.") true
match("B.A.T.") true
match("B. A. T.") true
match(" Hello World ") true
match(" K.O. ") true
match(" Js' friend ") true
match(" J A V A ") true
match("I'll look into it") true
match("I can't beleive it.") true
match("The students'") false
match("The students") false
match(".") false
match("") false
match("AA") false
match("A") false
match("B..A.T.") false
match("You''re late.") false
match("' ; delete from user
Match a Simple Sentence or a partial sentence
Suitable for matching
- People Names (to some extent)
- Product Titles (to some extent)
- Correct use of apostrophe (to some extent)
Input/Result
match("Because I'm Batman") true
match("John Cena's fans") true
match("Nice.") true
match("Mat's Mug") true
match("Good Boy") true
match("Assert True.") true
match("The students' projects") true
match("The Johnsons' house is on fire.") true
match("Tim's and Marty's ice cream") true
match("You're late.") true
match("Cat's eyes are blue.") true
match("B.A.T.") true
match("B. A. T.") true
match(" Hello World ") true
match(" K.O. ") true
match(" Js' friend ") true
match(" J A V A ") true
match("I'll look into it") true
match("I can't beleive it.") true
match("The students'") false
match("The students") false
match(".") false
match("") false
match("AA") false
match("A") false
match("B..A.T.") false
match("You''re late.") false
match("' ; delete from user
Solution
There are a number of things that could be improved here.
You're doing the right thing with the pre-compiled regular expression/pattern, but, you have fallen victim to the little-known auto-format-muck-up-monster, and what I consider the magic-value-overcompensation issue:
Your code would look much simpler with just:
Now, the first thing you do in your match method (well, the second thing), is trim the string.
This is a relatively expensive operation because it makes a copy of the data, and a new String and
Similarly, you do not allow the value to start with, or end with, a single quote
Now, about the unit tests....
I would suggest a couple of arrays that contain passing, and failing values. Something simple like:
and then the test is:
Adding new test cases is much simpler this way, and the DRY-factor is high.
You're doing the right thing with the pre-compiled regular expression/pattern, but, you have fallen victim to the little-known auto-format-muck-up-monster, and what I consider the magic-value-overcompensation issue:
private static final String SIMPLE_SENTENCE
= "([a-zA-Z]+(\\.|\\. |'(s |re |t |m |ll )|s' | )?)+";
private static final Pattern SIMPLE_SENTENCE_PATTERN = Pattern.compile(
SIMPLE_SENTENCE);- the magic-value-overcompensation-issue is that you have declared a constant, to preserve a magic value, that is only used once, in another magic value.
- the auto-format-muck-up-monster is the location of the line-break in the Pattern. In this case, it is more readable with the
Patternstarting on the new line.
Your code would look much simpler with just:
private static final Pattern SIMPLE_SENTENCE_PATTERN =
Pattern.compile("([a-zA-Z]+(\\.|\\. |'(s |re |t |m |ll )|s' | )?)+");Now, the first thing you do in your match method (well, the second thing), is trim the string.
This is a relatively expensive operation because it makes a copy of the data, and a new String and
char[] array instance. It would be much simpler to just incorporate the white-space in to the pattern... add \\s* at the beginning and end.Similarly, you do not allow the value to start with, or end with, a single quote
', so incorporate that in to your pattern too. Then you may as well make the null-check a condition on the return, ending up with the code:private static final Pattern SIMPLE_SENTENCE_PATTERN =
Pattern.compile("\\s*(?!')([a-zA-Z]+(\\.|\\. |'(s |re |t |m |ll )|s' | )?)+(?!')\\s*");
public static boolean match(String toTest) {
return toTest != null && toTest.length() > 2
&& SIMPLE_SENTENCE_PATTERN.matcher(toTest).matches();
}Now, about the unit tests....
I would suggest a couple of arrays that contain passing, and failing values. Something simple like:
private static final String[] passingValues = {
"Because I'm Batman",
"John Cena's fans",
"Nice.",
"Mat's Mug",
"Good Boy",
....
};and then the test is:
@Test
public void testMatching() {
for (String val : passingValues) {
assertTrue(match(val));
}
}Adding new test cases is much simpler this way, and the DRY-factor is high.
Code Snippets
private static final String SIMPLE_SENTENCE
= "([a-zA-Z]+(\\.|\\. |'(s |re |t |m |ll )|s' | )?)+";
private static final Pattern SIMPLE_SENTENCE_PATTERN = Pattern.compile(
SIMPLE_SENTENCE);private static final Pattern SIMPLE_SENTENCE_PATTERN =
Pattern.compile("([a-zA-Z]+(\\.|\\. |'(s |re |t |m |ll )|s' | )?)+");private static final Pattern SIMPLE_SENTENCE_PATTERN =
Pattern.compile("\\s*(?!')([a-zA-Z]+(\\.|\\. |'(s |re |t |m |ll )|s' | )?)+(?!')\\s*");
public static boolean match(String toTest) {
return toTest != null && toTest.length() > 2
&& SIMPLE_SENTENCE_PATTERN.matcher(toTest).matches();
}private static final String[] passingValues = {
"Because I'm Batman",
"John Cena's fans",
"Nice.",
"Mat's Mug",
"Good Boy",
....
};@Test
public void testMatching() {
for (String val : passingValues) {
assertTrue(match(val));
}
}Context
StackExchange Code Review Q#60247, answer score: 15
Revisions (0)
No revisions yet.