patternjavaMinor
Correcting punctuation spacing in a string
Viewed 0 times
correctingpunctuationstringspacing
Problem
I'm working on a program that corrects all punctuation spacing in a
Examples:
Here's how it works:
My process is to first add a space between all punctuation and then remove any unnecessary spaces. This allows me to tokenize both punctuation and words/numbers correctly.
For example, the acronym
When split by a space, it returns the list
I then attempt to "fix" the remaining line spacing in the
Here's an example of what the code looks like for correcting acronyms:
In
In
Here's the problem: Because of all the different grammatical rules, there are a lot of different "fix" methods, and each subsequent "fix" uses the same technique.
Ea
String.Examples:
// "Hello ,World" -> "Hello, World"
// "9 :00 A .M ." -> "9:00 A.M."
// "can 't even." -> "can't even."Here's how it works:
My process is to first add a space between all punctuation and then remove any unnecessary spaces. This allows me to tokenize both punctuation and words/numbers correctly.
For example, the acronym
String s = " A .M ." is transformed into "A . M .".When split by a space, it returns the list
["A", ".", "M", "."].I then attempt to "fix" the remaining line spacing in the
String, so the result will become "A.M.".Here's an example of what the code looks like for correcting acronyms:
In
FixAcronym.java:/**
* Fix acronyms.
*/
private static String fix(String line, SubtitleObject so) {
StringBuilder builder = new StringBuilder();
String[] split = so.split(RegexEnum.SPACE, line); // same as line.split(" ") but cached
String prevPrevPrev = null, prevPrev = null, prev = null, current = null;
boolean addSpace;
for (int i = 0; i 0 && addSpace) {
builder.append(' ');
}
builder.append(current);
}
return builder.toString();
}In
StringBuilderUtil.java:/**
* Delete space at index within StringBuilder.
*/
public static void deleteSpaceAt(StringBuilder builder, int index) {
assert builder.charAt(index) == ' ';
builder.deleteCharAt(index);
}
/**
* Delete only if space at index within StringBuilder.
*/
public static void deleteOnlyIfSpaceAt(StringBuilder builder, int index) {
if (builder.charAt(index) == ' ') {
builder.deleteCharAt(index);
}
}Here's the problem: Because of all the different grammatical rules, there are a lot of different "fix" methods, and each subsequent "fix" uses the same technique.
line = FixTime.fix(line, this);
line = FixContractions.fix(line, this);
line = FixAcronym.fix(line, this);
// etc...Ea
Solution
I would split it into three stages, and two concepts.
Conceptually, split the problem into:
The stages the process might be split into:
Conceptually, split the problem into:
- String manipulation (reading and writing)
- Logical problems (what exists and what needs doing)
The stages the process might be split into:
- The first stage involves reading the string, parsing it from beginning to end, recording the logical aspects of what exists.
- The second stage involves manipulating the logical record that represents the tokens, transforming it from a representation of what exists, to a representation of what string manipulations are required.
- The third stage is executing the string manipulation.
Context
StackExchange Code Review Q#110495, answer score: 5
Revisions (0)
No revisions yet.