HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaMinor

Correcting punctuation spacing in a string

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
correctingpunctuationstringspacing

Problem

I'm working on a program that corrects all punctuation spacing in a String.

Examples:

// "Hello ,World"  ->  "Hello, World"
// "9 :00 A .M ."  ->  "9:00 A.M."
// "can 't even."  ->  "can't even."


Here's how it works:

My process is to first add a space between all punctuation and then remove any unnecessary spaces. This allows me to tokenize both punctuation and words/numbers correctly.

For example, the acronym String s = " A .M ." is transformed into "A . M .".

When split by a space, it returns the list ["A", ".", "M", "."].

I then attempt to "fix" the remaining line spacing in the String, so the result will become "A.M.".

Here's an example of what the code looks like for correcting acronyms:

In FixAcronym.java:

/**
 * Fix acronyms.
 */
private static String fix(String line, SubtitleObject so) {
    StringBuilder builder = new StringBuilder();
    String[] split = so.split(RegexEnum.SPACE, line); // same as line.split(" ") but cached
    String prevPrevPrev = null, prevPrev = null, prev = null, current = null;
    boolean addSpace;
    for (int i = 0; i  0 && addSpace) {
            builder.append(' ');
        }
        builder.append(current);
    }

    return builder.toString();
}


In StringBuilderUtil.java:

/**
 * Delete space at index within StringBuilder.
 */
public static void deleteSpaceAt(StringBuilder builder, int index) {
    assert builder.charAt(index) == ' ';
    builder.deleteCharAt(index);
}

/**
 * Delete only if space at index within StringBuilder.
 */
public static void deleteOnlyIfSpaceAt(StringBuilder builder, int index) {
    if (builder.charAt(index) == ' ') {
        builder.deleteCharAt(index);
    }
}


Here's the problem: Because of all the different grammatical rules, there are a lot of different "fix" methods, and each subsequent "fix" uses the same technique.

line = FixTime.fix(line, this);
line = FixContractions.fix(line, this);
line = FixAcronym.fix(line, this);
// etc...


Ea

Solution

I would split it into three stages, and two concepts.

Conceptually, split the problem into:

  • String manipulation (reading and writing)



  • Logical problems (what exists and what needs doing)



The stages the process might be split into:

  • The first stage involves reading the string, parsing it from beginning to end, recording the logical aspects of what exists.



  • The second stage involves manipulating the logical record that represents the tokens, transforming it from a representation of what exists, to a representation of what string manipulations are required.



  • The third stage is executing the string manipulation.

Context

StackExchange Code Review Q#110495, answer score: 5

Revisions (0)

No revisions yet.