HiveBrain v1.2.0
Get Started
← Back to all entries
snippetjavaMinor

Undo format when format disappears

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
disappearsformatundowhen

Problem

I was posed a question as follows:


Given a sentence like "John is a bad man", lets say all the formatting disappears and you are left with one string "johnisabadman". Given a dictionary of words, design an algorithm to find the optimal way of "unconcatenating" the sequence of words. The optimality stems from the parsing which minimizes the number of unrecognized characters. In this particular example "johnisabadman" gets unconcatenated to "JOHN is a bad man".

I worked out a solution and it will be great if somebody could review it.

The algorithm works by checking if the original string is in the dictionary. If the dictionary contains the string, I return the string as it is, other wise I look at the substrings and recursively keep track of the cost of the substring in the inner class Pair. The cost of the substring is zero if it is within the dictionary or else it is the length of the substring. While bubbling up from the recursion, I compare the sum of the cost of the substring with the original string and based on the comparison I decide whether to keep the concatenation of the substrings or the original string.

```
import java.util.List;
import java.util.ArrayList;
import java.util.Set;
import java.util.HashSet;
import java.util.Map;
import java.util.HashMap;
import java.util.Arrays;

public class UndoFormat {
// Container class to hold the total number of
// character misses for the list of string.
class Pair {
private int miss;
private List strings;
Pair(int miss, List strings) {
this.miss = miss;
this.strings = strings;
}
int getMissedChars() {
return this.miss;
}
List getStrings() {
return this.strings;
}
}
private Set dict = new HashSet();
// Cache to store the intermediate computations
private Map cache = new HashMap();
public UndoFormat() {
// Seeding the dictionary for testing purposes
dic

Solution

The output is not as expected:

The formatted string is:S
The formatted string is:a
The formatted string is:M
The formatted string is:is
The formatted string is:a
The formatted string is:bad
The formatted string is:man


While it should be:

The formatted string is:SAM
The formatted string is:is
The formatted string is:a
The formatted string is:bad
The formatted string is:man


I'd rename the class to DictionarySplitter and undoFormat() to split().

Code Snippets

The formatted string is:S
The formatted string is:a
The formatted string is:M
The formatted string is:is
The formatted string is:a
The formatted string is:bad
The formatted string is:man
The formatted string is:SAM
The formatted string is:is
The formatted string is:a
The formatted string is:bad
The formatted string is:man

Context

StackExchange Code Review Q#96975, answer score: 3

Revisions (0)

No revisions yet.