snippetjavaMinor
Undo format when format disappears
Viewed 0 times
disappearsformatundowhen
Problem
I was posed a question as follows:
Given a sentence like "John is a bad man", lets say all the formatting disappears and you are left with one string "johnisabadman". Given a dictionary of words, design an algorithm to find the optimal way of "unconcatenating" the sequence of words. The optimality stems from the parsing which minimizes the number of unrecognized characters. In this particular example "johnisabadman" gets unconcatenated to "JOHN is a bad man".
I worked out a solution and it will be great if somebody could review it.
The algorithm works by checking if the original string is in the dictionary. If the dictionary contains the string, I return the string as it is, other wise I look at the substrings and recursively keep track of the cost of the substring in the inner class
```
import java.util.List;
import java.util.ArrayList;
import java.util.Set;
import java.util.HashSet;
import java.util.Map;
import java.util.HashMap;
import java.util.Arrays;
public class UndoFormat {
// Container class to hold the total number of
// character misses for the list of string.
class Pair {
private int miss;
private List strings;
Pair(int miss, List strings) {
this.miss = miss;
this.strings = strings;
}
int getMissedChars() {
return this.miss;
}
List getStrings() {
return this.strings;
}
}
private Set dict = new HashSet();
// Cache to store the intermediate computations
private Map cache = new HashMap();
public UndoFormat() {
// Seeding the dictionary for testing purposes
dic
Given a sentence like "John is a bad man", lets say all the formatting disappears and you are left with one string "johnisabadman". Given a dictionary of words, design an algorithm to find the optimal way of "unconcatenating" the sequence of words. The optimality stems from the parsing which minimizes the number of unrecognized characters. In this particular example "johnisabadman" gets unconcatenated to "JOHN is a bad man".
I worked out a solution and it will be great if somebody could review it.
The algorithm works by checking if the original string is in the dictionary. If the dictionary contains the string, I return the string as it is, other wise I look at the substrings and recursively keep track of the cost of the substring in the inner class
Pair. The cost of the substring is zero if it is within the dictionary or else it is the length of the substring. While bubbling up from the recursion, I compare the sum of the cost of the substring with the original string and based on the comparison I decide whether to keep the concatenation of the substrings or the original string.```
import java.util.List;
import java.util.ArrayList;
import java.util.Set;
import java.util.HashSet;
import java.util.Map;
import java.util.HashMap;
import java.util.Arrays;
public class UndoFormat {
// Container class to hold the total number of
// character misses for the list of string.
class Pair {
private int miss;
private List strings;
Pair(int miss, List strings) {
this.miss = miss;
this.strings = strings;
}
int getMissedChars() {
return this.miss;
}
List getStrings() {
return this.strings;
}
}
private Set dict = new HashSet();
// Cache to store the intermediate computations
private Map cache = new HashMap();
public UndoFormat() {
// Seeding the dictionary for testing purposes
dic
Solution
The output is not as expected:
While it should be:
I'd rename the class to
The formatted string is:S
The formatted string is:a
The formatted string is:M
The formatted string is:is
The formatted string is:a
The formatted string is:bad
The formatted string is:manWhile it should be:
The formatted string is:SAM
The formatted string is:is
The formatted string is:a
The formatted string is:bad
The formatted string is:manI'd rename the class to
DictionarySplitter and undoFormat() to split().Code Snippets
The formatted string is:S
The formatted string is:a
The formatted string is:M
The formatted string is:is
The formatted string is:a
The formatted string is:bad
The formatted string is:manThe formatted string is:SAM
The formatted string is:is
The formatted string is:a
The formatted string is:bad
The formatted string is:manContext
StackExchange Code Review Q#96975, answer score: 3
Revisions (0)
No revisions yet.