patternMinor
Segmenting an English string with no spaces using dynamic programming
Viewed 0 times
englishwithprogrammingsegmentingspacesusingdynamicstring
Problem
Suppose you have a function
I was thinking that we could have something like the following:
Now to determine the segmentation, we find max{Q[0], .., Q[n]} which will return some Q[i] (the first space is after this). Then, we find max{Q[i+1], .. Q[n]} which returns another Q[i] (second space is after this), etc. until max returns Q[n].
I have some questions though: is this method even correct, and does it use dynamic programming? It seems to me that it does, since we build the initial Q with subproblems to the original problem. Also, is this an optimal solution? To my understanding, the worst case would be O(n^2), which would be when max returns Q[0], then Q[1], then Q[2], etc.
quality(x) that returns the quality of a sequence of letters x. Given a string such as "howareyoutoday," what is the most efficient way to determine that the segmentation is "how are you today" (i.e. quality(how)+quality(are)+quality(you)+quality(today) is the maximum quality possible)?I was thinking that we could have something like the following:
A[0] = h, A[1] = o, ..., A[n] = y
Q[0] = quality(A[0]), Q[1] = quality(A[0]A[1]), ..., Q[n] = quality(A[0]...A[n])Now to determine the segmentation, we find max{Q[0], .., Q[n]} which will return some Q[i] (the first space is after this). Then, we find max{Q[i+1], .. Q[n]} which returns another Q[i] (second space is after this), etc. until max returns Q[n].
I have some questions though: is this method even correct, and does it use dynamic programming? It seems to me that it does, since we build the initial Q with subproblems to the original problem. Also, is this an optimal solution? To my understanding, the worst case would be O(n^2), which would be when max returns Q[0], then Q[1], then Q[2], etc.
Solution
I'm assuming your quality function measures something similar to the likelihood of the string being a word. The problem with your algorithm is that it makes a hard decision about where the first space should go before considering where the other spaces go. If you make a suboptimal decision for where the first space goes it will mess you up later on.
The dynamic programming algorithm for this problem is called the Viterbi algorithm. https://en.wikipedia.org/wiki/Viterbi_algorithm You have to keep a table of the highest quality segmentations seen so far as you process the string from left to right. When the algorithm finishes you will be left with a ranked list of the highest quality segmentations.
The dynamic programming algorithm for this problem is called the Viterbi algorithm. https://en.wikipedia.org/wiki/Viterbi_algorithm You have to keep a table of the highest quality segmentations seen so far as you process the string from left to right. When the algorithm finishes you will be left with a ranked list of the highest quality segmentations.
Context
StackExchange Computer Science Q#30572, answer score: 2
Revisions (0)
No revisions yet.