patternjavaMinor
Wordgenerator algorithm using Java collection
Viewed 0 times
collectionjavaalgorithmwordgeneratorusing
Problem
Problem Statement
Prompt the user for the order
Read a file of tokens, building a map
from a list of n words to a list of the words in the text following
these words: e.g., if n were 2, the
every pair of words in the text, and a value that is a list of all the
words following the key (no matter where the pair occurs, with NO
DUPLICATES allowed). Print all the associations, one per line, in any
order (the n words followed by the list of words that follow them in
the text).
Prompt the user for the number of random words to generate, and then
prompt for the n words to start with. Build a
using the words to start with to generate a random next word, then use
the previous n words (dropping the oldest word and adding the new word
generated) to generate another random word; repeat. Note: you might
have to stop prematurely if you generate the last n words in the text,
if these words occur nowhere else. That is because in this case, there
is no random word to generate following them! Print the list.
In simple words
This is given string
a b c b a d c b a d c a a b a a d
We pre populate a map with (x,y) -> (...) map like
We have to generate 5 length words taking first 2 input from user. We have to make other 3 letter by using above map.
Using above example
It is giving the correct output, but my questions are
`import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map
Prompt the user for the order
statistic n: 1, 2, 3, etc. Read a file of tokens, building a map
(Map[List[Stringn]] -> List[String*])from a list of n words to a list of the words in the text following
these words: e.g., if n were 2, the
map would contain a key forevery pair of words in the text, and a value that is a list of all the
words following the key (no matter where the pair occurs, with NO
DUPLICATES allowed). Print all the associations, one per line, in any
order (the n words followed by the list of words that follow them in
the text).
Prompt the user for the number of random words to generate, and then
prompt for the n words to start with. Build a
list (List[String*])using the words to start with to generate a random next word, then use
the previous n words (dropping the oldest word and adding the new word
generated) to generate another random word; repeat. Note: you might
have to stop prematurely if you generate the last n words in the text,
if these words occur nowhere else. That is because in this case, there
is no random word to generate following them! Print the list.
In simple words
This is given string
a b c b a d c b a d c a a b a a d
We pre populate a map with (x,y) -> (...) map like
[a, d] -> [c]
[a, b] -> [c, a]
[a, a] -> [b, d]
[b, c] -> [b]
[b, a] -> [d, a]
[c, b] -> [a]
[c, a] -> [a]
[d, c] -> [b, a]
We have to generate 5 length words taking first 2 input from user. We have to make other 3 letter by using above map.
Using above example
Enter # of words to generate: 10
Enter prefix word[0]: a
Enter prefix word[1]: d
Results = [a, d, c, a, a, b, a, d, c, a, a, d]
It is giving the correct output, but my questions are
- Can this be improved?
- Have I used collection classes efficiently?
- Is my algorithm efficient?
`import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map
Solution
Let's go through a couple of things. Concept, then missing functionality (.... you have some).
Concept
An N-Gram is a sequence of N words that have been found in a span of text. You can have 1-grams, 2-grams, 3-grams, .... n-grams. You identify these n-grams by finding all possible n-wide spans of text, and storing them. In the sentence
When processing natural languages, it is often statistically convenient to weigh the likelihood of a particular word happening 'next' after an existing sequence of words.
That is what this problem is about. given a span of 'n', find all the (n+1)-grams. Then, taking any n-width words, look for all the (n+1)-grams that start with those words. Randomly chose one to select the 'next' word. Then repeat the process until you run out of (n+1)-grams that match, or you hit the 'sentence' limit. You have just built a sentence that is statistically 'likely'. A smarter system will 'weight' the next word based on the frequency of the (n+1)-grams that were found in the text. I.e. If the original text has 'the white house' 10 times, and 'the white swan' just once, then it will 'randomly' choose 'house' 10 times more than 'swan'.
OK, that gives you some context for the problem.
Functionality
The challenge/requirement was to take the
That
Similarly, the
You already have commented that the length of the output sentence is supposed to be user-input. You should make it user-input, as well as the first
You have missed the point on the generating the sentence as well... You assume that there will always be a valid/matching random word to add to the sentence until you run out of words. This is not true. You may be part way through a sentence when you discover that the last
Code Style
Your code should be broken out in to more functions. You currently have just one which is used to get the next random word. You should have others to read the input file. You should probably have another that populates the map, etc.
Your main method is very heavy-weight, and should have function-extraction applied.
You also have indentation that is all over the place, and makes things hard to spot. It took me a while just to see that
Conclusion
Your code is only partially working, and some core functionality is missing. You are a good way along to getting a working solution. Hopefully the background on the problem will help you to understand what the problem is trying to solve.... basically: based on statistics from existing texts, randomly generate a new sentence that uses those statistics to predict what the next words in the sentence will be.
Concept
An N-Gram is a sequence of N words that have been found in a span of text. You can have 1-grams, 2-grams, 3-grams, .... n-grams. You identify these n-grams by finding all possible n-wide spans of text, and storing them. In the sentence
the quick brown fox, there are three 2-grams 'the quick', 'quick brown' and 'brown fox'. There are two 3-grams 'the quick brown' and 'quick brown fox'.When processing natural languages, it is often statistically convenient to weigh the likelihood of a particular word happening 'next' after an existing sequence of words.
That is what this problem is about. given a span of 'n', find all the (n+1)-grams. Then, taking any n-width words, look for all the (n+1)-grams that start with those words. Randomly chose one to select the 'next' word. Then repeat the process until you run out of (n+1)-grams that match, or you hit the 'sentence' limit. You have just built a sentence that is statistically 'likely'. A smarter system will 'weight' the next word based on the frequency of the (n+1)-grams that were found in the text. I.e. If the original text has 'the white house' 10 times, and 'the white swan' just once, then it will 'randomly' choose 'house' 10 times more than 'swan'.
OK, that gives you some context for the problem.
Functionality
The challenge/requirement was to take the
n value as an input. You have hard-coded it as 2. In other words, you have 2-grams and 3-grams, when you are supposed to have n-grams and (n+1)-grams.for(int listIndex = 0 ;listIndex < givenList.size() - 2; listIndex++)That
2 should be an n, and all the logic changes needed to fix that.Similarly, the
x1 and x2 variables should be a list, or an array, because there could be more than 2.You already have commented that the length of the output sentence is supposed to be user-input. You should make it user-input, as well as the first
n words.You have missed the point on the generating the sentence as well... You assume that there will always be a valid/matching random word to add to the sentence until you run out of words. This is not true. You may be part way through a sentence when you discover that the last
n words in your sentence do not match any available (n+1)-grams, and there is thus nothing you can add to the sentence, and you have to stop short.Code Style
Your code should be broken out in to more functions. You currently have just one which is used to get the next random word. You should have others to read the input file. You should probably have another that populates the map, etc.
Your main method is very heavy-weight, and should have function-extraction applied.
You also have indentation that is all over the place, and makes things hard to spot. It took me a while just to see that
randomWord was a function.Conclusion
Your code is only partially working, and some core functionality is missing. You are a good way along to getting a working solution. Hopefully the background on the problem will help you to understand what the problem is trying to solve.... basically: based on statistics from existing texts, randomly generate a new sentence that uses those statistics to predict what the next words in the sentence will be.
Code Snippets
for(int listIndex = 0 ;listIndex < givenList.size() - 2; listIndex++)Context
StackExchange Code Review Q#52417, answer score: 3
Revisions (0)
No revisions yet.