patternjavaMinor

Wordgenerator algorithm using Java collection

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

collectionjavaalgorithmwordgeneratorusing

Problem

Problem Statement

Prompt the user for the order

statistic n: 1, 2, 3, etc.

Read a file of tokens, building a map

(Map[List[Stringn]] -> List[String*])

from a list of n words to a list of the words in the text following
these words: e.g., if n were 2, the map would contain a key for
every pair of words in the text, and a value that is a list of all the
words following the key (no matter where the pair occurs, with NO
DUPLICATES allowed). Print all the associations, one per line, in any
order (the n words followed by the list of words that follow them in
the text).

Prompt the user for the number of random words to generate, and then
prompt for the n words to start with. Build a

list (List[String*])

using the words to start with to generate a random next word, then use
the previous n words (dropping the oldest word and adding the new word
generated) to generate another random word; repeat. Note: you might
have to stop prematurely if you generate the last n words in the text,
if these words occur nowhere else. That is because in this case, there
is no random word to generate following them! Print the list.

In simple words

This is given string

a b c b a d c b a d c a a b a a d

We pre populate a map with (x,y) -> (...) map like

[a, d] -> [c]
[a, b] -> [c, a]
[a, a] -> [b, d]
[b, c] -> [b]
[b, a] -> [d, a]
[c, b] -> [a]
[c, a] -> [a]
[d, c] -> [b, a]

We have to generate 5 length words taking first 2 input from user. We have to make other 3 letter by using above map.

Using above example

Enter # of words to generate: 10
Enter prefix word[0]: a
Enter prefix word[1]: d
Results = [a, d, c, a, a, b, a, d, c, a, a, d]

It is giving the correct output, but my questions are

Can this be improved?

Have I used collection classes efficiently?

Is my algorithm efficient?

`import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map

Solution

Let's go through a couple of things. Concept, then missing functionality (.... you have some).
Concept

An N-Gram is a sequence of N words that have been found in a span of text. You can have 1-grams, 2-grams, 3-grams, .... n-grams. You identify these n-grams by finding all possible n-wide spans of text, and storing them. In the sentence the quick brown fox, there are three 2-grams 'the quick', 'quick brown' and 'brown fox'. There are two 3-grams 'the quick brown' and 'quick brown fox'.

When processing natural languages, it is often statistically convenient to weigh the likelihood of a particular word happening 'next' after an existing sequence of words.

That is what this problem is about. given a span of 'n', find all the (n+1)-grams. Then, taking any n-width words, look for all the (n+1)-grams that start with those words. Randomly chose one to select the 'next' word. Then repeat the process until you run out of (n+1)-grams that match, or you hit the 'sentence' limit. You have just built a sentence that is statistically 'likely'. A smarter system will 'weight' the next word based on the frequency of the (n+1)-grams that were found in the text. I.e. If the original text has 'the white house' 10 times, and 'the white swan' just once, then it will 'randomly' choose 'house' 10 times more than 'swan'.

OK, that gives you some context for the problem.
Functionality

The challenge/requirement was to take the n value as an input. You have hard-coded it as 2. In other words, you have 2-grams and 3-grams, when you are supposed to have n-grams and (n+1)-grams.

for(int listIndex = 0 ;listIndex < givenList.size() - 2; listIndex++)

That 2 should be an n, and all the logic changes needed to fix that.

Similarly, the x1 and x2 variables should be a list, or an array, because there could be more than 2.

You already have commented that the length of the output sentence is supposed to be user-input. You should make it user-input, as well as the first n words.

You have missed the point on the generating the sentence as well... You assume that there will always be a valid/matching random word to add to the sentence until you run out of words. This is not true. You may be part way through a sentence when you discover that the last n words in your sentence do not match any available (n+1)-grams, and there is thus nothing you can add to the sentence, and you have to stop short.
Code Style

Your code should be broken out in to more functions. You currently have just one which is used to get the next random word. You should have others to read the input file. You should probably have another that populates the map, etc.

Your main method is very heavy-weight, and should have function-extraction applied.

You also have indentation that is all over the place, and makes things hard to spot. It took me a while just to see that randomWord was a function.
Conclusion

Your code is only partially working, and some core functionality is missing. You are a good way along to getting a working solution. Hopefully the background on the problem will help you to understand what the problem is trying to solve.... basically: based on statistics from existing texts, randomly generate a new sentence that uses those statistics to predict what the next words in the sentence will be.

Code Snippets

for(int listIndex = 0 ;listIndex < givenList.size() - 2; listIndex++)

Context

StackExchange Code Review Q#52417, answer score: 3

Revisions (0)

No revisions yet.