HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaModerate

Arranging words in sentences alphabetically

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
arrangingwordsalphabeticallysentences

Problem

For an online contest whose problem is here, I wrote some code.


In this problem the input will consist of a number of lines of English text consisting of the letters of the English alphabet, the punctuation marks ' (apostrophe), . (full stop), , (comma), ; (semicolon), : (colon) and white space characters (blank, newline).

Your task is print the words in the text in lexicographic order (that is, dictionary order). Each word should appear exactly once in your list. You can ignore the case (for instance, "The" and "the" are to be treated as the same word.) There should be no uppercase letters in the output.


For example, consider the following candidate for the input text:

This is a sample piece of text to illustrate this
problem.



The corresponding output would read as:

a
illustrate
is
of
piece
problem
sample
text
this
to


Input format


The first line of input contains a single integer \$N\$, indicating the number of lines in the input. This is followed by \$N\$ lines of input text.

Output format


The first line of output contains a single integer \$M\$ indicating the number of distinct words in the given text. The next \$M\$ lines list out these words in lexicographic order.

Test data


You may assume that \$N ≤ 10000\$ and that there are at most 80 characters in each line. You may also assume that there are at the most 1000 distinct words in the given text.

Example


We now illustrate the input and output formats using the above example.

Sample input

2
This is a sample piece of text to illustrate this
problem.


Sample output

10
a
illustrate
is
of
piece
problem
sample
text
this
to


It is working perfectly as far as I know. When I run my code, it passes all tests.

`import java.util.ArrayList;
import java.util.Collections;
import java.util.Scanner;

/**
* Created by aditya on 14-10-2014.
*/
public class Main {

public static void main(String args[]) {

Scanner

Solution

Your code here is far more complicated than it needs to be. The trick you need to follow is four-fold:

  • Use your library functionality



  • Use Regular Expressions



  • Use buffered input/output for performance



  • bend the rules on the input side.



You go to great lengths to store the data in two arrays (with bad names words and words_2), and you use the one to check for words in the other. The right structure in Java is a Set, which automatically, and efficiently does the check for you. Further, there's another specialized set, called the TreeSet which stores the data in sorted order already.

Regular Expressions are a good way to identify patterns in the data. They take some getting used to, but, because of the way they can be pre-compiled, and reused, they are fast. Note that the Scanner class looks for regular expressions.

When using input, and output, you get better performance if you use fewer operations. You should wrap the System.in in a BufferedInputStream, and perform all the output to a StringBuilder, and then dump that to System.out in one operation.

Now, about bending the rules on the input side.....

Many programming challenges are set up to work in many languages, including things like C, C++, etc. Some of those languages need to have an idea of how much space to allocate before processing the data. That's why the description says it will contain the number of lines, as well as the maximum line length. If you assume that the input will be valid, then there is no reason to process the data line by line, you can do it word by word, and just forget about the line count in the first line....

How would I do it?

try (Scanner scan = new Scanner(new BufferedInputStream(System.in));) {
        // find word boundaries...
        scan.useDelimiter(Pattern.compile("[\\s;:,.'\n]+", Pattern.MULTILINE));
        scan.nextLine(); // ignore the count
        Set words = new TreeSet<>();
        while (scan.hasNext()) {
            String word = scan.next().toLowerCase();
            words.add(word);
        }
        StringBuilder sb = new StringBuilder();
        sb.append(words.size()).append("\n");
        for (String word : words) {
            sb.append(word).append("\n");
        }
        System.out.print(sb.toString());
        System.out.flush();
    }


On further reflection, I would split the logic in to two parts, a word-finder, and a word printer:

Set words = parseWords(System.in);
System.out.println(formatWords(words));

Code Snippets

try (Scanner scan = new Scanner(new BufferedInputStream(System.in));) {
        // find word boundaries...
        scan.useDelimiter(Pattern.compile("[\\s;:,.'\n]+", Pattern.MULTILINE));
        scan.nextLine(); // ignore the count
        Set<String> words = new TreeSet<>();
        while (scan.hasNext()) {
            String word = scan.next().toLowerCase();
            words.add(word);
        }
        StringBuilder sb = new StringBuilder();
        sb.append(words.size()).append("\n");
        for (String word : words) {
            sb.append(word).append("\n");
        }
        System.out.print(sb.toString());
        System.out.flush();
    }
Set<String> words = parseWords(System.in);
System.out.println(formatWords(words));

Context

StackExchange Code Review Q#66850, answer score: 12

Revisions (0)

No revisions yet.