HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaMinor

Website Spell Checker in Java

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
websitespellcheckerjava

Problem

I've implemented a program that spell checks a website.
Here is the idea that I have in mind:

  • Scan all of the words in a web page into a string (using jsoup)



  • Filter out all of the HTML markup and code



  • Use a spell checking algorithm that reads from a dictionary.txt file and uses probability theory to offer suggestions



I would like to have my code reviewed and would greatly appreciate any input on how to make it more efficient or clean.

There is probably some bad practice followed as I'm new to programming so I apologize in advance if I'm doing something that is obviously wrong.

Some problems I've noticed about my code:

  • It only accepts English words



  • It prints out each suggestion in a new line, so large websites produce a messy output.



Here is the code:

Class 1 (used to call on the methods, basically a neat class to look good)

import java.io.*;

public class BulkSpellChecker extends ParseCleanCheck {

    public static void main(String[] args) throws IOException {
        System.out.println("Let's get started!");

        PageScanner(); // Scan the page and clean it first
        SpellChecker(); // Spell check the cleaned page

        System.out.println("Thanks for using the spell checker!");
    }}


Class 2

```
import java.io.*;
import java.util.*;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.safety.Whitelist;

public class ParseCleanCheck {

static Hashtable dictionary;// To store all the words of the
// dictionary
static boolean suggestWord;// To indicate whether the word is spelled
// correctly or not.

static Scanner urlInput = new Scanner(System.in);
public static String cleanString;
public static String url = "";
public static boolean correct = true;

/**
* PARSER METHOD
*/
public static void PageScanner() throws IOException {
System.out.println("Pick an english website to scan.");

// This do-while loop allo

Solution

Style convention

As you have used correctly in some places, Java's default convention is to use camelCase for method names, so having PageScanner() and SpellChecker() is mildly jarring to look at.

You also use a mixture of PascalCase, snake_case and camelCase for variable names, and the default convention for non-static final variables is to use camelCase as well. Standardization is highly recommended here.

Inheritance

BulkSpellChecker extends ParseCleanCheck


This looks slightly odd, especially when BulkSpellChecker is just, in your words, 'a neat class to look good'. If all you are doing is to implement public static void main(String[] args), you can do it in the underlying classes too. Extending a class only to implement static methods is a poor demonstration of inheritance.

Implementation vs interface

All Almost all your Collection classes are declared by their implementations (ArrayList) instead of their interfaces (List). It is usually recommended to use interfaces so that users of those variables only need to know they are dealing with a List. This allows for substitution too, e.g. during testing or to thread-safe implementations if required.

In addition, since Java 7, you can rely on the generic type infererence to shorten the declaration as such:

// ArrayList result = new ArrayList();
List result = new ArrayList<>();


Hashtable

In 2017, Hashtable is pretty much a relic class and you are highly encouraged to switch over to HashMap or ConcurrentHashMap, as you already did elsewhere.

System.exit

A hard System.exit(int) is usually not recommended, especially when it does not sit inside the main() method (it's at least easier to follow there). If you really do encounter a serious error, propagate the exception to the callers until you can handle it safely, e.g. by prompting the user to re-enter.

Variables naming

// This do-while loop allows the user to try again after a mistake
do {
    try {
        System.out.println("Enter a URL, starting with http://");
        // ...
        correct = false;
    } catch (Exception e) {
        System.out.println("Incorrect format for a URL. Please try again.");
    }
} while (correct);


Reading correct here is quite misleading as it sounds like you will loop when the processing inside the code block is correct. One suggestion is to invert the meaning so that it better reflects what is being done here:

boolean isDone = false;
while (!isDone) {
    try {
        System.out.println("Enter a URL, starting with http://");
        // ...
        isDone = true;
    } catch (Exception e) {
        System.out.println("Incorrect format for a URL. Please try again.");
    }
}


Actually, you can also eliminate the flag entirely, and by packaging the method as one that actually returns a usable output instead of assigning static variables, you will get something like:

public static String getHtmlOutput(Scanner input) {
    System.out.println("Pick an english website to scan.");
    while (true) {
        try {
            System.out.println("Enter a URL, starting with http://");
            Document doc = Jsoup.connect(input.nextLine()).get();
            return Jsoup.clean(doc.toString(), Whitelist.none());
        } catch (Exception e) {
            System.out.println("Incorrect format for a URL. Please try again.");
        }
    } while (correct);
}


This showcases how the Scanner object reading from System.in (or potentially other sources) is taken in as the input, and returns the output of Jsoup.clean(String, Whitelist).

try-with-resources

Since Java 7, you can rely on try-with-resources for safe and efficient handling of the underlying IO resources. For example:

public static void main(String[] args) {
    String htmlOutput;
    try (Scanner scanner = new Scanner(System.in)) {
        htmlOutput = getHtmlOutput(scanner);
    }
    // ... do something with htmlOutput
}


Map methods

Since Java 8, there's Map.merge(K, V, BiFunction) to simplify the following kind of operations:

// words.put((temp = m.group()), words.containsKey(temp) ? words.get(temp) + 1 : 1);
words.merge(m.group(), 1, Integer::sum);


  • Use m.group() as the key.



  • Use 1 as the default value.



  • If the entry exists, apply the BiFunction Integer.sum(int, int) (as a method reference) to sum the existing value and the incoming value 1.

Code Snippets

BulkSpellChecker extends ParseCleanCheck
// ArrayList<String> result = new ArrayList<String>();
List<String> result = new ArrayList<>();
// This do-while loop allows the user to try again after a mistake
do {
    try {
        System.out.println("Enter a URL, starting with http://");
        // ...
        correct = false;
    } catch (Exception e) {
        System.out.println("Incorrect format for a URL. Please try again.");
    }
} while (correct);
boolean isDone = false;
while (!isDone) {
    try {
        System.out.println("Enter a URL, starting with http://");
        // ...
        isDone = true;
    } catch (Exception e) {
        System.out.println("Incorrect format for a URL. Please try again.");
    }
}
public static String getHtmlOutput(Scanner input) {
    System.out.println("Pick an english website to scan.");
    while (true) {
        try {
            System.out.println("Enter a URL, starting with http://");
            Document doc = Jsoup.connect(input.nextLine()).get();
            return Jsoup.clean(doc.toString(), Whitelist.none());
        } catch (Exception e) {
            System.out.println("Incorrect format for a URL. Please try again.");
        }
    } while (correct);
}

Context

StackExchange Code Review Q#159547, answer score: 3

Revisions (0)

No revisions yet.