patternjavaMinor
Website Spell Checker in Java
Viewed 0 times
websitespellcheckerjava
Problem
I've implemented a program that spell checks a website.
Here is the idea that I have in mind:
I would like to have my code reviewed and would greatly appreciate any input on how to make it more efficient or clean.
There is probably some bad practice followed as I'm new to programming so I apologize in advance if I'm doing something that is obviously wrong.
Some problems I've noticed about my code:
Here is the code:
Class 1 (used to call on the methods, basically a neat class to look good)
Class 2
```
import java.io.*;
import java.util.*;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.safety.Whitelist;
public class ParseCleanCheck {
static Hashtable dictionary;// To store all the words of the
// dictionary
static boolean suggestWord;// To indicate whether the word is spelled
// correctly or not.
static Scanner urlInput = new Scanner(System.in);
public static String cleanString;
public static String url = "";
public static boolean correct = true;
/**
* PARSER METHOD
*/
public static void PageScanner() throws IOException {
System.out.println("Pick an english website to scan.");
// This do-while loop allo
Here is the idea that I have in mind:
- Scan all of the words in a web page into a string (using jsoup)
- Filter out all of the HTML markup and code
- Use a spell checking algorithm that reads from a dictionary.txt file and uses probability theory to offer suggestions
I would like to have my code reviewed and would greatly appreciate any input on how to make it more efficient or clean.
There is probably some bad practice followed as I'm new to programming so I apologize in advance if I'm doing something that is obviously wrong.
Some problems I've noticed about my code:
- It only accepts English words
- It prints out each suggestion in a new line, so large websites produce a messy output.
Here is the code:
Class 1 (used to call on the methods, basically a neat class to look good)
import java.io.*;
public class BulkSpellChecker extends ParseCleanCheck {
public static void main(String[] args) throws IOException {
System.out.println("Let's get started!");
PageScanner(); // Scan the page and clean it first
SpellChecker(); // Spell check the cleaned page
System.out.println("Thanks for using the spell checker!");
}}Class 2
```
import java.io.*;
import java.util.*;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.safety.Whitelist;
public class ParseCleanCheck {
static Hashtable dictionary;// To store all the words of the
// dictionary
static boolean suggestWord;// To indicate whether the word is spelled
// correctly or not.
static Scanner urlInput = new Scanner(System.in);
public static String cleanString;
public static String url = "";
public static boolean correct = true;
/**
* PARSER METHOD
*/
public static void PageScanner() throws IOException {
System.out.println("Pick an english website to scan.");
// This do-while loop allo
Solution
Style convention
As you have used correctly in some places, Java's default convention is to use
You also use a mixture of
Inheritance
This looks slightly odd, especially when
Implementation vs interface
All Almost all your
In addition, since Java 7, you can rely on the generic type infererence to shorten the declaration as such:
In 2017,
A hard
Variables naming
Reading
Actually, you can also eliminate the flag entirely, and by packaging the method as one that actually returns a usable output instead of assigning
This showcases how the
Since Java 7, you can rely on
Since Java 8, there's
As you have used correctly in some places, Java's default convention is to use
camelCase for method names, so having PageScanner() and SpellChecker() is mildly jarring to look at.You also use a mixture of
PascalCase, snake_case and camelCase for variable names, and the default convention for non-static final variables is to use camelCase as well. Standardization is highly recommended here.Inheritance
BulkSpellChecker extends ParseCleanCheckThis looks slightly odd, especially when
BulkSpellChecker is just, in your words, 'a neat class to look good'. If all you are doing is to implement public static void main(String[] args), you can do it in the underlying classes too. Extending a class only to implement static methods is a poor demonstration of inheritance.Implementation vs interface
All Almost all your
Collection classes are declared by their implementations (ArrayList) instead of their interfaces (List). It is usually recommended to use interfaces so that users of those variables only need to know they are dealing with a List. This allows for substitution too, e.g. during testing or to thread-safe implementations if required.In addition, since Java 7, you can rely on the generic type infererence to shorten the declaration as such:
// ArrayList result = new ArrayList();
List result = new ArrayList<>();HashtableIn 2017,
Hashtable is pretty much a relic class and you are highly encouraged to switch over to HashMap or ConcurrentHashMap, as you already did elsewhere.System.exitA hard
System.exit(int) is usually not recommended, especially when it does not sit inside the main() method (it's at least easier to follow there). If you really do encounter a serious error, propagate the exception to the callers until you can handle it safely, e.g. by prompting the user to re-enter.Variables naming
// This do-while loop allows the user to try again after a mistake
do {
try {
System.out.println("Enter a URL, starting with http://");
// ...
correct = false;
} catch (Exception e) {
System.out.println("Incorrect format for a URL. Please try again.");
}
} while (correct);Reading
correct here is quite misleading as it sounds like you will loop when the processing inside the code block is correct. One suggestion is to invert the meaning so that it better reflects what is being done here:boolean isDone = false;
while (!isDone) {
try {
System.out.println("Enter a URL, starting with http://");
// ...
isDone = true;
} catch (Exception e) {
System.out.println("Incorrect format for a URL. Please try again.");
}
}Actually, you can also eliminate the flag entirely, and by packaging the method as one that actually returns a usable output instead of assigning
static variables, you will get something like:public static String getHtmlOutput(Scanner input) {
System.out.println("Pick an english website to scan.");
while (true) {
try {
System.out.println("Enter a URL, starting with http://");
Document doc = Jsoup.connect(input.nextLine()).get();
return Jsoup.clean(doc.toString(), Whitelist.none());
} catch (Exception e) {
System.out.println("Incorrect format for a URL. Please try again.");
}
} while (correct);
}This showcases how the
Scanner object reading from System.in (or potentially other sources) is taken in as the input, and returns the output of Jsoup.clean(String, Whitelist).try-with-resourcesSince Java 7, you can rely on
try-with-resources for safe and efficient handling of the underlying IO resources. For example:public static void main(String[] args) {
String htmlOutput;
try (Scanner scanner = new Scanner(System.in)) {
htmlOutput = getHtmlOutput(scanner);
}
// ... do something with htmlOutput
}Map methodsSince Java 8, there's
Map.merge(K, V, BiFunction) to simplify the following kind of operations:// words.put((temp = m.group()), words.containsKey(temp) ? words.get(temp) + 1 : 1);
words.merge(m.group(), 1, Integer::sum);- Use
m.group()as the key.
- Use
1as the default value.
- If the entry exists, apply the
BiFunctionInteger.sum(int, int)(as a method reference) to sum the existing value and the incoming value1.
Code Snippets
BulkSpellChecker extends ParseCleanCheck// ArrayList<String> result = new ArrayList<String>();
List<String> result = new ArrayList<>();// This do-while loop allows the user to try again after a mistake
do {
try {
System.out.println("Enter a URL, starting with http://");
// ...
correct = false;
} catch (Exception e) {
System.out.println("Incorrect format for a URL. Please try again.");
}
} while (correct);boolean isDone = false;
while (!isDone) {
try {
System.out.println("Enter a URL, starting with http://");
// ...
isDone = true;
} catch (Exception e) {
System.out.println("Incorrect format for a URL. Please try again.");
}
}public static String getHtmlOutput(Scanner input) {
System.out.println("Pick an english website to scan.");
while (true) {
try {
System.out.println("Enter a URL, starting with http://");
Document doc = Jsoup.connect(input.nextLine()).get();
return Jsoup.clean(doc.toString(), Whitelist.none());
} catch (Exception e) {
System.out.println("Incorrect format for a URL. Please try again.");
}
} while (correct);
}Context
StackExchange Code Review Q#159547, answer score: 3
Revisions (0)
No revisions yet.