patternjavaMinor
Soundex algorithm implementation in Java
Viewed 0 times
algorithmimplementationjavasoundex
Problem
Just started learning Java Strings. Tried to implement the Soundex algorithm.
Package to hold the String related functions
```
package com.java.strings;
public class StringFunctions {
/**
* Removes all the spaces in a given String.
* E.g: : "A B CDD" becomes "ABCDD"
*/
public static String squeeze (String in) {
String temp = "";
StringBuilder sb = new StringBuilder(in.trim());
int i = 0;
while (i < sb.length()) {
if (sb.charAt(i) == ' ') {
// Starting with the current position, shift all
// the characters from right to left.
for (int j=i; j < sb.length() - 1; j++)
sb.setCharAt(j, sb.charAt(j+1));
// The length of string is reduced by 1
temp = sb.substring(0, sb.length()-1);
sb.setLength(0);
sb.append(temp);
}
// After shifting the characters from right to left, the new
// character in the current position might be a Space. If so,
// the same position has to be processed again .
if (sb.charAt(i) != ' ')
i++;
}
return sb.toString();
}
/**
* Removes Continuous Duplicate characters in a string.
* E.g: "AAABCCCDDDBB" becomes "ABCDB"
*/
public static String removeContDupChars(String in ) {
String temp = "";
StringBuilder sb = new StringBuilder(in);
int i = 0;
char prevChar;
while (i < sb.length()) {
prevChar = sb.charAt(i);
for (int j=i+1; j<sb.length(); j++) {
// As long as there are same characters, Replace all the Duplicates
// with Space.
if (prevChar == sb.charAt(j))
sb.setCharAt(j, ' ');
else
// Where there is a different char, break the inner loop.
Package to hold the String related functions
```
package com.java.strings;
public class StringFunctions {
/**
* Removes all the spaces in a given String.
* E.g: : "A B CDD" becomes "ABCDD"
*/
public static String squeeze (String in) {
String temp = "";
StringBuilder sb = new StringBuilder(in.trim());
int i = 0;
while (i < sb.length()) {
if (sb.charAt(i) == ' ') {
// Starting with the current position, shift all
// the characters from right to left.
for (int j=i; j < sb.length() - 1; j++)
sb.setCharAt(j, sb.charAt(j+1));
// The length of string is reduced by 1
temp = sb.substring(0, sb.length()-1);
sb.setLength(0);
sb.append(temp);
}
// After shifting the characters from right to left, the new
// character in the current position might be a Space. If so,
// the same position has to be processed again .
if (sb.charAt(i) != ' ')
i++;
}
return sb.toString();
}
/**
* Removes Continuous Duplicate characters in a string.
* E.g: "AAABCCCDDDBB" becomes "ABCDB"
*/
public static String removeContDupChars(String in ) {
String temp = "";
StringBuilder sb = new StringBuilder(in);
int i = 0;
char prevChar;
while (i < sb.length()) {
prevChar = sb.charAt(i);
for (int j=i+1; j<sb.length(); j++) {
// As long as there are same characters, Replace all the Duplicates
// with Space.
if (prevChar == sb.charAt(j))
sb.setCharAt(j, ' ');
else
// Where there is a different char, break the inner loop.
Solution
Bugs
The Soundex code for "Jackson" should be "J250". You fail to elide the "C" and the "K", and as a result, your code returns "J225" instead.
The Soundex code for "Wu" should be "W000", and the code for "Google" should be "G240". Your code crashes with a
If the input contains a non-alphabetic character, then
Organization
The
The
In
The
Implementation
The comments in
The Soundex code for "Jackson" should be "J250". You fail to elide the "C" and the "K", and as a result, your code returns "J225" instead.
The Soundex code for "Wu" should be "W000", and the code for "Google" should be "G240". Your code crashes with a
StringIndexOutOfBoundsException for both.If the input contains a non-alphabetic character, then
SoundExClass.getValue() crashes with a NullPointerException due to unboxing a null Character.Organization
The
com.java.strings package name infringes on someone else's namespace — assuming that you are not the owner of the java.com domain.The
SoundExClass class should be public — how else will other people call your code? But …Class is a pretty cumbersome Hungarian suffix. Furthermore, I think that there isn't much point in forcing your users to split what should be a simple function call into an object instantiation and a method call. I would just make itpublic class Soundex {
// Suppress default constructor
private Soundex() {}
public static String soundex(String name) {
…
}
}In
SoundExClass, variables map, vowels, notvowels, and dropChars should not be public. You don't want any other code to be able to alter their contents. (Note that being final does not make them unmodifiable.)The
getValue() method should be private, since it's an implementation detail that nobody outside the class should be concerned about.Implementation
removeContDupChars() is superfluous. There is no point in removing consecutive characters in the input string. Just map the characters into their respective digits — you will eventually elide them anyway when you get to // If two or more letters with the same number are adjacent in, only retain the first letter.squeeze() would be a lot simpler if you took advantage of StringBuilder.deleteCharAt(). I think that the squeeze() function should operate directly on a StringBuilder.The comments in
implementSoundEx() are helpful, but what would be even better is if each small code block were its own function operating on a StringBuilder. That would make the functionality even clearer. Taking a cue from this Haskell solution, I would suggest rewriting implementSoundEx() to look more like this:public static String soundex(String s) {
s = s.toUpperCase().trim();
if (s.isEmpty()) throw new IllegalArgumentException();
StringBuilder sb = new StringBuilder(s);
digitize(sb);
removeContDupChars(sb);
squeeze(sb);
return sb.setCharAt(0, s.charAt(0)).append("000").setLength(4).toString();
}Code Snippets
public class Soundex {
// Suppress default constructor
private Soundex() {}
public static String soundex(String name) {
…
}
}public static String soundex(String s) {
s = s.toUpperCase().trim();
if (s.isEmpty()) throw new IllegalArgumentException();
StringBuilder sb = new StringBuilder(s);
digitize(sb);
removeContDupChars(sb);
squeeze(sb);
return sb.setCharAt(0, s.charAt(0)).append("000").setLength(4).toString();
}Context
StackExchange Code Review Q#117913, answer score: 4
Revisions (0)
No revisions yet.