HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaMinor

Removing Asterisks and neighbors from a string

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
removingandfromstringasterisksneighbors

Problem

I am going through the CodingBat exercises for Java. I got to this problem:


Return a version of the given string, where for every star () in the string the star and the chars immediately to its left and right (if any) are gone. So "ad", "abcd", "ab**cd", "ead", and "ade" all yield "ad".

I decided to solve this using regular expressions. Here is my code:

public String starOut(String str){

    String s = " " + str + " "; //Avoiding OOB exceptions.
    String n = ""; //Used for replacements of s.

    if (s.contains("***")) {
        n = s.replaceAll(".[*][*][*].", "");
        s = n;
    }

    if (s.contains("**")) {
        n = s.replaceAll(".[*][*].", "");
        s = n;
    }

    if (s.contains("*")) {          
        n = s.replaceAll(".[*].", "");
    }

    String theOne = n.replaceAll("\\s", ""); //Remove whitespace created by s declaration.
    return theOne;
}


My code is inefficient and repetitious, and does not account for situations of a string containing more than three * adjacent to each other. I can't help feeling like I'm missing something obvious of regex that would be a beautifully logical solution.

What would be a good solution to ensure my code utilises regex in an efficient and sensical way? Would it be more appropriate to solve this by looping through characters in the original string?

Solution

Regular expressions are the right tool for this sort of problem, and your suspicions that your code is not great, is about right.... there's the + operator in regular expressions which will do what you want much more concisely. + matches 1-or-more times.

Consider the simple expression:

String compact = raw.replaceAll(".?\\*+.?", "");


That replaces all something-stars-something patterns with nothing.

Note, this pattern will have the following results:

ab*cd     ad
ab***cd   ad
a**b      
ab****    a


etc.

The way the expression works is as follows:

Key features of the regex are:

  • \\+ - is normally a special character. We have to escape it with \\ to make it a normal . The + is a 1-or-many match. What does this mean? It means that the expression \\+ will match at least one * character, perhaps many of them in a row.



  • .? this is a non-greedy 0-or-1 match of any character. This requires some explaining. This will match at most 1 character, but, if the overall pattern will fail to match something, then the pattern can be tried again but matching nothing. What it means, is: "if possible, match 1 character - any character (including *)"



Putting them together, you get an expression that matches any character, if there is one, before an asterisk, as well as the asterisk, and any other asterisks that follow it, and finally any other character, if there is one.

See this in an ideone here: http://ideone.com/6yMSvx

Code Snippets

String compact = raw.replaceAll(".?\\*+.?", "");
ab*cd     ad
ab***cd   ad
a**b      
ab****    a

Context

StackExchange Code Review Q#86278, answer score: 7

Revisions (0)

No revisions yet.