HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaMinor

Find minimum window size which includes the subset of characters

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
includesthesizeminimumcharactersfindwhichwindowsubset

Problem

Given a set T of characters and a string S, find the minimum window in S which will contain all the characters in T in complexity O(n).

eg,
S = “ADOBECODEBANC”
T = “ABC”

Minimum window size is 4 - “BANC”.

public final class MinSubstringWindow {

    private MinSubstringWindow() {}

    private static void addCharCount(Map map, Character ch) {
        if (!map.containsKey(ch)) {
            map.put(ch, 1);
        } else {
            int val = map.get(ch);
            map.put(ch, val + 1);
        }
    }

    private static Map toFindMap(String subset) {
        Map toFind  = new HashMap();
        for (char ch : subset.toCharArray()) {
            addCharCount(toFind, ch);
        }
        return toFind;
    }

    public static int mindWindowSize(String str, String subset) {
        final Map toFind = toFindMap(subset);
        final Map hasFound  = new HashMap();

        char[] ch  = str.toCharArray();

        int matchCtr = 0;
        boolean matchFound = false;
        char currLeftMostChar = 0;

        int j = 0; // the trailing position of the sliding window
        int i = 0; // the leading position of the sliding window.

        int min = Integer.MAX_VALUE;

        for (i = 0; i = subset.length()) {
                if ((i - j + 1)  toFind.get(currLeftMostChar)) {
                // advance the left pointer, such the window (i-j) is as small as possible.    
                while (!toFind.containsKey(ch[j]) || hasFound.get(ch[j]) > toFind.get(ch[j])) {
                    if (hasFound.containsKey(ch[j])) {
                        int val = hasFound.get(ch[j]);
                        hasFound.put(ch[j], val - 1);
                    } 
                    j++;
                }
                currLeftMostChar = ch[j];
            }  
        }

        if (matchCtr  (i - j) ? i - j : min;
    }
}


And test section

```
public class MinSubstringWindowTest {

@Test
public void test1() {
assertEquals(3, MinSubstringWindow.mind

Solution

The first big issue that crosses my mind: Naming. In particular, commenting variables is always a sign that you did a poor job on giving them good names:

int j = 0; // the trailing position of the sliding window
int i = 0; // the leading position of the sliding window.


There is no reason to name loop variables i and j just because. If they have a meaning beyond just being there for the sake of looping, give them a name representing this. It makes the following code easier to read because the reader won't have to look up and remember their meaning.

After giving them proper names you can also get rid of declaring them outside the loop. Declaring them inside the loop makes the coder smaller, which is good not for the sake of having less lines (LoC is a terrible measurement of quality), but it reduces the overhead when trying to understand what is going on.

This applies especially to i and j, but also to many other names in your program:

  • What is ch supposed to tell me? I don't need information about the type of a variable – the compiler and IDE do a lot of the work for me here; what I, as a reader, care about is the semantics of variables.



  • Is there really any need for shortening matchCounter to matchCtr? Your IDE will help you avoid having to type too much. Of course experienced programmers have "ctr" already in their vocabulary and automatically translate it to "counter", but there is not really any reason these days to add this extra translation step.



  • I'll just leave out the rest, typically the same arguments apply as the ones I stated above.



Let's move on. What was the reasoning for making everything static? You forced yourself to do this:

addCharCount(hasFound, ch[i]);


This is called a side-effect method and unless there is no way around it, it should be avoided. It is counter-intuitive that this method actually mutates the passed map (and from experience: this can easily cause very messy code and ugly bugs).

Instead, just make your class a normal (non-static) class with state. It allows you to get rid of some method parameters as they just become class members.

What you can do is make the constructor private (as it is now) and add a static method that creates a new instance and calls the proper method on it. This ensures that the caller doesn't call the method twice on the same instance and you won't have to deal with resetting members.

Refactoring this to a proper class will also allow you to avoid this as well as the block following it:

// check if match has been completed.
if (matchCtr >= subset.length()) {
    if ((i - j + 1) < min) {
        min = i - j + 1;
    }
}


Again, comments are good for mostly one thing: they signal code smell. If you have an entire block that needs a comment to be explained, it is time to think about whether the better way to go here is to extract a method with a good name that replaces the comment, because – to me – the single most profound fact about comments is that they rot faster than anything else and even the best comments will soon start lying. In other words: comments tend to become not just useless, but dangerous.

And your method is already big enough; too big, I would say. It does too many things on the same level of abstraction, so I would try splitting it up further.

Now to this:

if (matchCtr < subset.length()) {
    throw new IllegalArgumentException("The subset is not found in the input string.");
}


Is it really an illegal argument? You can argue that it is, but I tend to think that it's not. Is an exception even the right way to go here? Is this exceptional behavior you want the caller to deal with because you can't decide what to do here?

Finally, the tests. Again, the thing that really jumps into my eyes are names. What is test and test2 supposed to tell me about the test case here? Choose proper names describing the test cases, e.g

  • testShortestWindowAtTheEndOfStringIsFound



  • testShortedWindowWithinStringIsFound






Also, at first glance I fail to see why you chose to split your test cases into two tests as you did. I would say that every single assertion here deserves a test on its own (with proper name). This way, if a test fails, the test name alone is enough to tell you what isn't working, you won't have to first find out which assertion went wrong and you can be sure that the other cases are still working.

Code Snippets

int j = 0; // the trailing position of the sliding window
int i = 0; // the leading position of the sliding window.
addCharCount(hasFound, ch[i]);
// check if match has been completed.
if (matchCtr >= subset.length()) {
    if ((i - j + 1) < min) {
        min = i - j + 1;
    }
}
if (matchCtr < subset.length()) {
    throw new IllegalArgumentException("The subset is not found in the input string.");
}

Context

StackExchange Code Review Q#48375, answer score: 4

Revisions (0)

No revisions yet.