patternjavaMinor
Median of two sorted equal sized arrays on combination
Viewed 0 times
combinationarraysequalsizedtwosortedmedian
Problem
The comprehensive description of the problem can be found here. I'm looking for code review, clever optimizations, adherence to best practices. etc.
This code is also a correction of code previously posted here.
This code is also a correction of code previously posted here.
/**
* @author javadeveloper
*/
public final class MedianOfTwoSortedArrays {
private MedianOfTwoSortedArrays() {}
private static boolean isMedian (int[] a1, int[] a2, int pos) {
if (pos == 0 || pos == (a1.length - 1)) return true;
return (a1[pos] >= a2[pos]) && (a1[pos] a2[a2Median]) {
hb = a1Median - 1;
}
}
return null;
}
public static void main(String[] args) {
int[] a1 = {1, 3, 5, 7, 9};
int[] a2 = {2, 4, 6, 8, 10};
System.out.println("Expected 5.5, Actual " + median (a1, a2));
int[] a3 = {1, 3, 5, 7, 9, 100};
int[] a4 = {2, 4, 6, 8, 10, 200};
System.out.println("Expected 6.5, Actual " + median (a3, a4));
int[] a5 = {1, 2, 3, 4};
int[] a6 = {5, 6, 7, 8};
System.out.println("Expected 4.5, Actual " + median (a5, a6));
int[] a7 = {5, 6, 7, 8};
int[] a8 = {1, 2, 3, 4};
System.out.println("Expected 4.5, Actual " + median (a7, a8));
int[] a9 = {1, 1, 10, 10};
int[] a10 = {5, 5, 5, 5};
System.out.println("Expected 5.0, Actual " + median (a9, a10));
int[] a11 = {5, 5, 5, 5};
int[] a12 = {1, 1, 10, 10};
System.out.println("Expected 5.0, Actual " + median (a11, a12));
}
}Solution
Short Review
I have spent some hours trying to understand what you do, and why. Given the mess I made of the previous posting of this question, I thought I should spend extra effort to get this one right.
I have failed. I cannot understand the full intent of your code, and, even though it appears to work, I have this nagging feeling that some well-constructed uses cases will cause the code to fail - you have too many edge-case conditions in the code which are not documented, and not contextualized enough to ascertain whether they work, or what conditions they are supposed to handle.
Bottom line is that your program is not understandable in a reasonable amount of time, and it was your responsibility to make your code understandable, not the reviewer's responsibility to decode your system. Just because you may have your solution 'straight' in your head, it does not mean that you have expressed it well in the code. You have to write and document the code so that someone who has only just seen the problem can understand it and the solution with 'reasonable' effort. In this case, I don't believe you have succeeded.
Longer Review
In a 'real' review there is no "reference implementation" to compare against. All there is, is a specification. In this case, the specification is to identify the median (middle value, or average of middle-pair if the data-set is even-sized). In addition to the expected result, the data is assumed to be in two equal-sized arrays of sorted data.
Given those specifications you can also make a couple of assumptions:
OK, based on those specifications and assumptions, we can review the code, and see how it solves the problem.
-
First thing I note is that there are two variables which have names that don't seem to relate to any feature of the problem space,
This is a bad practice, and is solved by using self-documenting variable names. This has been pointed out to you before. You need to start improving your code-style.
-
When I try to track down what
At this point, in any 'serious' review, you would be facing a lot of criticism. There are times when it is OK for code to be hard to read... but that is only when the problem is very complicated. What you have here is unnecessarily hard to read.
OK, after some study (and I literally mean some serious head-scratching, debugging, and paper-worked examples), I think I can see your algorithm.....
I have spent some hours trying to understand what you do, and why. Given the mess I made of the previous posting of this question, I thought I should spend extra effort to get this one right.
I have failed. I cannot understand the full intent of your code, and, even though it appears to work, I have this nagging feeling that some well-constructed uses cases will cause the code to fail - you have too many edge-case conditions in the code which are not documented, and not contextualized enough to ascertain whether they work, or what conditions they are supposed to handle.
Bottom line is that your program is not understandable in a reasonable amount of time, and it was your responsibility to make your code understandable, not the reviewer's responsibility to decode your system. Just because you may have your solution 'straight' in your head, it does not mean that you have expressed it well in the code. You have to write and document the code so that someone who has only just seen the problem can understand it and the solution with 'reasonable' effort. In this case, I don't believe you have succeeded.
Longer Review
In a 'real' review there is no "reference implementation" to compare against. All there is, is a specification. In this case, the specification is to identify the median (middle value, or average of middle-pair if the data-set is even-sized). In addition to the expected result, the data is assumed to be in two equal-sized arrays of sorted data.
Given those specifications you can also make a couple of assumptions:
- the ending-dataset will always have an even number of elements (
2 * xis always even for any integral valuex), which means we will always be doing some form of average.
- There will always be
n-1values smaller than the smaller of the mid-point-pair, andn-1values larger than the higher of the mid-point-pair.
OK, based on those specifications and assumptions, we can review the code, and see how it solves the problem.
-
First thing I note is that there are two variables which have names that don't seem to relate to any feature of the problem space,
lb and hb. There is no comment to indicate what these variables are, so the only choice is to inspect the code to see how it is used. This means we have to trust that the code is doing the right thing just so we can understand what the intended use of the variable is.This is a bad practice, and is solved by using self-documenting variable names. This has been pointed out to you before. You need to start improving your code-style.
-
When I try to track down what
lb and hb are supposed to be, I find that they are used as a loop condition, which does not help us in this case because the loop condition is not documented. They are also used to calculate the variables a1Median and a2Median. Unfortunately, a1Median, although it is a descriptive name, does not mean what it says it means.... it is not the Median of array a1. It starts off being the approximate midpoint of a1 (the median is the value at the midpoint, not the position of the midpoint), but then, to make things worse, the a1Median is modified in each loop! So, this descriptive variable name has the completely wrong description. This is worse than having a non-descriptive name because, now the person reading the code has to keep remembering that "a1Median is not the median of array a1"!At this point, in any 'serious' review, you would be facing a lot of criticism. There are times when it is OK for code to be hard to read... but that is only when the problem is very complicated. What you have here is unnecessarily hard to read.
OK, after some study (and I literally mean some serious head-scratching, debugging, and paper-worked examples), I think I can see your algorithm.....
- set up
lbandhb(still not sure what those are supposed to stand for - low-something and high-something ?) as pointers in to the first array.
- we will manipulate these high and low pointers to select a 'candidate' value in the first array.
- we then use the candidate point in the first array and calculate a candidate point in the second array. The second array's candidate is always going to be where there are n-1 values smaller/larger than the two candidate positions. If our candidate in array1 is
xthen the candidate in array2 will bearray2 - x - 1which would satisfy the midpoint-pair condition.
- if the values at these candidate positions are in fact the midpoints, then we can return a success.
- if we have a success, we do not actually know if the candidate values are in fact the actual pair, all we know is that one of the candidates is the midpoint. The other candidate may not be part of the solution if we have two of the same values on our side... in which case both of the candidates need to come from one array.
- if we do not have a success, we essentually do a 'bifurcation' of the data to do a binary search-style partitioning of the dat
Context
StackExchange Code Review Q#39574, answer score: 9
Revisions (0)
No revisions yet.