HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Find median of two sorted arrays

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
arraystwofindsortedmedian

Problem

Problem statement

There are two sorted arrays nums1 and nums2 of size m and n respectively. Find the median of the two sorted arrays. The overall run time complexity should be \$\mathcal{O}(\log (m+n))\$.

Implementation

def median(A, B):
    m, n = len(A), len(B)
    if m > n:
        A, B, m, n = B, A, n, m
    if n == 0:
        raise ValueError

    imin, imax, half_len = 0, m, (m + n + 1) / 2
    while imin  0 and i  A[i]:
            # i is too small, must increase it
            imin = i + 1
        elif i > 0 and j  B[j]:
            # i is too big, must decrease it
            imax = i - 1
        else:
            # i is perfect
            if i == 0: max_of_left = B[j-1]
            elif j == 0: max_of_left = A[i-1]
            else: max_of_left = max(A[i-1], B[j-1])

            if (m + n) % 2 == 1:
                return max_of_left

            if i == m: min_of_right = B[j]
            elif j == n: min_of_right = A[i]
            else: min_of_right = min(A[i], B[j])

            return (max_of_left + min_of_right) / 2.0


Concern

I suspect that it is safe to change this line of code

j > 0 and i  A[i]


to

i  A[i]


and also it is safe to change this line of code

i > 0 and j  B[j]


to

i > 0 and A[i-1] > B[j]


I think remove the condition check of j is safe since we already making sure size of A is no bigger than size of B.

Solution

First lines can be written as (it's just a personal taste):

A, B = sorted([A, B], key=len, reverse=True)


Also the (m + n + 1) / 2 is possibly a bug, use (m + n + 1) // 2 instead (in python / is floating point division and // is integer division)

I am still looking for further improvements

To be honest I still can't get why it's O(log(n+m)) :-\

Edit:

I can't tell by sure whether the changes will break or not, but I've tested it with 10^6 random inputs multiples times and it seems that the changes are safe:

#!/usr/bin/env python

import random

def median(A, B):
    m, n = len(A), len(B)
    if m > n:
        A, B, m, n = B, A, n, m
    if n == 0:
        raise ValueError

    imin, imax, half_len = 0, m, (m + n + 1) / 2
    while imin  A[i]:
            # i is too small, must increase it
            imin = i + 1
        elif i > 0 and A[i-1] > B[j]:
            # i is too big, must decrease it
            imax = i - 1
        else:
            # i is perfect
            if i == 0: max_of_left = B[j-1]
            elif j == 0: max_of_left = A[i-1]
            else: max_of_left = max(A[i-1], B[j-1])

            if (m + n) % 2 == 1:
                return max_of_left

            if i == m: min_of_right = B[j]
            elif j == n: min_of_right = A[i]
            else: min_of_right = min(A[i], B[j])

            return (max_of_left + min_of_right) / 2.0

def random_seq(max_len):
    l = random.randint(0, max_len)
    return list(sorted(random.choice(range(max_len)) for i in range(l)))

def median_slow(A, B):
    C = A + B
    C.sort()
    if not C: raise ValueError()
    m1 = (len(C) - 1) / 2
    m2 = len(C) / 2
    return (C[m1] + C[m2]) / 2.0

for i in range(1000000):
    A = random_seq(10)
    B = random_seq(10)
    if len(A) + len(B) > 0:
        me = median(A, B)
        ma = median_slow(A, B)
        assert me == ma

print "Validated!"


Last Edit:

m + n is larger than one so m + n + 1 is larger than two so j = half_length - 1 will always be positive. also n > m so always n + m <= 2*n so half_length = (n + m + 1) / 2 is always n or smaller but if you subtract one (j = half_length - 1) then it will be always less than n. So these checks are really redundant

Code Snippets

A, B = sorted([A, B], key=len, reverse=True)
#!/usr/bin/env python

import random

def median(A, B):
    m, n = len(A), len(B)
    if m > n:
        A, B, m, n = B, A, n, m
    if n == 0:
        raise ValueError

    imin, imax, half_len = 0, m, (m + n + 1) / 2
    while imin <= imax:
        i = (imin + imax) / 2
        j = half_len - i
        if i < m and B[j-1] > A[i]:
            # i is too small, must increase it
            imin = i + 1
        elif i > 0 and A[i-1] > B[j]:
            # i is too big, must decrease it
            imax = i - 1
        else:
            # i is perfect
            if i == 0: max_of_left = B[j-1]
            elif j == 0: max_of_left = A[i-1]
            else: max_of_left = max(A[i-1], B[j-1])

            if (m + n) % 2 == 1:
                return max_of_left

            if i == m: min_of_right = B[j]
            elif j == n: min_of_right = A[i]
            else: min_of_right = min(A[i], B[j])

            return (max_of_left + min_of_right) / 2.0

def random_seq(max_len):
    l = random.randint(0, max_len)
    return list(sorted(random.choice(range(max_len)) for i in range(l)))

def median_slow(A, B):
    C = A + B
    C.sort()
    if not C: raise ValueError()
    m1 = (len(C) - 1) / 2
    m2 = len(C) / 2
    return (C[m1] + C[m2]) / 2.0

for i in range(1000000):
    A = random_seq(10)
    B = random_seq(10)
    if len(A) + len(B) > 0:
        me = median(A, B)
        ma = median_slow(A, B)
        assert me == ma

print "Validated!"

Context

StackExchange Code Review Q#125263, answer score: 3

Revisions (0)

No revisions yet.