HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

XOR of a list of a numbers

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
xorlistnumbers

Problem

In preparing a solution for the CodeChef "Will India Win" challenge, I am trying to find out the xor of list of numbers.

The first line of input contains the number of test cases, 1 ≤ T ≤ 106.

Subsequent lines contain fifteen space-delimited numbers (each between 1 and 231), to be XORed together.

def main():
    t = int(raw_input())
    for i in range(t):

        input_list = [int(j) for j in raw_input().split()]
        res = input_list[0]
        for n in range(len(input_list)-1):
            res = res^input_list[n+1]
        print res

if __name__ == '__main__':
    main()


Here is my second code by implementing a 2D array through a dictionary:

def main():
    t = int(raw_input())
    for i in range(t):
        input_str = [ int(j) for j in raw_input().split()]
        d = {}
        for j in range(15):
            d[j] = bin(input_str[j])[2:].zfill(32)

        l = []#contains the final output
        for j in range(32):
            temp = 0
            for k in range(15):
                if d[k][j] == '1':
                    temp += 1

            if temp%2 == 0:
                l.append(0)
            else:
                l.append(1)
if  __name__ == '__main__':
    name()


But it turns out that both code are not optimal because I am getting time constraint problems. How can I optimize the code?

Solution

As the input size is 106 and you're using Python 2 the first issue in that you're creating an unnecessary list of 106 integers just to do a loop.

Instead of that you should use xrange() which yields integers lazily:

>>> %timeit for _ in range(10**6): pass
10 loops, best of 3: 24.1 ms per loop
>>> %timeit for _ in xrange(10**6): pass
f100 loops, best of 3: 8.92 ms per loop


Another slight micro-optimization that you can do here is to use itertools.repeat with None. None is a singleton, so, only a single None ever exists in memory and it is the smallest object in CPython on the other hand creating 10**6 integers is expensive(well CPython caches some of them, but still they are unnecessary here)

>>> %timeit for _ in repeat(None, 10**6): pass
100 loops, best of 3: 7 ms per loop


As we're using the functions like raw_input, int multiple times in our code it's better to cache them as local variables, because otherwise we're looking for them at least 10**6 times in global dictionary. One way to cache them is to use them as default values to function attributes:

>>> def f_simple(n):
    for _ in xrange(n):
        foo = [int('1') for _ in xrange(15)]
...         
>>> def f_cached(n, int=int):
    for _ in xrange(n):
        foo = [int('1') for _ in xrange(15)]
...         
>>> %timeit f_simple(10**6)
1 loops, best of 3: 3.7 s per loop
>>> %timeit f_cached(10**6)
1 loops, best of 3: 3.52 s per loop


Instead of starting the result value with input_list[0] we can simply start with 0 and we can also prevent creation of that simple 15 items list:

result = 0
for x in raw_input().split():
    result ^= int(x)


Currently your first solution takes around 7.44 seconds on my system and my solution takes around 5.6 seconds, not a huge improvement.:

from itertools import islice
from functools import partial
import sys

def main5(int=int):
    t = int(raw_input())
    # Take a slice of sys.stdin of size (10**6)*2
    lines = islice(sys.stdin, t*2)
    # Now lines is an iterator which is going to yield one
    # line at a time, but we're also going to read the next line(to get X)
    # with each input, so instead of doing `next(lines)` each time in
    # the loop we can create a partial function. 
    next_line = partial(next, lines)

    for line in lines:
        result = 0
        for x in line.split():
            result ^= int(x)
        if format(result, 'b').zfill(32).count('1') > int(next_line()):
            print 'YES'
        else:
            print 'NO'


Note that in the above solution we are writing to the stdout instantly, if we can store the output temporarily in a list(say 1000 items) and them write them at once then the above solution takes 5.52 seconds:

from itertools import islice
from functools import partial
import sys

def main6(int=int, len=len):
    t = int(raw_input())
    # store a reference sys.stdout.write to prevent 2 attribute lookups
    stdout_write = sys.stdout.write
    lines = islice(sys.stdin, t*2)
    next_line = partial(next, lines)
    out = []
    # Cache out.append to prevent attribute lookup
    out_append = out.append
    for line in lines:
        result = 0
        for x in line.split():
            result ^= int(x)
        if format(result, 'b').zfill(32).count('1') > int(next_line()):
            result = 'YES'
        else:
            result = 'NO'

        out_append(result)
        if len(out) == 1000:
            stdout_write('\n'.join(out))
            # empty out list; use list.clear() in Python 3
            del out[:]
    if out:
        stdout_write('\n'.join(out))

Code Snippets

>>> %timeit for _ in range(10**6): pass
10 loops, best of 3: 24.1 ms per loop
>>> %timeit for _ in xrange(10**6): pass
f100 loops, best of 3: 8.92 ms per loop
>>> %timeit for _ in repeat(None, 10**6): pass
100 loops, best of 3: 7 ms per loop
>>> def f_simple(n):
    for _ in xrange(n):
        foo = [int('1') for _ in xrange(15)]
...         
>>> def f_cached(n, int=int):
    for _ in xrange(n):
        foo = [int('1') for _ in xrange(15)]
...         
>>> %timeit f_simple(10**6)
1 loops, best of 3: 3.7 s per loop
>>> %timeit f_cached(10**6)
1 loops, best of 3: 3.52 s per loop
result = 0
for x in raw_input().split():
    result ^= int(x)
from itertools import islice
from functools import partial
import sys

def main5(int=int):
    t = int(raw_input())
    # Take a slice of sys.stdin of size (10**6)*2
    lines = islice(sys.stdin, t*2)
    # Now lines is an iterator which is going to yield one
    # line at a time, but we're also going to read the next line(to get X)
    # with each input, so instead of doing `next(lines)` each time in
    # the loop we can create a partial function. 
    next_line = partial(next, lines)

    for line in lines:
        result = 0
        for x in line.split():
            result ^= int(x)
        if format(result, 'b').zfill(32).count('1') > int(next_line()):
            print 'YES'
        else:
            print 'NO'

Context

StackExchange Code Review Q#78660, answer score: 6

Revisions (0)

No revisions yet.