HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Counting pairs of relatively prime numbers

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
countingnumbersrelativelyprimepairs

Problem

Problem from Hacker Earth:


Inverted GCD:


Given an array a of \$N\$ numbers , you have to find the number of pair of indices \$i\$ and \$j\$ that satisfy the following relation:



  • \$i



  • \$a_i > a_j\$



  • gcd\$(a_i , a_j) = 1\$





Input:


The first line of the input contains a single integer N - denoting the size of the array.

The next line contains N space separated integers denoting the array a.


Output:

Output a single integer - the answer to the problem stated above.


Constraints:

\$1 \le N \le 10^5\$

\$1 \le a_i \le 10^5\$

For this problem I've submitted the following solution in Python:

from fractions import gcd
n = int(input())
count = 0;
a = [int(x) for x in input().split()]
for i in range(n):
    for j in range(n):
        if(a[i]>a[j] and i<j and gcd(a[i] , a[j]) == 1):
            count +=1;
print(count)


I'm not sure what is the complexity of gcd() function in Python, but it clear due to two for loop that the time complexity is minimum \$O(n^2)\$.I can't understand how to reduce it. Because here we have to check these conditions between every pair of integers. I've seen such question here but there is no condition for gcd(), so can't relate these two questions. I'm newbie to Python.

Solution

tl;dr TLE beyond input size 40000 to 50000 (10 seconds for input size 100000)

Conditions (1) and (2) are transitive but independent and one-sided. This means they can only prune half of all candidates on average each, and so their combination only reduces the average candidate set to one fourth (i.e. half of a half). Such a small constant factor is not worth it to invest in (as in 'devising a special data structure'), unless overall performance is already close to a passing grade. Asymptotic performance doesn't change at all, of course. Hence it is best to take the doubling in performance that comes from index pruning (condition (2), which is essentially free) and test condition (1) as pairs are processed.

Condition (3) induces a squarish relation all on its own already, and it is not transitive. This means that there is no efficiency gain from precomputing this relation, and memoisation is likely to be completely ineffective in the face of random inputs.

The evaluation of the GCD for a pair of numbers takes time logarithmic in the smaller of the two inputs. This time can be cut by a considerable constant factor with a bit of precomputation, by computing integer-sized 'bit vectors' representing the presence of a handful of the smallest primes in the (radical) factorisation of a number.

This way a simple binary AND can prove the absence of shared small factors and thus avoid the computation of the actual GCD in about 89% of all cases (assuming factorisation by the first 32 primes, i.e. up to 131). In the remaining 11%, only the factorisation 'remainders' need to be GCDed, i.e. what remains after dividing out the small factors/radicals represented by the bit vectors. This saves some iterations of Euclid's algorithm.

This can give you an overall speed boost of an order of magnitude or more. On top of the factor 2 speed gain from index pruning mentioned by coderodde and some general efficiency tuning this might just be enough to beat the time limit...

Note: one might be tempted to extend the 'bit vectors' from 32 bits to 64 bits to utilise the bigger machine words on 64-bit machines, but this is not worth it. It doubles the up-front factorisation effort but it improves the rejection rate only from 88.7% to 90.3%, i.e. by a completely negligible 1.8%.

Here's a small table with the pass/reject rates for bit masks involving the given number of the first primes:

8 primes: pass 17,102%, rej. 82,898%                 # up to 4 extra radicals
12 primes: pass 14,872%, rej. 85,128% // boost 2,690% # up to 3 extra radicals
16 primes: pass 13,609%, rej. 86,391% // boost 1.484%
32 primes: pass 11,300%, rej. 88,700% // boost 2,673%
64 primes: pass  9,683%, rej. 90,317% // boost 1,823% # up to 1 extra radical


The 'up to X extra radicals' bit says how many different prime factors can remain in the factorisations of numbers not exceeding 100,000 after the the given number of the first small primes has been factored out.

P.S.: the bounds on the number of numbers is the same as the bound on the size of the numbers, which means that the factorisations can be pre-computed wholesale without any divisions (by borrowing the logic behind the Sieve of Eratosthenes). The partial factoring for the bit vectors takes less than a millisecond, and factoring the input remainders takes less than 10 milliseconds for 100,000 inputs. Hence the stuff can always be computed wholesale and it isn't even worth it to have different code paths for small and big input batches.

--- UPDATE ---

For getting a better handle on the problem I needed a closer fix on some numbers, and so I coded things up in C++. It turned out that computing the gcd of the remainders (after factoring out the small primes used in the 'bit vectors') is orders of magnitude too slow, even though it occurs only for 0.35% of all candidates. This necessitated replacing the partial factorisation remainders with lists of the (up to two) remaining radicals.

'Radical' Optimisation

For numbers less than 107113 = 43 47 53, factoring out the 13 smallest primes leaves at most two distinct prime factors ('radicals'). Moreover, only factors up to half the limit need be considered since the only numbers with bigger prime factors are the primes, which are coprime to all numbers but themselves and self-pairings get excluded via criteria (1) and (2) both. This means that 2 x 16 bits = 32 bits are sufficient for storing the 'list' of remaining radicals for a number.

This size reduction is beneficial for L1 cache utilisation and it would allow to stuff both radicals into a single 32-bit integer; however, it collides with an important optimisation of the intersection operation. This optimisation is the reduction of branchings to the absolute minimum of two to three comparisons, by always treating the two-element lists as full and by replacing all but one of the remaining non-comparison branches with arithmetic (poor man's CMOV). This makes things easier for the CPU's branch

Code Snippets

8 primes: pass 17,102%, rej. 82,898%                 # up to 4 extra radicals
12 primes: pass 14,872%, rej. 85,128% // boost 2,690% # up to 3 extra radicals
16 primes: pass 13,609%, rej. 86,391% // boost 1.484%
32 primes: pass 11,300%, rej. 88,700% // boost 2,673%
64 primes: pass  9,683%, rej. 90,317% // boost 1,823% # up to 1 extra radical
uint16_t sieve[1 + LIMIT];  // bit masks of partial factorisations of 1 .. LIMIT
uint16_t v[1 + LIMIT];      // v[i] = sieve[a[i]]
uint32_t f[1 + LIMIT][3];   // remaining factors (radicals) for each input (3rd is sentinel)
uint8_t skip[1 + LIMIT];    // for each a[j]: how many successors are not less
int32_t d = int32_t(f[i][0] - f[j][0]);

                    if (d > 0)
                    {
                        d = int32_t(f[i][0] - f[j][1]);

                        result += d > 0 || d && f[i][1] != f[j][1];
                    }
                    else if (d != 0)
                    {
                        d = int32_t(f[i][1] - f[j][0]);

                        result += d < 0 || d && f[i][1] != f[j][1];
                    }
long count_inversions_m1 (List<int> values)
{
    long inversions = 0;
    int n = values.Count;
    var a = values.ToArray();
    var b = new int[n];

    for (int w = 1; w < n; w <<= 1)
    {
        for (int beg = 0; beg < n; beg += w + w)
        {
            int mid = Math.Min(beg + w, n);
            int end = Math.Min(mid + w, n);

            for (int i = beg, j = mid, k = beg; k < end; ++k)
            {
                if (i < mid && (j >= end || a[i] <= a[j]))
                {
                    b[k] = a[i++];
                }
                else // a[k] > a[j] for all k in [i .. mid - 1]
                {
                    inversions += mid - i;
                    b[k] = a[j++];
                }
            }
        }

        var t = a;
        a = b;
        b = t;
    }

    return inversions;
}
inversions += mid - i;

Context

StackExchange Code Review Q#133365, answer score: 3

Revisions (0)

No revisions yet.