HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Sum of prime factors of binomial coefficients over 9000!\${}\$

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
factors9000coefficientsbinomialsumprimeover

Problem

The problem

I have lately been working Project Euler: 231:


The prime factorisation of binomial coefficients


The binomial coefficient \$ ^{10}C_3 = 120 \$.


\$ 120 = 2^3 × 3 × 5 = 2 × 2 × 2 × 3 × 5 \$, and \$ 2 + 2 + 2 + 3 + 5 = 14 \$.


So the sum of the terms in the prime
factorisation of \$^{10}C_3\$ is 14.


Find the sum of the terms in the prime factorisation of \$ ^{20000000}C_{15000000} \$.

Naive approach

Using some built in functions I was able to hack together a brute-force solution fairly quickly:

from primefac import primefac

def slow_factors(n, k):
    total = 0
    nom = range(n - k+1, n+1)
    denom = range(2, k+1)
    for num in nom:
        total += sum(primefac(num))
    for num in denom:
        total -= sum(primefac(num))
    return total


This solution uses that:

\$
^n C_k = \binom{n}{k} = \frac{n!}{k!(n-k)!} = \frac{\overbrace{n \cdot (n-1) \cdots p}^{k \text{ times}}}{k\cdot (k-1) \cdots 2}
\$

For a concrete example of my algorithm:

\$
\binom{10}{3} = \frac{10!}{3!(10-3)!} = \frac{10 \cdot 9 \cdot 8}{3 \cdot 2 \cdot 1}
\$

This is respectively nominator and denominator in my code. From here I simply calculate the prime factorization of every number in the denominator. The answer is this minus the sum of the prime factorization of the denominator. I then realized that:

\$
\binom{20000000}{15000000}
\$

Is incredibly large! Wolfram Alpha says that it is approximately \$2 \approx 10^{4884377}\$, far bigger than \$9000! \approx 10^{31681}\$. So I was in need of a smarter solution.

Some improvements

The first improvements uses that:

\$
\binom{n}{k} = \binom{n}{n-k}
\$

We compute values \$n \cdot (n-1) \cdots\$ until \$k\$. However because of the symmetry we could also compute \$n \cdot (n-1) \cdots\$ until \$n-k\$. We choose based on which gives us the fewest values to compute:

k = min(k, n - k)


It's also inefficient because it calculates the prime factorization of each number separately

Solution

The strategy in the post is to iterate over all the numbers that are multiplied together in the numerator and denominator of the binomial coefficient, and factorize each one. But this is very time-consuming, because no-one knows an efficient algorithm for factorization. We would much rather work out the factorization in some other way.

A binomial coefficient is computed using factorials: $${n \choose k} = {n! \over k!(n-k)!}$$ So let's have a look at some factorizations of small factorials: $$\eqalign{2! &= 2 \\ 3! &= 2·3 \\ 4! &= 2^3·3 \\ 5! &= 2^3·3·5 \\ 6! &= 2^4·3^2·5 }$$ It should be clear that the factorization of \$n!\$ includes every prime \$≤n\$. So instead of iterating over the numbers up to \$n\$ and trying to factorize them, we can iterate over the primes \$p ≤ n\$ and count how many times each one appears in the factorization. We can iterate over the primes efficiently using a sieve. (I'm sure you've implemented several sieves if you've got as far as problem 231, but see this answer for some suggested implementations.)

So how do we compute the number of times each prime appears in the factorization of a factorial? Let's have a look at a concrete example: how many times does 2 appear in the factorization of 10 factorial? Well, every even number contributes a 2: $$ \mathbf{10}·9·\mathbf{8}·7·\mathbf{6}·5·\mathbf{4}·3·\mathbf{2}·1 $$ Every multiple of 4 contributes an extra 2: $$ 10·9·\mathbf{8}·7·6·5·\mathbf{4}·3·2·1 $$ And every multiple of 8 contributes a third 2: $$ 10·9·\mathbf{8}·7·6·5·4·3·2·1 $$ So in the factorization of \$10!\$, the prime 2 appears $$ \eqalign{ \nu_2(10!) &= \bigg\lfloor {10 \over 2} \bigg\rfloor + \bigg\lfloor {10 \over 4} \bigg\rfloor + \bigg\lfloor {10 \over 8} \bigg\rfloor \\ &= 5 + 2 + 1 \\ &= 8 } $$ times. In general, the power of \$p\$ in the factorization of \$n!\$ is $$ \nu_p(n!) = \bigg\lfloor {n \over p} \bigg\rfloor + \bigg\lfloor {n \over p^2} \bigg\rfloor + \bigg\lfloor {n \over p^3} \bigg\rfloor + \dots $$ This gives us an effective way to compute the power of a prime in a factorial:

def power_in_factorial(p, n):
    """Return the exponent of the prime p in the factorization of n!"""
    result = 0
    while True:
        n //= p
        if not n:
            break
        result += n
    return result


Now, the sum of the factors of \$n!\$ is $$ \sum_{p \text{ prime}} p · \nu_p(n!) $$ where \$p\$ ranges over the primes \$≤n\$, and so the sum of the factors of \$n \choose k\$ is $$ \sum_{p \text{ prime}} p \left( \nu_p(n!) - \nu_p(k!) - \nu_p((n - k)!) \right) $$ That is:

def euler231(n, k):
    """Return the sum of the terms in the prime factorisation of n choose k."""
    f = power_in_factorial
    return sum(p * (f(p, n) - f(p, k) - f(p, n - k)) for p in primes(n + 1))


Using sieve8 from here to implement primes, this takes about 5 seconds.

But so long as we're using NumPy, why not vectorize the whole computation?

def sum_factorial_factors(primes, n):
    """Return the sum of terms in the prime factorization of n!"""
    p = primes
    n = np.full_like(p, n)
    result = r = np.zeros_like(p)
    while True:
        n //= p
        if n[-1] == 0:       # some primes not contributing any more?
            l = n.argmin()   # number of primes still contributing
            if l == 0:
                break
            n, r, p = n[:l], r[:l], p[:l]
        r += n
    result *= primes
    return result.sum()

def euler231(n, k):
    """Return the sum of the terms in the prime factorisation of n choose k."""
    f = sum_factorial_factors
    p = primes(n + 1)
    return f(p, n) - f(p, k) - f(p, n - k)


This takes about half a second.

Code Snippets

def power_in_factorial(p, n):
    """Return the exponent of the prime p in the factorization of n!"""
    result = 0
    while True:
        n //= p
        if not n:
            break
        result += n
    return result
def euler231(n, k):
    """Return the sum of the terms in the prime factorisation of n choose k."""
    f = power_in_factorial
    return sum(p * (f(p, n) - f(p, k) - f(p, n - k)) for p in primes(n + 1))
def sum_factorial_factors(primes, n):
    """Return the sum of terms in the prime factorization of n!"""
    p = primes
    n = np.full_like(p, n)
    result = r = np.zeros_like(p)
    while True:
        n //= p
        if n[-1] == 0:       # some primes not contributing any more?
            l = n.argmin()   # number of primes still contributing
            if l == 0:
                break
            n, r, p = n[:l], r[:l], p[:l]
        r += n
    result *= primes
    return result.sum()

def euler231(n, k):
    """Return the sum of the terms in the prime factorisation of n choose k."""
    f = sum_factorial_factors
    p = primes(n + 1)
    return f(p, n) - f(p, k) - f(p, n - k)

Context

StackExchange Code Review Q#132387, answer score: 6

Revisions (0)

No revisions yet.