patternMinor

Sum of divisors summatory function with Erathosthenes' sieve

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

number-theory cs stackoverflow algorithms

withfunctionsievedivisorssumerathosthenessummatory

Problem

I ran across the following problem from an online problem bank: there are up to $~10^5~$ queries each of which asks to compute the sum
$$\sum_{k = L}^{R} \sigma(k)$$
where $\sigma(k)$ is the sum of divisors of $k$. It is given that $1 \leq L \leq R \leq 5\cdot 10^6$.

My solution (described below) is based on Erathosthenes's sieve. I've implemented it in C++ and it works in about $0.9$ seconds on average which is too slow. I know that this problem can be solved at least twice faster but don't know how.

So here is my solution (arrays are 0-based):

M = 5 * 1e6
M = array of zeroes of size M + 1
A[1] = 1
for (k = 2; k <= M; k += 1)
    for (j = k; j <= M; j += k)
        A[j] += k

I precalculate $\sigma(k)$ via Erathosthenes' sieve for each $k$ below max possible value. When the main loop reaches $k$, $A[k]$ keeps the value of $\sigma(k)$. Then I reassign $A[k]$ to be $\sum_{i=1}^{k}\sigma(i)$. After such preprocessing all queries can be computed in $O(1)$ time by computing $A[R] - A[L-1]$.

How can I make it faster? I know two formulas:
$$ (a) ~~~~~ \sigma(p_{1}^{a_1} \cdots p_{s}^{a_s}) = \prod_{i=1}^{s} \frac{p_{i}^{a_i + 1} - 1}{p_{i} - 1}$$
$$ (b) ~~~~~ \sum_{k=1}^{n} \sigma(k) = \sum_{k=1}^{n} k \left \lfloor \frac{n}{k} \right \rfloor$$

The problem with (a) is that computing it (at least in my implementation) is slower than given above. The problem with (b) is that I don't understand how to compute prefix sum with such approach faster than in $O(n^2)$ time.

Is there a more efficient algorithm for this problem?

(The problem bank credits the original source of the problem as 2012 Kharkiv, Winter School, Day of Sergey Kopelovich, Problem H.)

Solution

This isn't really computer science...

You create a table d where you store the sum of the divisors of k, for k = 1 to M, where M = $5 · 10^6$. That's the part that is time critical. Then you create a table s where you store the sum of divisors for all 1 ≤ j ≤ k, for k = 1 to M. That's easy, $s_0 = 0$, $s_{k+1} = s_k + d_{k+1}$. And then f (L, R) = $s_R - s_{L-1}$.

The first table is the problem. You handle this in $O (n \log n)$. And you only need a factor two, you say...

You will have an array d with 5 million entries, likely 4 byte per entry = 20 Megabyte. On a typical processor that you would have in your home computer, 20 Megabyte doesn't fit into any cache. And your code does lots of accesses to elements of that array in quasi random order. For each potential divisor k, you visit all numbers that are divisible by k, and increase the sum of divisors by k.

Let's do that with fewer visits: When you visit j which is divisible by k, add the two divisors k and j/k. But when you do that, start with $j = k^2$, adding only k (because k = j / k, and you don't want to count the divisor twice), and then add k and j / k for further j. You don't need to divide, because j / k will equal k+1, k+2, k+3 etc. We initialise the array for the case k = 1, that is setting A [j] = 1 + j / 1 for j ≥ 2.

A [1] = 1
for (j = 2; j ≤ M; j += 1)
    A [j] = 1 + j

for (k = 2; k*k ≤ M; k += 1)
    j = k*k
    A [j] += k
    j += k
    s = k + (k + 1)
    while j ≤ M
        A [j] += s
        j += k
        s += 1 // s equals k + j / k

You don't save operations. However, you are now accessing the array A in a much more regular pattern, so this will save you time because access to the items will be faster. j will be smaller, making the number of iterations for each j larger, which will make branch prediction work better.

For more improvement, you would find out how many array items fit into the processor cache in your computer, then perform the whole code for subranges of the array only (for example changing only A [0] to A [99999], then changing A [100000] to A [199999], and so on). That way, most memory accesses will only access cache memory, which might be substantially faster.

You are doing N lookups in a table of size M. If M is substantially larger than N, then you should probably think about approaches that don't build this table, and that may be a lot slower per lookup, but faster overall due to the small number of lookups. Even in the case here where N ≤ 100,000 and M = 5,000,000, you might for example not count divisors 1, 2, 3, 4, j/1, j/2, j/3, j/4 in the table (which makes it a bit faster to build), and handle that during the lookup.

Or you could add the sum of the divisors for odd numbers only, then calculate the sum of divisors for even numbers (if the sum of the divisors of an odd k is s, then the sum for 2k is 3s, for 4k it is 7s, for 8k it is 15s etc), which would save almost a factor 2.

PS. I measured it... making the algorithm for counting all sums of divisors more cache friendly by adding both j and k/j doubled the speed. Calculating the sum of divisors for odd k first, then calculating even k from the odd values, makes it a total of 7 times faster. Obviously all just constant factors.

Code Snippets

A [1] = 1
for (j = 2; j ≤ M; j += 1)
    A [j] = 1 + j

for (k = 2; k*k ≤ M; k += 1)
    j = k*k
    A [j] += k
    j += k
    s = k + (k + 1)
    while j ≤ M
        A [j] += s
        j += k
        s += 1 // s equals k + j / k

Context

StackExchange Computer Science Q#55698, answer score: 6

Revisions (0)

No revisions yet.