patternMinor
generating pi using Machin like formula
Viewed 0 times
likegeneratingmachinusingformula
Problem
Back in 2002, using this arctan formula for pi discoverd by Stoemer:
$$ \pi = 176 \arctan(1/57) + 28 \arctan(1/239) - 48 \arctan(1/682) \\ +
96 \arctan(1/12943)$$
and I assume using this series for arctan:
$$ \arctan(x) = x - \frac{x^3}{3} + \frac{x^5}{5} - \frac{x^7}{7} + \frac{x^9}{9} - \dotso $$
$\pi$ was generated to 1.2 trillion digits on a super computer, using 64 nodes of a HITACHI SR8000/MPP, in about 157 hours (it took 400 hours for another 4 term formula). Link to web site announcing the result:
https://web.archive.org/web/20150225043658/http://www.super-computing.org/pi_current.html
I assume each initial term was 480GB, which translates to 1,030,792,151,040 hexadecimal digits, corresponding to the articles stated accurate size of 1,030,700,000,000 hexadecimal digits. Since each node had 16GB of ram, this meant the numbers had to be streamed on and off hard drives. The terms decrease in size with each iteration, but only a tiny fraction of the total time will be spent with terms that fit in each node's ram.
Using the $\arctan$ series seems like it should be fairly straightforward, basically $term[i+1] = term[i] \cdot (1/z^2)$, where $z$ is 57, 239, 682, or 12943. The terms are combined (adding and subtracting), then divided by $(2i+1)$.
The terms are kept as large vectors (fixed point array), while the divisors are scalars (relatively small fixed number of bits), so the division is similar to long hand division where a multi-digit dividend is divided by a single digit divisor, with the quotients being generated by multiply + shift to speed up the process, explained next.
Dividing by a constant can be sped up using multiply by "magic" number and right shift some number of bits. For the sequence of divisors, 1, 3, 5, 7, ..., the "magic" numbers are generated dynamically (requires two actual divides), since the same divisor is being used across a large vector term.
A "magic" number $M$ is in the range:
$$\frac{2^{N+L}}{divisor} \leq M \leq
$$ \pi = 176 \arctan(1/57) + 28 \arctan(1/239) - 48 \arctan(1/682) \\ +
96 \arctan(1/12943)$$
and I assume using this series for arctan:
$$ \arctan(x) = x - \frac{x^3}{3} + \frac{x^5}{5} - \frac{x^7}{7} + \frac{x^9}{9} - \dotso $$
$\pi$ was generated to 1.2 trillion digits on a super computer, using 64 nodes of a HITACHI SR8000/MPP, in about 157 hours (it took 400 hours for another 4 term formula). Link to web site announcing the result:
https://web.archive.org/web/20150225043658/http://www.super-computing.org/pi_current.html
I assume each initial term was 480GB, which translates to 1,030,792,151,040 hexadecimal digits, corresponding to the articles stated accurate size of 1,030,700,000,000 hexadecimal digits. Since each node had 16GB of ram, this meant the numbers had to be streamed on and off hard drives. The terms decrease in size with each iteration, but only a tiny fraction of the total time will be spent with terms that fit in each node's ram.
Using the $\arctan$ series seems like it should be fairly straightforward, basically $term[i+1] = term[i] \cdot (1/z^2)$, where $z$ is 57, 239, 682, or 12943. The terms are combined (adding and subtracting), then divided by $(2i+1)$.
The terms are kept as large vectors (fixed point array), while the divisors are scalars (relatively small fixed number of bits), so the division is similar to long hand division where a multi-digit dividend is divided by a single digit divisor, with the quotients being generated by multiply + shift to speed up the process, explained next.
Dividing by a constant can be sped up using multiply by "magic" number and right shift some number of bits. For the sequence of divisors, 1, 3, 5, 7, ..., the "magic" numbers are generated dynamically (requires two actual divides), since the same divisor is being used across a large vector term.
A "magic" number $M$ is in the range:
$$\frac{2^{N+L}}{divisor} \leq M \leq
Solution
I cannot remember where I learned the trick for doing this. It doesn't seem very well publicised. It may have been from a Richard Brent paper (seems likely). Nonetheless, here goes...
There are a few principles which are helpful to remember if you're working with extremely large integers:
Large integer division is typically done using an iterative algorithm (e.g. Newton-Raphson), so you want to avoid that where possible. What this means in practice is that it is probably cheaper to calculate a numerator and denominator separately, and then conclude with a final division. This appears to be standard practice for evaluating Machin-like formulae.
Large integer multiplication is typically done using a FFT-based algorithm (e.g. Schönhage–Strassen, Fürer) for large enough integers, which has better complexity if the number of bits in each of the multiplicands is approximately the same. This suggests a divide-and-conquer algorithm for evaluating numbers which are essentially large products.
The basic idea is this. Suppose you want to calculate the series:
$$\sum_{i=0}^{n} \frac{P(i)}{Q(i)}$$
You can calculate the two half-series:
$$\frac{P_1}{Q_1} = \sum_{i=0}^{\left\lfloor n/2\right\rfloor} \frac{P(i)}{Q(i)}$$
$$\frac{P_2}{Q_2} = \sum_{i=\left\lfloor n/2\right\rfloor +1}^{n} \frac{P(i)}{Q(i)}$$
Then:
$$\frac{P}{Q} = \frac{P_1 Q_2 + P_2 Q_1}{Q_1 Q_2}$$
It should be obvious how to turn this into a recursive algorithm.
That's the general approach, but once you're thinking along these lines, there are some obvious optimisations that you can do with the arctan series in particular:
$$\arctan\left(\frac{1}{x}\right) = \frac{1}{x} - \frac{1}{3x^3} + \frac{1}{5x^5} - \frac{1}{7x^7} \cdots$$
The numerators are all one, and the denominators have a structure that you can exploit if you're careful. I'll leave this for you to discover.
EDIT Small test program in Haskell, didn't try to optimise it.
And running it:
Do remember that the 75,000 lines of code mentioned probably includes large slabs of what you'd typically find in a multiprecision integer library. In a real implementation, the final division steps would involve an iterative algorithm with increasing precision on each iteration.
By the way, another little hint as to how to speed this up: Dynamic programming.
There are a few principles which are helpful to remember if you're working with extremely large integers:
- Avoid division of a large integer by a large integer.
- Try to multiply integers of similar magnitudes.
Large integer division is typically done using an iterative algorithm (e.g. Newton-Raphson), so you want to avoid that where possible. What this means in practice is that it is probably cheaper to calculate a numerator and denominator separately, and then conclude with a final division. This appears to be standard practice for evaluating Machin-like formulae.
Large integer multiplication is typically done using a FFT-based algorithm (e.g. Schönhage–Strassen, Fürer) for large enough integers, which has better complexity if the number of bits in each of the multiplicands is approximately the same. This suggests a divide-and-conquer algorithm for evaluating numbers which are essentially large products.
The basic idea is this. Suppose you want to calculate the series:
$$\sum_{i=0}^{n} \frac{P(i)}{Q(i)}$$
You can calculate the two half-series:
$$\frac{P_1}{Q_1} = \sum_{i=0}^{\left\lfloor n/2\right\rfloor} \frac{P(i)}{Q(i)}$$
$$\frac{P_2}{Q_2} = \sum_{i=\left\lfloor n/2\right\rfloor +1}^{n} \frac{P(i)}{Q(i)}$$
Then:
$$\frac{P}{Q} = \frac{P_1 Q_2 + P_2 Q_1}{Q_1 Q_2}$$
It should be obvious how to turn this into a recursive algorithm.
That's the general approach, but once you're thinking along these lines, there are some obvious optimisations that you can do with the arctan series in particular:
$$\arctan\left(\frac{1}{x}\right) = \frac{1}{x} - \frac{1}{3x^3} + \frac{1}{5x^5} - \frac{1}{7x^7} \cdots$$
The numerators are all one, and the denominators have a structure that you can exploit if you're careful. I'll leave this for you to discover.
EDIT Small test program in Haskell, didn't try to optimise it.
module MachinLike where
import Data.Ratio
arctanSeries :: Int -> Int -> Int -> (Integer,Integer)
arctanSeries x i j
| i == j = let c = 2*i+1
in (if even i then 1 else -1,
fromIntegral c * (fromIntegral x)^c)
| otherwise = let m = (i+j) `div` 2
(p1,q1) = arctanSeries x i m
(p2,q2) = arctanSeries x (m+1) j
in (p1*q2 + p2*q1, q1*q2)
-- I'm just guessing on the appropriate number of terms in
-- each series. Proper numeric analysis is needed.
--
-- Also, in case it's not obvious, this isn't the final step
-- that you'd do if you were serious about high accuracy.
approxPi :: Int -> Double
approxPi n
= let (p1,q1) = arctanSeries 57 0 n
(p2,q2) = arctanSeries 239 0 (n `div` 4)
(p3,q3) = arctanSeries 682 0 (n `div` 11)
(p4,q4) = arctanSeries 12943 0 (n `div` 227)
in fromRational ((176 * p1)%q1)
+ fromRational ((28 * p2)%q2)
- fromRational ((48 * p3)%q3)
+ fromRational ((96 * p4)%q4)And running it:
*MachinLike> :set +s
*MachinLike> approxPi 500
3.141592653589793
(0.28 secs, 799,354,048 bytes)Do remember that the 75,000 lines of code mentioned probably includes large slabs of what you'd typically find in a multiprecision integer library. In a real implementation, the final division steps would involve an iterative algorithm with increasing precision on each iteration.
By the way, another little hint as to how to speed this up: Dynamic programming.
Code Snippets
module MachinLike where
import Data.Ratio
arctanSeries :: Int -> Int -> Int -> (Integer,Integer)
arctanSeries x i j
| i == j = let c = 2*i+1
in (if even i then 1 else -1,
fromIntegral c * (fromIntegral x)^c)
| otherwise = let m = (i+j) `div` 2
(p1,q1) = arctanSeries x i m
(p2,q2) = arctanSeries x (m+1) j
in (p1*q2 + p2*q1, q1*q2)
-- I'm just guessing on the appropriate number of terms in
-- each series. Proper numeric analysis is needed.
--
-- Also, in case it's not obvious, this isn't the final step
-- that you'd do if you were serious about high accuracy.
approxPi :: Int -> Double
approxPi n
= let (p1,q1) = arctanSeries 57 0 n
(p2,q2) = arctanSeries 239 0 (n `div` 4)
(p3,q3) = arctanSeries 682 0 (n `div` 11)
(p4,q4) = arctanSeries 12943 0 (n `div` 227)
in fromRational ((176 * p1)%q1)
+ fromRational ((28 * p2)%q2)
- fromRational ((48 * p3)%q3)
+ fromRational ((96 * p4)%q4)*MachinLike> :set +s
*MachinLike> approxPi 500
3.141592653589793
(0.28 secs, 799,354,048 bytes)Context
StackExchange Computer Science Q#68790, answer score: 2
Revisions (0)
No revisions yet.