debugMinor

Floating Point Systems - Rounding Error in Taylor series

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

numerical-algorithms cs stackoverflow error-estimation floating-point arithmetic

errorpointtaylorfloatingsystemsseriesrounding

Problem

I have a question about rounding error. I have created a function to approximate exp(x) by summing the terms of its taylor series until the sum stops changing. (That is, 1 + x + x^2/2! ...).

This function works well for seemingly any positive value of x that i have tried, but when i try for instance -30, it does not give me anything even close to what exp(-30) gives me.

I am aware of how to solve this problem, but I'm more curious as to why this happens. Does it have something to do with the alternating + and - signs when summing the series? Thank you.

Solution

Yes, that's exactly why: it's due to floating-point roundoff error, due to the alternating signs.

Suppose you have $x=10^{100}$ and $y=10^{100}-1$, and you ask your computer to subtract $x-y$. We know what the correct answer should be: it should be $1$. However, when $x$ and $y$ are represented as floating-point numbers, you don't get $1$ as the answer. Floating-point representation is not exact for large numbers, so $10^{100}$ will get rounded off to the nearest number that can be represented in the floating-point representation. As a result, $x$ won't be exactly $10^{100}$; it'll be a bit larger or a bit smaller -- and while the relative error will be very small ($< 0.1\%$), the absolute error might be very large ($x$ might be $10^{100}+\delta$ where $\delta$ is in the billions or much larger). The same is true for $y$. Thus, when you subtract $x-y$, the result won't be exactly $1$; it could be off by an absolute error of billions or much more. As a result, computing $x-y$ in floating-point arithmetic will give you answer that is wildly wrong.

In contrast, if you compute $x+y$ where $x=10^{100}$ and $y=10^{100}-1$, the answer will be reasonable. It won't be exactly correct, but it will be very close to correct.

To put it another way: suppose you compute $x+y$ and $x-y$ where $x \approx y$ and $x,y$ are very large. Then $x+y$ will be reasonably accurate, but $x-y$ will be wildly inaccurate. In particular, suppose that the closest representable floating-point number to $x$ is $\tilde{x}$, and the closest representable floating-point number to $y$ is $\tilde{y}$. We know that floating-point arithmetic has small relative error, so $\tilde{x} = x \times (1+\epsilon_x)$ where $\epsilon_x$ is small (say, $|\epsilon_x| \le 2^{-20}$), and $\tilde{y} = y \times (1+\epsilon_y)$ where $\epsilon_y$ is small. Now

$$\tilde{x} - \tilde{y} = x - y + x \epsilon_x - y \epsilon_y \approx x-y + x \times (\epsilon_x - \epsilon_y),$$

so the absolute error could be as large as $x \times (\epsilon_x- \epsilon-y)$. We know that $\epsilon_x - \epsilon_y$ will be no bigger than $2^{-19}$ or so, but if $x \approx 10^{100}$, their product will be enormous: the absolute error could be as large as $10^{100} \times 2^{-19}$, which is enormous. Since the correct answer is $x-y \approx 0$, the relative error will also be enormous.

In contrast,

$$\tilde{x} + \tilde{y} = x+y x \epsilon_x + y \epsilon_y \approx x+y + x \times (\epsilon_x + \epsilon_y),$$

and here $x \times (\epsilon_x + \epsilon_y)$ might be as large as $10^{100} \times 2^{-19}$ or so, which is enormous. So, the absolute error in this case is also large. However, the relative error is small: $10^{100} \times 2^{-19}$ is quite small compared to the correct answer, $x+y \approx 2 \times 10^{100}$. In particular, the relative error will be on the order of $2^{-19}$ when computing $x+y$ using floating-point math.

So, that's why absolute signs cause problems but the same sign doesn't cause problems.

For further explanation, you might enjoy reading the following article, especially Section 1.4:

What Every Computer Scientist Should Know About Floating-Point Arithmetic. David Goldberg. ACM Computing Surveys, 23:1, March 1991.

Separately, one other thing to know about Taylor series approximations is to have a feeling for the range where they are likely to be accurate. In practice, we always truncate the Taylor series after a finite number of terms -- the full Taylor series is infinite, so we can't sum the entire series; instead, we sum just the first $n$ terms, for some finite $n$.

If you truncate the Taylor series for some function $f(x)$ after $n$ terms, you get an approximation to $f(x)$. How good an approximation is this? There are various bounds on the error of this approximation. One bound is

$$\text{error} \le {|x-a|^{n+1} \over (n+1)!} \times \max \{f^{(n+1)}(v) : a \le v \le x\},$$

where we're using the first $n$ terms of the Taylor series approximation around $a$, and where we evaluate this at $x$ for some $x\ge a$. So, if you can upper-bound the value of the $n+1$-th derivative of $f(x)$, you can get a bound on how good an approximation the truncated Taylor series will be. You can read more about this at https://en.wikipedia.org/wiki/Taylor_series#Approximation_and_convergence and http://norsemathology.org/wiki/index.php?title=Taylor_Series_as_Approximations and https://en.wikipedia.org/wiki/Taylor%27s_theorem#Example.

In general, when you use a Taylor series approximation centered around $a$, you usually want to choose $a$ to be near $x$: the closer it is to $x$, the more accurate the approximation will be. Also, the more terms in your sum, the more accurate the approximation will be.

Once you have an upper-bound for the quality of the Taylor series approximation, you can now figure out how many terms you need to sum. Note that this analysis assumes infinite-precision numbers and ignores the issues due to floating-point round-o

Context

StackExchange Computer Science Q#64461, answer score: 4

Revisions (0)

No revisions yet.