HiveBrain v1.2.0
Get Started
← Back to all entries
patternModerate

Looping and branching with Algorithmic Differentiation

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
loopingdifferentiationwithbranchingandalgorithmic

Problem

Algorithmic (aka Automatic) Differentiation is a wonderful technique for numerical computation of derivatives. I understand how it relates to the fact that we know how to deal with every elementary operation in a computer program, but I am not sure to get how this applies to every computer program.

To quote from this wikipedia page:


every computer program, no matter how complicated, executes a sequence of elementary arithmetic operations

.. which I totally agree with. However, it sounds then that the number $m$ of variables newly produced from any $n$ initial input is fixed and can be determined from static code analysis. But this is not straightforward to me since constructs like:

if x_1 > x_2:                         # branching
    perform 4 elementary operations
else:
    perform 84 elementary operations
endif


and:

while x_1 < x_2:                      # looping
    perform 2 elementary operations
endwhile


do exist in so-called « complicated » computer programs. This make the number (and the type) of elementary operations not straightforward to compute in advance. And I even suspect it is impossible to gess that in general, right?

Does AD support such branching and looping programs?

Are there extensions of AD adapted to programs that are not just intricate closed-form expressions?

How does AD deal with Turing-completeness?

Solution

AD supports arbitrary computer programs, including branches and loops, but with one caveat: the control flow of the program must not depend on the contents of variables whose derivatives are to be calculated (or variables depending on them). Here is an example:

if x = 3 then 9 else x * x


At close inspection you will recognize that the above is really just an inefficient way of implementing $x*x$. If you evaluate this program at $x = 3$, then the result is the constant $9$. But the derivative of constants is zero, which is obviously not the right answer, which should be $6$.

The reason is that AD will typically only look at the executed branch of your program. It is perfectly ok to have branches on conditions that don't involve numbers, or conditions involving number variables not part of the computational graph for derivative calculations. It's also ok to look at fixed properties of derivative values (e.g. the dimension of a vector). But as soon as you "look" at the contents of a variable that depends on one of the inputs to determine what calculation to perform next, you will "break the chain". AD does basically just that: apply the chain rule.

Users of AD should view programs as "configuring" a fixed computational graph. If running the program with different inputs for which derivatives are requested gives you different computational graphs, AD may not always give the correct result.

Code Snippets

if x = 3 then 9 else x * x

Context

StackExchange Computer Science Q#70615, answer score: 13

Revisions (0)

No revisions yet.