Verifying derivative calculations

This notebook describes the verification of derivative calculations using Taylor remainder convergence testing. A simple time-independent problem is considered, using tlm_adjoint with the Firedrake backend.

The Taylor remainder convergence testing method is described in:

P. E. Farrell, D. A. Ham, S. W. Funke, and M. E. Rognes, ‘Automated derivation of the adjoint of high-level transient finite element programs’, SIAM Journal on Scientific Computing 35(4), pp. C369–C393, 2013, doi: 10.1137/120873558

Forward problem

We consider the solution of a linear time-independent partial differential equation, followed by the calculation of the square of the \(L^2\)-norm of the solution. Non-linearity is introduced by defining the right-hand-side of the problem to be a non-linear function of the control. We assume real spaces and a real build of Firedrake throughout.

Specifically we consider the solution \(u \in V\) of

\[\forall \zeta \in V \qquad \int_\Omega u \zeta + \alpha^2 \int_\Omega \nabla u \cdot \nabla \zeta = \int_\Omega \left( \sin^2 m \right) \zeta,\]

where \(V\) is a real \(P_1\) continuous finite element space defining functions on the domain \(\Omega = \left( 0, 1 \right)^2\), with \(m \in V\). This corresponds to a discretization of the partial differential equation

\[u - \alpha^2 \nabla^2 u = \sin^2 m \quad \text{on } \left( x, y \right) \in \left( 0, 1 \right)^2,\]

subject to boundary conditions \(\nabla u \cdot \hat{n} = 0\) on the boundary, where \(\hat{n}\) is an outward unit normal.

A simple implementation in Firedrake, with \(m = e^x \sin \left( \pi x y \right)\) and \(\alpha = 0.2\), takes the form:

[1]:

from firedrake import *

mesh = UnitSquareMesh(50, 50)
X = SpatialCoordinate(mesh)
space = FunctionSpace(mesh, "Lagrange", 1)
test = TestFunction(space)
trial = TrialFunction(space)

m = Function(space, name="m")
m.interpolate(exp(X[0]) * sin(pi * X[0] * X[1]))

alpha = Constant(0.2)

u = Function(space, name="u")
solve(inner(trial, test) * dx + (alpha ** 2) * inner(grad(trial), grad(test)) * dx
      == inner(sin(m) ** 2, test) * dx, u)

J = assemble(inner(u, u) * dx)

Taylor remainder convergence testing: First order

If we have a functional \(J\) depending on a control \(m\) then we have, given some perturbation direction \(\zeta\), via Taylor expansion,

\[\begin{split}\left| J \left( m + \varepsilon \zeta \right) - J \left( m \right) \right| = O \left( \varepsilon \right), \\\end{split}\]

\[\left| J \left( m + \varepsilon \zeta \right) - J \left( m \right) - \varepsilon \frac{dJ}{dm} \zeta \right| = O \left( \varepsilon^2 \right).\]

That is, \(\zeta\) is some direction in the same space as \(m\), which we choose, and then we control the perturbation amplitude using the scalar \(\varepsilon\). The final term in the second absolute value is a directional derivative, which we can compute using the adjoint.

This leads to a methodology for verifying a derivative computed using the adjoint method:

Choose a direction \(\zeta\).
Choose a number of different values of \(\varepsilon\).
See if we have second order convergence of the second of the above, to zero.

This verifies only the directional derivative with a single direction, but if we wish we can choose a new direction and repeat the test.

We can use the taylor_test function to perform the test for us. By default this generates a pseudorandom direction and chooses a number of values of \(\varepsilon\). It then computes the quantities on the left-hand-sides of the above equations, computes the orders of convergence between consecutive pairs of values for \(\varepsilon\), and displays the results. It returns the minimum order computed for the second case, which in a successful verification should be close to two.

Let’s compute a derivative using the adjoint method, and apply a Taylor remainder convergence test:

[2]:

from firedrake import *
from tlm_adjoint.firedrake import *

import logging
import numpy as np

np.random.seed(33582866)

logger = logging.getLogger("tlm_adjoint")
logger.setLevel(logging.INFO)
root_logger = logging.getLogger()
if len(logger.handlers) == 1:
    if len(root_logger.handlers) == 1:
        root_logger.handlers.pop()
    root_logger.addHandler(logger.handlers.pop())

reset_manager()

mesh = UnitSquareMesh(50, 50)
X = SpatialCoordinate(mesh)
space = FunctionSpace(mesh, "Lagrange", 1)
test = TestFunction(space)
trial = TrialFunction(space)

m = Function(space, name="m")
m.interpolate(exp(X[0]) * sin(pi * X[0] * X[1]))

alpha = Constant(0.2)


def forward(m):
    u = Function(space, name="u")
    solve(inner(trial, test) * dx + (alpha ** 2) * inner(grad(trial), grad(test)) * dx
          == inner(sin(m) ** 2, test) * dx, u)

    J = Functional(name="J")
    J.assign(inner(u, u) * dx)
    return J


start_manager()
J = forward(m)
stop_manager()

dJ_dm = compute_gradient(J, m)

min_order = taylor_test(forward, m, J_val=J.value, dJ=dJ_dm, seed=1.0e-3)
assert min_order > 1.99

Error norms, no adjoint   = [1.83675325e-04 9.18679773e-05 4.59415781e-05 2.29726877e-05
 1.14868187e-05]
Orders,      no adjoint   = [0.99952386 0.99976165 0.99988076 0.99994036]
Error norms, with adjoint = [1.21373327e-07 3.03720082e-08 7.59658226e-09 1.89959271e-09
 4.74953966e-10]
Orders,      with adjoint = [1.9986372  1.99931991 1.99966037 1.99983053]

The key changes here are:

To define a forward function. taylor_test uses this to repeatedly rerun the forward with different values of \(\varepsilon\).
Using seed to control the considered values of \(\varepsilon\). If this is too large then the asymptotic orders of convergence may not be seen. If this is too small then the effect of roundoff or iterative solver tolerances may become too large.
Seeding the NumPy pseudorandom number generator. The pseudorandom direction is generated using numpy.random.random, and we seed the pseudorandom number generator to improve reproducibility. We could alternatively supply a direction using the dM argument.

We see the expected first and second orders of convergence.

Taylor remainder convergence testing: Second order

Including the next order term in the Taylor expansion leads to

\[\left| J \left( m + \varepsilon \zeta \right) - J \left( m \right) - \varepsilon \frac{dJ}{dm} \zeta - \frac{1}{2} \varepsilon^2 \left[ \frac{d}{dm} \left( \frac{dJ}{dm} \zeta \right) \right] \zeta \right| = O \left( \varepsilon^3 \right).\]

Let’s use this approach to test Hessian calculations using a CachedHessian:

[3]:

np.random.seed(19986557)

H = CachedHessian(J)

min_order = taylor_test(forward, m, J_val=J.value, ddJ=H, seed=1.0e-3)
assert min_order > 2.99

Error norms, no adjoint   = [2.10559794e-04 1.05304003e-04 5.26580325e-05 2.63305245e-05
 1.31656394e-05]
Orders,      no adjoint   = [0.9996697  0.99983476 0.99991736 0.99995867]
Error norms, with adjoint = [9.17836045e-11 1.14560228e-11 1.43088134e-12 1.78880799e-13
 2.25105581e-14]
Orders,      with adjoint = [3.00213019 3.00113033 2.99983359 2.99032481]

We now see the expected first and third orders of convergence, although there is a suggestion of roundoff affecting the results for the smallest \(\varepsilon\). Here the first order directional derivative is computed using a tangent-linear calculation, and the Hessian action on \(\zeta\) is computed by applying the adjoint method to the forward and tangent-linear calculations.

Taylor remainder convergence testing: Higher order

We can test the derivative of a directional derivative, if we substitute

\[J \rightarrow K = \frac{dJ}{dm} \zeta_0,\]

with some new direction \(\zeta_0\), which we choose. That is, we use

\[\begin{split}\left| K \left( m + \varepsilon \zeta \right) - K \left( m \right) \right| = O \left( \varepsilon \right), \\\end{split}\]

\[\left| K \left( m + \varepsilon \zeta \right) - K \left( m \right) - \varepsilon \frac{dK}{dm} \zeta \right| = O \left( \varepsilon^2 \right),\]

with

\[K = \frac{dJ}{dm} \zeta_0.\]

The new term

\[\frac{dK}{dm} \zeta\]

can be computed using either a higher order tangent-linear or higher-order adjoint calculation. This generalizes naturally to higher order, by replacing the functional with the directional derivative of a directional derivative.

The function taylor_test_tlm performs such verification tests, considering directional derivatives of a given order, and computing all derivatives using tangent-linear calculations. Each directional derivative requires a new direction to be chosen – by default pseudorandom directions are generated.

Let’s apply this test up to fourth order:

[4]:

np.random.seed(76149511)

for order in range(1, 5):
    min_order = taylor_test_tlm(forward, m, tlm_order=order, seed=1.0e-3)
    assert min_order > 1.99

Error norms, no tangent-linear   = [2.14011610e-04 1.07017190e-04 5.35114451e-05 2.67564355e-05
 1.33783961e-05]
Orders,      no tangent-linear   = [0.99984651 0.99992316 0.99996156 0.99998077]
Error norms, with tangent-linear = [4.55807684e-08 1.14053932e-08 2.85262116e-09 7.13314307e-10
 1.78348453e-10]
Orders,      with tangent-linear = [1.99870907 1.99935611 1.99967835 1.99983921]
Error norms, no tangent-linear   = [1.76713220e-04 8.84698206e-05 4.42631569e-05 2.21386331e-05
 1.10710793e-05]
Orders,      no tangent-linear   = [0.99815267 0.99907905 0.9995402  0.99977027]
Error norms, with tangent-linear = [4.52243872e-07 1.12911424e-07 2.82091088e-08 7.04993071e-09
 1.76218937e-09]
Orders,      with tangent-linear = [2.00190949 2.00095847 2.00048011 2.00024011]
Error norms, no tangent-linear   = [1.01396288e-03 5.06448981e-04 2.53090600e-04 1.26511730e-04
 6.32474607e-05]
Orders,      no tangent-linear   = [1.001516   1.00076302 1.00038276 1.0001917 ]
Error norms, with tangent-linear = [2.13811439e-06 5.36596213e-07 1.34407417e-07 3.36341442e-08
 8.41257242e-09]
Orders,      with tangent-linear = [1.99443026 1.99722412 1.99861429 1.99930763]

tsfc:WARNING Estimated quadrature degree 11 more than tenfold greater than any argument/coefficient degree (max 1)

Estimated quadrature degree 11 more than tenfold greater than any argument/coefficient degree (max 1)
Error norms, no tangent-linear   = [0.00575594 0.00289012 0.0014481  0.00072481 0.00036259]
Orders,      no tangent-linear   = [0.99392152 0.99697228 0.998489   0.99924522]
Error norms, with tangent-linear = [4.85843576e-05 1.21408907e-05 3.03457070e-06 7.58561041e-07
 1.89630048e-07]
Orders,      with tangent-linear = [2.00061762 2.00030993 2.00015525 2.00007769]

The function taylor_test_tlm_adjoint also performs such verification tests, but computes the highest order derivative information using the adjoint method.

Let’s apply this test up to fourth order:

[5]:

np.random.seed(74728054)

for order in range(1, 5):
    min_order = taylor_test_tlm_adjoint(forward, m, adjoint_order=order, seed=1.0e-3)
    assert min_order > 1.99

Error norms, no adjoint   = [1.96925244e-04 9.84831481e-05 4.92467146e-05 2.46246436e-05
 1.23126435e-05]
Orders,      no adjoint   = [0.99969928 0.9998494  0.99992464 0.9999623 ]
Error norms, with adjoint = [8.22009802e-08 2.05743897e-08 5.14661260e-09 1.28702985e-09
 3.21804463e-10]
Orders,      with adjoint = [1.99830596 1.99915454 1.99957767 1.99978928]
Error norms, no adjoint   = [6.06217536e-05 3.03568452e-05 1.51898950e-05 7.59781316e-06
 3.79962268e-06]
Orders,      no adjoint   = [0.99781372 0.99890997 0.99945576 0.99972807]
Error norms, with adjoint = [1.83663729e-07 4.58635021e-08 1.14593025e-08 2.86400287e-09
 7.15897848e-10]
Orders,      with adjoint = [2.00164831 2.00082729 2.00041439 2.00020729]
Error norms, no adjoint   = [1.77271516e-04 8.85216211e-05 4.42321703e-05 2.21089118e-05
 1.10526609e-05]
Orders,      no adjoint   = [1.00185896 1.00093384 1.00046801 1.00023428]
Error norms, with adjoint = [4.57678459e-07 1.14702543e-07 2.87109829e-08 7.18216294e-09
 1.79609276e-09]
Orders,      with adjoint = [1.99643702 1.99822274 1.99911243 1.99955652]

tsfc:WARNING Estimated quadrature degree 11 more than tenfold greater than any argument/coefficient degree (max 1)

Estimated quadrature degree 11 more than tenfold greater than any argument/coefficient degree (max 1)
Error norms, no adjoint   = [2.68076588e-04 1.34645957e-04 6.74748172e-05 3.37753585e-05
 1.68971655e-05]
Orders,      no adjoint   = [0.99347432 0.99674984 0.99837808 0.99918983]
Error norms, with adjoint = [2.42982799e-06 6.07250783e-07 1.51786806e-07 3.79434582e-08
 9.48545874e-09]
Orders,      with adjoint = [2.00048984 2.00024605 2.00012331 2.00006172]