{ "cells": [ { "cell_type": "markdown", "id": "40c3ebad", "metadata": {}, "source": [ "# Getting started with tlm_adjoint\n", "\n", "This notebook introduces derivative calculations using tlm_adjoint.\n", "\n", "tlm_adjoint is primarily intended for first derivative calculations using the adjoint method, and Hessian information calculations using the adjoint method applied to tangent-linear and forward calculations. However the approach used by tlm_adjoint generalizes to higher order.\n", "\n", "The approach used by tlm_adjoint for higher order differentiation is described in:\n", "\n", "- James R. Maddison, Daniel N. Goldberg, and Benjamin D. Goddard, 'Automated calculation of higher order partial differential equation constrained derivative information', SIAM Journal on Scientific Computing, 41(5), pp. C417–C445, 2019, doi: 10.1137/18M1209465\n", "\n", "## A floating point example\n", "\n", "We consider a simple floating point calculation:" ] }, { "cell_type": "code", "execution_count": null, "id": "0c79382a", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "x = 1.0\n", "y = 0.25 * np.pi\n", "z = x * np.sin(y * np.exp(y))" ] }, { "cell_type": "markdown", "id": "e82699c6", "metadata": {}, "source": [ "tlm_adjoint is designed for high-level algorithmic differentiation, and not this type of low-level floating point calculation. However it *can* still process simple floating point calculations, so to introduce the key ideas we do that here. We consider differentiating `z` with respect to `x` and `y` – being precise, we mean computing the derivative of the function used to compute `z` with respect to the variables defined by `x` and `y`.\n", "\n", "## Adding tlm_adjoint\n", "\n", "We first modify the code so that tlm_adjoint processes the calculations:" ] }, { "cell_type": "code", "execution_count": null, "id": "dfb3955a", "metadata": {}, "outputs": [], "source": [ "from tlm_adjoint import *\n", "\n", "import numpy as np\n", "\n", "reset_manager()\n", "\n", "\n", "def forward(x, y):\n", " z = x * np.sin(y * np.exp(y))\n", " return z\n", "\n", "\n", "x = Float(1.0, name=\"x\")\n", "y = Float(0.25 * np.pi, name=\"y\")\n", "\n", "start_manager()\n", "z = forward(x, y)\n", "stop_manager()" ] }, { "cell_type": "markdown", "id": "4bb98432", "metadata": {}, "source": [ "The key changes here are:\n", "\n", "- To import tlm_adjoint.\n", "- Controlling the 'manager' – an object tlm_adjoint uses to process equations. The manager is first reset using `reset_manager`. This clears any previous processing, and disables the manager. `start_manager` and `stop_manager` are then used to enable the manager just when it is needed.\n", "- Defining `x` and `y` to be of type `Float`. Calculations involving `x` and `y` are recorded by the manager. The result of the calculations – here `z` – will have the same type, and we can access its value with `float(z)`.\n", "\n", "Let's display the information recorded by tlm_adjoint:" ] }, { "cell_type": "code", "execution_count": null, "id": "ba11e403", "metadata": {}, "outputs": [], "source": [ "manager_info()" ] }, { "cell_type": "markdown", "id": "a0138110", "metadata": {}, "source": [ "The key feature here is that there are four `FloatEquation` records, corresponding to the four floating point calculations – evaluation using `np.exp` and `np.sin`, and two multiplications.\n", "\n", "## Computing derivatives using an adjoint\n", "\n", "The `compute_gradient` function can be used to differentiate `z` with respect to `x` and `y` using the adjoint method:" ] }, { "cell_type": "code", "execution_count": null, "id": "66f6f440", "metadata": {}, "outputs": [], "source": [ "dz_dx, dz_dy = compute_gradient(z, (x, y))\n", "\n", "print(f\"{float(dz_dx)=}\")\n", "print(f\"{float(dz_dy)=}\")" ] }, { "cell_type": "markdown", "id": "1796fdfd", "metadata": {}, "source": [ "For a simple check of the result, we can compare with the result from finite differencing, here using second order centered finite differencing:" ] }, { "cell_type": "code", "execution_count": null, "id": "28d0c8a0", "metadata": {}, "outputs": [], "source": [ "def dJ_dm(J, m, *, eps=1.0e-7):\n", " return (J(m + eps) - J(m - eps)) / (2.0 * eps)\n", "\n", "\n", "print(f\"dz_dx approximation = {dJ_dm(lambda x: forward(x, float(y)), float(x))}\")\n", "print(f\"dz_dy approximation = {dJ_dm(lambda y: forward(float(x), y), float(y))}\")" ] }, { "cell_type": "markdown", "id": "4ef1b47b", "metadata": {}, "source": [ "## Computing derivatives using a tangent-linear\n", "\n", "A tangent-linear computes directional derivatives with respect to a given control and with a given direction.\n", "\n", "Here we consider the forward to be a function of `x` and `y`, computing a value of `z`, denoted $z \\left( x, y \\right)$. We consider the control $m = \\left( x, y \\right)^T$, and a direction $\\zeta = \\left( 2, 3 \\right)^T$. We can then use a tangent-linear to compute the directional derivative\n", "\n", "$$\n", " \\frac{dz}{dm} \\zeta = 2 \\frac{dz}{dx} + 3 \\frac{dz}{dy},\n", "$$\n", "\n", "where vector derivatives are notated using row vectors.\n", "\n", "In most cases tlm_adjoint needs to be told what tangent-linear calculations to perform *ahead* of the forward calculations. `configure_tlm` provides this information to tlm_adjoint:" ] }, { "cell_type": "code", "execution_count": null, "id": "0220eb94", "metadata": {}, "outputs": [], "source": [ "from tlm_adjoint import *\n", "\n", "import numpy as np\n", "\n", "reset_manager()\n", "\n", "\n", "def forward(x, y):\n", " z = x * np.sin(y * np.exp(y))\n", " return z\n", "\n", "\n", "x = Float(1.0, name=\"x\")\n", "y = Float(0.25 * np.pi, name=\"y\")\n", "\n", "m = (x, y)\n", "zeta = (Float(2.0, name=\"zeta_x\"), Float(3.0, name=\"zeta_y\"))\n", "configure_tlm((m, zeta))\n", "\n", "start_manager()\n", "z = forward(x, y)\n", "stop_manager()\n", "\n", "dz_dm_zeta = var_tlm(z, (m, zeta))\n", "\n", "print(f\"{float(dz_dm_zeta)=}\")" ] }, { "cell_type": "markdown", "id": "c00043a7", "metadata": {}, "source": [ "There are three new changes:\n", "\n", "- The control $m$ and direction $\\zeta$ are defined using `m` and `zeta` respectively.\n", "- Tangent-linear calculations are configured *before* the forward calculations are performed, using `configure_tlm`. Note that here, for this first derivative calculation, the argument is a single control-direction pair.\n", "- We access the tangent-linear variable, containing the value of $\\left( dz/dm \\right) \\zeta$, using `var_tlm`, and using the same control-direction pair.\n", "\n", "In fact more has happened here – if we now display the information recorded by tlm_adjoint:" ] }, { "cell_type": "code", "execution_count": null, "id": "5b2a2150", "metadata": {}, "outputs": [], "source": [ "manager_info()" ] }, { "cell_type": "markdown", "id": "527f3b8b", "metadata": {}, "source": [ "we now see that there are *eight* `FloatEquation` records – the original four, and four new ones. The extra ones correspond to the tangent-linear calculations. tlm_adjoint has recorded the original forward calculations, and *also* recorded the tangent-linear calculations.\n", "\n", "## Computing second derivatives using an adjoint of a tangent-linear\n", "\n", "Since tlm_adjoint has recorded both forward and tangent-linear calculations, we can now compute second derivative information using an adjoint associated with a tangent-linear. Specifically we can compute a Hessian action on $\\zeta$,\n", "\n", "$$\n", " H \\zeta = \\frac{d}{dm} \\left( \\frac{dz}{dm} \\zeta \\right)^T.\n", "$$\n", "\n", "The inner directional derivative appearing here is computed using the tangent-linear method, and the outer derivative is computed by applying the adjoint method to the tangent-linear and forward calculations. In code we simply use `compute_gradient` to compute the derivative of the tangent-linear result:" ] }, { "cell_type": "code", "execution_count": null, "id": "8612e275", "metadata": {}, "outputs": [], "source": [ "from tlm_adjoint import *\n", "\n", "import numpy as np\n", "\n", "reset_manager()\n", "\n", "\n", "def forward(x, y):\n", " z = x * np.sin(y * np.exp(y))\n", " return z\n", "\n", "\n", "x = Float(1.0, name=\"x\")\n", "y = Float(0.25 * np.pi, name=\"y\")\n", "\n", "m = (x, y)\n", "zeta = (Float(2.0, name=\"zeta_x\"), Float(3.0, name=\"zeta_y\"))\n", "configure_tlm((m, zeta))\n", "\n", "start_manager()\n", "z = forward(x, y)\n", "stop_manager()\n", "\n", "dz_dm_zeta = var_tlm(z, (m, zeta))\n", "\n", "print(f\"{float(dz_dm_zeta)=}\")\n", "\n", "d2z_dm_zeta_dx, d2z_dm_zeta_dy = compute_gradient(dz_dm_zeta, m)\n", "\n", "print(f\"{float(d2z_dm_zeta_dx)=}\")\n", "print(f\"{float(d2z_dm_zeta_dy)=}\")" ] }, { "cell_type": "markdown", "id": "b4b0fac5", "metadata": {}, "source": [ "## Computing second derivatives using a tangent-linear of a tangent-linear\n", "\n", "We can also compute second derivative information using a tangent-linear associated with a tangent-linear. For example if we define $e_1 = \\left( 1, 0 \\right)^T$, then we can find the first component of the previously computed Hessian action on $\\zeta$ via\n", "\n", "$$\n", " e_1^T H \\zeta = \\left[ \\frac{d}{dm} \\left( \\frac{dz}{dm} \\zeta \\right) \\right] e_1.\n", "$$\n", "\n", "That is, here we now want to compute a directional derivative of a directional derivative, and we compute this using a tangent-linear associated with the previous tangent-linear and forward calculations.\n", "\n", "tlm_adjoint handles this case by supplying more arguments to `configure_tlm`:" ] }, { "cell_type": "code", "execution_count": null, "id": "5c02e84f", "metadata": {}, "outputs": [], "source": [ "from tlm_adjoint import *\n", "\n", "import numpy as np\n", "\n", "reset_manager()\n", "\n", "\n", "def forward(x, y):\n", " z = x * np.sin(y * np.exp(y))\n", " return z\n", "\n", "\n", "x = Float(1.0, name=\"x\")\n", "y = Float(0.25 * np.pi, name=\"y\")\n", "\n", "m = (x, y)\n", "zeta = (Float(2.0, name=\"zeta_x\"), Float(3.0, name=\"zeta_y\"))\n", "e_1 = (Float(1.0, name=\"e_1_x\"), Float(0.0, name=\"e_1_y\"))\n", "configure_tlm((m, zeta), (m, e_1))\n", "\n", "start_manager()\n", "z = forward(x, y)\n", "stop_manager()\n", "\n", "dz_dm_zeta = var_tlm(z, (m, zeta))\n", "dz_dx = var_tlm(z, (m, e_1))\n", "d2z_dm_zeta_dx = var_tlm(z, (m, zeta), (m, e_1))\n", "\n", "print(f\"{float(dz_dm_zeta)=}\")\n", "print(f\"{float(dz_dx)=}\")\n", "print(f\"{float(d2z_dm_zeta_dx)=}\")" ] }, { "cell_type": "markdown", "id": "88b3a4f1", "metadata": {}, "source": [ "The first control-direction pair supplied to `configure_tlm` indicates that we seek to compute directional derivatives of the forward with respect to the control defined by `m` with direction defined by `zeta`. The second control-direction pair indicates that we seek to compute directional deriatives of *these* directional derivatives, with respect to the control defined by `m` and with direction defined by `e_1`. As a side-effect we find that we also compute the directional derivatives of the forward with respect to the control defined by `m` with direction defined by `e_1`.\n", "\n", "We then access the tangent-linear variables using `var_tlm`, supplying two control-variable pairs to access a second order tangent-linear variable.\n", "\n", "As before, tlm_adjoint has not just performed the tangent-linear calculations – if we display the information recorded by tlm_adjoint:" ] }, { "cell_type": "code", "execution_count": null, "id": "cb6739a1", "metadata": {}, "outputs": [], "source": [ "manager_info()" ] }, { "cell_type": "markdown", "id": "1e4fde63", "metadata": {}, "source": [ "we now find that there are *sixteen* `FloatEquation` records, constituting the forward and all tangent-linear calculations.\n", "\n", "## Computing third derivatives using an adjoint of a tangent-linear of a tangent-linear\n", "\n", "We can now compute the derivative of the tangent-linear-computed second derivative by simply handing the second order tangent-linear variable to `compute_gradient`. This applies the adjoint method to the higher order tangent-linear calculations and the forward calculations, computing\n", "\n", "$$\n", " \\frac{d}{dm} \\left( e_1^T H \\zeta \\right) = \\frac{d}{dm} \\left[ \\left[ \\frac{d}{dm} \\left( \\frac{dz}{dm} \\zeta \\right) \\right] e_1 \\right].\n", "$$" ] }, { "cell_type": "code", "execution_count": null, "id": "c60bc4b8", "metadata": {}, "outputs": [], "source": [ "from tlm_adjoint import *\n", "\n", "import numpy as np\n", "\n", "reset_manager()\n", "\n", "\n", "def forward(x, y):\n", " z = x * np.sin(y * np.exp(y))\n", " return z\n", "\n", "\n", "x = Float(1.0, name=\"x\")\n", "y = Float(0.25 * np.pi, name=\"y\")\n", "\n", "m = (x, y)\n", "zeta = (Float(2.0, name=\"zeta_x\"), Float(3.0, name=\"zeta_y\"))\n", "e_1 = (Float(1.0, name=\"e_1_x\"), Float(0.0, name=\"e_1_y\"))\n", "configure_tlm((m, zeta), (m, e_1))\n", "\n", "start_manager()\n", "z = forward(x, y)\n", "stop_manager()\n", "\n", "dz_dm_zeta = var_tlm(z, (m, zeta))\n", "dz_dx = var_tlm(z, (m, e_1))\n", "d2z_dm_zeta_dx = var_tlm(z, (m, zeta), (m, e_1))\n", "\n", "print(f\"{float(dz_dm_zeta)=}\")\n", "print(f\"{float(dz_dx)=}\")\n", "print(f\"{float(d2z_dm_zeta_dx)=}\")\n", "\n", "d3z_dm_zeta_dx_dx, d3z_dm_zeta_dx_dy = compute_gradient(d2z_dm_zeta_dx, m)\n", "\n", "print(f\"{float(d3z_dm_zeta_dx_dx)=}\")\n", "print(f\"{float(d3z_dm_zeta_dx_dy)=}\")" ] }, { "cell_type": "markdown", "id": "b1a3e919", "metadata": {}, "source": [ "## Higher order\n", "\n", "The approach now generalizes.\n", "\n", "- Supplying further arguments to `configure_tlm` indicates directional derivatives of directional derivatives, defining a tangent-linear calculation of increasingly high order.\n", "- Supplying these arguments to `var_tlm` accesses the higher-order tangent-linear variables.\n", "- These higher order tangent-linear variables can be handed to `compute_gradient` to compute derivatives of the higher order derivatives using the adjoint method." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 5 }