diff --git a/README.md b/README.md
index 384ac69..e87b212 100644
--- a/README.md
+++ b/README.md
@@ -51,7 +51,7 @@ The chain rule and associativity: forward vs. reverse differentiation.   The cha
 
 In part 2 (last few minutes), began setting up some example problems involving matrix functions of matrices, and "[vectorization](https://en.wikipedia.org/wiki/Vectorization_(mathematics))" of matrices to vectors and linear operators to [Kronecker products](https://en.wikipedia.org/wiki/Kronecker_product).  To be continued.
 
-**Further reading**: The terms "forward-mode" and "reverse-mode" differentiation are most prevalent in [automatic differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation), which will will cover later in this course. You can find many, many articles online about [backpropagation](https://en.wikipedia.org/wiki/Backpropagation) in neural networks.   There are many other versions of this, e.g. in differential geometry the derivative linear operator (multiplying Jacobians and perturbations dx right-to-left) is called a [pushforward](https://en.wikipedia.org/wiki/Pushforward_(differential)), whereas multiplying a gradient row vector (covector) by a Jacobian left-to-right is called a [pullback](https://en.wikipedia.org/wiki/Pullback_(differential_geometry)).
+**Further reading**: The terms "forward-mode" and "reverse-mode" differentiation are most prevalent in [automatic differentiation (AD)](https://en.wikipedia.org/wiki/Automatic_differentiation), which will will cover later in this course. You can find many, many articles online about [backpropagation](https://en.wikipedia.org/wiki/Backpropagation) in neural networks.   There are many other versions of this, e.g. in differential geometry the derivative linear operator (multiplying Jacobians and perturbations dx right-to-left) is called a [pushforward](https://en.wikipedia.org/wiki/Pushforward_(differential)), whereas multiplying a gradient row vector (covector) by a Jacobian left-to-right is called a [pullback](https://en.wikipedia.org/wiki/Pullback_(differential_geometry)).   This [video on the principles of AD in Julia](https://www.youtube.com/watch?v=UqymrMG-Qi4) by [Mohamed Tarek](https://github.com/mohamed82008) also starts with a similar left-to-right (reverse) vs right-to-left (forward) viewpoint and goes into how it translates to Julia code, and how you define custom chain-rule steps for Julia AD.
 
 ## Lecture 3 (Jan 14)
 
@@ -90,11 +90,33 @@ Automatic differentiation, guest lecture by [Dr. Chris Rackauckas](https://chris
 
 # Lecture 6 (Jan 24)
 
-* part 1: derivatives of eigenproblems [(html)](https://rawcdn.githack.com/mitmath/matrixcalc/69cbb521f357a9633ea4e5f77079eb399986240a/symeig.jl.html) [(julia source)](symeig.jl)
+* part 1: derivatives of eigenproblems [(html)](https://rawcdn.githack.com/mitmath/matrixcalc/61a7b3e0cbebd0ccdc126fbe831d1398154e272b/symeig.jl.html) [(julia source)](symeig.jl)
 * part 2: second derivatives, bilinear forms, and Hessian matrices [(notes)](https://www.dropbox.com/s/tde5cow6wuais8y/Hessians.pdf?dl=0)
 * [video](https://mit.zoom.us/rec/share/Av05JqxIoIO9woSRjxZVh5AA4cXGeq1mkTUFyqqM5ZU2YElVbsPupJ_zrWwwemUU.8Ten6yxTyONNy8ya?startTime=1643039961000)
 
 **Further reading (part 1)**: Computing derivatives on curved surfaces ("manifolds") is closely related to [tangent spaces](https://en.wikipedia.org/wiki/Tangent_space) in differential geometry.   The effect of constraints can also be expressed in terms of [Lagrange multipliers](https://en.wikipedia.org/wiki/Lagrange_multiplier), which are useful in expressing optimization problems with constraints (see also chapter 5 of [Convex Optimization](https://web.stanford.edu/~boyd/cvxbook/) by Boyd and Vandenberghe).
 In physics, first and second derivatives of eigenvalues and first derivatives of eigenvectors are often presented as part of ["time-independent" perturbation theory](https://en.wikipedia.org/wiki/Perturbation_theory_(quantum_mechanics)#Time-independent_perturbation_theory) in quantum mechanics, or as the [Hellmann–Feynmann theorem](https://en.wikipedia.org/wiki/Hellmann%E2%80%93Feynman_theorem) for the case of dλ.    The derivative of an eigenvector involves *all* of the other eigenvectors, but a much simpler "vector–Jacobian product" (involving only a single eigenvector and eigenvalue) can be obtained from left-to-right differentiation of a *scalar function* of an eigenvector, as reviewed in the [18.335 notes on adjoint methods](https://github.com/mitmath/18335/blob/spring21/notes/adjoint/adjoint.pdf).
 
-**Further reading (part 2)**: [Bilinear forms](https://en.wikipedia.org/wiki/Bilinear_form) are an important generalization of quadratic operations to arbitrary vector spaces, and we saw that the second derivative can be viewed as a [symmetric bilinear forms](https://en.wikipedia.org/wiki/Symmetric_bilinear_form).   This is closely related to a [quadratic form](https://en.wikipedia.org/wiki/Quadratic_form), which is just what we get by plugging in the same vector twice, e.g. the f''(x)[δx,δx]/2 that appears in quadratic approximations for f(x+δx) is a quadratic form.  The most familar multivariate version of f''(x) is the [Hessian matrix](https://en.wikipedia.org/wiki/Hessian_matrix); Khan academy has an elementary [introduction to quadratic approximation](https://www.khanacademy.org/math/multivariable-calculus/applications-of-multivariable-derivatives/quadratic-approximations/a/quadratic-approximation) using the Hessian.
\ No newline at end of file
+**Further reading (part 2)**: [Bilinear forms](https://en.wikipedia.org/wiki/Bilinear_form) are an important generalization of quadratic operations to arbitrary vector spaces, and we saw that the second derivative can be viewed as a [symmetric bilinear forms](https://en.wikipedia.org/wiki/Symmetric_bilinear_form).   This is closely related to a [quadratic form](https://en.wikipedia.org/wiki/Quadratic_form), which is just what we get by plugging in the same vector twice, e.g. the f''(x)[δx,δx]/2 that appears in quadratic approximations for f(x+δx) is a quadratic form.  The most familar multivariate version of f''(x) is the [Hessian matrix](https://en.wikipedia.org/wiki/Hessian_matrix); Khan academy has an elementary [introduction to quadratic approximation](https://www.khanacademy.org/math/multivariable-calculus/applications-of-multivariable-derivatives/quadratic-approxi
+
+# Lecture 7 (Jan 26)
+
+* part 1: continued [Hessian notes](https://www.dropbox.com/s/tde5cow6wuais8y/Hessians.pdf?dl=0) from previous lecture: minima/maxima and f″ definiteness, Hessian conditioning and steepest-descent convergence
+* part 2: derivatives and backpropagation on graphs and linear operators (to be posted)
+* [video](https://mit.zoom.us/rec/share/DblFFU72Nary_yKfaQis0WaDoFEznD-92EPr52LHE1QBKcVWPUlmBPgApjre2uf9.oqtYrgEg73glPWx-?startTime=1643212653000)
+* [pset 2 solutions](hw2sol.pdf) and computational [notebook](https://nbviewer.org/github/mitmath/matrixcalc/blob/main/hw2sol.ipynb)
+
+**Further reading (part 1)**: [Positive-definite](https://en.wikipedia.org/wiki/Definite_matrix) Hessian matrices, or more generally [definite quadratic forms](https://en.wikipedia.org/wiki/Definite_quadratic_form) f″, appear at extrema (f′=0) of scalar-valued functions f(x) that are local minima; there a lot [more formal treatments](http://www.columbia.edu/~md3405/Unconstrained_Optimization.pdf) of the same idea, and conversely Khan academy has the [simple 2-variable version](https://www.khanacademy.org/math/multivariable-calculus/applications-of-multivariable-derivatives/optimizing-multivariable-functions/a/second-partial-derivative-test) where you can check the sign of the 2×2 eigenvalues just by looking at the determinant and a single entry (or the trace).  There's a nice [stackexchange discussion](https://math.stackexchange.com/questions/2285282/relating-condition-number-of-hessian-to-the-rate-of-convergence) on why an [ill-conditioned](https://nhigham.com/2020/03/19/what-is-a-condition-number/) Hessian tends to make steepest descent converge slowly; some Toronto [course notes on the topic](https://www.cs.toronto.edu/~rgrosse/courses/csc421_2019/slides/lec07.pdf) may also be helpful.
+
+**Further reading (part 2)**: See this blog post on [calculus on computational graphs](https://colah.github.io/posts/2015-08-Backprop/) for a gentle review, and these Columbia [course notes](http://www.cs.columbia.edu/~mcollins/ff2.pdf) for a more formal approach.
+
+
+# Lecture 8 (Jan 28)
+
+* part 1: continued [Hessian notes](https://www.dropbox.com/s/tde5cow6wuais8y/Hessians.pdf?dl=0): using Hessians for optimization (Newton & quasi-Newton & Newton–Krylov methods), cost of Hessians
+* part 2: adjoint differentiation of ODES (guest lecture by Dr. Chris Rackauckas): [notes from 18.337](https://rawcdn.githack.com/mitmath/18337/7b0e890e1211bfa253782f7862389aeaa321e8d7/lecture11/adjoints.html)
+* [video](https://mit.zoom.us/rec/share/4yJfjUXsAhykKGebU8lw1wqOb_TIwK-WISEHwEbG8WyB2bLlyYLjkZXQHonT2Tte.lLaSTRDjVxYocEmz?startTime=1643385485000)
+
+**Further reading (part 1)**: See e.g. these Stanford notes on [sequential quadratic optimization](https://web.stanford.edu/class/ee364b/lectures/seq_notes.pdf) using trust regions (sec. 2.2).  See 18.335 [notes on BFGS quasi-Newton methods](https://github.com/mitmath/18335/blob/spring21/notes/BFGS.pdf) (also [video](https://mit.zoom.us/rec/share/naqcRgSkZ0VNeDp0ht8QmB566mPowuHJ8k0LcaAmZ7XxaCT1ch4j_O4Khzi-taXm.CXI8xFthag4RvvoC?startTime=1620241284000)).   The fact that a quadratic optimization problem in a sphere has [strong duality](https://en.wikipedia.org/wiki/Strong_duality) and hence is efficiently solvable is discussed in section 5.2.4 of the [*Convex Optimization* book](https://web.stanford.edu/~boyd/cvxbook/).  There has been a lot of work on [automatic Hessian computation](https://en.wikipedia.org/wiki/Hessian_automatic_differentiation), but for large-scale problems you can ultimately only compute Hessian–vector products efficiently in general, which are equivalent to a directional derivative of the gradient, and can be used e.g. for [Newton–Krylov methods](https://en.wikipedia.org/wiki/Newton%E2%80%93Krylov_method).
+
+**Further reading (part 2)**: A very general reference on adjoint-method (reverse-mode/backpropagation) differentiation of ODEs (and generalizations thereof), using notation similar to that of Chris R. today, is [Cao et al (2003)](https://epubs.siam.org/doi/10.1137/S1064827501380630) ([pdf](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.65.455&rep=rep1&type=pdf)).  See also the [adjoint sensitivity analysis](https://diffeq.sciml.ai/stable/extras/sensitivity_math/) and [automatic sensitivity analysis](https://diffeq.sciml.ai/stable/analysis/sensitivity/) sections of Chris's amazing [DifferentialEquations.jl software suite](https://diffeq.sciml.ai/stable/) for numerical solution of ODEs in Julia.  There is a nice YouTube [lecture on adjoint sensitivity of ODEs](https://www.youtube.com/watch?v=k6s2G5MZv-I), again using a similar notation.
diff --git a/hw2.pdf b/hw2.pdf
index c72620a..c8fc977 100644
Binary files a/hw2.pdf and b/hw2.pdf differ
diff --git a/hw2sol.ipynb b/hw2sol.ipynb
new file mode 100644
index 0000000..878e88c
--- /dev/null
+++ b/hw2sol.ipynb
@@ -0,0 +1,799 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "eb604661",
+   "metadata": {},
+   "source": [
+    "Similar to homework 1, let's define a relative-error function for quantifying the error from finite differences:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "48cd6341",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "relerr (generic function with 1 method)"
+      ]
+     },
+     "execution_count": 1,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "using LinearAlgebra # for norm, kron, I(n) etc.\n",
+    "\n",
+    "relerr(approx, exact) = norm(approx - exact) / norm(exact)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3ccad3f3",
+   "metadata": {},
+   "source": [
+    "Let's also define the $\\otimes$ operator (type it by `\\otimes` followed by TAB) to be the Kronecker product (the `kron` function in the `LinearAlgebra` library) so we can use math notation `A ⊗ B` instead of `kron(A, B)`:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "24b5fd38",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "kron (generic function with 21 methods)"
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "const ⊗ = kron"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "43df9092",
+   "metadata": {},
+   "source": [
+    "We'll also load the Zygote AD library for analytical comparisons:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "0a1fc52b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "using Zygote"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4c8e37b5",
+   "metadata": {},
+   "source": [
+    "# Problem 2\n",
+    "\n",
+    "Here, $f(A) = \\sqrt{A}$, and the homework says the Jacobian (acting on $\\operatorname{vec}(A)$) should be $(I\\otimes \\sqrt{A} + \\sqrt{A}^T \\otimes I)^{-1}$.  Let's try it out for a random positive-definite $A$:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "63dc982c",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "1×1 Matrix{Float64}:\n",
+       " 1.3731197861011764"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "n = 1 # should be big enough to be interesting\n",
+    "B = randn(n,n)\n",
+    "A = B'B # random positive definite"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "993f6868",
+   "metadata": {},
+   "source": [
+    "The matrix square root is just `sqrt(A)` in Julia:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "cb14ff97",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "0.0"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# check that sqrt(A)² ≈ A, up to roundoff errors:\n",
+    "relerr(sqrt(A)^2, A)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "0b3cb733",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "1×1 Matrix{Float64}:\n",
+       " 0.42669326873498054"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Jacobian:\n",
+    "J = (I(n) ⊗ sqrt(A) + sqrt(A)' ⊗ I(n))^-1"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3b89a4f5",
+   "metadata": {},
+   "source": [
+    "Now, let's check this against finite difference for a small random $dA$:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "0d188b50",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "3.726301788530775e-8"
+      ]
+     },
+     "execution_count": 7,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "dA = randn(n,n) * 1e-8\n",
+    "\n",
+    "relerr( vec(sqrt(A+dA) - sqrt(A)),  # finite-difference directional derivative\n",
+    "        J * vec(dA) )               # vs. exact expression"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b5574a08",
+   "metadata": {},
+   "source": [
+    "Hooray, it worked!"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4cd27f68",
+   "metadata": {},
+   "source": [
+    "# Problem 3"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e3d8f59a",
+   "metadata": {},
+   "source": [
+    "Let's define our function $f(p)$ as in the problem.  We'll pass the extra parameters $A_0$ etcetera explicitly (or we could use global variables):"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "c2b586ab",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "f (generic function with 1 method)"
+      ]
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "function f(p, A₀, a, B₀, b, F)\n",
+    "    A = A₀ + diagm(p)\n",
+    "    x = A \\ a  # A⁻¹a\n",
+    "    B = B₀ + diagm(x.*x)\n",
+    "    y = B \\ b  # B⁻¹b\n",
+    "    return y'*F*y\n",
+    "end"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1ba4a2c9",
+   "metadata": {},
+   "source": [
+    "Now, we'll pick some random parameters and try it out:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "2147c2b2",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "49.975441720918774"
+      ]
+     },
+     "execution_count": 9,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "n = 5\n",
+    "A₀ = randn(n,n)\n",
+    "a = randn(5)\n",
+    "B₀ = randn(n,n)\n",
+    "b = randn(5)\n",
+    "F = randn(n,n); F = F + F'\n",
+    "p = rand(5)\n",
+    "f(p, A₀, a, B₀, b, F)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2484c042",
+   "metadata": {},
+   "source": [
+    "Now, let's implement the *manual* \"adjoint\" gradient from the solutions, using the same notation:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "002b28b3",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "∇f (generic function with 1 method)"
+      ]
+     },
+     "execution_count": 10,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "function ∇f(p, A₀, a, B₀, b, F)\n",
+    "    # the forward solution, copied from above\n",
+    "    #    (note that in serious computation we would want to re-use this from f)\n",
+    "    A = A₀ + diagm(p)\n",
+    "    x = A \\ a  # A⁻¹a\n",
+    "    B = B₀ + diagm(x.*x)\n",
+    "    y = B \\ b  # B⁻¹b\n",
+    "    \n",
+    "    # the reverse-mode gradient\n",
+    "    g′ᵀ = 2F*y         # step (i)\n",
+    "    u = B' \\ -g′ᵀ      # step (ii): adjoint problem 1\n",
+    "    w = 2u .* x .* y   # step (iii)\n",
+    "    z = A' \\ -w        # step (iv): adjoint problem 2\n",
+    "    return z .* x      # step (v): ∇f\n",
+    "end"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "5fa14097",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "5-element Vector{Float64}:\n",
+       "  262.55155103380923\n",
+       " -273.8595112733488\n",
+       "   96.32881037878508\n",
+       "  217.43756038923235\n",
+       " -171.79342086396048"
+      ]
+     },
+     "execution_count": 11,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "∇f(p, A₀, a, B₀, b, F)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3c1b030e",
+   "metadata": {},
+   "source": [
+    "## problem 3b\n",
+    "\n",
+    "First, we'll compare it against finite differences for a random small `dp`:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "657849bd",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "1.428248665134681e-7"
+      ]
+     },
+     "execution_count": 12,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "dp = randn(n) * 1e-8\n",
+    "\n",
+    "relerr( f(p+dp, A₀, a, B₀, b, F) - f(p, A₀, a, B₀, b, F),  # finite difference\n",
+    "        ∇f(p, A₀, a, B₀, b, F)'dp) # exact directional derivative"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b5371853",
+   "metadata": {},
+   "source": [
+    "Hooray, it matches!"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cae7ae8e",
+   "metadata": {},
+   "source": [
+    "## problem 3c"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f15be256",
+   "metadata": {},
+   "source": [
+    "To compute $\\nabla f$ with `Zygote`, we have to give Zygote a function of a *single* parameter vector `p` that we want to differentiate with respect to, along with the point `p` at which we want the derivative.  To do that, we will define an [anonymous function](https://docs.julialang.org/en/v1/manual/functions/#man-anonymous-functions) with `p -> ...` that captures the other parameters (also called a [closure](https://en.wikipedia.org/wiki/Closure_(computer_programming)) in computer science):"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "2329ed32",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "([262.5515510338092, -273.8595112733488, 96.32881037878506, 217.43756038923235, -171.7934208639604],)"
+      ]
+     },
+     "execution_count": 13,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "Zygote.gradient(p -> f(p, A₀, a, B₀, b, F), p)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "09a8a25a",
+   "metadata": {},
+   "source": [
+    "(Zygote returns a 1-component tuple of outputs, because it can potentially differentiate with respect to multiple arguments, though here we are just asking for 1.)\n",
+    "\n",
+    "The above looks pretty good if we \"eyeball\" it compared to `∇f` above.  Let's compare it quantitatively:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "c0c3d50f",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "2.1572017312010847e-16"
+      ]
+     },
+     "execution_count": 14,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "relerr(∇f(p, A₀, a, B₀, b, F),                              # manual ∇f\n",
+    "       Zygote.gradient(p -> f(p, A₀, a, B₀, b, F), p)[1])   #  vs AD"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "33c8c7d5",
+   "metadata": {},
+   "source": [
+    "Hooray, it matches up to the limits of roundoff error (to essentially [machine precision](https://en.wikipedia.org/wiki/Machine_epsilon))!"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4cb40856",
+   "metadata": {},
+   "source": [
+    "# Problem 4c"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8834f86c",
+   "metadata": {},
+   "source": [
+    "Demonstrate numerically that  $d(e^A) = \\sum_{k=0}^\\infty  \\! \\! \\frac{1}{k!} (\\sum_{\\ell=0}^{k-1} (A^T)^{k-\\ell-1} \\otimes A^\\ell )dA$"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "36ca240e",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "3×3 Matrix{Float64}:\n",
+       " 0.989401  0.372355  0.475427\n",
+       " 0.94298   0.68495   0.646221\n",
+       " 0.794569  0.13317   0.353663"
+      ]
+     },
+     "execution_count": 15,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "A = rand(3,3)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "df6620d8",
+   "metadata": {},
+   "source": [
+    "### Using ForwardDiff"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "id": "a7dfb9f4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "using ForwardDiff"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "id": "d3176e9d",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "e (generic function with 1 method)"
+      ]
+     },
+     "execution_count": 17,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "e(A) = sum(A^k/factorial(k) for k=0:20) # hmm exp doesn't work, i'll go to k=20"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "id": "c1536121",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "1.2971024122201743e-15"
+      ]
+     },
+     "execution_count": 18,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "relerr(e(A), exp(A))  # check that our sum matches the built-in exp(A) function"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "id": "8b7817cb",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "9×9 Matrix{Float64}:\n",
+       " 3.35707   0.5389     0.688955   1.50121   …  1.04456    0.123629   0.158014\n",
+       " 1.50121   2.82372    0.907286   0.466706     0.338304   0.92634    0.209389\n",
+       " 1.04456   0.265598   2.45771    0.338304     0.244335   0.0576795  0.83731\n",
+       " 0.5389    0.0624931  0.0798733  2.82372      0.265598   0.0290732  0.0371575\n",
+       " 0.170853  0.479248   0.105876   1.3384       0.079272   0.237991   0.0492994\n",
+       " 0.123629  0.0290732  0.434115   0.92634   …  0.0576795  0.0134119  0.216817\n",
+       " 0.688955  0.0798733  0.102087   0.907286     2.45771    0.434115   0.555028\n",
+       " 0.218367  0.612715   0.135322   0.289541     1.21467    2.02443    0.729772\n",
+       " 0.158014  0.0371575  0.555028   0.209389     0.83731    0.216817   1.73404"
+      ]
+     },
+     "execution_count": 19,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "J_AD = ForwardDiff.jacobian(e,A)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "84969d50",
+   "metadata": {},
+   "source": [
+    "### .. or using Finite Differences"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "id": "7b35a6b2",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "9×9 Matrix{Float64}:\n",
+       " 3.35707   0.538899   0.688955   1.50121   …  1.04456    0.123629   0.158014\n",
+       " 1.50121   2.82372    0.907286   0.466706     0.338304   0.92634    0.209389\n",
+       " 1.04456   0.265598   2.45771    0.338304     0.244335   0.0576795  0.83731\n",
+       " 0.5389    0.062493   0.0798732  2.82372      0.265598   0.0290732  0.0371576\n",
+       " 0.170853  0.479248   0.105875   1.3384       0.0792719  0.237991   0.0492994\n",
+       " 0.123629  0.0290732  0.434115   0.92634   …  0.0576795  0.0134118  0.216817\n",
+       " 0.688955  0.0798732  0.102087   0.907286     2.45771    0.434115   0.555028\n",
+       " 0.218367  0.612715   0.135321   0.289541     1.21467    2.02443    0.729772\n",
+       " 0.158014  0.0371575  0.555028   0.209389     0.83731    0.216817   1.73404"
+      ]
+     },
+     "execution_count": 20,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "ϵ = 1e-8\n",
+    "J = zeros(9,0)  # initialize 9x9 Jacobian with 9 rows and no columns\n",
+    "for j=1:3, i=1:3\n",
+    "    dA = zeros(3,3)\n",
+    "    dA[i,j] = ϵ  # perturb the (i,j) entry only\n",
+    "    df = exp(A+dA)-exp(A) # see the perturbed exp\n",
+    "    J = [J vec(df)] # append this to J\n",
+    "end\n",
+    "J_FD = J/ϵ"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d6973563",
+   "metadata": {},
+   "source": [
+    "### The theoretical answer summing to 20"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "id": "aabb597f",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "9×9 Matrix{Float64}:\n",
+       " 3.35707   0.5389     0.688955   1.50121   …  1.04456    0.123629   0.158014\n",
+       " 1.50121   2.82372    0.907286   0.466706     0.338304   0.92634    0.209389\n",
+       " 1.04456   0.265598   2.45771    0.338304     0.244335   0.0576795  0.83731\n",
+       " 0.5389    0.0624931  0.0798733  2.82372      0.265598   0.0290732  0.0371575\n",
+       " 0.170853  0.479248   0.105876   1.3384       0.079272   0.237991   0.0492994\n",
+       " 0.123629  0.0290732  0.434115   0.92634   …  0.0576795  0.0134119  0.216817\n",
+       " 0.688955  0.0798733  0.102087   0.907286     2.45771    0.434115   0.555028\n",
+       " 0.218367  0.612715   0.135322   0.289541     1.21467    2.02443    0.729772\n",
+       " 0.158014  0.0371575  0.555028   0.209389     0.83731    0.216817   1.73404"
+      ]
+     },
+     "execution_count": 21,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# written as nested sum calls:\n",
+    "sum(sum( (A')^(k-ℓ-1) ⊗ A^ℓ for ℓ=0:(k-1))/factorial(k) for k=1:20)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 22,
+   "id": "75eb21e5",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "9×9 Matrix{Float64}:\n",
+       " 3.35707   0.5389     0.688955   1.50121   …  1.04456    0.123629   0.158014\n",
+       " 1.50121   2.82372    0.907286   0.466706     0.338304   0.92634    0.209389\n",
+       " 1.04456   0.265598   2.45771    0.338304     0.244335   0.0576795  0.83731\n",
+       " 0.5389    0.0624931  0.0798733  2.82372      0.265598   0.0290732  0.0371575\n",
+       " 0.170853  0.479248   0.105876   1.3384       0.079272   0.237991   0.0492994\n",
+       " 0.123629  0.0290732  0.434115   0.92634   …  0.0576795  0.0134119  0.216817\n",
+       " 0.688955  0.0798733  0.102087   0.907286     2.45771    0.434115   0.555028\n",
+       " 0.218367  0.612715   0.135322   0.289541     1.21467    2.02443    0.729772\n",
+       " 0.158014  0.0371575  0.555028   0.209389     0.83731    0.216817   1.73404"
+      ]
+     },
+     "execution_count": 22,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# same thing written another way:\n",
+    "J_20 = sum( (A')^(k-ℓ-1) ⊗ A^ℓ / factorial(k) for ℓ=0:20, k=1:20 if k>ℓ)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bd39711b",
+   "metadata": {},
+   "source": [
+    "Looks good let's check the match quantitatively:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "id": "d3c6f007",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "1.3738128345184809e-15"
+      ]
+     },
+     "execution_count": 23,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "relerr(J_20, J_AD)  # should match to nearly machine precision"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "id": "6c1d30c8",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "1.293987776257493e-7"
+      ]
+     },
+     "execution_count": 24,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "relerr(J_20, J_FD)  # should match to ≈ 7 digits"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "49ebb275",
+   "metadata": {},
+   "source": [
+    "Hooray, math works!"
+   ]
+  }
+ ],
+ "metadata": {
+  "@webio": {
+   "lastCommId": null,
+   "lastKernelId": null
+  },
+  "kernelspec": {
+   "display_name": "Julia 1.7.1",
+   "language": "julia",
+   "name": "julia-1.7"
+  },
+  "language_info": {
+   "file_extension": ".jl",
+   "mimetype": "application/julia",
+   "name": "julia",
+   "version": "1.7.1"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/hw2sol.pdf b/hw2sol.pdf
new file mode 100644
index 0000000..1faf98a
Binary files /dev/null and b/hw2sol.pdf differ
diff --git a/hw2sol.tex b/hw2sol.tex
new file mode 100644
index 0000000..cfad4b4
--- /dev/null
+++ b/hw2sol.tex
@@ -0,0 +1,157 @@
+\documentclass[10pt,oneside]{article}
+\usepackage{amsmath}
+\usepackage{graphicx}
+\usepackage{subcaption}
+\usepackage{amsfonts}
+\usepackage{amssymb}
+
+\newcommand{\tr}{\operatorname{trace}}
+\newcommand{\vecm}{\operatorname{vec}}
+\newcommand{\diagm}{\operatorname{diagm}}
+\newcommand{\dotstar}{\mathbin{.*}}
+
+\usepackage{polyglossia}
+
+
+
+
+\usepackage[
+  backend=biber
+]{biblatex}
+\addbibresource{biblio.bib}
+
+
+\usepackage[
+  letterpaper,
+  left=1cm,
+  right=1cm,
+  top=1.5cm,
+  bottom=1.5cm
+]{geometry}
+
+
+\usepackage[
+  final,
+  unicode,
+  colorlinks=true,
+  citecolor=blue,
+  linkcolor=blue,
+  plainpages=false,
+  urlcolor=blue,
+  pdfpagelabels=true,
+  pdfsubject={Cálculo},
+  pdfauthor={José Doroteo Arango Arámbula},
+  pdftitle={Tarea 1},
+  pdfkeywords={UNAM, FES Acatlán, 2021-I}
+]{hyperref}
+
+\usepackage{booktabs}
+
+
+\usepackage{algpseudocode}
+\usepackage{algorithm}
+\floatname{algorithm}{Algoritmo}
+
+\usepackage{enumitem}
+
+\usepackage{lastpage}
+\usepackage{fancyhdr}
+\fancyhf{}
+\pagestyle{fancy}
+\fancyhf{}
+\fancyhead[L]{MIT IAP Winter 2022}
+\fancyhead[C]{Matrix Calculus}
+\fancyhead[R]{Profs. Edelman and Johnson}
+\fancyfoot[R]{}
+% https://tex.stackexchange.com/questions/227/how-can-i-add-page-of-on-my-document
+\fancyfoot[C]{\thepage\ of \pageref*{LastPage}}
+%\fancyfoot[C]{ of }
+\fancyfoot[L]{}
+\renewcommand{\headrulewidth}{2pt}
+\renewcommand{\footrulewidth}{2pt}
+
+\usepackage[newfloat=true]{minted}
+
+\author{}
+\title{Homework 2}
+
+\date{\today}
+
+
+\DeclareCaptionFormat{mitedFormat}{%
+    \textbf{#1#2}#3}
+\DeclareCaptionStyle{minetdStyle}{skip=0cm,width=.85\textwidth,justification=centering,
+  font=footnotesize,singlelinecheck=off,format=mitedFormat,labelsep=space}
+\newenvironment{mintedCode}{\captionsetup{type=listing,style=minetdStyle}}{}
+
+\usepackage{dirtytalk}
+
+\SetupFloatingEnvironment{listing}{}
+
+\usepackage[parfill]{parskip}
+
+\usepackage{csquotes}
+
+\begin{document}
+\maketitle
+\thispagestyle{fancy}
+
+{\bf Please submit your HW on Canvas; include a PDF printout of any code and results, clearly labeled, e.g. from a Jupyter notebook.  It is due Wednesday January 26th by 11:59pm EST.  }
+
+\subsection*{Problem 1}
+
+Let $f(x): \mathbb{R}^n \to \mathbb{R}^n$ be an invertible map from inputs $x \in \mathbb{R}^n$ ($n$-component column vectors) to outputs in $\mathbb{R}^n$.   If $g(x)$ is the inverse function, so that $g(f(x)) = x = f(g(x))$, use the chain rule to show how the Jacobians $f'(x)$ and $g'(x)$ are related.
+
+\subsection*{Problem 2}
+
+Consider $f(A) = \sqrt{A}$ where $A$ is an $n\times n$ matrix.   (Recall that the matrix square root is a matrix such that $(\sqrt{A})^2 = A$.   In 18.06, we might compute this from a diagonalization of $A$ by taking the square roots of all the eigenvalues.)
+
+\begin{enumerate}[label=\alph*)]
+    \item  Give a formula for the Jacobian of $f$, acting on $\vecm(A)$, in terms of Kronecker products and matrix powers only (no eigenvalues required!).  (Hint: see problem 1.)
+
+    \item Check your formula numerically against a finite-difference approximation.  (To ensure that no complex numbers show up in the matrix square root, pick a random positive-definite $A = B^T B$ for some random square $B$.)
+\end{enumerate}
+
+\subsection*{Problem 3}
+
+Suppose that $A(p) = A_0 + \diagm(p)$ constructs an $n\times n$ matrix $A$ from $p \in \mathbb{R}^n$, where $\diagm(p) = \begin{pmatrix} p_1 & & \\ & p_2 & \\ & & \ddots \end{pmatrix}$ is the diagonal matrix with the entries of $p$ along its diagonal, and $A_0$ is some constant matrix.   We now perform the following sequence of steps:
+
+\begin{enumerate}
+    \item solve $A(p) x = a$ for $x \in \mathbb{R}^n$, where $a$ is some vector.
+    \item form $B(x) = B_0 + \diagm(x \dotstar x)$ where $\dotstar$ is the elementwise product and $B_0$ is some $n\times n$ matrix.
+    \item solve $By = b$ for $y \in \mathbb{R}^n$, where $b$ is some vector.
+    \item compute $f = y^T F y$ where $F = F^T$ is some symmetric $n \times n$ matrix.
+\end{enumerate}
+
+Now, suppose that we want to optimize $f(p)$ as a function of the parameters $p$ that went into the first step.  We need to compute the gradient $\nabla f$ for any kind of large-scale problem.   If $n$ is huge, though, we must be careful to do this efficiently, using ``reverse-mode'' or ``adjoint'' calculations in which we apply the chain rule from left to right.
+
+\begin{enumerate}[label=\alph*)]
+
+\item Explain the sequence of steps to compute $\nabla f$.   Your answer should show that $\nabla f$ can \emph{also} be computed by solving only \emph{two} $n \times n$ linear systems, similar to $f$ itself.
+
+\item Check your answer numerically against finite differences (for some randomly chosen $A_0, B_0, a, b, F$).
+
+\item Check your answer against the result of a reverse-mode AD software (e.g. Zygote in Julia).
+
+\end{enumerate}
+
+\subsection*{Problem 4}
+
+\begin{enumerate}[label=\alph*)]
+
+\item Write down some $4 \times 4$ matrix that is not the
+Kronecker product of two $2 \times 2$ matrices.  Convince
+us this is true.
+
+\item Prove that if $A$ ($m \times m$) and $B$ ($n \times n$) are orthogonal (i.e.,
+$A^TA=I_m$ and $B^TB=I_n$) then $A \otimes B$ ($mn \times mn$) is orthogonal.
+
+\item If $f(A) = e^A= \sum_{k=0}^\infty A^k/k!$, write down
+a power series involving Kronecker products for the
+Jacobian $f'(A)$.  Check your answer with a numerical example, e.g.~against finite differences.
+
+\end{enumerate}
+
+
+
+\end{document}