Start with the objective function consisting of both a data misfit and model regularization: $$ \phi = \phi_d + \beta \phi_m $$ For the linear problem we are considering $$ \phi_d = \frac{1}{2}\| W_d (Gm-d^{obs})\|_2^2 = \frac{1}{2}(Gm-d^{obs})^T W_d^T W_d (Gm-d^{obs}) $$ and $$ \phi_m = \frac{1}{2} \|W_m (m-m_{ref}) \|^2_2 = \frac{1}{2}(m-m_{ref})^T W_m^T W_m (m-m_{ref}) $$

To minimize this, we want to look at $$ \frac{d \phi}{dm} $$ and $$ \frac{d^2 \phi}{dm^2} $$

Before diving into those derivatives, we looked at $$ \frac{d}{dx} (Ax) = A \frac{dx}{dx} = A $$ since $\frac{dx}{dx} = I$ (the derivative of a vector is always a matrix) Next, we look at $x^TAx$(recall that for a scalar $a = a^T$ and then take the deriv by first fixing $x$ then fixing $x^T$) $$ \frac{d}{dx} (x^TAx) = \frac{1}{2}( \frac{d}{dx} (x^T)_{fix}Ax + \frac{d}{dx}( (x_{fix})^T A^Tx )) $$

There is also an convention that we have to be careful with... for functions

$f: \mathcal{R^n} \to \mathcal{R^n}$ (ie. $f(x) = A x$), the derivative is as expected ($\frac{d}{dx} Ax = A$)
$f: \mathcal{R^n} \to \mathcal{R}$ (ie. $f(x) = s^T x$), the derivative is transposed so we get a column vector ($\frac{d}{dx} s^T x = s)$

With that in mind... let define $s^T = (x^T)_{fix}A$, and $q^T = x_{fix})^T A^T$ since these are both row vectors. Then, $$ \frac{d}{dx} (x^TAx) = \frac{1}{2}( (s^T\frac{dx}{dx})^T + (q^T\frac{dx}{dx})^T ) \\ = \frac{1}{2}( \frac{dx}{dx}^Ts + \frac{dx}{dx}^T q ) \\ = \frac{1}{2} \frac{dx}{dx}^T (s + q) $$ $\frac{dx}{dx} = I$ is just the identity matrix, so we can drop it $$ = \frac{1}{2} (s + q) $$ here, we substitute back in the definitions of $s$ and $q$, dropping the $fix$ subscript $$ = \frac{1}{2} ((x^TA)^T + (x^TA^T)^T ) $$ which simplifies to $$ = \frac{1}{2} (A^T + A ) x $$ remember that this simplifies even further if $A$ is symmetric.

Ok, so lets now look at taking the derivative of $\phi_d$ wrt $m$. Instead of expanding out the whole thing, lets make a simplification and do this in two steps. We define $$ r = Gm - d^{obs} $$ then $$ \phi_d = \frac{1}{2} r^T W_d^T W_d r $$ We will make one more simplification to draw parallels with the above examples, namely: $$ A = W_d^T W_d $$ Note that $A$ is symmetric. So we now want to take the derivative of $$ \phi_d = \frac{1}{2} r^T A r $$ wrt $m$, remembering that $r = r(m)$

Apply the same process as we did for $x^T A x$ $$ \frac{d \phi_d}{dm} = \frac{1}{2}\frac{d}{dm} (r^T A r) \\ = \frac{1}{2} ((r^T A \frac{dr}{dm})^T+(r^T A^T \frac{dr}{dm})^T) \\ = \frac{1}{2} \frac{dr}{dm}^T( A^T + A) r $$ ok, now we need $$ \frac{dr}{dm} = \frac{d}{dm} (Gm - d^{obs}) = \frac{d}{dm} Gm - \frac{d}{dm}d^{obs} = \frac{d}{dm} Gm $$ which is identical to the case of $\frac{d}{dx} Ax$ I will let you take it from here!



In [ ]: