Suppose we have two cartesian coordinate systems (unprimed and primed), where coordinate vectors are denoted as $\mathbf{i}$ and $\mathbf{j}$ and $\mathbf{i}'$ and $\mathbf{j}'$ respectively.

$$ x = 2x' $$$$ y = 2y' $$

Suppose $$ F(x,y) = x^2 e^y $$

\begin{eqnarray} \frac{\partial F}{\partial x} & = & 2 x e^y \\ \frac{\partial F}{\partial y} & = & x^2 e^y \\ \end{eqnarray}

If we substitude primed coordinates

$$ F'(x',y') = 4 (x')^2 e^{2 y'} $$

The partial derivative in the primed coordinate system is

\begin{eqnarray} \frac{\partial F'}{\partial x'} & = & 8 x' e^{2 y' } \\ \frac{\partial F'}{\partial y'} & = & 8 (x')^2 e^{2 y'} \\ \end{eqnarray}

We conclude \begin{eqnarray} \frac{\partial F'}{\partial x'} & = & 4 x e^{y } = 2 \frac{\partial F}{\partial x} \\ \frac{\partial F'}{\partial y'} & = & 2 x^2 e^{y} = 2 \frac{\partial F}{\partial y}\\ \end{eqnarray}

But if we want the gradient be a vector, a geometrical object entirely independent of the particular choice of coordinates.

Coordinate Systems

Consider the plane $E^2$ (the two dimensional Euclidian space) and a general point $\omega \in E^2$. We will first resist the temptation of denoting points in this space by a pair of numbers as $\omega = (x_1, x_2)$; rather we will merely view $\omega$ as a geometrical object. In fact, the geometrical notions distance (metric) and angle are sufficient to describe Euclidian spaces.

Wikipedia: "One way to think of the Euclidean plane is as a set of points satisfying certain relationships, expressible in terms of distance and angle."

We will define two functions $x_1 : E^2 \rightarrow \mathcal{R}$ and $x_2 : E^2 \rightarrow \mathcal{R}$ such that for every point $\omega$ in the plane, we will have a so called coordinate representation \begin{eqnarray} \left(x_1(\omega), x_2(\omega) \right) \end{eqnarray} These functions will associate with each point $\omega$ a pair of numbers. The familiar Cartesian coordinate system is the first example. The idea of a coordinate system is so natural that $\omega$ is taken equivalent to $(x_1, x_2)$.

We could have defined another coordinate system, such as polar coordinates, \begin{eqnarray} \left(\xi_1(\omega), \xi_2(\omega) \right) \end{eqnarray}

We can think of a collection of points



In [9]:

    
%matplotlib inline

import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np

N = 10
M = 10

n0 = -N/2.
m0 = -M/2.
#n0 = 1
#m0 = 1

dn = 0.1
dm = 0.1

xx0 = np.arange(n0, n0+N+dn, dn)
xx1 = np.arange(m0, m0+M+dm, dm)

P = []

for i in range(M+1):
    pt = np.c_[xx0, (i+m0)*np.ones_like(xx0)].T
    P.append(pt)

for j in range(N+1):
    pt = np.c_[(j+n0)*np.ones_like(xx1), xx1].T
    P.append(pt)

#for p in P:
#    plt.plot(p[0,:],p[1,:],'k-')

A = np.array([[1,0.5],[-1,0]])
for p in P:
    z = A.dot(p)
    plt.plot(z[0,:],z[1,:],'k-')

#for p in P:
#    plt.plot(np.sqrt(p[0,:]**2 + p[1,:]**2), np.arctan2(p[1,:], p[0,:]) ,'k-')
    
#for p in P:
#    plt.plot(p[0,:]*np.cos(2*np.pi*p[1,:]/(N)), p[0,:]*np.sin(2*np.pi*p[1,:]/(N)),'k-')

#for p in P:
#    plt.plot(np.exp(p[0,:]/M), np.exp(p[1,:]/N),'k-')

#for p in P:
#    plt.plot(1/p[0,:], 1/p[1,:],'k-')
    
ax = plt.gca()
ax.axis('equal')
ax.axis('off')
#ax.set_xlim([n0-1,n0+N+1])
#ax.set_ylim([m0-1, m0+M+1])
plt.show()

One of the key points in using tensor calculus is the fact that vectors should be considered as geometrical objects rather than a collection of numbers. This is especially difficult for persons from a computer science background as the term vector is used as a synonym for an array of numbers. The actual numbers that are needed to represent the vector are always with respect to an ambient basis; when this basis is changed, the numbers must accordingly change but the vector should be the same vector.

The directional derivative

$F(x)$ where $x = x_0 + s v$. Here $s$ is a scalar and $v$ is a vector pointing into a direction.

The directional derivative is $$ \frac{d F(x_0 + s v)}{d s} = \nabla_x F \cdot v $$

Tensor calculus achieves invariance by establishing rules for forming expressions that evaluate to the same value in all coordinate systems.

Chain Rule, Partial Derivatives and Einstein notation

Take $F(a^1, a^2, a^3)$

$$ f(\mu_1, \mu_2) = F(A^1(\mu^1, \mu^2), A^2(\mu^1, \mu^2), A^3(\mu^1, \mu^2)) $$

Is equivalently expressed as

$$ f(\mu) = F(A(\mu)) $$

The function as a graph is shown below:

Here, we apply the convention that rounded squares are variable nodes and diamonds are functions of several variables. One key property of our graphical notation is that we distinguish between functions and their values. In many calculus texts, this distinction is not explicitely stated and is a source of confusion.

The partial derivatives $$ \frac{\partial}{\partial \mu^1} F(A^1(\mu^1, \mu^2), A^2(\mu^1, \mu^2), A^3(\mu^1, \mu^2)) = \frac{\partial{F}}{\partial{a^1}}\frac{\partial{A^1}}{\partial{\mu^1}} + \frac{\partial{F}}{\partial{a^2}}\frac{\partial{A^2}}{\partial{\mu^1}} + \frac{\partial{F}}{\partial{a^3}}\frac{\partial{A^3}}{\partial{\mu^1}} $$

These are the products of all the derivatives over all the paths that connect $\mu^1$ with $F$

This is compactly denoted by

$$ \frac{\partial{F}}{\partial \mu^\alpha} = \frac{\partial{F}}{\partial a^i} \frac{\partial{A^i}}{\partial \mu^\alpha} $$

where repeated index means summation.

The second derivative is

\begin{eqnarray} \frac{\partial^2{F}}{\partial \mu^\alpha \partial \mu^\beta} & = & \frac{\partial}{\partial \mu^\beta}\left(\frac{\partial{F}}{\partial a^i} \frac{\partial{A^i}}{\partial \mu^\alpha} \right) \\ & = & (\frac{\partial}{\partial \mu^\beta} \frac{\partial{F}}{\partial a^i} ) \frac{\partial{A^i}}{\partial \mu^\alpha} + \frac{\partial{F}}{\partial a^i} (\frac{\partial}{\partial \mu^\beta} \frac{\partial{A^i}}{\partial \mu^\alpha}) \\ & = & \frac{\partial^2{F}}{\partial a^i \partial a^j} \frac{\partial A^j}{\partial \mu^\beta} \frac{\partial{A^i}}{\partial \mu^\alpha} + \frac{\partial{F}}{\partial a^i} \frac{\partial^2{A^i}}{\partial \mu^\beta \partial \mu^\alpha} \\ \end{eqnarray}

The third derivative is \begin{eqnarray} \frac{\partial^3{F}}{\partial \mu^\alpha \partial \mu^\beta \partial \mu^\gamma} & = & \frac{\partial}{\partial \mu^\gamma}\left(\frac{\partial^2{F}}{\partial a^i \partial a^j} \frac{\partial{A^i}}{\partial \mu^\alpha} \frac{\partial A^j}{\partial \mu^\beta} + \frac{\partial{F}}{\partial a^i} \frac{\partial^2{A^i}}{\partial \mu^\beta \partial \mu^\alpha} \right) \\ & = & \frac{\partial^3{F}}{\partial a^i \partial a^j \partial a^k} \frac{\partial{A^i}}{\partial \mu^\alpha} \frac{\partial A^j}{\partial \mu^\beta}\frac{\partial A^k}{\partial \mu^\gamma} + \frac{\partial^2{F}}{\partial a^i \partial a^j} \frac{\partial^2{A^i}}{\partial \mu^\alpha \partial \mu^\gamma} \frac{\partial A^j}{\partial \mu^\beta} + \frac{\partial^2{F}}{\partial a^i \partial a^j} \frac{\partial{A^i}}{\partial \mu^\alpha} \frac{\partial^2 A^j}{\partial \mu^\beta \partial \mu^\gamma} + \\ & & \frac{\partial^2{F}}{\partial a^i \partial a^k} \frac{\partial A^k}{\partial \mu^\gamma} \frac{\partial^2{A^i}}{\partial \mu^\beta \partial \mu^\alpha} + \frac{\partial{F}}{\partial a^i} \frac{\partial^3{A^i}}{\partial \mu^\alpha \partial \mu^\beta \partial \mu^\gamma} \\ & = & \frac{\partial^3{F}}{\partial a^i \partial a^j \partial a^k} \frac{\partial{A^i}}{\partial \mu^\alpha} \frac{\partial A^j}{\partial \mu^\beta}\frac{\partial A^k}{\partial \mu^\gamma} + \frac{\partial^2{F}}{\partial a^i \partial a^j} \left( \frac{\partial^2{A^i}}{\partial \mu^\alpha \partial \mu^\gamma} \frac{\partial A^j}{\partial \mu^\beta} + \frac{\partial^2 A^i}{\partial \mu^\beta \partial \mu^\gamma} \frac{\partial{A^j}}{\partial \mu^\alpha} + \frac{\partial^2{A^i}}{\partial \mu^\beta \partial \mu^\alpha} \frac{\partial A^j}{\partial \mu^\gamma} \right) + \frac{\partial{F}}{\partial a^i} \frac{\partial^3{A^i}}{\partial \mu^\alpha \partial \mu^\beta \partial \mu^\gamma} \end{eqnarray}

Inverse Function

$\phi$ and $f$ are inverse functions.

This is written as

\begin{eqnarray} \xi & = & \phi(x) \\ x & = & f(\xi) \end{eqnarray}$$f(\phi(x)) = (f \circ \phi)(x) = x$$

The chain rule implies $$ \frac{d\phi}{dx} \frac{d f}{d\xi} = 1$$

However, it is difficult to invent a new name for each function, so we reuse the name of the variables $x$ instead of $f$ and $\xi$ instead of $\phi$.

\begin{eqnarray} \xi & = & \xi(x) \\ x & = & x(\xi) \end{eqnarray}

Now, we can write: $$x(\xi(x)) = x$$

The chain rule implies $$ \frac{d\xi}{dx} \frac{d x}{d\xi} = 1$$

This overloaded notation is simpler but we have to remind ourselves always the distinction between the variable $x$ and the function $x()$.

\begin{eqnarray} x(\xi(x)) &=& x \\ x'(\xi(x)) \xi'(x) & = & 1 \\ x''(\xi(x)) (\xi'(x))^2 + x'(\xi(x)) \xi''(x) & = & 0 \\ \end{eqnarray}

\begin{eqnarray} F(f(x,y), g(x, y)) &=& x \\ G(f(x,y), g(x, y)) &=& y \\ \end{eqnarray}

Let $X = f(x,y)$ and $Y = g(x,y)$

\begin{eqnarray} \frac{\partial F}{\partial X} \frac{\partial f}{\partial x} + \frac{\partial F}{\partial Y} \frac{\partial g}{\partial x} & = & 1 \\ \frac{\partial F}{\partial X} \frac{\partial f}{\partial y} + \frac{\partial F}{\partial Y} \frac{\partial g}{\partial y} & = & 0 \\ \end{eqnarray}\begin{eqnarray} \frac{\partial G}{\partial X} \frac{\partial f}{\partial x} + \frac{\partial G}{\partial Y} \frac{\partial g}{\partial x} & = & 0 \\ \frac{\partial G}{\partial X} \frac{\partial f}{\partial y} + \frac{\partial G}{\partial Y} \frac{\partial g}{\partial y} & = & 1 \\ \end{eqnarray}\begin{eqnarray} \left( \begin{array}{cc} \frac{\partial F}{\partial X} & \frac{\partial F}{\partial Y} \\ \frac{\partial G}{\partial X} & \frac{\partial G}{\partial Y} \end{array} \right) \left( \begin{array}{cc} \frac{\partial f}{\partial x} & \frac{\partial f}{\partial y} \\ \frac{\partial g}{\partial x} & \frac{\partial g}{\partial y} \end{array} \right) & = & \left( \begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array} \right) \end{eqnarray}

$$\frac{\partial x(\xi(x))}{\partial x_i} $$$$ \frac{\partial x^j}{\partial \xi^k} \frac{\partial \xi^k}{\partial x^i} = \delta^j_{i}$$

We start with coordinates $Z$.

The $i$'th (contravariant) component of the coordinates $Z$ is $Z^i$.

Define a position vector $\mathbf{R}$.

Overload the notation: $\mathbf{R}(Z)$ is a function that gives the position vector at coordinates $Z$. In $N$ dimensional space, we have $$ \mathbf{R}(Z^1, Z^2, \dots, Z^i, \dots, Z^N) $$

The derivative vector $\mathbf{R}'(h)$ of the position is an entirely geometrical concept. Take a scalar parameter $h$ and let $\mathbf{R}(h)$ be a curve.

Vectors can be (i) added, (ii) multiplied by scalars, (iii) can be said to approach a limit. $$ \lim_{h\rightarrow 0} |\mathbf{A}(h) - \mathbf{B}| = 0 $$

$$ \mathbf{R}'(\alpha) = \lim_{h\rightarrow 0 }\frac{\mathbf{R}(\alpha + h) - \mathbf{R}(\alpha)}{h} $$

Think of a circle in 2D, the position vector is $\mathbf{A}(\alpha)$ at $\alpha$ angles. The derivative is tangential and points to the direction of positive $h$. We can show that $\mathbf{R}'(\alpha)$ and $\mathbf{R}(\alpha)$ are orthogonal in purely geometric terms

$$ \mathbf{R}(\alpha) \cdot \mathbf{R}(\alpha) = 1 $$

Take derivatives on both sides $$ \mathbf{R}(\alpha) \cdot \mathbf{R}'(\alpha) + \mathbf{R}'(\alpha) \cdot \mathbf{R}(\alpha)= 0 $$ Hence $$ \mathbf{R}(\alpha) \cdot \mathbf{R}'(\alpha) = 0 $$

For the second derivative, we differentiate again $$ \mathbf{R}'(\alpha) \cdot \mathbf{R}'(\alpha) + \mathbf{R}(\alpha) \cdot \mathbf{R}''(\alpha) = 0 $$

$$ \mathbf{R}(\alpha) \cdot \mathbf{R}''(\alpha) = -1 $$

Definition: Covariant basis

$$\mathbf{Z}_i = \frac{\partial \mathbf{R}(Z)}{\partial Z^i}$$

The coordinate vector $\mathbf{Z}_i$ at position $\mathbf{R}$ in space is tangential to the $i$'th coordinate curve.

An invariant vector in space can be expanded in terms of the covariant basis: $$ \mathbf{V} = V^i \mathbf{Z}_i $$ where $V^i$ are the contravariant components.

Definition: Covariant metric tensor is $Z_{i,j} = \mathbf{Z}_i \cdot \mathbf{Z}_j$.

Dot product of any two vectors at a position is $\mathbf{V}\cdot \mathbf{U} = U^i V^j Z_{i,j}$

Definition: Contravariant metric tensor $Z^{i,j}$ is given by

$Z^{i,j} Z_{j,k} = \delta^{i}_k$

Contravariant basis $$\mathbf{Z}^i = Z^{i,j} \mathbf{Z}_j $$

\begin{eqnarray} Z_{k,i} \mathbf{Z}^i &=& \mathbf{Z}_k \\ \end{eqnarray}$$\mathbf{Z}^i \cdot \mathbf{Z}_k = Z^{i,j} \mathbf{Z}_j \cdot \mathbf{Z}_k = Z^{i,j} Z_{j,k} =\delta^i_k$$\begin{eqnarray} \cos \alpha = \frac{\mathbf{Z}^1 \cdot \mathbf{Z}_1}{|\mathbf{Z}^1||\mathbf{Z}_1|} = \frac{1}{|\mathbf{Z}^1||\mathbf{Z}_1|} \geq 0 \end{eqnarray}

The length of a curve $$ L = \int_a^b \sqrt{Z_{i,j} \frac{dZ^i}{dt} \frac{dZ^j}{dt}} dt $$

$\mathbf{Z}^i \cdot \mathbf{Z}^j \mathbf{Z}_j \cdot \mathbf{Z}_k = \delta^{i}_k$

Definition: Christoffel Symbol

Christoffel Symbol measures the variation of the covariant basis from point to point $$ \frac{\partial \mathbf{Z}_i }{\partial Z^j} = \Gamma^{k}_{ij} \mathbf{Z}_k $$

Each basis vector $\mathbf{Z}_i$ changes with respect to each coordinate $\partial Z^j$ to provide $N$ vectors (with $N^2$ numbers). Each vector $\mathbf{Z}_i$ is to be decomposed on the covariant basis. The Christoffel symbol gives the components of these vectors.

As $$ \mathbf{Z}_i = \frac{\partial \mathbf{R}(Z) }{\partial Z^i} $$ We have $$ \frac{\partial \mathbf{Z}_i }{\partial Z^j} = \frac{\partial \mathbf{R} }{\partial Z^i \partial Z^j} = \frac{\partial \mathbf{Z}_j }{\partial Z^i} $$

$$ \Gamma^{k}_{ij} = \Gamma^{k}_{ji} $$

The Christoffel for the polar coordinates $(r, \theta) \equiv (Z^1, Z^2)$.

$$ \frac{\partial \mathbf{R}(Z^1, Z^2)}{\partial Z^1} = \lim_{h\rightarrow 0} \frac{ \mathbf{R}(Z^1 + h, Z^2) - \mathbf{R}(Z^1, Z^2)}{h} $$$$ \frac{\partial \mathbf{R}(Z^1, Z^2)}{\partial Z^2} = \lim_{h\rightarrow 0} \frac{ \mathbf{R}(Z^1, Z^2+h) - \mathbf{R}(Z^1, Z^2)}{h} $$

Take the two unit vectors $\mathbf{e}_1$ and $\mathbf{e}_2$

\begin{eqnarray} \mathbf{R}(Z^1, Z^2) & = & Z^1 \cos(Z^2) \mathbf{e}_1 + Z^1 \sin(Z^2) \mathbf{e}_2 \end{eqnarray}

The covariant coordinate basis elements $\mathbf{Z}_k$ are

\begin{eqnarray} \frac{\partial \mathbf{R}(Z^1, Z^2)}{\partial Z^1} & = & \cos(Z^2) \mathbf{e}_1 + \sin(Z^2) \mathbf{e}_2 = \mathbf{Z}^1 \end{eqnarray}\begin{eqnarray} \frac{\partial \mathbf{R}(Z^1, Z^2)}{\partial Z^2} & = & -Z^1 \sin(Z^2) \mathbf{e}_1 + Z^1 \cos(Z^2) \mathbf{e}_2 = \mathbf{Z}_2 \end{eqnarray}\begin{eqnarray} \frac{\partial^2 \mathbf{R}(Z^1, Z^2)}{\partial Z^1 \partial Z^1} & = & 0 \end{eqnarray}\begin{eqnarray} \frac{\partial^2 \mathbf{R}(Z^1, Z^2)}{\partial Z^1 \partial Z^2} & = & - \sin(Z^2) \mathbf{e}_1 + \cos(Z^2) \mathbf{e}_2 \\ & = & \frac{1}{Z^1} \mathbf{Z}_2 \end{eqnarray}\begin{eqnarray} \frac{\partial^2 \mathbf{R}(Z^1, Z^2)}{\partial Z^2 \partial Z^2} & = & -Z^1 \cos(Z_2) \mathbf{e}_1 - Z_1 \sin(Z_2) \mathbf{e}_2 \\ & = & - Z^1 \mathbf{Z}_1 \end{eqnarray}\begin{eqnarray} \left(\begin{array}{cc} \Gamma^{1}_{11} & \Gamma^{1}_{12} \\ \Gamma^{1}_{21} & \Gamma^{1}_{22} \end{array}\right) & = & \left(\begin{array}{cc} 0 & 0 \\ 0 & - Z^1 \end{array}\right) \end{eqnarray}\begin{eqnarray} \left(\begin{array}{cc} \Gamma^{2}_{11} & \Gamma^{2}_{12} \\ \Gamma^{2}_{21} & \Gamma^{2}_{22} \end{array}\right) & = & \left(\begin{array}{cc} 0 & {1}/{Z^1} \\ {1}/{Z^1} & 0 \end{array}\right) \end{eqnarray}

The Christoffel for the scaled polar coordinates $(r, u) \equiv (Z^1, Z^2)$.

Take the two unit vectors $\mathbf{e}_1$ and $\mathbf{e}_2$

\begin{eqnarray} \mathbf{R}(Z^1, Z^2) & = & Z^1 \cos(2\pi Z^2 /N) \mathbf{e}_1 + Z^1 \sin(2\pi Z^2 /N) \mathbf{e}_2 \end{eqnarray}

The covariant coordinate basis elements $\mathbf{Z}_k$ are

\begin{eqnarray} \frac{\partial \mathbf{R}(Z^1, Z^2)}{\partial Z^1} & = & \cos(2\pi Z^2 /N) \mathbf{e}_1 + \sin(2\pi Z^2 /N) \mathbf{e}_2 = \mathbf{Z}^1 \end{eqnarray}\begin{eqnarray} \frac{\partial \mathbf{R}(Z^1, Z^2)}{\partial Z^2} & = & - \frac{2\pi}{N} Z^1 \sin(2\pi Z^2 /N) \mathbf{e}_1 + \frac{2\pi}{N} Z^1 \cos(2\pi Z^2 /N) \mathbf{e}_2 = \mathbf{Z}_2 \end{eqnarray}\begin{eqnarray} \frac{\partial^2 \mathbf{R}(Z^1, Z^2)}{\partial Z^1 \partial Z^1} & = & 0 \end{eqnarray}\begin{eqnarray} \frac{\partial^2 \mathbf{R}(Z^1, Z^2)}{\partial Z^1 \partial Z^2} & = & - \frac{2\pi}{N}\sin(2\pi Z^2 /N) \mathbf{e}_1 + \frac{2\pi}{N}\cos(2\pi Z^2 /N) \mathbf{e}_2 \\ & = & \frac{1}{Z^1} \mathbf{Z}_2 \end{eqnarray}\begin{eqnarray} \frac{\partial^2 \mathbf{R}(Z^1, Z^2)}{\partial Z^2 \partial Z^2} & = & -\left(\frac{2\pi}{N}\right)^2 Z^1 \cos(2\pi Z^2 /N) \mathbf{e}_1 - \left(\frac{2\pi}{N}\right)^2 Z_1 \sin(2\pi Z^2 /N) \mathbf{e}_2 \\ & = & - \left(\frac{2\pi}{N}\right)^2 Z^1 \mathbf{Z}_1 \end{eqnarray}\begin{eqnarray} \left(\begin{array}{cc} \Gamma^{1}_{11} & \Gamma^{1}_{12} \\ \Gamma^{1}_{21} & \Gamma^{1}_{22} \end{array}\right) & = & \left(\begin{array}{cc} 0 & 0 \\ 0 & - (2\pi/N)^2 Z^1 \end{array}\right) \end{eqnarray}\begin{eqnarray} \left(\begin{array}{cc} \Gamma^{2}_{11} & \Gamma^{2}_{12} \\ \Gamma^{2}_{21} & \Gamma^{2}_{22} \end{array}\right) & = & \left(\begin{array}{cc} 0 & {1}/{Z^1} \\ {1}/{Z^1} & 0 \end{array}\right) \end{eqnarray}

Properties of the Christoffel symbol

The Christoffel symbol $\Gamma^k_{ij}$ is the $k$'th contravariant component of the derivative of the $i$'th covariant coordinate vector with respect to its $j$'th component, as expressed in the covariant basis $\mathbf{Z}_k$. Mathematically, this is: \begin{eqnarray} \frac{\partial \mathbf{Z}_i }{\partial Z^j} &=& \Gamma^{k}_{ij} \mathbf{Z}_k \\ \end{eqnarray} Taking the dot product with the dual basis $\mathbf{Z}^l$, we obtain \begin{eqnarray} \mathbf{Z}^l \cdot \frac{\partial \mathbf{Z}_i }{\partial Z^j} &=& \Gamma^{k}_{ij} \mathbf{Z}_k \cdot \mathbf{Z}^l = \Gamma^{k}_{ij} \delta_k^l\\ \mathbf{Z}^l \cdot \frac{\partial \mathbf{Z}_i }{\partial Z^j} &=& \Gamma^{l}_{ij} \\ \mathbf{Z}^k \cdot \frac{\partial \mathbf{Z}_i }{\partial Z^j} &=& \Gamma^{k}_{ij} \\ \end{eqnarray} The last step merely replaces index $k$ with $l$.

A natural question is how is the contravariant basis $\mathbf{Z}^k$ is changing with respect to its components, i.e. \begin{eqnarray} \frac{\partial\mathbf{Z}^k}{\partial Z^j} = ? \end{eqnarray}

The result is provided by the chain rule: We observe that \begin{eqnarray} \frac{\partial (\mathbf{Z}^k \cdot \mathbf{Z}_i)}{\partial Z^j} &=& \frac{\partial\mathbf{Z}^k}{\partial Z^j} \cdot \mathbf{Z}_i + \mathbf{Z}^k \cdot \frac{\partial \mathbf{Z}_i}{\partial Z^j} \end{eqnarray} with $\mathbf{Z}^k \cdot \mathbf{Z}_i) = \delta^k_i$. As the Kronecker delta is not changing with respect to $Z^j$, we have ${\partial \delta^k_i }/{\partial Z^j} = 0$ and \begin{eqnarray} 0 & = & \frac{\partial\mathbf{Z}^k}{\partial Z^j} \cdot \mathbf{Z}_i + \Gamma^{k}_{ij} \end{eqnarray}

\begin{eqnarray} \frac{\partial\mathbf{Z}^k}{\partial Z^j} = - \Gamma^{k}_{ij} \mathbf{Z}^i \end{eqnarray}

\begin{eqnarray} \frac{\partial Z_{i,j}}{\partial Z^k} & = & \frac{\partial (\mathbf{Z}_i \cdot \mathbf{Z}_j) }{\partial Z^k} = \frac{\partial \mathbf{Z}_i }{\partial Z^k} \cdot \mathbf{Z}_j + \frac{\partial \mathbf{Z}_j }{\partial Z^k} \cdot \mathbf{Z}_i \\ & = & \frac{\partial \mathbf{Z}_i }{\partial Z^k} \cdot Z_{l,j} \mathbf{Z}^l + \frac{\partial \mathbf{Z}_j }{\partial Z^k} \cdot Z_{l,i} \mathbf{Z}^l \\ & = & \Gamma^{l}_{ik} Z_{l,j} + \Gamma^{l}_{jk} Z_{l,i} \end{eqnarray}

\begin{eqnarray} \frac{\partial Z_{i,m}}{\partial Z^j} & = & \Gamma^{l}_{ij} Z_{l,m} + \Gamma^{l}_{mj} Z_{l,i} \\ \frac{\partial Z_{j,m}}{\partial Z^i} & = & \Gamma^{l}_{ji} Z_{l,m} + \Gamma^{l}_{mi} Z_{l,j} \\ \frac{\partial Z_{j,i}}{\partial Z^m} & = & \Gamma^{l}_{jm} Z_{l,i} + \Gamma^{l}_{im} Z_{l,j} \\ \end{eqnarray}\begin{eqnarray} Z^{k,m}(\frac{\partial Z_{i,m}}{\partial Z^j} + \frac{\partial Z_{j,m}}{\partial Z^i} - \frac{\partial Z_{j,i}}{\partial Z^m}) &=& \Gamma^{l}_{ij} Z_{l,m} Z^{k,m} + \Gamma^{l}_{mj} Z_{l,i}Z^{k,m} + \Gamma^{l}_{ji} Z_{l,m}Z^{k,m} + \Gamma^{l}_{mi} Z_{l,j}Z^{k,m} - \Gamma^{l}_{jm} Z_{l,i}Z^{k,m} - \Gamma^{l}_{im} Z_{l,j} Z^{k,m} \\ &=& \Gamma^{l}_{ij} \delta_{l}^k + \Gamma^{l}_{mj} Z_{l,i}Z^{k,m} + \Gamma^{l}_{ji} \delta_{l}^k + \Gamma^{l}_{mi} Z_{l,j}Z^{k,m} - \Gamma^{l}_{jm} Z_{l,i}Z^{k,m} - \Gamma^{l}_{im} Z_{l,j} Z^{k,m} \\ &=& 2 \Gamma^{k}_{ij} + \Gamma^{l}_{mj} Z_{l,i}Z^{k,m} + \Gamma^{l}_{mi} Z_{l,j}Z^{k,m} - \Gamma^{l}_{jm} Z_{l,i}Z^{k,m} - \Gamma^{l}_{im} Z_{l,j} Z^{k,m} \\ & = & 2 \Gamma^{k}_{ij} \end{eqnarray}

Jacobians

Covariant tensor of order one

$T_i = T_{i'} J_i^{i'}$

Contravariant tensor of order one

$T^i = T^{i'} J^i_{i'}$

\begin{eqnarray} J^{i}_{i',j'} = \frac{\partial^2 Z^{i} }{\partial Z^{i'} \partial Z^{j'}} = \frac{\partial }{\partial Z^{i'}} \left( \frac{\partial Z^i}{\partial Z^{j'}} \right) \end{eqnarray}

Note that the derivative of the Jacobian is always between the primed and unprimed indices.

\begin{eqnarray} \delta^{i}_{j} & = & \frac{\partial Z^{i} }{\partial Z^{i'}} \frac{\partial Z^{i'} }{\partial Z^{j}} \\ \frac{\partial }{\partial Z^{k}} \delta^{i}_{j} & = & \frac{\partial }{\partial Z^{k}}(\frac{\partial Z^{i} }{\partial Z^{i'}} \frac{\partial Z^{i'} }{\partial Z^{j}}) \\ \frac{\partial}{\partial Z^{k}} & = & \frac{\partial Z^{j'}}{\partial Z^{k}} \frac{\partial}{\partial Z^{j'}} \\ 0 & = & (\frac{\partial}{\partial Z^{k}} \frac{\partial Z^{i} }{\partial Z^{i'} }) \frac{\partial Z^{i'} }{\partial Z^{j}} + \frac{\partial Z^{i} }{\partial Z^{i'}} \frac{\partial }{\partial Z^{k}}\frac{\partial Z^{i'} }{\partial Z^{j}} \\ 0 & = & (\frac{\partial Z^{j'}}{\partial Z^{k}} \frac{\partial}{\partial Z^{j'}} \frac{\partial Z^{i} }{\partial Z^{i'} }) \frac{\partial Z^{i'} }{\partial Z^{j}} + \frac{\partial Z^{i} }{\partial Z^{i'}} \frac{\partial }{\partial Z^{k}}\frac{\partial Z^{i'} }{\partial Z^{j}} \\ 0 & = & J^{j'}_k J^{i}_{i',j'} J^{i'}_{j} + J^{i}_{i'} J^{i'}_{k,j} \end{eqnarray}

Tensor or not a tensor?

The covariant basis $\mathbf{Z}_i$ is a tensor \begin{eqnarray} \frac{\partial \mathbf{R}(Z')}{\partial Z^{i'}} &=& \frac{\partial \mathbf{R}(Z(Z'))}{\partial Z^{i'}} = \frac{\partial \mathbf{R}}{\partial Z^i} \frac{\partial Z^i}{\partial Z^{i'}} \\ \mathbf{Z}_{i'} &=& \mathbf{Z}_{i} J^i_{i'} \end{eqnarray}

The covariant metric tensor is indeed a tensor.

The partial derivative of an invariant (scalar quantity) is a tensor:

\begin{eqnarray} \frac{\partial F}{\partial Z^{i'}} = \frac{\partial F}{\partial Z^{i}} \frac{\partial Z^i}{\partial Z^{i'}} = \frac{\partial F}{\partial Z^{i}}J^{i}_{i'} \end{eqnarray}

So, ${\partial F}/{\partial Z^{i}}$ is a tensor.

The second derivative is not a tensor \begin{eqnarray} \frac{\partial^2 F}{\partial Z^{i'} \partial Z^{j'}} = \frac{\partial^2 F}{\partial Z^{i} \partial Z^{j}} \frac{\partial Z^i}{\partial Z^{i'}} + \frac{\partial F}{\partial Z^{i}} \frac{\partial^2 Z^i}{\partial Z^{i'} \partial Z^{j'}} \end{eqnarray} So, ${\partial^2 F}/{\partial Z^{i} \partial Z^{j}}$ is not a tensor.

The derivative of an order one covariant tensor is not a tensor

\begin{eqnarray} T_{i'} & = & T_i \frac{\partial Z^{i}}{\partial Z^{i'}} \\ \frac{\partial T_{i'}}{\partial Z^{j'}} & = & \frac{\partial T_{i}}{\partial Z^{j}} \frac{\partial Z^{j}}{\partial Z^{j'}} \frac{\partial Z^{i}}{\partial Z^{i'}} + T_i \frac{\partial^2 Z^{i}}{\partial Z^{j'} \partial Z^{i'}} \\ & = & \frac{\partial T_{i}}{\partial Z^{j}} J^{j}_{j'} J^{i}_{i'} + T_i J^{i}_{i',j'} \end{eqnarray}

The skew-symmetric part is a tensor

\begin{eqnarray} \frac{\partial T_{i}}{\partial Z^{j}} - \frac{\partial T_{j}}{\partial Z^{i}} & = & \frac{\partial T_{i'}}{\partial Z^{j'}} J^{j'}_{j} J^{i'}_{i} + T_{i'} J^{i'}_{i,j} - \frac{\partial T_{j'}}{\partial Z^{i'}} J^{i'}_{i} J^{j'}_{j} - T_{i'} J^{i'}_{j,i} \\ & = & (\frac{\partial T_{i'}}{\partial Z^{j'}} - \frac{\partial T_{j'}}{\partial Z^{i'}}) J^{j'}_{j} J^{i'}_{i} \end{eqnarray}

The derivative of an order one contravariant tensor is not a tensor

\begin{eqnarray} T^{i} & = & T^{i'} \frac{\partial Z^{i}}{\partial Z^{i'}} \\ \frac{\partial T^{i}}{\partial Z^{j}} & = & \frac{\partial T^{i'}}{\partial Z^{j'}} \frac{\partial Z^{j'}}{\partial Z^{j}} \frac{\partial Z^{i}}{\partial Z^{i'}} + T^{i'} \frac{\partial Z^{j'}}{\partial Z^{j}} \frac{\partial^2 Z^{i}}{\partial Z^{i'} \partial Z^{j'}} \\ & = & \frac{\partial T^{i'}}{\partial Z^{j'}} J^{j'}_j J^{i}_{i'} + T^{i'} J^{j'}_j J^{i}_{i',j'} \\ \end{eqnarray}

The partial derivative of the Christoffel symbol wrt contravariant index

\begin{eqnarray} \Gamma^{k}_{i,j} & = & \mathbf{Z}^k \cdot \frac{\partial \mathbf{Z}_i }{\partial Z^j} \end{eqnarray}\begin{eqnarray} \frac{\partial \Gamma^{k}_{i,j}}{\partial Z^{k'}} & = & \frac{\partial {Z}^{u}}{\partial Z^{k'}} \frac{\partial \mathbf{Z}^k}{\partial Z^{u}} \cdot \frac{\partial \mathbf{Z}_i }{\partial Z^j} + \mathbf{Z}^k \cdot \frac{\partial {Z}^{j'} }{\partial Z^{j}} \frac{\partial^2 \mathbf{Z}_i }{\partial Z^{k'} \partial Z^{j'}} \\ & = & J^{u}_{k'} (- \Gamma^{k}_{u,r} \mathbf{Z}^r ) \Gamma^{s}_{i,j} \mathbf{Z}_s + \mathbf{Z}^k \cdot J^{j'}_j \frac{\partial^2 \mathbf{Z}_i }{\partial Z^{k'} \partial Z^{j'}} \\ & = & - J^{u}_{k'} \Gamma^{k}_{u,r} \mathbf{Z}^r \cdot \mathbf{Z}_s \Gamma^{s}_{i,j} + \mathbf{Z}^k \cdot J^{j'}_j \frac{\partial^2 \mathbf{Z}_i }{\partial Z^{k'} \partial Z^{j'}} \\ & = & - J^{u}_{k'} \Gamma^{k}_{u,r} \delta^r_s \Gamma^{s}_{i,j} + \mathbf{Z}^k \cdot J^{j'}_j \frac{\partial^2 \mathbf{Z}_i }{\partial Z^{k'} \partial Z^{j'}} \\ & = & - J^{u}_{k'} \Gamma^{k}_{u,s} \Gamma^{s}_{i,j} + \mathbf{Z}^k \cdot J^{j'}_j \frac{\partial^2 \mathbf{Z}_i }{\partial Z^{k'} \partial Z^{j'}} \\ \end{eqnarray}

The transform rule for the Christoffel symbol \begin{eqnarray} \Gamma^{k'}_{i',j'} & = & \mathbf{Z}^{k'} \cdot \frac{\partial \mathbf{Z}_{i'} }{\partial Z^{j'}} \end{eqnarray}

$\mathbf{Z}_{i'}$ is a covariant tensor, we found the derivatives of tensors as $$ \frac{\partial \mathbf{Z}_{i'}}{\partial Z^{j'}} = \frac{\partial \mathbf{Z}_{i}}{\partial Z^{j}} J^{j}_{j'} J^{i}_{i'} + \mathbf{Z}_i J^{i}_{i',j'} $$

\begin{eqnarray} \Gamma^{k'}_{i',j'} & = & (\mathbf{Z}^{k} J^{k'}_k ) \cdot (\frac{\partial \mathbf{Z}_{i}}{\partial Z^{j}} J^{j}_{j'} J^{i}_{i'} + \mathbf{Z}_i J^{i}_{i',j'} ) \\ & = & \mathbf{Z}^{k} \cdot \frac{\partial \mathbf{Z}_{i}}{\partial Z^{j}} J^{k'}_k J^{j}_{j'} J^{i}_{i'} + \mathbf{Z}^k \mathbf{Z}_i J^{k'}_k J^{i}_{i',j'} \\ & = & \Gamma^{k}_{i,j} J^{k'}_k J^{j}_{j'} J^{i}_{i'} + J^{k'}_k J^{k}_{i',j'} \end{eqnarray}

Variant: An object that can be obtained through applying the same rule in a coordinate system. Examples are the covariant basis ($\mathbf{Z}_i$ or the covariant metric tensor $Z_{i,j}$, the Christoffel symbol). The jacobian $J^{i}_{i'}$ and its derivative $J^{i}_{i',j'}$ are not variants as one needs two coordinate systems for their construction.

A tensor $T_i$ is a variant that transforms according to the rule $$ T_{i'} = T_i J^{i}_{i'} $$

The Covariant Derivative

Take a vector (an invariant) and calculate the partial derivative with respect to the $k$'th contravariant component of the covariant basis \begin{eqnarray} \frac{\partial \mathbf{V}}{\partial Z^{k}} & = & \frac{\partial (V^i\mathbf{Z}_i) }{\partial Z^{k}} = \frac{\partial V^i}{\partial Z^{k}} \mathbf{Z}_i + V^i \frac{\partial \mathbf{Z}_i}{\partial Z^{k}} = \frac{\partial V^i}{\partial Z^{k}} \mathbf{Z}_i + V^i \Gamma_{ik}^j \mathbf{Z}_j \end{eqnarray} Changing names of the dummy indices that are contracted over, we find \begin{eqnarray} \frac{\partial \mathbf{V}}{\partial Z^{k}} & = & \frac{\partial V^i}{\partial Z^{k}} \mathbf{Z}_i + V^i \Gamma_{ik}^j \mathbf{Z}_j = \frac{\partial V^i}{\partial Z^{k}} \mathbf{Z}_i + V^j \Gamma_{jk}^i \mathbf{Z}_i = \left( \frac{\partial V^i}{\partial Z^{k}} + \Gamma_{jk}^i V^j \right) \mathbf{Z}_i \end{eqnarray}

The covariant derivative is

\begin{eqnarray} \nabla_k T^i \equiv \frac{\partial T^i}{\partial Z^{k}} + \Gamma_{rk}^i T^r \end{eqnarray}

Do the same derivation using the contravariant basis

\begin{eqnarray} \frac{\partial \mathbf{V}}{\partial Z^{k}} & = & \frac{\partial (V_i \mathbf{Z}^i) }{\partial Z^{k}} = \frac{\partial V_i}{\partial Z^{k}} \mathbf{Z}^i + V_i \frac{\partial \mathbf{Z}^i}{\partial Z^{k}} \end{eqnarray}

From the properties of the Christoffel symbol: \begin{eqnarray} \frac{\partial \mathbf{Z}^i}{\partial Z^{k}} = -\Gamma^{i}_{jk} \mathbf{Z}^{j} \end{eqnarray} and exchange dummy indices

\begin{eqnarray} \frac{\partial \mathbf{V}}{\partial Z^{k}} & = & \frac{\partial V_i}{\partial Z^{k}} \mathbf{Z}^i - V_j \Gamma^{j}_{ik} \mathbf{Z}^{i} = \left(\frac{\partial V_i}{\partial Z^{k}} - V_j \Gamma^{j}_{ik} \right) \mathbf{Z}^{i} \end{eqnarray}

Similarly we define

\begin{eqnarray} \nabla_k T_i \equiv \frac{\partial T_i}{\partial Z^{k}} - \Gamma_{ik}^m T_m \end{eqnarray}

This implies that

\begin{eqnarray} \nabla_k T^i_j \equiv \frac{\partial T^i_j}{\partial Z^{k}} + \Gamma_{rk}^i T^r_j - \Gamma_{kj}^m T^i_m \end{eqnarray}

Metrinillic property

\begin{eqnarray} \nabla_k \mathbf{Z}_i = \frac{\partial \mathbf{Z}_i}{\partial Z^{k}} - \Gamma_{ik}^m \mathbf{Z}_m = \Gamma_{ik}^m \mathbf{Z}_m - \Gamma_{ik}^m \mathbf{Z}_m = 0 \end{eqnarray}

$\cos(\arccos(x)) = x$

$-\sin(\arccos(x))\arccos'(x) = 1$

$\sin^2(\arccos(x)) + \cos^2(\arccos(x)) = 1$

$\sin^2(\arccos(x)) = 1 - x^2$

$ \arccos'(x) = \pm\frac{1}{\sqrt{1-x^2}} $

Diffeomorphism

Amari

$\newcommand{\E}[1]{\left\langle{#1}\right\rangle}$

All the $p(x; \theta)$ have the same support
$\ell(x; \theta) = \log p(x; \theta)$
For $i=1\dots n$, the derivatives as functions in $x$ are linearly independent $$ \frac{\partial \ell(x; \theta)}{\partial \theta^i} $$
The moments of ${\partial \ell(x; \theta)}/{\partial \theta^i}$ exist.

The tangent space $T_\theta$

Coordinate curves

\begin{eqnarray} \theta_1(t) & = & (\theta^1_0+t, \theta^2_0, \dots, \theta^n_0) \end{eqnarray}

Tangent $C_1$ \begin{eqnarray} C_1 f & = & \frac{d}{dt} f(\theta_1(t)) = \frac{\partial f(\theta_1(t))}{\partial \theta^i} \frac{\partial \theta^i}{\partial t} \end{eqnarray} is denoted as $\partial_1$

The covariant basis is denoted by $\partial_i$, so any vector in the tangent space is \begin{eqnarray} A & = & A^i \partial_i \end{eqnarray} As random variables

\begin{eqnarray} T_\theta^{(1)} = \left\{ A(x) : A(x) = A^i \partial_i \ell(x; \theta) \right\} \end{eqnarray}\begin{eqnarray} 1 &=& \int p(x;\theta) dP(x) \\ 0 & = & \frac{\partial}{\partial \theta_i} \int p(x;\theta) dP(x) = \int \frac{\partial}{\partial \theta_i} p(x;\theta) dP(x) \\ & = & \int p(x;\theta) \partial_i \ell(x; \theta) dP(x) = \E{\partial_i \ell(x; \theta)} \end{eqnarray}

Jacobians

\begin{eqnarray} J^\alpha_i & = & \frac{\partial \xi^\alpha}{\partial \theta^i} \end{eqnarray}\begin{eqnarray} \bar{J}^i_\alpha & = & \frac{\partial \theta^i}{\partial \xi^\alpha} \end{eqnarray}\begin{eqnarray} \partial_\alpha & = & \bar{J}^i_\alpha \partial_i \end{eqnarray}

Metric tensor

\begin{eqnarray} g_{i,j} & = & \E{ \partial_i \cdot \partial_j} \end{eqnarray}

Standard Normal

\begin{eqnarray} \theta_1 & = & \mu \\ \theta_2 & = & \sigma \end{eqnarray}\begin{eqnarray} \xi_1 & = & \mu \\ \xi_2 & = & \mu^2 + \sigma^2 \end{eqnarray}



In [2]:

    
from IPython.display import HTML, display
from IPython.display import IFrame

display(HTML('<img src="../feed_forward2.png" width="250">'))

IFrame("../latex_figures/feed_forward2.pdf", width=350, height=100)









    











    Out[2]:



In [10]:

    
import numpy as np
x_1 = np.array([0,1.2,5])
x_2 = np.array([0,-3,5])

for x,y in zip(x_1,x_2):
    print(np.sin(2*x)*np.cos(x*y))

    
dfdx1 = -np.sin(2*x_1)*np.sin(x_1*x_2)*x_2 + 2*np.cos(x_1)*np.cos(x_1*x_2)
dfdx2 = -np.sin(2*x_1)*np.sin(x_1*x_2)*x_1

print(dfdx1)

print(dfdx2)

for i in range(3):
    print(dfdx1[i], dfdx2[i])









    



0.0
-0.605727292083
-0.539235254827
[ 2.          0.24682407  0.20232278]
[-0.         -0.35868752 -0.36001073]
2.0 -0.0
0.246824066159 -0.358687519304
0.202322781119 -0.360010730582