This notebook contains a short guide on using the solvers w/o backprop or computing gradients. Some issues of interest include:

- How to define and solve SDEs with this codebase
- How to run things on a GPU
- How to gain control over the randomness and enforce deterministic behavior with fixed seeds (e.g. when testing)
- The subtlety of noise type in SDEs

The other file in the `examples`

folder (`latent_sde.py`

) contains the use case where gradients need to be taken to fit parameters.

```
In [1]:
```import torch
from torch import nn
import os
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
sys.path.append(module_path)
%matplotlib inline
import matplotlib.pyplot as plt
from torchsde import sdeint, BrownianPath, BrownianTree

Just like how each ordinary differential equation (ODE) is governed by a vector field, a stochastic differential equation (SDE) is governed by two vector fields, which are called the **drift** and **diffusion** functions:
$$dx(t) = \underbrace{f(x(t), t, \theta_f)}_{\text{drift}} dt + \underbrace{g(x(t), t, \theta_g)}_{\text{diffusion}} dW(t).$$
The output of $f$ is of the same size as the $d$-dimensional state, whereas the output of $g$ may be a matrix of size $(d, m)$.

Here, $W(t)$ is the Brownian motion (aka Wiener process), and it may be $m$ dimensional. It is a stochastic process, and each random draw produces a function of time.

To implement an SDE, we create a class with the functions `f`

and `g`

:

```
In [2]:
```class SDE(nn.Module):
def __init__(self):
super().__init__()
self.theta = nn.Parameter(torch.tensor(0.1), requires_grad=False) # Scalar parameter.
self.noise_type = "diagonal"
self.sde_type = "ito"
def f(self, t, y):
return torch.sin(t) + self.theta * y
def g(self, t, y):
return 0.3 * torch.sigmoid(torch.cos(t) * torch.exp(-y))

The functions `f`

and `g`

are arbitrarily chosen for demonstration purposes. The attributes `noise_type`

and `sde_type`

are used in the solver to determine the particular numerical method being used and must be included. We use `diagonal`

here, meaning the output of `g`

should be a vector with the same shape as input `y`

, and it is an element-wise function.
Note that for any other noise type, we expect the output of `g`

to be a matrix, and batch matrix-vector product is performed under the hood.
The requirement of element-wise function is a rather technical condition to ensure the high-order solvers attain their theoretically derived efficiency.

All solvers in the codebase are based on Itô stochastic integrals, so we use `ito`

for the `sde_type`

attribute. The library also has a base class `SDEIto`

, which can be inherited from and imported directly, and saves the extra line of setting the `sde_type`

attribute. As a side note, our adjoint computation internally computes a Stratonovich correction term and performs the reverse pass with it. We plan to add solvers based on stratonovich SDEs in the future.

Now we instantiate an object of the SDE class and call the function `sdeint`

on it.

```
In [3]:
```batch_size, d, T = 3, 1, 100
sde = SDE()
ts = torch.linspace(0, 1, T)
y0 = torch.zeros(batch_size, 1).fill_(0.1) # (batch_size, d)
with torch.no_grad():
ys = sdeint(sde, y0, ts, method='srk') # (T, batch_size, d) = (100, 3, 1).
plt.figure()
for i in range(batch_size):
plt.plot(ts, ys[:, i].squeeze(), marker='x', label=f'sample {i}')
plt.xlabel('$t$')
plt.ylabel('$y_t$')
plt.legend()
plt.show()

```
```

`method='srk'`

means the strong order 1.5 Stochastic Runge-Kutta (SRK) method is used. Other possible methods include the strong order 0.5 `euler`

and strong order 1.0 `milstein`

, both of which are of slightly lower order.

We stress that the drift and diffusion functions don't necessarily need to be defined as the `f`

and `g`

methods of the class. They can be methods with any name, so long as we provide these names to the solver when they differ from the default `f`

and `g`

. The following is an example where the function `h`

is used as the drift.

```
In [4]:
```class SDENewName(nn.Module):
def __init__(self):
super().__init__()
self.theta = nn.Parameter(torch.tensor(0.1), requires_grad=False) # Scalar parameter.
self.noise_type = "diagonal"
self.sde_type = "ito"
def h(self, t, y):
return torch.sin(t) + self.theta * y
def g(self, t, y):
return 0.3 * torch.sigmoid(torch.cos(t) * torch.exp(-y))
sde_new_name = SDENewName()
with torch.no_grad():
ys = sdeint(sde_new_name, y0, ts, method='srk', names={'drift': 'h'}) # Supply a dictionary to the argument `names`.
plt.figure()
for i in range(batch_size):
plt.plot(ts, ys[:, i].squeeze(), marker='x', label=f'sample {i}')
plt.xlabel('$t$')
plt.ylabel('$y_t$')
plt.legend()
plt.show()

```
```

```
In [5]:
```if torch.cuda.is_available():
gpu = torch.device('cuda')
sde = SDE().to(gpu)
ts = ts.to(gpu)
y0 = y0.to(gpu)
with torch.no_grad():
ys = sdeint(sde, y0, ts, method='srk') # (100, 3, 1).
plt.figure()
for i in range(batch_size):
plt.plot(ts.cpu(), ys[:, i].squeeze().cpu(), marker='x', label=f'sample {i}')
plt.xlabel('$t$')
plt.ylabel('$y_t$')
plt.legend()
plt.show()

```
```

To gain control over the randomness, we draw Brownian motion samples by instantiating objects of classes `BrownianPath`

or `BrownianTree`

. `BrownianPath`

has fast query, but stores all previous queries, so is costly in memory. `BrownianTree`

only stores objects in a fixed size cache, but has slower query, since everything else is reconstructed on the fly based on random seed splitting. Repeated queries on the same Brownian motion object gives deterministic results. Here, we use `BrownianPath`

as an example.

```
In [6]:
```ts = torch.linspace(0, 1, T)
bm = BrownianPath(t0=0.0, w0=torch.zeros(batch_size, d))
bm_queries = torch.stack([bm(t) for t in ts], dim=0)
plt.figure()
plt.title('Query')
for i in range(batch_size):
plt.plot(ts, bm_queries[:, i].squeeze(), marker='x', label=f'sample {i}')
plt.xlabel('$t$')
plt.ylabel('$W_t$')
plt.legend()
plt.show()
bm_queries2 = torch.stack([bm(t) for t in ts], dim=0)
plt.figure()
plt.title('Query again (samples should be same as before)')
for i in range(batch_size):
plt.plot(ts, bm_queries2[:, i].squeeze(), marker='x', label=f'sample {i}')
plt.xlabel('$t$')
plt.ylabel('$W_t$')
plt.legend()
plt.show()
assert torch.allclose(bm_queries, bm_queries2)

```
```

`w0`

on the GPU or using the `to`

method of the `bm`

object:

```
In [7]:
```if torch.cuda.is_available():
# Approach 1:
bm = BrownianPath(t0=0.0, w0=torch.zeros(batch_size, d).to(gpu)) # Runs on GPU.
print(bm(0.5))
# Approach 2:
bm = BrownianPath(t0=0.0, w0=torch.zeros(batch_size, d)) # Runs on CPU.
bm.to(gpu) # Runs on GPU.
print(bm(0.5))

```
```

We can also feed this fixed Brownian motion sample into the solver to get deterministic behavior:

```
In [8]:
```sde = SDE()
ts = torch.linspace(0, 1, T)
y0 = torch.zeros(batch_size, 1).fill_(0.1) # (batch_size, d)
with torch.no_grad():
ys = sdeint(sde, y0, ts, method='srk', bm=bm)
plt.figure()
plt.title('Solve SDE')
for i in range(batch_size):
plt.plot(ts, ys[:, i].squeeze(), marker='x', label=f'sample {i}')
plt.xlabel('$t$')
plt.ylabel('$y_t$')
plt.legend()
plt.show()
with torch.no_grad():
ys = sdeint(sde, y0, ts, method='srk', bm=bm)
plt.figure()
plt.title('Solve SDE again (samples should be same as before)')
for i in range(batch_size):
plt.plot(ts, ys[:, i].squeeze(), marker='x', label=f'sample {i}')
plt.xlabel('$t$')
plt.ylabel('$y_t$')
plt.legend()
plt.show()
# Use a new BM sample, we expect different sample paths.
bm = BrownianPath(t0=0.0, w0=torch.zeros(batch_size, d))
with torch.no_grad():
ys = sdeint(sde, y0, ts, method='srk', bm=bm)
plt.figure()
plt.title('Solve SDE (expect different sample paths)')
for i in range(batch_size):
plt.plot(ts, ys[:, i].squeeze(), marker='x', label=f'sample {i}')
plt.xlabel('$t$')
plt.ylabel('$y_t$')
plt.legend()
plt.show()

```
```

The supported noise types of this codebase are "diagonal", "additive", "scalar", and "general". The following is a simple summary of each type:

- "diagonal": The diffusion function is an elementwise function, with the output being the same dimension as the state (both $d$-dimensional). There are $d$ independent Brownian motions, each responsible for the noise of only a single state dimension.
- "additive": The diffusion function is a constant w.r.t. the state, i.e. the derivative of the diffusion function w.r.t. the state is 0. The output of the diffusion function is of size $(d, m)$, and the system has $m$ independent Brownian motions. The integral involving the Brownian motion can be loosely interpreted as integrating a sequence of matrix-vector products.
- "scalar": The diffusion function has output shape $(d, 1)$, and a single Brownian motion is shared across all state dimensions.
- "general": The diffusion function has output shape $(d, m)$, and the system has $m$ independent Brownian motions.

It is tempting to use the noise type configuration "general" for all problems. However, since there's little known structure for these SDEs, high-order solvers are not possible, and the current codebase only supports the `euler`

method. All three methods (`euler`

, `milstein`

, and `srk`

) are supported for all remaining noise types.

Lastly, for modeling problems, our limited experience have found "diagonal" to be a good setting, where flexibility of models and tractability of numerical integration is rather well-balanced.