```
In [1]:
```example_list = [1, 2, 3,]
average = ((example_list[0] + example_list[1] + example_list[2])) / 3
# short cut using python functions :
average2 = sum(example_list) / len(example_list)
print(average)
print(average2)

```
```

If you are a fan of mathematical formulae here is an alternative using greek symbols:

\begin{equation} \frac{1}{n}\sum_{i=0}^{n-1}x_i = \frac{x_{0}+x_{1}+x_{2}+...+x_{n-1}}{n} \end{equation}where we have a list of $n$ numbers $x_i\in\mathbb{R}$ s.t. $i \in [0,n)$.

Now that we have defined what an average is lets use it to smooth out our climate data timeseries in order to improve its 'readability'. Before this lets talk a bit about timeseries so we have a clear idea of what they are.

Lets say that every day of the week I use a pedometer to track the number of steps I am taking as an experiment to monitor how much I am walking on a daily average:

```
Day Number of Steps
--------------------------
Monday 132
Tuesday 250
Wednesday 101
Thursday 230
Friday 396
Saturday 444
Sunday 60
```

This table is information that changes over time. Measurements that were taken one day after the other. More formally a timeseries is a list of measurements or data points that where taken sequentially at a given time interval. In our specific example the time interval was a day. Is this data 100% reliable ? Is it possible that the pedometer was on low battery on sunday or does it just reflect that sundays are my sleepy day?

Just like the example above we have an equivalent timeseries which measures things such as `anual temperature, snow cover, average sunshine and other climate related measurements`

.
In this case the measurements are yearly (this is our time interval, a year).

Why would we want to smooth our data ? because of noise in our measurements. It is possible that in a particular year the average temperature was difficult to measure thus this will result in having an outlier which may make our timeseries a bit harder to interpret. We want to smooth in order to make patterns easier to find because sometimes too much information just makes things confusing.

Lets have a step by step run on how smoothing can be carried out using the pedometer timeseries as an example:

(1) We need to decide on a window size. What this means in our pedometer example is how many days back in time are we going take in to acount when averaging. Lets say $window$ $size$ is $2$;

`window size = 2`

(2) Start with the most recent data point in time (in our case that is Sunday) and take $window$ $size$ $-1$ (our window size is 2 thus this is 1) steps backwards in time. Collect all the measurements within this range including the one on Sunday:

```
Day Number of Steps
--------------------------
Saturday 444
Sunday 60
```

- (3) Take the average of the measurements in the the $window$ $size$ $-1$ range:

`(444 + 60) / 2 = 252`

- (4) This now becomes the measurement for Sunday so we update our table

```
Day Number of Steps
--------------------------
Monday 132
Tuesday 250
Wednesday 101
Thursday 230
Friday 396
Saturday 444
Sunday 255
```

- (5) Now we take one step backwards to Saturday and we repeat steps 2,3,4 until we can no longer carry out step 2 (which is making the list to average by taking $window$ $size$ $-1$ steps backwards) this will happen when we reach Monday and thus we wont be able to compute a moving average value for that day.

On our pedometer data this is what the result of the moving average looks like:

```
Day Number of Steps
--------------------------
Tuesday 191
Wednesday 175.5
Thursday 165.5
Friday 313
Sunday 420
Saturday 255
```

As we can see now the numbers change a lot less drastically.

- What would be the number of elements left in the new table if our $window$ size is 3 ?
- In the template bellow try to replicate the exercise we did in python (Difficult).

```
In [2]:
```# Our table as a list where the index is the day
# 0 being Monday and 6 Sunday
number_of_steps = [132, 250, 101, 230, 396, 444, 60]
window_size = 2
averaged_number_of_steps = list()
# Change this loop a bit to get our moving average working:
for i, step_count in enumerate(number_of_steps):
# Makes sure that it calculates only up till tuesday
if i - window_size + 1 >= 0:
# this would be steps (2) and (3)
average = 0
# The new values get updated
# Note they are being inserted in the wrong way
averaged_number_of_steps.append(average)
#Reversed deals with them being inserted in the wrong way
print(list(reversed(averaged_number_of_steps)))

```
```

Now lets have a look at our climate data and see what a moving average can do for it:

```
In [3]:
```# Our home brewed data analysis library for dds
from dds_lab.climdat import ClimPlots
# Importing locations of the data
from dds_lab.datasets import climate
# Plotting facilities
from bokeh.plotting import show, output_notebook
output_notebook()

```
```