F4B101A / TP4 / Analyse de séries temporelles et modèles ARMA

D'après le notebook IPython par P. Tandeo et T. Guilment.
Données : concentration hebdomadaire de CO2 mesurée à Hawaii entre 1974 et 2016 (ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_weekly_mlo.txt).


In [45]:
using DataFrames
using DSP
using Gadfly
using PyCall
@pyimport numpy as np

In [46]:
Y = readdlm("CO2.txt")[:,1]; # CO2 concentration (ppm)
time = linspace(1974.38, 2016.753, length(Y)); # Year (decimal)

xmin = minimum(time);
xmax = maximum(time);

df_raw = DataFrame(x=time, y=Y, ymin=Y, ymax=Y, f="raw data");

In [47]:
plot(df_raw, x=:x, y=:y, color=:f, Geom.line,
Coord.cartesian(xmin=xmin, xmax=xmax),
Guide.xlabel("Time"),
Guide.ylabel("CO2 concentration (ppm)"))


Out[47]:
Time 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 2020 2030 2040 2050 2060 2070 2080 1932 1934 1936 1938 1940 1942 1944 1946 1948 1950 1952 1954 1956 1958 1960 1962 1964 1966 1968 1970 1972 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2022 2024 2026 2028 2030 2032 2034 2036 2038 2040 2042 2044 2046 2048 2050 2052 2054 2056 2058 2060 1900 1950 2000 2050 2100 1930 1935 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 2020 2025 2030 2035 2040 2045 2050 2055 2060 raw data f 200 225 250 275 300 325 350 375 400 425 450 475 500 525 550 225 230 235 240 245 250 255 260 265 270 275 280 285 290 295 300 305 310 315 320 325 330 335 340 345 350 355 360 365 370 375 380 385 390 395 400 405 410 415 420 425 430 435 440 445 450 455 460 465 470 475 480 485 490 495 500 505 510 515 520 525 200 300 400 500 600 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 410 420 430 440 450 460 470 480 490 500 510 520 530 CO2 concentration (ppm)

In [48]:
A = hcat(time.^2, time, ones(length(time)));
coeffs = A\Y;
T(X) = coeffs[1]*(X.^2) + coeffs[2]*X + coeffs[3];

df_tendancy = DataFrame(x=time, y=T(time), f="tendancy");


WARNING: Method definition T(Any) in module Main at In[20]:3 overwritten at In[48]:3.

In [49]:
plot(vcat(df_raw, df_tendancy), x=:x, y=:y, color=:f, Geom.line,
Coord.cartesian(xmin=xmin, xmax=xmax),
Guide.xlabel("Time"),
Guide.ylabel("CO2 concentration (ppm)"))


Out[49]:
Time 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 2020 2030 2040 2050 2060 2070 2080 1932 1934 1936 1938 1940 1942 1944 1946 1948 1950 1952 1954 1956 1958 1960 1962 1964 1966 1968 1970 1972 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2022 2024 2026 2028 2030 2032 2034 2036 2038 2040 2042 2044 2046 2048 2050 2052 2054 2056 2058 2060 1900 1950 2000 2050 2100 1930 1935 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 2020 2025 2030 2035 2040 2045 2050 2055 2060 raw data tendancy f 200 225 250 275 300 325 350 375 400 425 450 475 500 525 550 225 230 235 240 245 250 255 260 265 270 275 280 285 290 295 300 305 310 315 320 325 330 335 340 345 350 355 360 365 370 375 380 385 390 395 400 405 410 415 420 425 430 435 440 445 450 455 460 465 470 475 480 485 490 495 500 505 510 515 520 525 200 300 400 500 600 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 410 420 430 440 450 460 470 480 490 500 510 520 530 CO2 concentration (ppm)

In [50]:
plot(x=time, y=Y-T(time), Geom.line,
Guide.xlabel("Time"),
Guide.ylabel("Value"),
Guide.title("Seasonality only"))


Out[50]:
Time 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 2020 2030 2040 2050 2060 2070 2080 1920 1922 1924 1926 1928 1930 1932 1934 1936 1938 1940 1942 1944 1946 1948 1950 1952 1954 1956 1958 1960 1962 1964 1966 1968 1970 1972 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2022 2024 2026 2028 2030 2032 2034 2036 2038 2040 2042 2044 2046 2048 2050 2052 2054 2056 2058 2060 2062 2064 2066 2068 2070 1900 1950 2000 2050 2100 1920 1925 1930 1935 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 2020 2025 2030 2035 2040 2045 2050 2055 2060 2065 2070 -35 -30 -25 -20 -15 -10 -5 0 5 10 15 20 25 30 35 -30 -29 -28 -27 -26 -25 -24 -23 -22 -21 -20 -19 -18 -17 -16 -15 -14 -13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 -40 -20 0 20 40 -30 -28 -26 -24 -22 -20 -18 -16 -14 -12 -10 -8 -6 -4 -2 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 Value Seasonality only

In [51]:
p = periodogram(Y-T(time), fs=52);

In [52]:
plot(x=p.freq, y=p.power, Geom.line,
Coord.cartesian(xmin=0, xmax=5),
Scale.y_log10,
Guide.xlabel("Frequency [Hz]"),
Guide.ylabel("Power Spectral Density [V^2/Hz]"),
Guide.title("Periodogram"))


Out[52]:
Frequency [Hz] -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 -5.0 -4.8 -4.6 -4.4 -4.2 -4.0 -3.8 -3.6 -3.4 -3.2 -3.0 -2.8 -2.6 -2.4 -2.2 -2.0 -1.8 -1.6 -1.4 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0 4.2 4.4 4.6 4.8 5.0 5.2 5.4 5.6 5.8 6.0 6.2 6.4 6.6 6.8 7.0 7.2 7.4 7.6 7.8 8.0 8.2 8.4 8.6 8.8 9.0 9.2 9.4 9.6 9.8 10.0 -5 0 5 10 -5.0 -4.5 -4.0 -3.5 -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 10-50 10-45 10-40 10-35 10-30 10-25 10-20 10-15 10-10 10-5 100 105 1010 1015 1020 1025 1030 1035 10-45 10-44 10-43 10-42 10-41 10-40 10-39 10-38 10-37 10-36 10-35 10-34 10-33 10-32 10-31 10-30 10-29 10-28 10-27 10-26 10-25 10-24 10-23 10-22 10-21 10-20 10-19 10-18 10-17 10-16 10-15 10-14 10-13 10-12 10-11 10-10 10-9 10-8 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 101 102 103 104 105 106 107 108 109 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 10-50 100 1050 10-46 10-44 10-42 10-40 10-38 10-36 10-34 10-32 10-30 10-28 10-26 10-24 10-22 10-20 10-18 10-16 10-14 10-12 10-10 10-8 10-6 10-4 10-2 100 102 104 106 108 1010 1012 1014 1016 1018 1020 1022 1024 1026 1028 1030 Power Spectral Density [V^2/Hz] Periodogram
$$S(t) = \beta_1 cos(2\pi\omega_1 t) + \beta_2 sin(2\pi\omega_1 t) + \beta_3 cos(2\pi\omega_2 t) + \beta_4 sin(2\pi\omega_2 t)$$

In [53]:
w1 = 1;
w2 = 2;

cosin(w,X) = cos(2*pi*w*X);
sinus(w,X) = sin(2*pi*w*X);

A = hcat(cosin(w1,time), sinus(w1,time), cosin(w2,time), sinus(w2,time));
coeffs_S = A\(Y-T(time));
S(X) = coeffs[1]*cosin(w1,X) + coeffs_S[2]*sinus(w1,X) + coeffs_S[3]*cosin(w2,X) + coeffs_S[4]*sinus(w2,X);

df_seasonality = DataFrame(x=time, y=T(time)+S(time), f="+ seasonality");


WARNING: Method definition cosin(Any, Any) in module Main at In[26]:4 overwritten at In[53]:4.
WARNING: Method definition sinus(Any, Any) in module Main at In[26]:5 overwritten at In[53]:5.
WARNING: Method definition S(Any) in module Main at In[26]:9 overwritten at In[53]:9.

In [54]:
plot(vcat(df_raw, df_tendancy, df_seasonality), x=:x, y=:y, color=:f, Geom.line,
Coord.cartesian(xmin=xmin, xmax=xmax),
Guide.xlabel("Time"),
Guide.ylabel("CO2 concentration (ppm)"))


Out[54]:
Time 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 2020 2030 2040 2050 2060 2070 2080 1932 1934 1936 1938 1940 1942 1944 1946 1948 1950 1952 1954 1956 1958 1960 1962 1964 1966 1968 1970 1972 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2022 2024 2026 2028 2030 2032 2034 2036 2038 2040 2042 2044 2046 2048 2050 2052 2054 2056 2058 2060 1900 1950 2000 2050 2100 1930 1935 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 2020 2025 2030 2035 2040 2045 2050 2055 2060 raw data tendancy + seasonality f 200 225 250 275 300 325 350 375 400 425 450 475 500 525 550 225 230 235 240 245 250 255 260 265 270 275 280 285 290 295 300 305 310 315 320 325 330 335 340 345 350 355 360 365 370 375 380 385 390 395 400 405 410 415 420 425 430 435 440 445 450 455 460 465 470 475 480 485 490 495 500 505 510 515 520 525 200 300 400 500 600 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 410 420 430 440 450 460 470 480 490 500 510 520 530 CO2 concentration (ppm)

In [55]:
Z = Y-T(time)-S(time);

In [56]:
plot(x=time, y=Z, Geom.line,
Coord.cartesian(xmin=xmin, xmax=xmax),
Guide.xlabel("Time"),
Guide.ylabel("CO2 concentration (ppm)"),
Guide.title("Residuals"))


Out[56]:
Time 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 2020 2030 2040 2050 2060 2070 2080 1932 1934 1936 1938 1940 1942 1944 1946 1948 1950 1952 1954 1956 1958 1960 1962 1964 1966 1968 1970 1972 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2022 2024 2026 2028 2030 2032 2034 2036 2038 2040 2042 2044 2046 2048 2050 2052 2054 2056 2058 2060 1900 1950 2000 2050 2100 1930 1935 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 2020 2025 2030 2035 2040 2045 2050 2055 2060 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 -11.0 -10.5 -10.0 -9.5 -9.0 -8.5 -8.0 -7.5 -7.0 -6.5 -6.0 -5.5 -5.0 -4.5 -4.0 -3.5 -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 10.5 11.0 11.5 12.0 12.5 13.0 -20 -10 0 10 20 -11.0 -10.5 -10.0 -9.5 -9.0 -8.5 -8.0 -7.5 -7.0 -6.5 -6.0 -5.5 -5.0 -4.5 -4.0 -3.5 -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 10.5 11.0 11.5 12.0 12.5 13.0 CO2 concentration (ppm) Residuals

In [57]:
Z1 = Z[2:length(Z)];   # Z(t)
Z2 = Z[1:length(Z)-1]; # Z(t-1);

In [58]:
plot(x=Z2, y=Z1,
Guide.xlabel("Z(t-1)"),
Guide.ylabel("Z(t)"),
Guide.title("Autocorrelation"))


Out[58]:
Z(t-1) -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 -11.0 -10.5 -10.0 -9.5 -9.0 -8.5 -8.0 -7.5 -7.0 -6.5 -6.0 -5.5 -5.0 -4.5 -4.0 -3.5 -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 10.5 11.0 11.5 12.0 12.5 13.0 -20 -10 0 10 20 -11.0 -10.5 -10.0 -9.5 -9.0 -8.5 -8.0 -7.5 -7.0 -6.5 -6.0 -5.5 -5.0 -4.5 -4.0 -3.5 -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 10.5 11.0 11.5 12.0 12.5 13.0 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 -11.0 -10.5 -10.0 -9.5 -9.0 -8.5 -8.0 -7.5 -7.0 -6.5 -6.0 -5.5 -5.0 -4.5 -4.0 -3.5 -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 10.5 11.0 11.5 12.0 12.5 13.0 -20 -10 0 10 20 -11.0 -10.5 -10.0 -9.5 -9.0 -8.5 -8.0 -7.5 -7.0 -6.5 -6.0 -5.5 -5.0 -4.5 -4.0 -3.5 -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 10.5 11.0 11.5 12.0 12.5 13.0 Z(t) Autocorrelation

In [59]:
# fit AR(1)
reg_AR_1 = np.corrcoef(Z1, Z2);
phi = reg_AR_1[2];

# define the final residuals epsilon (stationary)
epsilon = Z2-(phi*Z1);

# compute var[Z(t+h)]
sigma2 = var(epsilon)
function compute_var_Z(sigma2, phi, h)
    res = 0
    for k = 1:h
        res += sigma2*phi^(2*k)
    end
    return res
end;


WARNING: Method definition compute_var_Z(Any, Any, Any) in module Main at In[44]:11 overwritten at In[59]:11.

In [60]:
# mean forecasts
n_future = 10*52; # 10 years with 52 weeks
time_future = linspace(2016.753, 2026.753, n_future);
Y_future = T(time_future)+S(time_future);

# error forecasts
var_Z = Array{Float64}(0);
for h = 1:n_future
    var_Z = vcat(var_Z, compute_var_Z(sigma2, phi, h+1))
end

# 95% confidence = 1.96
conf = 1.96*sqrt(var_Z);

df_future = DataFrame(
x=time_future, y=Y_future,
ymin=Y_future - conf,
ymax=Y_future + conf,
f="forecast"
);

In [61]:
plot(vcat(df_raw, df_future), x=:x, y=:y, ymin=:ymin, ymax=:ymax, color=:f, Geom.line, Geom.ribbon,
Coord.cartesian(xmin=2006.753, xmax=2026.753, ymin=380, ymax=430),
Guide.xlabel("Time"),
Guide.ylabel("CO2 concentration (ppm)"))


Out[61]:
Time 1975 1980 1985 1990 1995 2000 2005 2010 2015 2020 2025 2030 2035 2040 2045 2050 2055 2060 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 1980 2000 2020 2040 2060 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2022 2024 2026 2028 2030 2032 2034 2036 2038 2040 2042 2044 2046 2048 raw data forecast f 320 330 340 350 360 370 380 390 400 410 420 430 440 450 460 470 480 490 330 332 334 336 338 340 342 344 346 348 350 352 354 356 358 360 362 364 366 368 370 372 374 376 378 380 382 384 386 388 390 392 394 396 398 400 402 404 406 408 410 412 414 416 418 420 422 424 426 428 430 432 434 436 438 440 442 444 446 448 450 452 454 456 458 460 462 464 466 468 470 472 474 476 478 480 300 350 400 450 500 330 335 340 345 350 355 360 365 370 375 380 385 390 395 400 405 410 415 420 425 430 435 440 445 450 455 460 465 470 475 480 CO2 concentration (ppm)

In [ ]: