Quick and incomplete Extreme Value Theory introduction

Extreme Value Theory (EVT) is unique as a statistical discipline in that it develops techniques and models for describing the unusual rather than the usual, e.g., it is focused in the tail of the distribution.

By definition, extreme values are scarce, meaning that estimates are often required for levels of a process that are much greater than have already been observed. This implies an extrapolation from an small set of observed levels to unobserved levels. Extreme Value Theory provides models to enable such extrapolation.

Fields of interest

Applications of extreme value theory include predicting the probability distribution of:

  • Several engineering design processes:

    • Hydraulics engineering (extreme floods,...)
    • Structural engineering (earthquakes, wind speed,...)
  • Meteorology

    • Extreme temperatures, rainfall, Hurricanes...
  • Ocean engineering

    • The size of freak waves (wave height)
  • Environmental sciences

    • Large wildfires
    • Environmental loads on structures
  • Insurance industry

    • The amounts of large insurance losses, portfolio adjustment,...
  • Financial industry

    • Equity risks
    • Day to day market risk
    • Stock market crashes
  • Material sciences

    • Corrosion analysis
    • Strenght of materials
  • Telecommunications

    • Traffic prediction
  • Biology

    • Mutational events during evolution
    • Memory cell failure
  • ...

A brief history of Extreme Value Theory

One of the earliest books on the statistics of extreme values is E.J. Gumbel (1958). Research into extreme values as a subject in it’s own right began between 1920 and 1940 when work by E.L. Dodd, M. Frêchet, E.J. Gumbel, R. von Mises and L.H.C. Tippett investigated the asymptotic distribution of the largest order statistic. This led to the main theoretical result: the Extremal Types Theorem (also known as the Fisher–Tippett–Gnedenko theorem, the Fisher–Tippett theorem or the extreme value theorem) which was developed in stages by Fisher, Tippett and von Mises, and eventually proved in general by B. Gnedenko in 1943.

Until 1950, development was largely theoretical. In 1958, Gumbel started applying theory to problems in engineering. In the 1970s, L. de Haan, Balkema and J. Pickands generalised the theoretical results (the second theorem in extreme value theory), giving a better basis for statistical models.

Since the 1980s, methods for the application of Extreme Value Theory have become much more widespread.

General approaches to estimate extreme values

There are two primary approaches to analyzing extremes of a dataset:

  • The first and more classical approach reduces the data considerably by taking maxima of long blocks of data, e.g., annual maxima. The generalized extreme value (GEV) distribution function has theoretical justification for fitting to block maxima of data.

  • The second approach is to analyze excesses over a high threshold. For this second approach the generalized Pareto (GP) distribution function has similar justification for fitting to excesses over a high threshold.

Block-Maxima + Generalised Extreme Value (GEV) and Gumbel distribution

The generalized extreme value (GEV) family of distribution functions has theoretical support for fitting to block maximum data whereby the blocks are sufficiently large, and is given by:

$$G(z;\mu, \sigma, \xi) = exp\{-[1+\xi\frac{z - \mu}{\sigma}]^{-1/\xi}\}$$

The parameters $\mu$ ($-\infty < \mu < \infty$), $\sigma$ ($\sigma > 0$) and $\xi$ ($\infty < \xi < \infty$) are location, scale and shape parameters, respectively. The value of the shape parameter $\xi$ differentiates between the three types of extreme value distribution in Extremal Types Theorem (also known as the Fisher–Tippett–Gnedenko theorem, the Fisher–Tippett theorem or the extreme value theorem).

  • $\xi = 0$, leading to, corresponds to the Gumbel distribution (type I). This special case can be formulated as
$$G(z;\mu, \sigma) = exp\{-exp(\frac{z - \mu}{\sigma})\}$$
  • $\xi > 0$ correspond to the Frêchet (type II) and

  • $\xi < 0$ correspond to the Weibull (type III)

distributions respectively. In practice, when we estimate the shape parameter $\xi$, the standard error for $\xi$ accounts for our uncertainty in choosing between the three models.

Peak-Over-Threshold (POT) + Generalised Pareto (GP) distribution

TODO

TODO

TODO

TODO

References used to prepare this section