Flight delays (capstone project)

Summary of data wrangling process

1. Original files (raw data)

1.1 - CSV with data of the flights: monthly-basis departures and arrivals

These raw data files are available in CSV format at website of brazilian Federal Agency for Civil Aviation (ANAC): http://www.anac.gov.br/assuntos/dados-e-estatisticas/historico-de-voos. Last access: nov, 2017.

These files has theses information:

1.2 - Local airports (list with all the ~600 brazilian public airports)

A XLSX spreadsheet is available, also, at ANAC's website: Link: http://www.anac.gov.br/assuntos/setor-regulado/aerodromos/cadastro-de-aerodromos/aerodromos-cadastrados/aerodromospublicos-12.xls. Last access on nov, 15th, 2017 ("aerodromospublicos-12.xls")

I opened this file with MS Excel and saved it as CSV. The columns of this file:

1.3 - Weather data (historical data)

I could not find any local dataset with easy-to-retrieve and open format weather historical data. So I grabbed the weather historical data from Weather Underground website.

A example with 2016 weather data from Campinas, SP, Brazil (my city): https://www.wunderground.com/history/airport/SBKP/2016/1/1/CustomHistory.html?dayend=31&monthend=12&yearend=2016&req_city=&req_state=&req_statename=&reqdb.zip=&reqdb.magic=&reqdb.wmo=

I Ctrl-C the data section and Ctrl-V on a Google spreadsheet. I moved the last column that shows the "Weather Events" (Rain, Thunderstorms etc.) and these data are presented like this:

1.4 - List of codes (two letters) used when there was a flight delay (departure)

A XLS file with two-letters justification code that is used by the aviation crew to "explain" the delay of a flight:

Link from ANAC's website: http://www.anac.gov.br/assuntos/dados-e-estatisticas/vra/glossario_de_justificativas.xls. Last access: nov, 20th, 2017.

These codes are used in the CSV with data of the flights (see section 1.1, column "Delay Code").

2. Adjusting some data

In order to start the analysis it was necessary to adjust some data from these files.

In the flights dataset, for instance, we'd found date/time in a not ISO format (yyyy/mm/dd HH:MM:SS);
The analysis will consider only the local flights: airport_A and airport_B in Brazil. It is necessary, then, to identify which airports (A or B) are in Brazil. A new column 'local_flight' will be added to flights dataset with values '1' (local flight) or '0' (A or B are overseas). This information will come as a result of a merge of the flight dataset with brazilian_airport dataset: if both 'airpot_A' and 'airport_B' are in brazilian_airport, 'local_flight' = '1', otherwise '0'.
There was relevant info in portuguese, so, for clarification of this analysis, I tried to translate some of these information
I translated to english the file header (columns' names)

2.1 - Flights

Adjust the date format to ISO, status of the flight, identify local (brazilian) airports

In order to demonstrate this I create a sample file from original dataset with 20 records: "arfsample-date.csv".



In [1]:

    
from datetime import datetime

# Pandas and NumPy
import pandas as pd
import numpy as np



In [2]:

    
# Read the CSV with flights records (separation = ";")
flights = pd.read_csv('data/arfsample-date.csv', sep = ';', dtype = str)



In [3]:

    
flights.head()









    Out[3]:







  
    
      
      airline
      flight-code
      auth-code
      line-code
      airport-A
      airport-B
      departure-est
      departure-real
      arrival-est
      arrival-real
      flight-status
      delay-code
    
  
  
    
      0
      AAL
      213
      0
      I
      KMIA
      SBBR
      02/01/2017 01:20
      02/01/2017 01:20
      02/01/2017 09:00
      02/01/2017 09:00
      REALIZADO
      NaN
    
    
      1
      AAL
      213
      0
      I
      KMIA
      SBBR
      24/01/2017 01:20
      24/01/2017 01:20
      24/01/2017 09:00
      24/01/2017 09:00
      REALIZADO
      NaN
    
    
      2
      TAM
      8113
      0
      I
      MMMX
      SBGR
      04/05/2017 20:25
      NaN
      05/05/2017 06:00
      NaN
      CANCELADO
      XN
    
    
      3
      TAM
      8113
      0
      I
      MMMX
      SBGR
      31/05/2017 20:25
      31/05/2017 20:25
      01/06/2017 06:00
      01/06/2017 06:00
      REALIZADO
      NaN
    
    
      4
      TAM
      8113
      0
      I
      MMMX
      SBGR
      16/05/2017 20:25
      16/05/2017 20:25
      17/05/2017 06:00
      17/05/2017 06:00
      REALIZADO
      NaN



In [4]:

    
# Lambda function
# 1 - Used to adjust date columns to ISO format
# flights['column-with-date'] = flights[['column-with-date']].apply(lambda row: \
#                                                      row.str.replace("(?P<day>\d{2})/\
#                                                                       (?P<month>\d{2})/\
#                                                                       (?P<year>\d{4}) \
#                                                                       (?P<HOUR>\d{2}):\
#                                                                       (?P<MIN>\d{2})", \
#                                                                      "\g<year>/\g<month>/\g<day> \g<HOUR>:\g<MIN>:00"), axis=1)
#
# 2- Used to translate to english the 'flight-status': 
# - 'REALIZADO'-> 'ACCOMPLISHED'
# - 'CANCELADO'-> 'CANCELED'
# ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
flights['departure-est'] = flights[['departure-est']].apply(lambda row: row.str.replace("(?P<day>\d{2})/(?P<month>\d{2})/(?P<year>\d{4}) (?P<HOUR>\d{2}):(?P<MIN>\d{2})", "\g<year>/\g<month>/\g<day> \g<HOUR>:\g<MIN>:00"), axis=1)
flights['departure-real'] = flights[['departure-real']].apply(lambda row: row.str.replace("(?P<day>\d{2})/(?P<month>\d{2})/(?P<year>\d{4}) (?P<HOUR>\d{2}):(?P<MIN>\d{2})", "\g<year>/\g<month>/\g<day> \g<HOUR>:\g<MIN>:00"), axis=1)
flights['arrival-est'] = flights[['arrival-est']].apply(lambda row: row.str.replace("(?P<day>\d{2})/(?P<month>\d{2})/(?P<year>\d{4}) (?P<HOUR>\d{2}):(?P<MIN>\d{2})", "\g<year>/\g<month>/\g<day> \g<HOUR>:\g<MIN>:00"), axis=1)
flights['arrival-real'] = flights[['arrival-real']].apply(lambda row: row.str.replace("(?P<day>\d{2})/(?P<month>\d{2})/(?P<year>\d{4}) (?P<HOUR>\d{2}):(?P<MIN>\d{2})", "\g<year>/\g<month>/\g<day> \g<HOUR>:\g<MIN>:00"), axis=1)



In [5]:

    
flights.dtypes









    Out[5]:





airline           object
flight-code       object
auth-code         object
line-code         object
airport-A         object
airport-B         object
departure-est     object
departure-real    object
arrival-est       object
arrival-real      object
flight-status     object
delay-code        object
dtype: object

Now, it is time to covert dates from 'object' to 'date' format



In [6]:

    
# Departure and Arrival columns: from 'object' to 'date' format
flights['departure-est'] = pd.to_datetime(flights['departure-est'],errors='ignore')
flights['departure-real'] = pd.to_datetime(flights['departure-real'],errors='ignore')
flights['arrival-est'] = pd.to_datetime(flights['arrival-est'],errors='ignore')
flights['arrival-real'] = pd.to_datetime(flights['arrival-real'],errors='ignore')



In [7]:

    
flights.dtypes









    Out[7]:





airline                   object
flight-code               object
auth-code                 object
line-code                 object
airport-A                 object
airport-B                 object
departure-est     datetime64[ns]
departure-real    datetime64[ns]
arrival-est       datetime64[ns]
arrival-real      datetime64[ns]
flight-status             object
delay-code                object
dtype: object

Status of the flight: The content of 'status_flight' will be translated to english:



In [8]:

    
# translate the flight status from portuguese to english
flights['flight-status'] = flights[['flight-status']].apply(lambda row: row.str.replace("REALIZADO", "ACCOMPLISHED"), axis=1)
flights['flight-status'] = flights[['flight-status']].apply(lambda row: row.str.replace("CANCELADO", "CANCELED"), axis=1)

The result, so far:



In [9]:

    
flights[['departure-est', 'departure-real', 'arrival-est', 'arrival-real', 'flight-status']].head()









    Out[9]:







  
    
      
      departure-est
      departure-real
      arrival-est
      arrival-real
      flight-status
    
  
  
    
      0
      2017-01-02 01:20:00
      2017-01-02 01:20:00
      2017-01-02 09:00:00
      2017-01-02 09:00:00
      ACCOMPLISHED
    
    
      1
      2017-01-24 01:20:00
      2017-01-24 01:20:00
      2017-01-24 09:00:00
      2017-01-24 09:00:00
      ACCOMPLISHED
    
    
      2
      2017-05-04 20:25:00
      NaT
      2017-05-05 06:00:00
      NaT
      CANCELED
    
    
      3
      2017-05-31 20:25:00
      2017-05-31 20:25:00
      2017-06-01 06:00:00
      2017-06-01 06:00:00
      ACCOMPLISHED
    
    
      4
      2017-05-16 20:25:00
      2017-05-16 20:25:00
      2017-05-17 06:00:00
      2017-05-17 06:00:00
      ACCOMPLISHED

Some EDA (tests)



In [32]:

    
flights.groupby(['airport-A']).size().reset_index(name='number_departures')









    Out[32]:







  
    
      
      airport-A
      number_departures
    
  
  
    
      0
      KMIA
      2
    
    
      1
      MMMX
      5
    
    
      2
      SBFZ
      2
    
    
      3
      SBGR
      5
    
    
      4
      SBSP
      6



In [ ]:

2.2 - Merge Flights with local airports (list with all the ~600 brazilian public airports) and identify the local flights

The local airports dataset is not a sample. This CSV file contains all 589 brazilian public airports. There are other airports in Brazil but, in this analysis, only the public airports will be considered.



In [10]:

    
# Airports dataset: all brazilian public airports (updated until october, 2017)
airports = pd.read_csv('data/brazilianPublicAirports-out2017.csv', sep = ';')



In [11]:

    
airports.head()









    Out[11]:







  
    
      
      airport
      airportname
      city
      state
      latitude
      longitude
      altitude
      operation
    
  
  
    
      0
      SBAA
      CONCEIÇÃO DO ARAGUAIA
      CONCEIÇÃO DO ARAGUAIA
      PA
      8° 20' 55'' S
      49° 18' 11'' W
      199 m
      VFR Diurno/Noturno e IFR Diurno
    
    
      1
      SBAE
      BAURU/AREALVA
      BAURU E AREALVA
      SP
      22° 9' 28'' S
      49° 4' 6'' W
      594 m
      VFR Diurno/Noturno e IFR Diurno/Noturno
    
    
      2
      SBAM
      AMAPÁ
      AMAPÁ
      AP
      2° 4' 22'' N
      50° 51' 45'' W
      14 m
      VFR Diurno e IFR Diurno
    
    
      3
      SBAQ
      BARTOLOMEU DE GUSMÃO
      ARARAQUARA
      SP
      21° 48' 16'' S
      48° 8' 25'' W
      711 m
      VFR Diurno/Noturno e IFR Diurno/Noturno
    
    
      4
      SBAR
      SANTA MARIA
      ARACAJU
      SE
      10° 59' 7'' S
      37° 4' 24'' W
      7 m
      VFR Diurno/Noturno e IFR Diurno/Noturno



In [12]:

    
# Merge "flights" dataset with "airports" in order to identify 
#   local flights (origin and destination are in Brazil)
flights = pd.merge(flights, airports, left_on="airport-A", right_on="airport", how='left')
flights = pd.merge(flights, airports, left_on="airport-B", right_on="airport", how='left')



In [13]:

    
flights.head()









    Out[13]:







  
    
      
      airline
      flight-code
      auth-code
      line-code
      airport-A
      airport-B
      departure-est
      departure-real
      arrival-est
      arrival-real
      ...
      altitude_x
      operation_x
      airport_y
      airportname_y
      city_y
      state_y
      latitude_y
      longitude_y
      altitude_y
      operation_y
    
  
  
    
      0
      AAL
      213
      0
      I
      KMIA
      SBBR
      2017-01-02 01:20:00
      2017-01-02 01:20:00
      2017-01-02 09:00:00
      2017-01-02 09:00:00
      ...
      NaN
      NaN
      SBBR
      PRESIDENTE JUSCELINO KUBITSCHEK
      BRASÍLIA
      DF
      15° 52' 16'' S
      47° 55' 7'' W
      1066 m
      VFR Diurno/Noturno e IFR Diurno/Noturno
    
    
      1
      AAL
      213
      0
      I
      KMIA
      SBBR
      2017-01-24 01:20:00
      2017-01-24 01:20:00
      2017-01-24 09:00:00
      2017-01-24 09:00:00
      ...
      NaN
      NaN
      SBBR
      PRESIDENTE JUSCELINO KUBITSCHEK
      BRASÍLIA
      DF
      15° 52' 16'' S
      47° 55' 7'' W
      1066 m
      VFR Diurno/Noturno e IFR Diurno/Noturno
    
    
      2
      TAM
      8113
      0
      I
      MMMX
      SBGR
      2017-05-04 20:25:00
      NaT
      2017-05-05 06:00:00
      NaT
      ...
      NaN
      NaN
      SBGR
      GUARULHOS - GOVERNADOR ANDRÉ FRANCO MONTORO
      GUARULHOS
      SP
      23° 26' 8'' S
      46° 28' 23'' W
      750 m
      VFR Diurno/Noturno e IFR Diurno/Noturno
    
    
      3
      TAM
      8113
      0
      I
      MMMX
      SBGR
      2017-05-31 20:25:00
      2017-05-31 20:25:00
      2017-06-01 06:00:00
      2017-06-01 06:00:00
      ...
      NaN
      NaN
      SBGR
      GUARULHOS - GOVERNADOR ANDRÉ FRANCO MONTORO
      GUARULHOS
      SP
      23° 26' 8'' S
      46° 28' 23'' W
      750 m
      VFR Diurno/Noturno e IFR Diurno/Noturno
    
    
      4
      TAM
      8113
      0
      I
      MMMX
      SBGR
      2017-05-16 20:25:00
      2017-05-16 20:25:00
      2017-05-17 06:00:00
      2017-05-17 06:00:00
      ...
      NaN
      NaN
      SBGR
      GUARULHOS - GOVERNADOR ANDRÉ FRANCO MONTORO
      GUARULHOS
      SP
      23° 26' 8'' S
      46° 28' 23'' W
      750 m
      VFR Diurno/Noturno e IFR Diurno/Noturno
    
  

5 rows × 28 columns

--- Here I got stuck ---:

1) After the merge of 'flights' and 'airports' datasets,

2) I created a new columun: 'local_flight';

3) I had tried to define a value (int or boolean) to this column, based on the content of columns 'airport_x' (departure) and 'airport_y' (arrival): if there is information in both columns, it means that we have a local flight record (A and B are brazilian airports)



In [14]:

    
# Create a new column 'local_flight' with '0' value as default
flights["local_flight"] = 0



In [15]:

    
flights.airport_x = flights.airport_x.fillna('')
flights.airport_y = flights.airport_y.fillna('')



In [16]:

    
flights[["airport_x", "airport_y", "local_flight"]]









    Out[16]:







  
    
      
      airport_x
      airport_y
      local_flight
    
  
  
    
      0
      
      SBBR
      0
    
    
      1
      
      SBBR
      0
    
    
      2
      
      SBGR
      0
    
    
      3
      
      SBGR
      0
    
    
      4
      
      SBGR
      0
    
    
      5
      
      SBGR
      0
    
    
      6
      
      SBGR
      0
    
    
      7
      SBSP
      SBPA
      0
    
    
      8
      SBSP
      SBPA
      0
    
    
      9
      SBSP
      SBPA
      0
    
    
      10
      SBSP
      SBPA
      0
    
    
      11
      SBSP
      SBPA
      0
    
    
      12
      SBSP
      SBPA
      0
    
    
      13
      SBGR
      
      0
    
    
      14
      SBGR
      
      0
    
    
      15
      SBGR
      
      0
    
    
      16
      SBGR
      
      0
    
    
      17
      SBGR
      
      0
    
    
      18
      SBFZ
      
      0
    
    
      19
      SBFZ
      
      0



In [17]:

    
for index,row in flights.iterrows():
    if ((row['airport_x'] != '') & (row['airport_y'] != '')):
        flights['local_flight'] = 1



In [18]:

    
flights[['airport_x', 'airport_y', 'local_flight']]









    Out[18]:







  
    
      
      airport_x
      airport_y
      local_flight
    
  
  
    
      0
      
      SBBR
      1
    
    
      1
      
      SBBR
      1
    
    
      2
      
      SBGR
      1
    
    
      3
      
      SBGR
      1
    
    
      4
      
      SBGR
      1
    
    
      5
      
      SBGR
      1
    
    
      6
      
      SBGR
      1
    
    
      7
      SBSP
      SBPA
      1
    
    
      8
      SBSP
      SBPA
      1
    
    
      9
      SBSP
      SBPA
      1
    
    
      10
      SBSP
      SBPA
      1
    
    
      11
      SBSP
      SBPA
      1
    
    
      12
      SBSP
      SBPA
      1
    
    
      13
      SBGR
      
      1
    
    
      14
      SBGR
      
      1
    
    
      15
      SBGR
      
      1
    
    
      16
      SBGR
      
      1
    
    
      17
      SBGR
      
      1
    
    
      18
      SBFZ
      
      1
    
    
      19
      SBFZ
      
      1

Moving on...

2.3 - The Weather data from https://www.wunderground.com/history

From this website I captured a sample data from local airport (Campinas, SP, Brazil) during the year of 2016.

The website presents data like this (see https://goo.gl/SqKfmP):

I performed the following steps in Google Spreadsheet:

I copied (Ctr+C) this data and I created a Google spreadsheet (Ctr+V) with this data (see section 1.3 above);
I "moved" the column that shows weather special events (thunderstorms, rain etc.) and I positioned just before de "Temperature" column;
In order to handle the date information I add the correct year and month to each row.

The result is this CSV (see https://docs.google.com/spreadsheets/d/1-AYWwt4coeJHhq4RSKwYTIXdov4MHxraQb7-ayPCaK4/edit?usp=sharing):



In [19]:

    
# Weather sample: load the CSV with weather historical data (from Campinas, SP, Brazil, 2016)
weather = pd.read_csv('data/DataScience-Intensive-weatherAtCampinasAirport-2016-Campinas_Airport_2016Weather.csv', \
                      sep = ',', dtype = str)



In [20]:

    
weather.head()









    Out[20]:







  
    
      
      year
      month
      day
      weather-event
      tempC-high
      tempC-avg
      tempC-low
      dewpoint-high
      dewpoint-avg
      dewpoint-low
      ...
      sealevelpress-high
      sealevelpress-avg
      sealevelpress-low
      visibilty-km-high
      visibilty-km-avg
      visibilty-km-low
      wind-km-h-high
      wind-km-h-avg
      precipitation-mm-high
      precipitation-mm-sum
    
  
  
    
      0
      2016
      1
      1
      Rain , Thunderstorm
      30
      26
      21
      22
      21
      20
      ...
      1016
      1014
      1012
      10
      10
      8
      21
      10
      -
      0.00
    
    
      1
      2016
      1
      2
      Rain , Thunderstorm
      28
      24
      22
      23
      21
      20
      ...
      1017
      1015
      1013
      10
      9
      3
      24
      10
      45
      0.00
    
    
      2
      2016
      1
      3
      Rain
      26
      23
      22
      22
      20
      19
      ...
      1017
      1015
      1013
      10
      10
      8
      26
      14
      -
      0.00
    
    
      3
      2016
      1
      4
      NaN
      28
      24
      20
      19
      18
      17
      ...
      1017
      1016
      1013
      10
      10
      10
      32
      19
      -
      0.00
    
    
      4
      2016
      1
      5
      NaN
      31
      24
      19
      19
      17
      14
      ...
      1020
      1017
      1014
      10
      10
      10
      29
      16
      -
      0.00
    
  

5 rows × 23 columns



In [21]:

    
weather.dtypes









    Out[21]:





year                     object
month                    object
day                      object
weather-event            object
tempC-high               object
tempC-avg                object
tempC-low                object
dewpoint-high            object
dewpoint-avg             object
dewpoint-low             object
humidity-perc-high       object
humidity-perc-avg        object
humidity-perc-low        object
sealevelpress-high       object
sealevelpress-avg        object
sealevelpress-low        object
visibilty-km-high        object
visibilty-km-avg         object
visibilty-km-low         object
wind-km-h-high           object
wind-km-h-avg            object
precipitation-mm-high    object
precipitation-mm-sum     object
dtype: object



In [22]:

    
weather["date"] = weather["year"].map(str) + "-" + weather["month"].map(str) + "-" + weather["day"].map(str)



In [23]:

    
weather["date"] = pd.to_datetime(weather['date'],errors='ignore')



In [24]:

    
weather.dtypes









    Out[24]:





year                             object
month                            object
day                              object
weather-event                    object
tempC-high                       object
tempC-avg                        object
tempC-low                        object
dewpoint-high                    object
dewpoint-avg                     object
dewpoint-low                     object
humidity-perc-high               object
humidity-perc-avg                object
humidity-perc-low                object
sealevelpress-high               object
sealevelpress-avg                object
sealevelpress-low                object
visibilty-km-high                object
visibilty-km-avg                 object
visibilty-km-low                 object
wind-km-h-high                   object
wind-km-h-avg                    object
precipitation-mm-high            object
precipitation-mm-sum             object
date                     datetime64[ns]
dtype: object



In [ ]:

	airline	flight-code	line-code	airport-A	airport-B	departure-est	departure-real	arrival-est	arrival-real	flight-status	delay-code
0	AAL	213	I	KMIA	SBBR	02/01/2017 01:20	02/01/2017 01:20	02/01/2017 09:00	02/01/2017 09:00	REALIZADO	NaN
1	AAL	213	I	KMIA	SBBR	24/01/2017 01:20	24/01/2017 01:20	24/01/2017 09:00	24/01/2017 09:00	REALIZADO	NaN
2	TAM	8113	I	MMMX	SBGR	04/05/2017 20:25	NaN	05/05/2017 06:00	NaN	CANCELADO	XN
3	TAM	8113	I	MMMX	SBGR	31/05/2017 20:25	31/05/2017 20:25	01/06/2017 06:00	01/06/2017 06:00	REALIZADO	NaN
4	TAM	8113	I	MMMX	SBGR	16/05/2017 20:25	16/05/2017 20:25	17/05/2017 06:00	17/05/2017 06:00	REALIZADO	NaN

	departure-est	departure-real	arrival-est	arrival-real	flight-status
0	2017-01-02 01:20:00	2017-01-02 01:20:00	2017-01-02 09:00:00	2017-01-02 09:00:00	ACCOMPLISHED
1	2017-01-24 01:20:00	2017-01-24 01:20:00	2017-01-24 09:00:00	2017-01-24 09:00:00	ACCOMPLISHED
2	2017-05-04 20:25:00	NaT	2017-05-05 06:00:00	NaT	CANCELED
3	2017-05-31 20:25:00	2017-05-31 20:25:00	2017-06-01 06:00:00	2017-06-01 06:00:00	ACCOMPLISHED
4	2017-05-16 20:25:00	2017-05-16 20:25:00	2017-05-17 06:00:00	2017-05-17 06:00:00	ACCOMPLISHED

	airport	airportname	city	state	latitude	longitude	altitude	operation
0	SBAA	CONCEIÇÃO DO ARAGUAIA	CONCEIÇÃO DO ARAGUAIA	PA	8° 20' 55'' S	49° 18' 11'' W	199 m	VFR Diurno/Noturno e IFR Diurno
1	SBAE	BAURU/AREALVA	BAURU E AREALVA	SP	22° 9' 28'' S	49° 4' 6'' W	594 m	VFR Diurno/Noturno e IFR Diurno/Noturno
2	SBAM	AMAPÁ	AMAPÁ	AP	2° 4' 22'' N	50° 51' 45'' W	14 m	VFR Diurno e IFR Diurno
3	SBAQ	BARTOLOMEU DE GUSMÃO	ARARAQUARA	SP	21° 48' 16'' S	48° 8' 25'' W	711 m	VFR Diurno/Noturno e IFR Diurno/Noturno
4	SBAR	SANTA MARIA	ARACAJU	SE	10° 59' 7'' S	37° 4' 24'' W	7 m	VFR Diurno/Noturno e IFR Diurno/Noturno

	year	month	day	weather-event	tempC-high	tempC-avg	tempC-low	dewpoint-high	dewpoint-avg	dewpoint-low	...	sealevelpress-high	sealevelpress-avg	sealevelpress-low	visibilty-km-high	visibilty-km-avg	visibilty-km-low	wind-km-h-high	wind-km-h-avg	precipitation-mm-high
0	2016	1	1	Rain , Thunderstorm	30	26	21	22	21	20	...	1016	1014	1012	10	10	8	21	10	-
1	2016	1	2	Rain , Thunderstorm	28	24	22	23	21	20	...	1017	1015	1013	10	9	3	24	10	45
2	2016	1	3	Rain	26	23	22	22	20	19	...	1017	1015	1013	10	10	8	26	14	-
3	2016	1	4	NaN	28	24	20	19	18	17	...	1017	1016	1013	10	10	10	32	19	-
4	2016	1	5	NaN	31	24	19	19	17	14	...	1020	1017	1014	10	10	10	29	16	-