This kernel gives a starter over the contents of the Gliding Data dataset.
The dataset includes metadata and calculated phases for over 100000 gliding flights from 2016 to 2019, mostly in the region of France but also Belgium, Switzerland and others. In total there are more than 6 million flights phases recorded.
Gliding is a leisure aviation activity and sport where pilots fly unpowered aircraft known as gliders or sailplanes.
The principle of gliding is to climb using some sort of lift and convert the altitude gained into distance. Repeating this process allows for very long distance flights, often above 500km or even 1000km - the current free distance world record being just over 3000km in Patagonia. This is the same process birds follow to reduce the amount of effort required to travel great distances.
The most used sources of lift include:
with thermalling being by far the most common, and this dataset reflects that.
When flying pilots often carry some sort of GPS recording device which generates a track of points collected every few seconds. Each point contains fields for latitude, longitude and altitude (pressure and GPS) among several others, often stored in IGC format. Often these tracks are uploaded to online competitions where they are shared with other pilots, and can be easily visualized.
The data available in this dataset was scraped from the online competition Netcoupe. The scraping, parsing and analysis was done using goigc, an open source flight parser and analyser.
Let's start by loading the different files in the dataset and taking a peek at the records.
The available files include:
In [ ]:
import datetime
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import random
import os
for dirname, _, filenames in os.walk('/tmp/gliding-data'):
for filename in filenames:
print(os.path.join(dirname, filename))
In [ ]:
flight_websource = pd.read_csv("/tmp/gliding-data/flight_websource.csv")
flight_track = pd.read_csv("/tmp/gliding-data/flight_track.csv")
flight_phases = pd.read_csv("/tmp/gliding-data/phases.csv", skiprows=lambda i: i>0 and random.random() > 0.5)
In [ ]:
flight_websource.head(1)
In [ ]:
flight_track.head(1)
The most useful information comes from the websource file as this information is passed directly by the pilot when submitting the flight to the online competition. Things like Country or Region provide useful statistics on how popular the sport is in different areas. As an example, what are the most popular regions considering the total number of flights? What about the most popular Takeoff location?
In [ ]:
flight_websource.groupby(['Country', 'Region'])['Region'].value_counts().sort_values(ascending=False).head(3)
In [ ]:
flight_websource.groupby(['Country', 'Year', 'Takeoff'])['Takeoff'].value_counts().sort_values(ascending=False).head(3)
The three regions above match the Alps area, which is an expected result given this is a gliding Meca. The second result shows Vinon as the most popular takeoff in 2016, a big club also in the Southern Alps. But it's interesting to notice that more recently a club near Montpellier took over as the one with the most activity in terms of number of flights.
Gliding is a seasonal activity peaking in summer months.
In [ ]:
flight_websource['DayOfWeek'] = flight_websource.apply(lambda r: datetime.datetime.strptime(r['Date'], "%Y-%m-%dT%H:%M:%SZ").strftime("%A"), axis=1)
flight_websource.groupby(['DayOfWeek'])['DayOfWeek'].count().plot.bar()
In [ ]:
flight_websource['Month'] = flight_websource.apply(lambda r: datetime.datetime.strptime(r['Date'], "%Y-%m-%dT%H:%M:%SZ").strftime("%m"), axis=1)
flight_websource.groupby(['Month'])['Month'].count().plot.bar()
We can get additional flight metadata information from the flight_track file, but you can expect this data to be less reliable as it's often the case that the metadata in the flight recorder is not updated before a flight. It is very useful though to know more about what type of recorder was used, calibration settings, etc. It is also the source of the flight tracks used to generate the data in the phases file.
In some cases we want to handle columns from both flight metadata files, so it's useful to join the two sets. We can rely on the ID for this purpose.
In [ ]:
flight_all = pd.merge(flight_websource, flight_track, how='left', on='ID')
flight_all.head(1)
In addition to the flight metadata provided in the files above, by analysing the GPS flight tracks we can generate a lot more interesting data.
Here we take a look at flight phases, calculated using the goigc tool. As described earlier to travel further glider pilots use thermals to gain altitude and then convert that altitude into distance. In the phases file we have a record of each individual phase detected for each of the 100k flights, and we'll focus on:
These are indicated by the integer field Type below. Each phase has a set of additional fields with relevant statistics for each phase type: while circling the average climb rate (vario) and duration are interesting; while cruising the distance covered and LD (glide ratio) are more interesting.
In [ ]:
flight_phases.head(1)
As a quick example of what is possible with this kind of data let's take try to map all circling phases as a HeatMap.
First we need to do some treatment of the data: convert coordinates from radians to degrees, filter out unreasonable values (climb rates above 15m/s are due to errors in the recording device), convert the date to the expected format and desired grouping. In this case we're grouping all thermal phases by week.
In [ ]:
phases = pd.merge(flight_phases, flight_websource[['TrackID', 'Distance', 'Speed']], on='TrackID')
phases['Lat'] = np.rad2deg(phases['CentroidLatitude'])
phases['Lng'] = np.rad2deg(phases['CentroidLongitude'])
phases_copy = phases[phases.Type==5][phases.AvgVario<10][phases.AvgVario>2].copy()
phases_copy.head(2)
#phases_copy['AM'] = phases_copy.apply(lambda r: datetime.datetime.strptime(r['StartTime'], "%Y-%m-%dT%H:%M:%SZ").strftime("%p"), axis=1)
#phases_copy['Day'] = phases_copy.apply(lambda r: datetime.datetime.strptime(r['StartTime'], "%Y-%m-%dT%H:%M:%SZ").strftime("%j"), axis=1)
#phases_copy['Week'] = phases_copy.apply(lambda r: datetime.datetime.strptime(r['StartTime'], "%Y-%m-%dT%H:%M:%SZ").strftime("%W"), axis=1)
#phases_copy['Month'] = phases_copy.apply(lambda r: r['StartTime'][5:7], axis=1)
#phases_copy['Year'] = phases_copy.apply(lambda r: r['StartTime'][0:4], axis=1)
#phases_copy['YearMonth'] = phases_copy.apply(lambda r: r['StartTime'][0:7], axis=1)
#phases_copy['YearMonthDay'] = phases_copy.apply(lambda r: r['StartTime'][0:10], axis=1)
# use the corresponding function above to update the grouping to something other than week
phases_copy['Group'] = phases_copy.apply(lambda r: datetime.datetime.strptime(r['StartTime'], "%Y-%m-%dT%H:%M:%SZ").strftime("%W"), axis=1)
phases_copy.head(1)
In [ ]:
# This is a workaround for this known issue:
# https://github.com/python-visualization/folium/issues/812#issuecomment-582213307
!pip install git+https://github.com/python-visualization/branca
!pip install git+https://github.com/sknzl/folium@update-css-url-to-https
In [ ]:
import folium
from folium import plugins
from folium import Choropleth, Circle, Marker
from folium.plugins import HeatMap, HeatMapWithTime, MarkerCluster
# folium.__version__ # should be '0.10.1+8.g4ea1307'
# folium.branca.__version__ # should be '0.4.0+4.g6ac241a'
In [ ]:
# we use a smaller sample to improve the visualization
# a better alternative is to group entries by CellID, an example of this will be added later
phases_single = phases_copy.sample(frac=0.01, random_state=1)
m_5 = folium.Map(location=[47.06318, 5.41938], tiles='stamen terrain', zoom_start=7)
HeatMap(
phases_single[['Lat','Lng','AvgVario']], gradient={0.5: 'blue', 0.7: 'yellow', 1: 'red'},
min_opacity=5, max_val=phases_single.AvgVario.max(), radius=4, max_zoom=7, blur=4, use_local_extrema=False).add_to(m_5)
m_5
In [ ]:
m_5 = folium.Map(location=[47.06318, 5.41938], tiles='stamen terrain', zoom_start=7)
groups = phases_copy.Group.sort_values().unique()
data = []
for g in groups:
data.append(phases_copy.loc[phases_copy.Group==g,['Group','Lat','Lng','AvgVario']].groupby(['Lat','Lng']).sum().reset_index().values.tolist())
HeatMapWithTime(
data,
index = list(phases_copy.Group.sort_values().unique()),
gradient={0.1: 'blue', 0.3: 'yellow', 0.8: 'red'},
auto_play=True, scale_radius=False, display_index=True, radius=4, min_speed=1, max_speed=6, speed_step=1,
min_opacity=1, max_opacity=phases_copy.AvgVario.max(), use_local_extrema=True).add_to(m_5)
m_5
In [ ]: