In [1]:
from google.colab import drive
drive.mount('/content/drive')


Mounted at /content/drive

In [2]:
# .kaggleというフォルダをColab上に作成
!mkdir -p ~/.kaggle

# .kaggelフォルダにコピーし、権限を変更
!cp /content/drive/'My Drive'/Kaggle/kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

!ls /root/.kaggle


kaggle.json

In [3]:
!pip install kaggle


Requirement already satisfied: kaggle in /usr/local/lib/python3.6/dist-packages (1.5.2)
Requirement already satisfied: urllib3<1.23.0,>=1.15 in /usr/local/lib/python3.6/dist-packages (from kaggle) (1.22)
Requirement already satisfied: six>=1.10 in /usr/local/lib/python3.6/dist-packages (from kaggle) (1.11.0)
Requirement already satisfied: certifi in /usr/local/lib/python3.6/dist-packages (from kaggle) (2018.11.29)
Requirement already satisfied: python-dateutil in /usr/local/lib/python3.6/dist-packages (from kaggle) (2.5.3)
Requirement already satisfied: requests in /usr/local/lib/python3.6/dist-packages (from kaggle) (2.18.4)
Requirement already satisfied: tqdm in /usr/local/lib/python3.6/dist-packages (from kaggle) (4.28.1)
Requirement already satisfied: python-slugify in /usr/local/lib/python3.6/dist-packages (from kaggle) (2.0.1)
Requirement already satisfied: idna<2.7,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests->kaggle) (2.6)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests->kaggle) (3.0.4)
Requirement already satisfied: Unidecode>=0.04.16 in /usr/local/lib/python3.6/dist-packages (from python-slugify->kaggle) (1.0.23)

In [4]:
!kaggle competitions list


ref                                            deadline             category            reward  teamCount  userHasEntered  
---------------------------------------------  -------------------  ---------------  ---------  ---------  --------------  
digit-recognizer                               2030-01-01 00:00:00  Getting Started  Knowledge       2562           False  
titanic                                        2030-01-01 00:00:00  Getting Started  Knowledge       9991            True  
house-prices-advanced-regression-techniques    2030-01-01 00:00:00  Getting Started  Knowledge       4120            True  
imagenet-object-localization-challenge         2029-12-31 07:00:00  Research         Knowledge         34           False  
competitive-data-science-predict-future-sales  2019-12-31 23:59:00  Playground           Kudos       2330           False  
two-sigma-financial-news                       2019-07-15 23:59:00  Featured          $100,000       2927           False  
LANL-Earthquake-Prediction                     2019-06-03 23:59:00  Research           $50,000       1125            True  
tmdb-box-office-prediction                     2019-05-30 23:59:00  Playground       Knowledge        176           False  
dont-overfit-ii                                2019-05-07 23:59:00  Playground            Swag        515           False  
gendered-pronoun-resolution                    2019-04-22 23:59:00  Research           $25,000        181           False  
santander-customer-transaction-prediction      2019-04-10 23:59:00  Featured           $65,000        352           False  
womens-machine-learning-competition-2019       2019-04-09 23:59:00  Featured           $25,000          7           False  
mens-machine-learning-competition-2019         2019-04-08 23:59:00  Featured           $25,000          8           False  
histopathologic-cancer-detection               2019-03-30 23:59:00  Playground       Knowledge        615           False  
petfinder-adoption-prediction                  2019-03-28 23:59:00  Featured           $25,000       1136           False  
vsb-power-line-fault-detection                 2019-03-21 23:59:00  Featured           $25,000        926           False  
microsoft-malware-prediction                   2019-03-13 23:59:00  Research           $25,000       1700           False  
humpback-whale-identification                  2019-02-28 23:59:00  Featured           $25,000       1887           False  
elo-merchant-category-recommendation           2019-02-26 23:59:00  Featured           $50,000       3898           False  
ga-customer-revenue-prediction                 2019-02-15 23:59:00  Featured           $45,000       1104           False  

In [5]:
!kaggle competitions download -c reducing-commercial-aviation-fatalities


Downloading sample_submission.csv.zip to /content
 41% 17.0M/41.2M [00:00<00:00, 65.2MB/s]
100% 41.2M/41.2M [00:00<00:00, 104MB/s] 
Downloading test.csv.zip to /content
 99% 1.63G/1.65G [00:15<00:01, 19.0MB/s]
100% 1.65G/1.65G [00:15<00:00, 117MB/s] 
Downloading train.csv.zip to /content
100% 428M/429M [00:56<00:00, 7.67MB/s]
100% 429M/429M [00:56<00:00, 7.97MB/s]

In [6]:
!ls | grep .zip | xargs -I{} unzip {}


Archive:  sample_submission.csv.zip
  inflating: sample_submission.csv   
Archive:  test.csv.zip
  inflating: test.csv                
Archive:  train.csv.zip
  inflating: train.csv               

In [0]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

pd.options.display.max_rows = 999
pd.options.display.max_columns = 999

In [0]:
train = pd.read_csv("train.csv")
test = pd.read_csv("test.csv")

In [0]:
train.info()

やること

  • 下記のデータセットから、パイロットの状態を推測する。
  • SS, CA, DAへの多項分類問題
  • 評価はMulti-Class Log Lossで評価される。
  • センサーによるノイズが多いので、それを自動的に処理する必要がある。

crew

  • 確かクルーの組み合わせで決めている数字らしい ### experiment
  • experiment - One of CA, DA, SS or LOFT. The first 3 comprise the training set. The latter the test set.
  • CAは集中できている状態
  • DAは集中が阻害されている状態
  • SSは驚きがある状態 ### seat
  • pilotが左の席に座っているか、右の席に座っているかを示す
  • 多分、機長と副操縦士のことをカテゴリわけしている。 ### eeg*
  • electroencephalogram、脳波を示す ### ecg
  • 3-point Electrocardiogram signal. The sensor had a resolution/bit of .012215 µV and a range of -100mV to +100mV. The data are provided in microvolts.
  • 心電図の波形らしい ### r
  • Respiration, a measure of the rise and fall of the chest. The sensor had a resolution/bit of .2384186 µV and a range of -2.0V to +2.0V. The data are provided in microvolts.
  • 呼吸を示すセンサーで、microvoltsで単位が表されている ### gsr
  • Galvanic Skin Response, a measure of electrodermal activity. The sensor had a resolution/bit of .2384186 µV and a range of -2.0V to +2.0V. The data are provided in microvolts.
  • 皮膚電気反応、ストレスがかかった時に電位が変化することを利用している。 ### event
  • A = baseline, B = SS, C = CA, D = DA

In [0]:
train[(train["crew"] == 1) & (train["seat"] == 0)].describe()

基本的な戦略

パイロット別に考える

  • 基本的に人間のセンサーデータは個体差が大きいので、ノイズ処理は全体ではなく個体で考える。
  • 異常値は前後の平均値で穴埋めを行うなど
  • パイロット別に行うためには、

シグナル

  • 正規分布に従っているのかどうかが疑問。従っていそうな気がするが

一旦の方向性

  • どのカラムとevent のクラスが影響があるのかを知りたい。

In [0]:
train.head()

In [0]:
from scipy import signal

b, a = signal.buffer(8, 0.05)
y = signal.filtfilt(b, a, subset['r'], padlen=150)
plt.plot(y[3000:4024])

In [0]: