Data Science Tutorial 01 @ Data Science Society

那須野薫(Kaoru Nasuno)/ 東京大学(The University of Tokyo)

データサイエンスの基礎的なスキルを身につける為のチュートリアルです。
KaggleのコンペティションであるRECRUIT Challenge, Coupon Purchase Predictionのデータセットを題材として、
データサイエンスの基礎的なスキルに触れ,理解の土台を養うことを目的とします。
(高い予測精度を出すことが目的ではないです)
まだ、書きかけでして、要望に合わせて誤りの修正や加筆をしていく予定です。何かお気づきの点があればご連絡頂けますと幸いです。

対象データ

RECRUIT Challenge, Coupon Purchase Predictionのデータセット。
ユーザ登録や利用規約に同意してダウンロードしてください。
https://www.kaggle.com/c/coupon-purchase-prediction/data

進め方

まずは、全てのコードをコピー&ペーストして、エラーなく動作することを確認しましょう。
この段階でエラーが出る場合には環境が整っていないか、パラメタの設定ができていない等、
プログラムの理解とはあまり関係のない箇所が原因である可能性が高いです。
動作確認が終わったら、ひとつずつ書き写してみて、それぞれどのように動作するかを理解していくという方法をお勧めします。

目次

  1. 下準備
  2. データベースへのデータの展開
  3. モデリング対象の明確化
  4. 機械学習による予測モデルの構築・精度検証
  5. データの概観把握・予測モデルの改善

1, 2 について進めていきます。 3. 以降については、Lecture 02以降を参照ください。

dependencies

macユーザ:

brew update;
pip install ipython;
pip install ipython[notebook];
brew install mariadb;
pip install MySQL-python;
pip install scikit-learn;

mysqlが起動していない場合は、下記のコマンドでmysqlのプロセスを立ち上げましょう。

mysqld_safe;

MySQLクライアンの一つであるSequel Pro( http://www.sequelpro.com/ )もinstall してください。

1. 下準備

データベースの作成

このチュートリアルではMySQL(MariaDB)というリレーショナルデータベースを利用します。
ここでは、利用するデータベース名をcoupon_purchaseとし、データベースを作成していない人は下記のコマンドをターミナルで実行してください。

echo 'CREATE DATABASE coupon_purchase; ' |mysql -uroot

rootユーザのパスワードを設定している方は

echo 'CREATE DATABASE coupon_purchase; ' |mysql -uroot -pyourpassword

としてください。
ローカル環境下で実行している場合には、sequel proで下記のような設定で
でデータベースにアクセスできるようになっているはずです。
(MySQLのパスワードを設定していない場合には、パスワード欄は空白)

以下は、ipython notebook上で実行してください。
ipython notebook は下記のコマンドをターミナルで実行することで起動できます。

ipython notebook;

起動すると、ブラウザ上でipython notebookが起動します。
New >> python2(or New Notebook)をクリックすることで、新しいpythonのノートブックを作成できます。

パラメタの設定

MySQLのユーザ名やパスワードなどのパラメタを指定してください。
多くの場合はuserpasswdを変更すれば動くと思います。
また、ダウンロードし、解凍した9つのcsvファイルが置いてあるディレクトリのパスを設定してください。
(coupon_area_test.csv, coupon_list_test.csv, prefecture_locations.csv, coupon_area_train.csv, coupon_list_train.csv, sample_submission.csv, coupon_detail_train.csv, coupon_visit_train.csv user_list.csv)


In [1]:
# TODO: You Must Change the setting bellow
MYSQL = {
    'user': 'root',
    'passwd': '',
    'db': 'coupon_purchase',
    'host': '127.0.0.1',
    'port': 3306,
    'local_infile': True,
    'charset': 'utf8',
}
DATA_DIR = '/home/nasuno/recruit_kaggle_datasets'  #  ディレクトリの名前に日本語(マルチバイト文字)は使わないでください。
OUTPUTS_DIR = '/home/nasuno/recruit_kaggle/outputs'  # 予測結果などを保存するディレクトリ。

モジュールのimportや変数の初期化

次に、このチュートリアルで利用するモジュールのimportや一部の変数の初期化を行います。

%matplotlib inline

は ipython notebookに特有のマジックコマンドというものです。
pythonの文法と異なりますが、matplotlibという画像を描画するライブラリの出力結果がブラウザ上に表示されるように設定するものです。
(ここでは、おまじない程度に考えてください。)


In [2]:
%matplotlib inline
import matplotlib.pyplot as plt
import MySQLdb
import numpy

from sklearn.utils import shuffle
from sklearn.cross_validation import train_test_split
from sklearn.metrics import f1_score, accuracy_score
from sklearn.linear_model import LogisticRegression
from datetime import datetime, timedelta
from itertools import product

# Random Seed
rng = numpy.random.RandomState(1234)

dbcon = MySQLdb.connect(**MYSQL)
dbcur = dbcon.cursor()

2. データベースへのデータの格納

データベースとは

データベースとは、色々なデータの目的ベースでの管理や、効率的なデータ参照/検索を可能にするものです。
データベースの中には複数のテーブルがあります。
テーブルはちょうどスプレッドシートのようになっていて、それぞれの列に名前があり、1行が1つのデータとなるイメージです。

データの格納

データの格納の流れは大まかに、

  1. テーブルの作成
  2. テーブルへのインサート
  3. errorやwarningの確認

の3つのステップとなります。

kaggleのページにテーブルの定義が書いてあるので、ここでは、その通りに作成します。
まずは、user_listのテーブル作成クエリと実行です。
MySQLのCREATE TABLE構文については、http://dev.mysql.com/doc/refman/5.6/ja/create-table.html を参照ください。


In [3]:
dbcur.execute('''DROP TABLE IF EXISTS user_list;''')  # チュートリアルの便宜上、一度削除します。
query = '''
CREATE TABLE IF NOT EXISTS user_list (
    reg_date DATETIME,
    sex_id VARCHAR(1),
    age INT,
    withdraw_date DATETIME,
    pref_name VARCHAR(15),
    user_id_hash VARCHAR(32),
    PRIMARY KEY(user_id_hash),
    INDEX(reg_date),
    INDEX(sex_id),
    INDEX(age),
    INDEX(withdraw_date),
    INDEX(pref_name)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
'''
dbcur.execute(query)


Out[3]:
0L

次に、データのインサートです。
csvファイルなど、dumpされたファイルからMySQLにインサートする場合にはLOAD DATA INFILE構文を利用します。
LOAD DATA INFILE構文については、 http://dev.mysql.com/doc/refman/5.6/ja/load-data.html を参照ください。


In [4]:
csv_path = DATA_DIR + '/user_list.csv'
query = '''
LOAD DATA LOCAL INFILE "''' + csv_path + '''"
INTO TABLE user_list
CHARACTER SET utf8
FIELDS TERMINATED BY ','
IGNORE 1 LINES
(reg_date, sex_id, age,@withdraw_date, pref_name, user_id_hash)
SET
    withdraw_date = IF(CHAR_LENGTH(@withdraw_date) != 19 , '9999-12-31 23:59:59', STR_TO_DATE(@withdraw_date, "%Y-%m-%d %H:%i:%s"))
;
'''
dbcur.execute(query)


Out[4]:
22873L

テーブルの作成に利用したCREATE TABLE文には、
テーブルの型の定義ではなく、インデックスと呼ばれるものの定義も含まれています。
インデックスとはデータの検索を高速化するものです。

PRIMARY KEY

テーブル内でuniqueで、かつ、検索するカラムに付与する。 例えば、user_listテーブルのuser_id_hashは当該テーブルで、ユニークであり,かつ、ユーザの検索によく用いるため、PRIMARY KEYを付与しておいた方が良い。

INDEX

テーブル内でuniqueではないが、検索するカラムに付与する。例えば、ユーザを性別や年齢に応じて検索・集計して、割合を見たい場合には、sex_idやageなどのカラムに付与しておいた方が良い。

TODO

MYSQL関数などの説明の加筆。

Exercise

下記の他のファイルについても同様にテーブルを作成し、データをインサートしてください。

  • prefecture_locations.csv
  • coupon_area_train.csv, coupon_area_test.csv
  • coupon_detail_train.csv
  • coupon_visit_train.csv
  • coupon_list_train.csv, coupon_list_test.csv

実装例

  • prefecture_locations.csv

In [5]:
### prefecture_locations
csv_path = DATA_DIR + '/prefecture_locations.csv' 
dbcur.execute('''DROP TABLE IF EXISTS prefecture_locations;''')
dbcur.execute('''
CREATE TABLE IF NOT EXISTS prefecture_locations (
    pref_name VARCHAR(15),
    PRIMARY KEY(pref_name),
    prefectual_office VARCHAR(15),
    latitude DOUBLE,
    longitude DOUBLE
) ENGINE=MyISAM DEFAULT CHARSET=utf8
;
''')

dbcur.execute('''
LOAD DATA LOCAL INFILE "''' + csv_path + '''"
INTO TABLE prefecture_locations
CHARACTER SET utf8
FIELDS TERMINATED BY ','
IGNORE 1 LINES
(pref_name, prefectual_office, latitude, longitude)
;
''')


/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 1
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 2
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 3
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 4
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 5
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 6
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 7
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 8
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 9
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 10
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 11
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 12
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 13
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 14
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 15
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 16
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 17
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 18
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 19
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 20
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 21
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 22
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 23
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 24
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 25
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 26
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 27
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 28
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 29
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 30
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 31
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 32
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 33
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 34
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 35
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 36
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 37
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 38
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 39
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 40
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 41
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 42
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 43
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 44
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 45
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 46
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:23: Warning: Data truncated for column 'longitude' at row 47
Out[5]:
47L

実行すると、それぞれのレコードでWarningが発生しますが、
データベースに展開されたレコードのlongitudeを確認すると正しく展開されているため、ここではWarningは無視します。
(確認しておりませんが、行末の改行コードがWarningの原因かもしれません、、、)

  • coupon_area_train.csv, coupon_area_test.csv

In [6]:
### coupon_area_train
csv_path = DATA_DIR + '/coupon_area_train.csv' 
dbcur.execute('''DROP TABLE IF EXISTS coupon_area_train;''')
dbcur.execute('''
CREATE TABLE IF NOT EXISTS coupon_area_train (
    small_area_name  VARCHAR(32),
    pref_name VARCHAR(15),    
    coupon_id_hash VARCHAR(32),
    INDEX(coupon_id_hash),
     INDEX(pref_name)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
;
''')

dbcur.execute('''
LOAD DATA LOCAL INFILE "''' + csv_path + '''"
INTO TABLE coupon_area_train
CHARACTER SET utf8
FIELDS TERMINATED BY ','
IGNORE 1 LINES
(small_area_name,pref_name,coupon_id_hash)
;
''')


Out[6]:
138185L

In [7]:
### coupon_area_test
csv_path = DATA_DIR + '/coupon_area_test.csv' 
dbcur.execute('''DROP TABLE IF EXISTS coupon_area_test;''')
dbcur.execute('''
CREATE TABLE IF NOT EXISTS coupon_area_test (
    small_area_name  VARCHAR(32),
    pref_name VARCHAR(15),    
    coupon_id_hash VARCHAR(32),
    INDEX(coupon_id_hash),
    INDEX(pref_name)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
;
''')

dbcur.execute('''
LOAD DATA LOCAL INFILE "''' + csv_path + '''"
INTO TABLE coupon_area_test
CHARACTER SET utf8
FIELDS TERMINATED BY ','
IGNORE 1 LINES
(small_area_name,pref_name,coupon_id_hash)
;
''')


Out[7]:
2165L
  • coupon_detail_train.csv

In [8]:
### coupon_detail_train
csv_path = DATA_DIR + '/coupon_detail_train.csv' 
dbcur.execute('''DROP TABLE IF EXISTS coupon_detail_train;''')
dbcur.execute('''
CREATE TABLE IF NOT EXISTS coupon_detail_train (
    item_count INT,
    i_date DATETIME,
    small_area_name VARCHAR(32),
    purchaseid_hash VARCHAR(32),
    user_id_hash VARCHAR(32),
    coupon_id_hash VARCHAR(32),
    INDEX(coupon_id_hash)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
;
''')

dbcur.execute('''
LOAD DATA LOCAL INFILE "''' + csv_path + '''"
INTO TABLE coupon_detail_train
CHARACTER SET utf8
FIELDS TERMINATED BY ','
IGNORE 1 LINES
(item_count, i_date, small_area_name, purchaseid_hash, user_id_hash, coupon_id_hash)
;
''')


Out[8]:
168996L
  • coupon_visit_train.csv
    このファイルはレコード数が一番多く、インサートが完了するまで少し時間がかかります。

In [9]:
### coupon_visit_train
csv_path = DATA_DIR + '/coupon_visit_train.csv' 
dbcur.execute('''DROP TABLE IF EXISTS coupon_visit_train;''')
dbcur.execute('''
CREATE TABLE IF NOT EXISTS coupon_visit_train (
    purchase_flg INT,
    i_date DATETIME,
    page_serial INT,
    referrer_hash VARCHAR(128),
    view_coupon_id_hash VARCHAR(128),
    user_id_hash VARCHAR(32),
    session_id_hash VARCHAR(128),
    purchaseid_hash VARCHAR(32),
    INDEX(user_id_hash, i_date),
    INDEX(i_date, user_id_hash),
    INDEX(view_coupon_id_hash),
    INDEX(purchaseid_hash),
    INDEX(purchase_flg)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
;
''')

dbcur.execute('''
LOAD DATA LOCAL INFILE "''' + csv_path + '''"
INTO TABLE coupon_visit_train
CHARACTER SET utf8
FIELDS TERMINATED BY ','
IGNORE 1 LINES
(purchase_flg,i_date,page_serial,referrer_hash,view_coupon_id_hash,user_id_hash,session_id_hash,purchaseid_hash)
;
''')


Out[9]:
2833180L
  • coupon_list_train.csv, coupon_list_test.csv
    2つともWarningが出ますが、 日時の値が正しくないために発生しているだけなので、無視します。
    下記のクエリの
    SET validperiod=IF(@validperiod = 'NA', Null, @validperiod)
    
    を外すと、おそらくNULLが入ってほしい箇所に0が入ってしまうため、ここでは、NULLがはいるように変換します。

In [10]:
### coupon_list_train
csv_path = DATA_DIR + '/coupon_list_train.csv' 
dbcur.execute('''DROP TABLE IF EXISTS coupon_list_train;''')
dbcur.execute('''
CREATE TABLE IF NOT EXISTS coupon_list_train (
    capsule_text VARCHAR(20),
    genre_name VARCHAR(50),
    price_rate INT,
    catalog_price INT,
    discount_price INT,
    dispfrom DATETIME,
    dispend DATETIME,
    dispperiod INT,
    validfrom DATE,
    validend DATE,
    validperiod INT,
    usable_date_mon VARCHAR(7),
    usable_date_tue VARCHAR(7),
    usable_date_wed VARCHAR(7),
    usable_date_thu VARCHAR(7),
    usable_date_fri VARCHAR(7),
    usable_date_sat VARCHAR(7),
    usable_date_sun VARCHAR(7),
    usable_date_holiday VARCHAR(7),
    usable_date_before_holiday VARCHAR(7),
    large_area_name VARCHAR(30),
    ken_name VARCHAR(8),
    small_area_name VARCHAR(30),
    coupon_id_hash VARCHAR(32),
    PRIMARY KEY(coupon_id_hash),
    INDEX(ken_name),
    INDEX(genre_name)
    
) ENGINE=MyISAM DEFAULT CHARSET=utf8
;
''')

dbcur.execute('''
LOAD DATA LOCAL INFILE "''' + csv_path + '''"
INTO TABLE coupon_list_train
CHARACTER SET utf8
FIELDS TERMINATED BY ','
IGNORE 1 LINES
(capsule_text,genre_name,price_rate,catalog_price,discount_price,dispfrom,dispend,dispperiod,validfrom,validend,@validperiod,usable_date_mon,usable_date_tue,usable_date_wed,usable_date_thu,usable_date_fri,usable_date_sat,usable_date_sun,usable_date_holiday,usable_date_before_holiday,large_area_name,ken_name,small_area_name,coupon_id_hash)
SET validperiod=IF(@validperiod = 'NA', Null, @validperiod)
;
''')


/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validfrom' at row 328
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validend' at row 328
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validfrom' at row 347
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validend' at row 347
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validfrom' at row 349
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validend' at row 349
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validfrom' at row 350
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validend' at row 350
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validfrom' at row 378
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validend' at row 378
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validfrom' at row 387
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validend' at row 387
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validfrom' at row 394
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validend' at row 394
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validfrom' at row 406
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validend' at row 406
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validfrom' at row 410
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validend' at row 410
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validfrom' at row 414
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validend' at row 414
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validfrom' at row 415
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validend' at row 415
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validfrom' at row 418
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validend' at row 418
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validfrom' at row 419
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validend' at row 419
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validfrom' at row 426
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validend' at row 426
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validfrom' at row 446
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validend' at row 446
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validfrom' at row 447
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validend' at row 447
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validfrom' at row 448
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validend' at row 448
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validfrom' at row 452
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validend' at row 452
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validfrom' at row 454
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validend' at row 454
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validfrom' at row 455
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validend' at row 455
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validfrom' at row 466
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validend' at row 466
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validfrom' at row 475
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validend' at row 475
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validfrom' at row 476
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validend' at row 476
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validfrom' at row 479
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validend' at row 479
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validfrom' at row 485
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validend' at row 485
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validfrom' at row 488
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validend' at row 488
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validfrom' at row 489
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validend' at row 489
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validfrom' at row 491
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validend' at row 491
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validfrom' at row 492
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validend' at row 492
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validfrom' at row 495
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validend' at row 495
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validfrom' at row 496
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validend' at row 496
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validfrom' at row 497
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:47: Warning: Data truncated for column 'validend' at row 497
Out[10]:
19413L

In [11]:
### coupon_list_test
csv_path = DATA_DIR + '/coupon_list_test.csv' 
dbcur.execute('''DROP TABLE IF EXISTS coupon_list_test;''')
dbcur.execute('''
CREATE TABLE IF NOT EXISTS coupon_list_test (
    capsule_text VARCHAR(20),
    genre_name VARCHAR(50),
    price_rate INT,
    catalog_price INT,
    discount_price INT,
    dispfrom DATETIME,
    dispend DATETIME,
    dispperiod INT,
    validfrom DATE,
    validend DATE,
    validperiod INT,
    usable_date_mon VARCHAR(7),
    usable_date_tue VARCHAR(7),
    usable_date_wed VARCHAR(7),
    usable_date_thu VARCHAR(7),
    usable_date_fri VARCHAR(7),
    usable_date_sat VARCHAR(7),
    usable_date_sun VARCHAR(7),
    usable_date_holiday VARCHAR(7),
    usable_date_before_holiday VARCHAR(7),
    large_area_name VARCHAR(30),
    ken_name VARCHAR(8),
    small_area_name VARCHAR(30),
    coupon_id_hash VARCHAR(32),
    PRIMARY KEY(coupon_id_hash),
    INDEX(ken_name),
    INDEX(genre_name)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
;
''')

dbcur.execute('''
LOAD DATA LOCAL INFILE "''' + csv_path + '''"
INTO TABLE coupon_list_test
CHARACTER SET utf8
FIELDS TERMINATED BY ','
IGNORE 1 LINES
(capsule_text,genre_name,price_rate,catalog_price,discount_price,dispfrom,dispend,dispperiod,validfrom,validend,@validperiod,usable_date_mon,usable_date_tue,usable_date_wed,usable_date_thu,usable_date_fri,usable_date_sat,usable_date_sun,usable_date_holiday,usable_date_before_holiday,large_area_name,ken_name,small_area_name,coupon_id_hash)
SET validperiod=IF(@validperiod = 'NA', Null, @validperiod)
;
''')


/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validfrom' at row 24
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validend' at row 24
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validfrom' at row 25
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validend' at row 25
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validfrom' at row 26
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validend' at row 26
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validfrom' at row 29
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validend' at row 29
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validfrom' at row 30
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validend' at row 30
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validfrom' at row 31
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validend' at row 31
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validfrom' at row 34
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validend' at row 34
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validfrom' at row 41
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validend' at row 41
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validfrom' at row 43
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validend' at row 43
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validfrom' at row 44
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validend' at row 44
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validfrom' at row 45
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validend' at row 45
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validfrom' at row 53
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validend' at row 53
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validfrom' at row 54
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validend' at row 54
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validfrom' at row 58
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validend' at row 58
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validfrom' at row 59
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validend' at row 59
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validfrom' at row 60
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validend' at row 60
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validfrom' at row 61
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validend' at row 61
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validfrom' at row 62
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validend' at row 62
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validfrom' at row 63
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validend' at row 63
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validfrom' at row 66
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validend' at row 66
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validfrom' at row 67
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validend' at row 67
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validfrom' at row 68
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validend' at row 68
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validfrom' at row 69
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validend' at row 69
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validfrom' at row 72
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validend' at row 72
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validfrom' at row 74
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validend' at row 74
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validfrom' at row 75
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validend' at row 75
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validfrom' at row 76
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validend' at row 76
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validfrom' at row 84
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validend' at row 84
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validfrom' at row 90
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validend' at row 90
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validfrom' at row 91
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validend' at row 91
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validfrom' at row 108
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validend' at row 108
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validfrom' at row 109
/usr/local/lib/python2.7/dist-packages/IPython/kernel/__main__.py:46: Warning: Data truncated for column 'validend' at row 109
Out[11]:
310L

謝辞

フィードバックをくれた方ありがとうございます!
Kaggleやリクルートの方ありがとうございます!


In [ ]: