C:\Hadoop\binC:\Hadoop\bin下面HADOOP_HOME环境变量,指向C:\HadoopC:\temp\hive文件夹c:\hadoop\bin\winutils chmod 777 \temp\hiveC:\SPARK目录下,其它操作系统的放到home目录SPARK_HOME,指向C:\SPARKc:\spark\bin\spark-shell看看是否安装成功conda --versionconda listconda update conda输入jupyter notebook
In [1]:
    
%matplotlib inline
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 3*np.pi, 500)
plt.plot(x, np.sin(x**2))
plt.title('Sine wave')
    
    Out[1]:
Text(0.5,1,'Sine wave')
    
 
In [2]:
    
import numpy as np
import pandas as pd
    
In [3]:
    
s = pd.Series([1, 3, 5, np.nan, 6, 8])
s[4] # 6.0
    
    Out[3]:
6.0
In [4]:
    
df = pd.DataFrame({'data': ['2016-01-01', '2016-01-02', '2016-01-03'], 'qty': [20, 30, 40]})
df
    
    Out[4]:
  
    
       
      data 
      qty 
     
  
  
    
      0 
      2016-01-01 
      20 
     
    
      1 
      2016-01-02 
      30 
     
    
      2 
      2016-01-03 
      40 
     
  
更大的数据应当从文件里获取
In [5]:
    
rain = pd.read_csv('data/rainfall/rainfall.csv')
rain
    
    Out[5]:
  
    
       
      City 
      Month 
      Rainfall 
     
  
  
    
      0 
      San Francisco 
      Jan 
      10.0 
     
    
      1 
      Seattle 
      Jan 
      30.0 
     
    
      2 
      Los Angeles 
      Jan 
      2.0 
     
    
      3 
      Seattle 
      Feb 
      20.0 
     
    
      4 
      San Francisco 
      Feb 
      4.0 
     
    
      5 
      Los Angeles 
      Feb 
      0.0 
     
    
      6 
      Seattle 
      Mar 
      22.0 
     
    
      7 
      San Francisco 
      Mar 
      4.0 
     
    
      8 
      Los Angeles 
      Mar 
      NaN 
     
    
      9 
      Seattle 
      Apr 
      NaN 
     
    
      10 
      San Francisco 
      Apr 
      5.0 
     
    
      11 
      Los Angeles 
      Apr 
      4.0 
     
  
In [6]:
    
# 加载一列
rain['City']
    
    Out[6]:
0     San Francisco
1           Seattle
2       Los Angeles
3           Seattle
4     San Francisco
5       Los Angeles
6           Seattle
7     San Francisco
8       Los Angeles
9           Seattle
10    San Francisco
11      Los Angeles
Name: City, dtype: object
In [7]:
    
# 加载一行(第二行)
rain.loc[[1]]
    
    Out[7]:
  
    
       
      City 
      Month 
      Rainfall 
     
  
  
    
      1 
      Seattle 
      Jan 
      30.0 
     
  
In [8]:
    
# 第一行和第二行
rain.loc[0:1]
    
    Out[8]:
  
    
       
      City 
      Month 
      Rainfall 
     
  
  
    
      0 
      San Francisco 
      Jan 
      10.0 
     
    
      1 
      Seattle 
      Jan 
      30.0 
     
  
In [10]:
    
# 查找所有降雨量小于10的数据
rain[rain['Rainfall'] < 10]
    
    Out[10]:
  
    
       
      City 
      Month 
      Rainfall 
     
  
  
    
      2 
      Los Angeles 
      Jan 
      2.0 
     
    
      4 
      San Francisco 
      Feb 
      4.0 
     
    
      5 
      Los Angeles 
      Feb 
      0.0 
     
    
      7 
      San Francisco 
      Mar 
      4.0 
     
    
      10 
      San Francisco 
      Apr 
      5.0 
     
    
      11 
      Los Angeles 
      Apr 
      4.0 
     
  
查找4月份的降雨
In [11]:
    
rain[rain['Month'] == 'Apr']
    
    Out[11]:
  
    
       
      City 
      Month 
      Rainfall 
     
  
  
    
      9 
      Seattle 
      Apr 
      NaN 
     
    
      10 
      San Francisco 
      Apr 
      5.0 
     
    
      11 
      Los Angeles 
      Apr 
      4.0 
     
  
查找Los Angeles的数据
In [12]:
    
rain[rain['City'] == 'Los Angeles']
    
    Out[12]:
  
    
       
      City 
      Month 
      Rainfall 
     
  
  
    
      2 
      Los Angeles 
      Jan 
      2.0 
     
    
      5 
      Los Angeles 
      Feb 
      0.0 
     
    
      8 
      Los Angeles 
      Mar 
      NaN 
     
    
      11 
      Los Angeles 
      Apr 
      4.0 
     
  
In [13]:
    
rain = rain.set_index(rain['City'] + rain['Month'])
    
注意,当我们修改dataframe时,其实是在创建一个副本,因此要把这个值再赋值给原有的dataframe
In [14]:
    
rain.loc['San FranciscoApr']
    
    Out[14]:
City        San Francisco
Month                 Apr
Rainfall                5
Name: San FranciscoApr, dtype: object
In [15]:
    
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
    
In [16]:
    
df = pd.read_csv('data/nycflights13/flights.csv.gz')
    
In [17]:
    
df
    
    Out[17]:
  
    
       
      year 
      month 
      day 
      dep_time 
      sched_dep_time 
      dep_delay 
      arr_time 
      sched_arr_time 
      arr_delay 
      carrier 
      flight 
      tailnum 
      origin 
      dest 
      air_time 
      distance 
      hour 
      minute 
      time_hour 
     
  
  
    
      0 
      2013 
      1 
      1 
      517.0 
      515 
      2.0 
      830.0 
      819 
      11.0 
      UA 
      1545 
      N14228 
      EWR 
      IAH 
      227.0 
      1400 
      5 
      15 
      2013-01-01 05:00:00 
     
    
      1 
      2013 
      1 
      1 
      533.0 
      529 
      4.0 
      850.0 
      830 
      20.0 
      UA 
      1714 
      N24211 
      LGA 
      IAH 
      227.0 
      1416 
      5 
      29 
      2013-01-01 05:00:00 
     
    
      2 
      2013 
      1 
      1 
      542.0 
      540 
      2.0 
      923.0 
      850 
      33.0 
      AA 
      1141 
      N619AA 
      JFK 
      MIA 
      160.0 
      1089 
      5 
      40 
      2013-01-01 05:00:00 
     
    
      3 
      2013 
      1 
      1 
      544.0 
      545 
      -1.0 
      1004.0 
      1022 
      -18.0 
      B6 
      725 
      N804JB 
      JFK 
      BQN 
      183.0 
      1576 
      5 
      45 
      2013-01-01 05:00:00 
     
    
      4 
      2013 
      1 
      1 
      554.0 
      600 
      -6.0 
      812.0 
      837 
      -25.0 
      DL 
      461 
      N668DN 
      LGA 
      ATL 
      116.0 
      762 
      6 
      0 
      2013-01-01 06:00:00 
     
    
      5 
      2013 
      1 
      1 
      554.0 
      558 
      -4.0 
      740.0 
      728 
      12.0 
      UA 
      1696 
      N39463 
      EWR 
      ORD 
      150.0 
      719 
      5 
      58 
      2013-01-01 05:00:00 
     
    
      6 
      2013 
      1 
      1 
      555.0 
      600 
      -5.0 
      913.0 
      854 
      19.0 
      B6 
      507 
      N516JB 
      EWR 
      FLL 
      158.0 
      1065 
      6 
      0 
      2013-01-01 06:00:00 
     
    
      7 
      2013 
      1 
      1 
      557.0 
      600 
      -3.0 
      709.0 
      723 
      -14.0 
      EV 
      5708 
      N829AS 
      LGA 
      IAD 
      53.0 
      229 
      6 
      0 
      2013-01-01 06:00:00 
     
    
      8 
      2013 
      1 
      1 
      557.0 
      600 
      -3.0 
      838.0 
      846 
      -8.0 
      B6 
      79 
      N593JB 
      JFK 
      MCO 
      140.0 
      944 
      6 
      0 
      2013-01-01 06:00:00 
     
    
      9 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      753.0 
      745 
      8.0 
      AA 
      301 
      N3ALAA 
      LGA 
      ORD 
      138.0 
      733 
      6 
      0 
      2013-01-01 06:00:00 
     
    
      10 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      849.0 
      851 
      -2.0 
      B6 
      49 
      N793JB 
      JFK 
      PBI 
      149.0 
      1028 
      6 
      0 
      2013-01-01 06:00:00 
     
    
      11 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      853.0 
      856 
      -3.0 
      B6 
      71 
      N657JB 
      JFK 
      TPA 
      158.0 
      1005 
      6 
      0 
      2013-01-01 06:00:00 
     
    
      12 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      924.0 
      917 
      7.0 
      UA 
      194 
      N29129 
      JFK 
      LAX 
      345.0 
      2475 
      6 
      0 
      2013-01-01 06:00:00 
     
    
      13 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      923.0 
      937 
      -14.0 
      UA 
      1124 
      N53441 
      EWR 
      SFO 
      361.0 
      2565 
      6 
      0 
      2013-01-01 06:00:00 
     
    
      14 
      2013 
      1 
      1 
      559.0 
      600 
      -1.0 
      941.0 
      910 
      31.0 
      AA 
      707 
      N3DUAA 
      LGA 
      DFW 
      257.0 
      1389 
      6 
      0 
      2013-01-01 06:00:00 
     
    
      15 
      2013 
      1 
      1 
      559.0 
      559 
      0.0 
      702.0 
      706 
      -4.0 
      B6 
      1806 
      N708JB 
      JFK 
      BOS 
      44.0 
      187 
      5 
      59 
      2013-01-01 05:00:00 
     
    
      16 
      2013 
      1 
      1 
      559.0 
      600 
      -1.0 
      854.0 
      902 
      -8.0 
      UA 
      1187 
      N76515 
      EWR 
      LAS 
      337.0 
      2227 
      6 
      0 
      2013-01-01 06:00:00 
     
    
      17 
      2013 
      1 
      1 
      600.0 
      600 
      0.0 
      851.0 
      858 
      -7.0 
      B6 
      371 
      N595JB 
      LGA 
      FLL 
      152.0 
      1076 
      6 
      0 
      2013-01-01 06:00:00 
     
    
      18 
      2013 
      1 
      1 
      600.0 
      600 
      0.0 
      837.0 
      825 
      12.0 
      MQ 
      4650 
      N542MQ 
      LGA 
      ATL 
      134.0 
      762 
      6 
      0 
      2013-01-01 06:00:00 
     
    
      19 
      2013 
      1 
      1 
      601.0 
      600 
      1.0 
      844.0 
      850 
      -6.0 
      B6 
      343 
      N644JB 
      EWR 
      PBI 
      147.0 
      1023 
      6 
      0 
      2013-01-01 06:00:00 
     
    
      20 
      2013 
      1 
      1 
      602.0 
      610 
      -8.0 
      812.0 
      820 
      -8.0 
      DL 
      1919 
      N971DL 
      LGA 
      MSP 
      170.0 
      1020 
      6 
      10 
      2013-01-01 06:00:00 
     
    
      21 
      2013 
      1 
      1 
      602.0 
      605 
      -3.0 
      821.0 
      805 
      16.0 
      MQ 
      4401 
      N730MQ 
      LGA 
      DTW 
      105.0 
      502 
      6 
      5 
      2013-01-01 06:00:00 
     
    
      22 
      2013 
      1 
      1 
      606.0 
      610 
      -4.0 
      858.0 
      910 
      -12.0 
      AA 
      1895 
      N633AA 
      EWR 
      MIA 
      152.0 
      1085 
      6 
      10 
      2013-01-01 06:00:00 
     
    
      23 
      2013 
      1 
      1 
      606.0 
      610 
      -4.0 
      837.0 
      845 
      -8.0 
      DL 
      1743 
      N3739P 
      JFK 
      ATL 
      128.0 
      760 
      6 
      10 
      2013-01-01 06:00:00 
     
    
      24 
      2013 
      1 
      1 
      607.0 
      607 
      0.0 
      858.0 
      915 
      -17.0 
      UA 
      1077 
      N53442 
      EWR 
      MIA 
      157.0 
      1085 
      6 
      7 
      2013-01-01 06:00:00 
     
    
      25 
      2013 
      1 
      1 
      608.0 
      600 
      8.0 
      807.0 
      735 
      32.0 
      MQ 
      3768 
      N9EAMQ 
      EWR 
      ORD 
      139.0 
      719 
      6 
      0 
      2013-01-01 06:00:00 
     
    
      26 
      2013 
      1 
      1 
      611.0 
      600 
      11.0 
      945.0 
      931 
      14.0 
      UA 
      303 
      N532UA 
      JFK 
      SFO 
      366.0 
      2586 
      6 
      0 
      2013-01-01 06:00:00 
     
    
      27 
      2013 
      1 
      1 
      613.0 
      610 
      3.0 
      925.0 
      921 
      4.0 
      B6 
      135 
      N635JB 
      JFK 
      RSW 
      175.0 
      1074 
      6 
      10 
      2013-01-01 06:00:00 
     
    
      28 
      2013 
      1 
      1 
      615.0 
      615 
      0.0 
      1039.0 
      1100 
      -21.0 
      B6 
      709 
      N794JB 
      JFK 
      SJU 
      182.0 
      1598 
      6 
      15 
      2013-01-01 06:00:00 
     
    
      29 
      2013 
      1 
      1 
      615.0 
      615 
      0.0 
      833.0 
      842 
      -9.0 
      DL 
      575 
      N326NB 
      EWR 
      ATL 
      120.0 
      746 
      6 
      15 
      2013-01-01 06:00:00 
     
    
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
     
    
      336746 
      2013 
      9 
      30 
      2123.0 
      2125 
      -2.0 
      2223.0 
      2247 
      -24.0 
      EV 
      5489 
      N712EV 
      LGA 
      CHO 
      45.0 
      305 
      21 
      25 
      2013-09-30 21:00:00 
     
    
      336747 
      2013 
      9 
      30 
      2127.0 
      2129 
      -2.0 
      2314.0 
      2323 
      -9.0 
      EV 
      3833 
      N16546 
      EWR 
      CLT 
      72.0 
      529 
      21 
      29 
      2013-09-30 21:00:00 
     
    
      336748 
      2013 
      9 
      30 
      2128.0 
      2130 
      -2.0 
      2328.0 
      2359 
      -31.0 
      B6 
      97 
      N807JB 
      JFK 
      DEN 
      213.0 
      1626 
      21 
      30 
      2013-09-30 21:00:00 
     
    
      336749 
      2013 
      9 
      30 
      2129.0 
      2059 
      30.0 
      2230.0 
      2232 
      -2.0 
      EV 
      5048 
      N751EV 
      LGA 
      RIC 
      45.0 
      292 
      20 
      59 
      2013-09-30 20:00:00 
     
    
      336750 
      2013 
      9 
      30 
      2131.0 
      2140 
      -9.0 
      2225.0 
      2255 
      -30.0 
      MQ 
      3621 
      N807MQ 
      JFK 
      DCA 
      36.0 
      213 
      21 
      40 
      2013-09-30 21:00:00 
     
    
      336751 
      2013 
      9 
      30 
      2140.0 
      2140 
      0.0 
      10.0 
      40 
      -30.0 
      AA 
      185 
      N335AA 
      JFK 
      LAX 
      298.0 
      2475 
      21 
      40 
      2013-09-30 21:00:00 
     
    
      336752 
      2013 
      9 
      30 
      2142.0 
      2129 
      13.0 
      2250.0 
      2239 
      11.0 
      EV 
      4509 
      N12957 
      EWR 
      PWM 
      47.0 
      284 
      21 
      29 
      2013-09-30 21:00:00 
     
    
      336753 
      2013 
      9 
      30 
      2145.0 
      2145 
      0.0 
      115.0 
      140 
      -25.0 
      B6 
      1103 
      N633JB 
      JFK 
      SJU 
      192.0 
      1598 
      21 
      45 
      2013-09-30 21:00:00 
     
    
      336754 
      2013 
      9 
      30 
      2147.0 
      2137 
      10.0 
      30.0 
      27 
      3.0 
      B6 
      1371 
      N627JB 
      LGA 
      FLL 
      139.0 
      1076 
      21 
      37 
      2013-09-30 21:00:00 
     
    
      336755 
      2013 
      9 
      30 
      2149.0 
      2156 
      -7.0 
      2245.0 
      2308 
      -23.0 
      UA 
      523 
      N813UA 
      EWR 
      BOS 
      37.0 
      200 
      21 
      56 
      2013-09-30 21:00:00 
     
    
      336756 
      2013 
      9 
      30 
      2150.0 
      2159 
      -9.0 
      2250.0 
      2306 
      -16.0 
      EV 
      3842 
      N10575 
      EWR 
      MHT 
      39.0 
      209 
      21 
      59 
      2013-09-30 21:00:00 
     
    
      336757 
      2013 
      9 
      30 
      2159.0 
      1845 
      194.0 
      2344.0 
      2030 
      194.0 
      9E 
      3320 
      N906XJ 
      JFK 
      BUF 
      50.0 
      301 
      18 
      45 
      2013-09-30 18:00:00 
     
    
      336758 
      2013 
      9 
      30 
      2203.0 
      2205 
      -2.0 
      2339.0 
      2331 
      8.0 
      EV 
      5311 
      N722EV 
      LGA 
      BGR 
      61.0 
      378 
      22 
      5 
      2013-09-30 22:00:00 
     
    
      336759 
      2013 
      9 
      30 
      2207.0 
      2140 
      27.0 
      2257.0 
      2250 
      7.0 
      MQ 
      3660 
      N532MQ 
      LGA 
      BNA 
      97.0 
      764 
      21 
      40 
      2013-09-30 21:00:00 
     
    
      336760 
      2013 
      9 
      30 
      2211.0 
      2059 
      72.0 
      2339.0 
      2242 
      57.0 
      EV 
      4672 
      N12145 
      EWR 
      STL 
      120.0 
      872 
      20 
      59 
      2013-09-30 20:00:00 
     
    
      336761 
      2013 
      9 
      30 
      2231.0 
      2245 
      -14.0 
      2335.0 
      2356 
      -21.0 
      B6 
      108 
      N193JB 
      JFK 
      PWM 
      48.0 
      273 
      22 
      45 
      2013-09-30 22:00:00 
     
    
      336762 
      2013 
      9 
      30 
      2233.0 
      2113 
      80.0 
      112.0 
      30 
      42.0 
      UA 
      471 
      N578UA 
      EWR 
      SFO 
      318.0 
      2565 
      21 
      13 
      2013-09-30 21:00:00 
     
    
      336763 
      2013 
      9 
      30 
      2235.0 
      2001 
      154.0 
      59.0 
      2249 
      130.0 
      B6 
      1083 
      N804JB 
      JFK 
      MCO 
      123.0 
      944 
      20 
      1 
      2013-09-30 20:00:00 
     
    
      336764 
      2013 
      9 
      30 
      2237.0 
      2245 
      -8.0 
      2345.0 
      2353 
      -8.0 
      B6 
      234 
      N318JB 
      JFK 
      BTV 
      43.0 
      266 
      22 
      45 
      2013-09-30 22:00:00 
     
    
      336765 
      2013 
      9 
      30 
      2240.0 
      2245 
      -5.0 
      2334.0 
      2351 
      -17.0 
      B6 
      1816 
      N354JB 
      JFK 
      SYR 
      41.0 
      209 
      22 
      45 
      2013-09-30 22:00:00 
     
    
      336766 
      2013 
      9 
      30 
      2240.0 
      2250 
      -10.0 
      2347.0 
      7 
      -20.0 
      B6 
      2002 
      N281JB 
      JFK 
      BUF 
      52.0 
      301 
      22 
      50 
      2013-09-30 22:00:00 
     
    
      336767 
      2013 
      9 
      30 
      2241.0 
      2246 
      -5.0 
      2345.0 
      1 
      -16.0 
      B6 
      486 
      N346JB 
      JFK 
      ROC 
      47.0 
      264 
      22 
      46 
      2013-09-30 22:00:00 
     
    
      336768 
      2013 
      9 
      30 
      2307.0 
      2255 
      12.0 
      2359.0 
      2358 
      1.0 
      B6 
      718 
      N565JB 
      JFK 
      BOS 
      33.0 
      187 
      22 
      55 
      2013-09-30 22:00:00 
     
    
      336769 
      2013 
      9 
      30 
      2349.0 
      2359 
      -10.0 
      325.0 
      350 
      -25.0 
      B6 
      745 
      N516JB 
      JFK 
      PSE 
      196.0 
      1617 
      23 
      59 
      2013-09-30 23:00:00 
     
    
      336770 
      2013 
      9 
      30 
      NaN 
      1842 
      NaN 
      NaN 
      2019 
      NaN 
      EV 
      5274 
      N740EV 
      LGA 
      BNA 
      NaN 
      764 
      18 
      42 
      2013-09-30 18:00:00 
     
    
      336771 
      2013 
      9 
      30 
      NaN 
      1455 
      NaN 
      NaN 
      1634 
      NaN 
      9E 
      3393 
      NaN 
      JFK 
      DCA 
      NaN 
      213 
      14 
      55 
      2013-09-30 14:00:00 
     
    
      336772 
      2013 
      9 
      30 
      NaN 
      2200 
      NaN 
      NaN 
      2312 
      NaN 
      9E 
      3525 
      NaN 
      LGA 
      SYR 
      NaN 
      198 
      22 
      0 
      2013-09-30 22:00:00 
     
    
      336773 
      2013 
      9 
      30 
      NaN 
      1210 
      NaN 
      NaN 
      1330 
      NaN 
      MQ 
      3461 
      N535MQ 
      LGA 
      BNA 
      NaN 
      764 
      12 
      10 
      2013-09-30 12:00:00 
     
    
      336774 
      2013 
      9 
      30 
      NaN 
      1159 
      NaN 
      NaN 
      1344 
      NaN 
      MQ 
      3572 
      N511MQ 
      LGA 
      CLE 
      NaN 
      419 
      11 
      59 
      2013-09-30 11:00:00 
     
    
      336775 
      2013 
      9 
      30 
      NaN 
      840 
      NaN 
      NaN 
      1020 
      NaN 
      MQ 
      3531 
      N839MQ 
      LGA 
      RDU 
      NaN 
      431 
      8 
      40 
      2013-09-30 08:00:00 
     
  
336776 rows × 19 columns
这里我们主要关注统计数据和可视化。我们来看一下按月统计的晚点时间的均值。
In [18]:
    
mean_delay_by_month = df.groupby(['month'])['arr_delay'].mean()
mean_delay_by_month
    
    Out[18]:
month
1      6.129972
2      5.613019
3      5.807577
4     11.176063
5      3.521509
6     16.481330
7     16.711307
8      6.040652
9     -4.018364
10    -0.167063
11     0.461347
12    14.870355
Name: arr_delay, dtype: float64
In [19]:
    
mean_month_plt = mean_delay_by_month.plot(kind='bar', title='Mean Delay By Month')
mean_month_plt
    
    Out[19]:
<matplotlib.axes._subplots.AxesSubplot at 0x259b18d6ac8>
    
 
注意,这里9、10月均值会有负值。
In [20]:
    
mean_delay_by_month_ord = df[(df.dest == 'ORD')].groupby(['month'])['arr_delay'].mean()
print("Flights to Chicago (ORD)")
print(mean_delay_by_month_ord)
mean_month_plt_ord = mean_delay_by_month_ord.plot(kind='bar', title="Mean Delay By Month (Chicago)")
mean_month_plt_ord
    
    
Flights to Chicago (ORD)
month
1      7.287694
2      3.680794
3     -2.702473
4     19.179352
5      7.938280
6     13.299376
7      8.405514
8      4.256851
9     -4.745370
10    -1.597090
11     2.071058
12    16.462817
Name: arr_delay, dtype: float64
    Out[20]:
<matplotlib.axes._subplots.AxesSubplot at 0x259b19e7f98>
    
 
In [21]:
    
# 再看看Los Angeles进行比较一下
mean_delay_by_month_lax = df[(df.dest == 'LAX')].groupby(['month'])['arr_delay'].mean()
print("Flights to Chicago (LAX)")
print(mean_delay_by_month_lax)
mean_month_plt_lax = mean_delay_by_month_lax.plot(kind='bar', title="Mean Delay By Month (Los Angeles)")
mean_month_plt_lax
    
    
Flights to Chicago (LAX)
month
1     -4.160312
2     -7.601824
3     -5.280928
4      3.085153
5     -7.150657
6     13.007027
7      8.191432
8      1.028667
9     -8.719044
10    -1.205694
11    -0.103290
12    10.724460
Name: arr_delay, dtype: float64
    Out[21]:
<matplotlib.axes._subplots.AxesSubplot at 0x259b19d8550>
    
 
从上面的图表中我们可以直观的看到一些特征。现在我们再来看看每个航空公司晚点的情况,并进行一些可视化。
In [22]:
    
# 看看是否不同的航空公司对晚点会有不同的影响
df[['carrier', 'arr_delay']].groupby('carrier').mean().plot(kind='bar', figsize=(12, 8))
plt.xticks(rotation=0)
plt.xlabel('Carrier')
plt.ylabel('Average Delay in Min')
plt.title('Average Arrival Delay by Carrier in 2008, All airports')
df[['carrier', 'dep_delay']].groupby('carrier').mean().plot(kind='bar', figsize=(12, 8))
plt.xticks(rotation=0)
plt.xlabel('Carrier')
plt.ylabel('Average Delay in Min')
plt.title('Average Departure Delay by Carrier in 2008, All airports')
    
    Out[22]:
Text(0.5,1,'Average Departure Delay by Carrier in 2008, All airports')
    
 
    
 
从上面的图表里我们可以看到F9(Front Airlines)几乎是最经常晚点的,而夏威夷(HA)在这方面表现最好。
我们有多个数据集,天气、机场的。现在我们来看一下如何把两个表连接在一起
In [25]:
    
weather = pd.read_csv('data/nycflights13/weather.csv.gz')
weather
    
    Out[25]:
  
    
       
      origin 
      year 
      month 
      day 
      hour 
      temp 
      dewp 
      humid 
      wind_dir 
      wind_speed 
      wind_gust 
      precip 
      pressure 
      visib 
      time_hour 
     
  
  
    
      0 
      EWR 
      2013 
      1 
      1 
      0 
      37.04 
      21.92 
      53.97 
      230.0 
      10.35702 
      11.918651 
      0.00 
      1013.9 
      10.00 
      2012-12-31 19:00:00 
     
    
      1 
      EWR 
      2013 
      1 
      1 
      1 
      37.04 
      21.92 
      53.97 
      230.0 
      13.80936 
      15.891535 
      0.00 
      1013.0 
      10.00 
      2012-12-31 20:00:00 
     
    
      2 
      EWR 
      2013 
      1 
      1 
      2 
      37.94 
      21.92 
      52.09 
      230.0 
      12.65858 
      14.567241 
      0.00 
      1012.6 
      10.00 
      2012-12-31 21:00:00 
     
    
      3 
      EWR 
      2013 
      1 
      1 
      3 
      37.94 
      23.00 
      54.51 
      230.0 
      13.80936 
      15.891535 
      0.00 
      1012.7 
      10.00 
      2012-12-31 22:00:00 
     
    
      4 
      EWR 
      2013 
      1 
      1 
      4 
      37.94 
      24.08 
      57.04 
      240.0 
      14.96014 
      17.215830 
      0.00 
      1012.8 
      10.00 
      2012-12-31 23:00:00 
     
    
      5 
      EWR 
      2013 
      1 
      1 
      6 
      39.02 
      26.06 
      59.37 
      270.0 
      10.35702 
      11.918651 
      0.00 
      1012.0 
      10.00 
      2013-01-01 01:00:00 
     
    
      6 
      EWR 
      2013 
      1 
      1 
      7 
      39.02 
      26.96 
      61.63 
      250.0 
      8.05546 
      9.270062 
      0.00 
      1012.3 
      10.00 
      2013-01-01 02:00:00 
     
    
      7 
      EWR 
      2013 
      1 
      1 
      8 
      39.02 
      28.04 
      64.43 
      240.0 
      11.50780 
      13.242946 
      0.00 
      1012.5 
      10.00 
      2013-01-01 03:00:00 
     
    
      8 
      EWR 
      2013 
      1 
      1 
      9 
      39.92 
      28.04 
      62.21 
      250.0 
      12.65858 
      14.567241 
      0.00 
      1012.2 
      10.00 
      2013-01-01 04:00:00 
     
    
      9 
      EWR 
      2013 
      1 
      1 
      10 
      39.02 
      28.04 
      64.43 
      260.0 
      12.65858 
      14.567241 
      0.00 
      1011.9 
      10.00 
      2013-01-01 05:00:00 
     
    
      10 
      EWR 
      2013 
      1 
      1 
      11 
      37.94 
      28.04 
      67.21 
      240.0 
      11.50780 
      13.242946 
      0.00 
      1012.4 
      10.00 
      2013-01-01 06:00:00 
     
    
      11 
      EWR 
      2013 
      1 
      1 
      12 
      39.02 
      28.04 
      64.43 
      240.0 
      14.96014 
      17.215830 
      0.00 
      1012.2 
      10.00 
      2013-01-01 07:00:00 
     
    
      12 
      EWR 
      2013 
      1 
      1 
      13 
      39.92 
      28.04 
      62.21 
      250.0 
      10.35702 
      11.918651 
      0.00 
      1012.2 
      10.00 
      2013-01-01 08:00:00 
     
    
      13 
      EWR 
      2013 
      1 
      1 
      14 
      39.92 
      28.04 
      62.21 
      260.0 
      14.96014 
      17.215830 
      0.00 
      1012.7 
      10.00 
      2013-01-01 09:00:00 
     
    
      14 
      EWR 
      2013 
      1 
      1 
      15 
      41.00 
      28.04 
      59.65 
      260.0 
      13.80936 
      15.891535 
      0.00 
      1012.4 
      10.00 
      2013-01-01 10:00:00 
     
    
      15 
      EWR 
      2013 
      1 
      1 
      16 
      41.00 
      26.96 
      57.06 
      260.0 
      14.96014 
      17.215830 
      0.00 
      1011.4 
      10.00 
      2013-01-01 11:00:00 
     
    
      16 
      EWR 
      2013 
      1 
      1 
      17 
      39.20 
      28.40 
      64.93 
      270.0 
      16.11092 
      18.540125 
      0.00 
      NaN 
      10.00 
      2013-01-01 12:00:00 
     
    
      17 
      EWR 
      2013 
      1 
      1 
      18 
      39.20 
      28.40 
      64.93 
      330.0 
      14.96014 
      17.215830 
      0.00 
      NaN 
      10.00 
      2013-01-01 13:00:00 
     
    
      18 
      EWR 
      2013 
      1 
      1 
      19 
      39.02 
      24.08 
      54.68 
      280.0 
      13.80936 
      15.891535 
      0.00 
      1010.8 
      10.00 
      2013-01-01 14:00:00 
     
    
      19 
      EWR 
      2013 
      1 
      1 
      20 
      37.94 
      24.08 
      57.04 
      290.0 
      9.20624 
      10.594357 
      0.00 
      1011.9 
      10.00 
      2013-01-01 15:00:00 
     
    
      20 
      EWR 
      2013 
      1 
      1 
      21 
      37.04 
      19.94 
      49.62 
      300.0 
      13.80936 
      15.891535 
      0.00 
      1012.1 
      10.00 
      2013-01-01 16:00:00 
     
    
      21 
      EWR 
      2013 
      1 
      1 
      22 
      35.96 
      19.04 
      49.83 
      330.0 
      11.50780 
      13.242946 
      0.00 
      1013.2 
      10.00 
      2013-01-01 17:00:00 
     
    
      22 
      EWR 
      2013 
      1 
      1 
      23 
      33.98 
      15.08 
      45.43 
      310.0 
      12.65858 
      14.567241 
      0.00 
      1014.1 
      10.00 
      2013-01-01 18:00:00 
     
    
      23 
      EWR 
      2013 
      1 
      2 
      0 
      33.08 
      12.92 
      42.84 
      320.0 
      10.35702 
      11.918651 
      0.00 
      1014.4 
      10.00 
      2013-01-01 19:00:00 
     
    
      24 
      EWR 
      2013 
      1 
      2 
      1 
      32.00 
      15.08 
      49.19 
      310.0 
      14.96014 
      17.215830 
      0.00 
      1015.2 
      10.00 
      2013-01-01 20:00:00 
     
    
      25 
      EWR 
      2013 
      1 
      2 
      2 
      30.02 
      12.92 
      48.48 
      320.0 
      18.41248 
      21.188714 
      0.00 
      1016.0 
      10.00 
      2013-01-01 21:00:00 
     
    
      26 
      EWR 
      2013 
      1 
      2 
      3 
      28.94 
      12.02 
      48.69 
      320.0 
      18.41248 
      21.188714 
      0.00 
      1016.5 
      10.00 
      2013-01-01 22:00:00 
     
    
      27 
      EWR 
      2013 
      1 
      2 
      4 
      28.04 
      10.94 
      48.15 
      310.0 
      16.11092 
      18.540125 
      0.00 
      1016.4 
      10.00 
      2013-01-01 23:00:00 
     
    
      28 
      EWR 
      2013 
      1 
      2 
      5 
      26.96 
      10.94 
      50.34 
      310.0 
      14.96014 
      17.215830 
      0.00 
      1016.3 
      10.00 
      2013-01-02 00:00:00 
     
    
      29 
      EWR 
      2013 
      1 
      2 
      6 
      26.06 
      10.94 
      52.25 
      330.0 
      12.65858 
      14.567241 
      0.00 
      1016.3 
      10.00 
      2013-01-02 01:00:00 
     
    
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
     
    
      26100 
      LGA 
      2013 
      12 
      29 
      18 
      42.80 
      37.40 
      81.07 
      70.0 
      9.20624 
      10.594357 
      0.06 
      NaN 
      2.50 
      2013-12-29 13:00:00 
     
    
      26101 
      LGA 
      2013 
      12 
      29 
      19 
      39.20 
      37.40 
      93.19 
      40.0 
      16.11092 
      18.540125 
      0.06 
      NaN 
      1.75 
      2013-12-29 14:00:00 
     
    
      26102 
      LGA 
      2013 
      12 
      29 
      20 
      41.00 
      39.02 
      92.59 
      40.0 
      13.80936 
      15.891535 
      0.37 
      999.9 
      1.50 
      2013-12-29 15:00:00 
     
    
      26103 
      LGA 
      2013 
      12 
      29 
      21 
      41.00 
      37.94 
      88.76 
      350.0 
      8.05546 
      9.270062 
      0.28 
      998.7 
      1.50 
      2013-12-29 16:00:00 
     
    
      26104 
      LGA 
      2013 
      12 
      29 
      22 
      41.00 
      39.20 
      93.24 
      50.0 
      20.71404 
      23.837303 
      0.03 
      NaN 
      2.00 
      2013-12-29 17:00:00 
     
    
      26105 
      LGA 
      2013 
      12 
      29 
      23 
      42.08 
      39.02 
      88.81 
      330.0 
      14.96014 
      17.215830 
      0.00 
      997.2 
      3.00 
      2013-12-29 18:00:00 
     
    
      26106 
      LGA 
      2013 
      12 
      30 
      0 
      42.80 
      37.40 
      81.07 
      320.0 
      17.26170 
      19.864419 
      0.00 
      NaN 
      8.00 
      2013-12-29 19:00:00 
     
    
      26107 
      LGA 
      2013 
      12 
      30 
      1 
      41.00 
      37.40 
      86.89 
      310.0 
      19.56326 
      22.513008 
      0.00 
      NaN 
      6.00 
      2013-12-29 20:00:00 
     
    
      26108 
      LGA 
      2013 
      12 
      30 
      2 
      42.80 
      37.40 
      81.07 
      320.0 
      16.11092 
      18.540125 
      0.00 
      NaN 
      10.00 
      2013-12-29 21:00:00 
     
    
      26109 
      LGA 
      2013 
      12 
      30 
      3 
      42.98 
      37.04 
      79.38 
      300.0 
      9.20624 
      10.594357 
      0.00 
      1003.8 
      10.00 
      2013-12-29 22:00:00 
     
    
      26110 
      LGA 
      2013 
      12 
      30 
      4 
      42.98 
      35.06 
      73.39 
      310.0 
      17.26170 
      19.864419 
      0.00 
      1005.1 
      10.00 
      2013-12-29 23:00:00 
     
    
      26111 
      LGA 
      2013 
      12 
      30 
      5 
      42.08 
      33.98 
      72.78 
      320.0 
      11.50780 
      13.242946 
      0.00 
      1005.9 
      10.00 
      2013-12-30 00:00:00 
     
    
      26112 
      LGA 
      2013 
      12 
      30 
      6 
      42.08 
      33.98 
      72.78 
      250.0 
      9.20624 
      10.594357 
      0.00 
      1007.6 
      10.00 
      2013-12-30 01:00:00 
     
    
      26113 
      LGA 
      2013 
      12 
      30 
      7 
      41.00 
      33.98 
      75.88 
      240.0 
      8.05546 
      9.270062 
      0.00 
      1008.3 
      10.00 
      2013-12-30 02:00:00 
     
    
      26114 
      LGA 
      2013 
      12 
      30 
      8 
      42.98 
      33.98 
      70.30 
      270.0 
      9.20624 
      10.594357 
      0.00 
      1008.2 
      10.00 
      2013-12-30 03:00:00 
     
    
      26115 
      LGA 
      2013 
      12 
      30 
      9 
      41.00 
      33.08 
      73.19 
      0.0 
      0.00000 
      0.000000 
      0.00 
      1008.9 
      10.00 
      2013-12-30 04:00:00 
     
    
      26116 
      LGA 
      2013 
      12 
      30 
      10 
      42.98 
      33.08 
      67.81 
      250.0 
      10.35702 
      11.918651 
      0.00 
      1009.2 
      10.00 
      2013-12-30 05:00:00 
     
    
      26117 
      LGA 
      2013 
      12 
      30 
      11 
      42.98 
      33.98 
      70.30 
      230.0 
      6.90468 
      7.945768 
      0.00 
      1010.8 
      10.00 
      2013-12-30 06:00:00 
     
    
      26118 
      LGA 
      2013 
      12 
      30 
      12 
      44.06 
      35.06 
      70.42 
      240.0 
      11.50780 
      13.242946 
      0.00 
      1011.9 
      10.00 
      2013-12-30 07:00:00 
     
    
      26119 
      LGA 
      2013 
      12 
      30 
      13 
      44.06 
      33.98 
      67.45 
      260.0 
      11.50780 
      13.242946 
      0.00 
      1012.9 
      10.00 
      2013-12-30 08:00:00 
     
    
      26120 
      LGA 
      2013 
      12 
      30 
      14 
      44.06 
      33.08 
      65.07 
      260.0 
      13.80936 
      15.891535 
      0.00 
      1013.7 
      10.00 
      2013-12-30 09:00:00 
     
    
      26121 
      LGA 
      2013 
      12 
      30 
      15 
      42.80 
      33.80 
      70.28 
      330.0 
      16.11092 
      18.540125 
      0.00 
      NaN 
      10.00 
      2013-12-30 10:00:00 
     
    
      26122 
      LGA 
      2013 
      12 
      30 
      16 
      41.00 
      28.40 
      60.54 
      340.0 
      13.80936 
      15.891535 
      0.00 
      NaN 
      10.00 
      2013-12-30 11:00:00 
     
    
      26123 
      LGA 
      2013 
      12 
      30 
      17 
      37.94 
      23.00 
      54.51 
      330.0 
      21.86482 
      25.161598 
      0.00 
      1015.7 
      10.00 
      2013-12-30 12:00:00 
     
    
      26124 
      LGA 
      2013 
      12 
      30 
      18 
      37.04 
      21.92 
      53.97 
      340.0 
      17.26170 
      19.864419 
      0.00 
      1016.5 
      10.00 
      2013-12-30 13:00:00 
     
    
      26125 
      LGA 
      2013 
      12 
      30 
      19 
      35.96 
      19.94 
      51.78 
      340.0 
      13.80936 
      15.891535 
      0.00 
      1017.1 
      10.00 
      2013-12-30 14:00:00 
     
    
      26126 
      LGA 
      2013 
      12 
      30 
      20 
      33.98 
      17.06 
      49.51 
      330.0 
      17.26170 
      19.864419 
      0.00 
      1018.8 
      10.00 
      2013-12-30 15:00:00 
     
    
      26127 
      LGA 
      2013 
      12 
      30 
      21 
      32.00 
      15.08 
      49.19 
      340.0 
      14.96014 
      17.215830 
      0.00 
      1019.5 
      10.00 
      2013-12-30 16:00:00 
     
    
      26128 
      LGA 
      2013 
      12 
      30 
      22 
      30.92 
      12.92 
      46.74 
      320.0 
      17.26170 
      19.864419 
      0.00 
      1019.9 
      10.00 
      2013-12-30 17:00:00 
     
    
      26129 
      LGA 
      2013 
      12 
      30 
      23 
      28.94 
      10.94 
      46.41 
      330.0 
      18.41248 
      21.188714 
      0.00 
      1020.9 
      10.00 
      2013-12-30 18:00:00 
     
  
26130 rows × 15 columns
In [26]:
    
df_withweather = pd.merge(df, weather, how='left', on=['year', 'month', 'day', 'hour'])
df_withweather
    
    Out[26]:
  
    
       
      year 
      month 
      day 
      dep_time 
      sched_dep_time 
      dep_delay 
      arr_time 
      sched_arr_time 
      arr_delay 
      carrier 
      ... 
      temp 
      dewp 
      humid 
      wind_dir 
      wind_speed 
      wind_gust 
      precip 
      pressure 
      visib 
      time_hour_y 
     
  
  
    
      0 
      2013 
      1 
      1 
      517.0 
      515 
      2.0 
      830.0 
      819 
      11.0 
      UA 
      ... 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
     
    
      1 
      2013 
      1 
      1 
      533.0 
      529 
      4.0 
      850.0 
      830 
      20.0 
      UA 
      ... 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
     
    
      2 
      2013 
      1 
      1 
      542.0 
      540 
      2.0 
      923.0 
      850 
      33.0 
      AA 
      ... 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
     
    
      3 
      2013 
      1 
      1 
      544.0 
      545 
      -1.0 
      1004.0 
      1022 
      -18.0 
      B6 
      ... 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
     
    
      4 
      2013 
      1 
      1 
      554.0 
      600 
      -6.0 
      812.0 
      837 
      -25.0 
      DL 
      ... 
      39.02 
      26.06 
      59.37 
      270.0 
      10.35702 
      11.918651 
      0.0 
      1012.0 
      10.0 
      2013-01-01 01:00:00 
     
    
      5 
      2013 
      1 
      1 
      554.0 
      600 
      -6.0 
      812.0 
      837 
      -25.0 
      DL 
      ... 
      39.02 
      26.06 
      59.37 
      260.0 
      12.65858 
      14.567241 
      0.0 
      1012.6 
      10.0 
      2013-01-01 01:00:00 
     
    
      6 
      2013 
      1 
      1 
      554.0 
      600 
      -6.0 
      812.0 
      837 
      -25.0 
      DL 
      ... 
      39.92 
      26.06 
      57.33 
      260.0 
      13.80936 
      15.891535 
      0.0 
      1011.9 
      10.0 
      2013-01-01 01:00:00 
     
    
      7 
      2013 
      1 
      1 
      554.0 
      558 
      -4.0 
      740.0 
      728 
      12.0 
      UA 
      ... 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
     
    
      8 
      2013 
      1 
      1 
      555.0 
      600 
      -5.0 
      913.0 
      854 
      19.0 
      B6 
      ... 
      39.02 
      26.06 
      59.37 
      270.0 
      10.35702 
      11.918651 
      0.0 
      1012.0 
      10.0 
      2013-01-01 01:00:00 
     
    
      9 
      2013 
      1 
      1 
      555.0 
      600 
      -5.0 
      913.0 
      854 
      19.0 
      B6 
      ... 
      39.02 
      26.06 
      59.37 
      260.0 
      12.65858 
      14.567241 
      0.0 
      1012.6 
      10.0 
      2013-01-01 01:00:00 
     
    
      10 
      2013 
      1 
      1 
      555.0 
      600 
      -5.0 
      913.0 
      854 
      19.0 
      B6 
      ... 
      39.92 
      26.06 
      57.33 
      260.0 
      13.80936 
      15.891535 
      0.0 
      1011.9 
      10.0 
      2013-01-01 01:00:00 
     
    
      11 
      2013 
      1 
      1 
      557.0 
      600 
      -3.0 
      709.0 
      723 
      -14.0 
      EV 
      ... 
      39.02 
      26.06 
      59.37 
      270.0 
      10.35702 
      11.918651 
      0.0 
      1012.0 
      10.0 
      2013-01-01 01:00:00 
     
    
      12 
      2013 
      1 
      1 
      557.0 
      600 
      -3.0 
      709.0 
      723 
      -14.0 
      EV 
      ... 
      39.02 
      26.06 
      59.37 
      260.0 
      12.65858 
      14.567241 
      0.0 
      1012.6 
      10.0 
      2013-01-01 01:00:00 
     
    
      13 
      2013 
      1 
      1 
      557.0 
      600 
      -3.0 
      709.0 
      723 
      -14.0 
      EV 
      ... 
      39.92 
      26.06 
      57.33 
      260.0 
      13.80936 
      15.891535 
      0.0 
      1011.9 
      10.0 
      2013-01-01 01:00:00 
     
    
      14 
      2013 
      1 
      1 
      557.0 
      600 
      -3.0 
      838.0 
      846 
      -8.0 
      B6 
      ... 
      39.02 
      26.06 
      59.37 
      270.0 
      10.35702 
      11.918651 
      0.0 
      1012.0 
      10.0 
      2013-01-01 01:00:00 
     
    
      15 
      2013 
      1 
      1 
      557.0 
      600 
      -3.0 
      838.0 
      846 
      -8.0 
      B6 
      ... 
      39.02 
      26.06 
      59.37 
      260.0 
      12.65858 
      14.567241 
      0.0 
      1012.6 
      10.0 
      2013-01-01 01:00:00 
     
    
      16 
      2013 
      1 
      1 
      557.0 
      600 
      -3.0 
      838.0 
      846 
      -8.0 
      B6 
      ... 
      39.92 
      26.06 
      57.33 
      260.0 
      13.80936 
      15.891535 
      0.0 
      1011.9 
      10.0 
      2013-01-01 01:00:00 
     
    
      17 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      753.0 
      745 
      8.0 
      AA 
      ... 
      39.02 
      26.06 
      59.37 
      270.0 
      10.35702 
      11.918651 
      0.0 
      1012.0 
      10.0 
      2013-01-01 01:00:00 
     
    
      18 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      753.0 
      745 
      8.0 
      AA 
      ... 
      39.02 
      26.06 
      59.37 
      260.0 
      12.65858 
      14.567241 
      0.0 
      1012.6 
      10.0 
      2013-01-01 01:00:00 
     
    
      19 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      753.0 
      745 
      8.0 
      AA 
      ... 
      39.92 
      26.06 
      57.33 
      260.0 
      13.80936 
      15.891535 
      0.0 
      1011.9 
      10.0 
      2013-01-01 01:00:00 
     
    
      20 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      849.0 
      851 
      -2.0 
      B6 
      ... 
      39.02 
      26.06 
      59.37 
      270.0 
      10.35702 
      11.918651 
      0.0 
      1012.0 
      10.0 
      2013-01-01 01:00:00 
     
    
      21 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      849.0 
      851 
      -2.0 
      B6 
      ... 
      39.02 
      26.06 
      59.37 
      260.0 
      12.65858 
      14.567241 
      0.0 
      1012.6 
      10.0 
      2013-01-01 01:00:00 
     
    
      22 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      849.0 
      851 
      -2.0 
      B6 
      ... 
      39.92 
      26.06 
      57.33 
      260.0 
      13.80936 
      15.891535 
      0.0 
      1011.9 
      10.0 
      2013-01-01 01:00:00 
     
    
      23 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      853.0 
      856 
      -3.0 
      B6 
      ... 
      39.02 
      26.06 
      59.37 
      270.0 
      10.35702 
      11.918651 
      0.0 
      1012.0 
      10.0 
      2013-01-01 01:00:00 
     
    
      24 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      853.0 
      856 
      -3.0 
      B6 
      ... 
      39.02 
      26.06 
      59.37 
      260.0 
      12.65858 
      14.567241 
      0.0 
      1012.6 
      10.0 
      2013-01-01 01:00:00 
     
    
      25 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      853.0 
      856 
      -3.0 
      B6 
      ... 
      39.92 
      26.06 
      57.33 
      260.0 
      13.80936 
      15.891535 
      0.0 
      1011.9 
      10.0 
      2013-01-01 01:00:00 
     
    
      26 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      924.0 
      917 
      7.0 
      UA 
      ... 
      39.02 
      26.06 
      59.37 
      270.0 
      10.35702 
      11.918651 
      0.0 
      1012.0 
      10.0 
      2013-01-01 01:00:00 
     
    
      27 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      924.0 
      917 
      7.0 
      UA 
      ... 
      39.02 
      26.06 
      59.37 
      260.0 
      12.65858 
      14.567241 
      0.0 
      1012.6 
      10.0 
      2013-01-01 01:00:00 
     
    
      28 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      924.0 
      917 
      7.0 
      UA 
      ... 
      39.92 
      26.06 
      57.33 
      260.0 
      13.80936 
      15.891535 
      0.0 
      1011.9 
      10.0 
      2013-01-01 01:00:00 
     
    
      29 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      923.0 
      937 
      -14.0 
      UA 
      ... 
      39.02 
      26.06 
      59.37 
      270.0 
      10.35702 
      11.918651 
      0.0 
      1012.0 
      10.0 
      2013-01-01 01:00:00 
     
    
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
     
    
      1007561 
      2013 
      9 
      30 
      2240.0 
      2250 
      -10.0 
      2347.0 
      7 
      -20.0 
      B6 
      ... 
      68.00 
      53.06 
      58.80 
      150.0 
      5.75390 
      6.621473 
      0.0 
      1015.4 
      10.0 
      2013-09-30 17:00:00 
     
    
      1007562 
      2013 
      9 
      30 
      2240.0 
      2250 
      -10.0 
      2347.0 
      7 
      -20.0 
      B6 
      ... 
      64.04 
      55.94 
      74.94 
      180.0 
      6.90468 
      7.945768 
      0.0 
      1016.0 
      10.0 
      2013-09-30 17:00:00 
     
    
      1007563 
      2013 
      9 
      30 
      2240.0 
      2250 
      -10.0 
      2347.0 
      7 
      -20.0 
      B6 
      ... 
      66.92 
      51.98 
      58.65 
      160.0 
      9.20624 
      10.594357 
      0.0 
      1015.4 
      10.0 
      2013-09-30 17:00:00 
     
    
      1007564 
      2013 
      9 
      30 
      2241.0 
      2246 
      -5.0 
      2345.0 
      1 
      -16.0 
      B6 
      ... 
      68.00 
      53.06 
      58.80 
      150.0 
      5.75390 
      6.621473 
      0.0 
      1015.4 
      10.0 
      2013-09-30 17:00:00 
     
    
      1007565 
      2013 
      9 
      30 
      2241.0 
      2246 
      -5.0 
      2345.0 
      1 
      -16.0 
      B6 
      ... 
      64.04 
      55.94 
      74.94 
      180.0 
      6.90468 
      7.945768 
      0.0 
      1016.0 
      10.0 
      2013-09-30 17:00:00 
     
    
      1007566 
      2013 
      9 
      30 
      2241.0 
      2246 
      -5.0 
      2345.0 
      1 
      -16.0 
      B6 
      ... 
      66.92 
      51.98 
      58.65 
      160.0 
      9.20624 
      10.594357 
      0.0 
      1015.4 
      10.0 
      2013-09-30 17:00:00 
     
    
      1007567 
      2013 
      9 
      30 
      2307.0 
      2255 
      12.0 
      2359.0 
      2358 
      1.0 
      B6 
      ... 
      68.00 
      53.06 
      58.80 
      150.0 
      5.75390 
      6.621473 
      0.0 
      1015.4 
      10.0 
      2013-09-30 17:00:00 
     
    
      1007568 
      2013 
      9 
      30 
      2307.0 
      2255 
      12.0 
      2359.0 
      2358 
      1.0 
      B6 
      ... 
      64.04 
      55.94 
      74.94 
      180.0 
      6.90468 
      7.945768 
      0.0 
      1016.0 
      10.0 
      2013-09-30 17:00:00 
     
    
      1007569 
      2013 
      9 
      30 
      2307.0 
      2255 
      12.0 
      2359.0 
      2358 
      1.0 
      B6 
      ... 
      66.92 
      51.98 
      58.65 
      160.0 
      9.20624 
      10.594357 
      0.0 
      1015.4 
      10.0 
      2013-09-30 17:00:00 
     
    
      1007570 
      2013 
      9 
      30 
      2349.0 
      2359 
      -10.0 
      325.0 
      350 
      -25.0 
      B6 
      ... 
      66.02 
      53.96 
      65.07 
      160.0 
      4.60312 
      5.297178 
      0.0 
      1015.6 
      10.0 
      2013-09-30 18:00:00 
     
    
      1007571 
      2013 
      9 
      30 
      2349.0 
      2359 
      -10.0 
      325.0 
      350 
      -25.0 
      B6 
      ... 
      62.06 
      55.94 
      80.34 
      190.0 
      5.75390 
      6.621473 
      0.0 
      1016.0 
      10.0 
      2013-09-30 18:00:00 
     
    
      1007572 
      2013 
      9 
      30 
      2349.0 
      2359 
      -10.0 
      325.0 
      350 
      -25.0 
      B6 
      ... 
      64.94 
      53.96 
      67.57 
      200.0 
      9.20624 
      10.594357 
      0.0 
      1015.5 
      10.0 
      2013-09-30 18:00:00 
     
    
      1007573 
      2013 
      9 
      30 
      NaN 
      1842 
      NaN 
      NaN 
      2019 
      NaN 
      EV 
      ... 
      71.96 
      46.94 
      40.90 
      90.0 
      5.75390 
      6.621473 
      0.0 
      1015.7 
      10.0 
      2013-09-30 13:00:00 
     
    
      1007574 
      2013 
      9 
      30 
      NaN 
      1842 
      NaN 
      NaN 
      2019 
      NaN 
      EV 
      ... 
      68.00 
      55.04 
      63.21 
      190.0 
      11.50780 
      13.242946 
      0.0 
      1016.6 
      10.0 
      2013-09-30 13:00:00 
     
    
      1007575 
      2013 
      9 
      30 
      NaN 
      1842 
      NaN 
      NaN 
      2019 
      NaN 
      EV 
      ... 
      71.96 
      46.94 
      40.90 
      NaN 
      3.45234 
      3.972884 
      0.0 
      1015.8 
      10.0 
      2013-09-30 13:00:00 
     
    
      1007576 
      2013 
      9 
      30 
      NaN 
      1455 
      NaN 
      NaN 
      1634 
      NaN 
      9E 
      ... 
      66.92 
      50.00 
      54.51 
      50.0 
      3.45234 
      3.972884 
      0.0 
      1018.3 
      10.0 
      2013-09-30 09:00:00 
     
    
      1007577 
      2013 
      9 
      30 
      NaN 
      1455 
      NaN 
      NaN 
      1634 
      NaN 
      9E 
      ... 
      68.00 
      48.92 
      50.44 
      20.0 
      5.75390 
      6.621473 
      0.0 
      1018.4 
      10.0 
      2013-09-30 09:00:00 
     
    
      1007578 
      2013 
      9 
      30 
      NaN 
      1455 
      NaN 
      NaN 
      1634 
      NaN 
      9E 
      ... 
      64.94 
      50.00 
      58.39 
      NaN 
      3.45234 
      3.972884 
      0.0 
      1018.2 
      10.0 
      2013-09-30 09:00:00 
     
    
      1007579 
      2013 
      9 
      30 
      NaN 
      2200 
      NaN 
      NaN 
      2312 
      NaN 
      9E 
      ... 
      68.00 
      53.06 
      58.80 
      150.0 
      5.75390 
      6.621473 
      0.0 
      1015.4 
      10.0 
      2013-09-30 17:00:00 
     
    
      1007580 
      2013 
      9 
      30 
      NaN 
      2200 
      NaN 
      NaN 
      2312 
      NaN 
      9E 
      ... 
      64.04 
      55.94 
      74.94 
      180.0 
      6.90468 
      7.945768 
      0.0 
      1016.0 
      10.0 
      2013-09-30 17:00:00 
     
    
      1007581 
      2013 
      9 
      30 
      NaN 
      2200 
      NaN 
      NaN 
      2312 
      NaN 
      9E 
      ... 
      66.92 
      51.98 
      58.65 
      160.0 
      9.20624 
      10.594357 
      0.0 
      1015.4 
      10.0 
      2013-09-30 17:00:00 
     
    
      1007582 
      2013 
      9 
      30 
      NaN 
      1210 
      NaN 
      NaN 
      1330 
      NaN 
      MQ 
      ... 
      59.00 
      53.96 
      83.34 
      10.0 
      6.90468 
      7.945768 
      0.0 
      1018.8 
      10.0 
      2013-09-30 07:00:00 
     
    
      1007583 
      2013 
      9 
      30 
      NaN 
      1210 
      NaN 
      NaN 
      1330 
      NaN 
      MQ 
      ... 
      62.06 
      53.96 
      74.75 
      40.0 
      5.75390 
      6.621473 
      0.0 
      1018.8 
      10.0 
      2013-09-30 07:00:00 
     
    
      1007584 
      2013 
      9 
      30 
      NaN 
      1210 
      NaN 
      NaN 
      1330 
      NaN 
      MQ 
      ... 
      60.98 
      51.08 
      69.86 
      NaN 
      5.75390 
      6.621473 
      0.0 
      1018.6 
      10.0 
      2013-09-30 07:00:00 
     
    
      1007585 
      2013 
      9 
      30 
      NaN 
      1159 
      NaN 
      NaN 
      1344 
      NaN 
      MQ 
      ... 
      55.94 
      53.96 
      93.05 
      20.0 
      5.75390 
      6.621473 
      0.0 
      1018.7 
      10.0 
      2013-09-30 06:00:00 
     
    
      1007586 
      2013 
      9 
      30 
      NaN 
      1159 
      NaN 
      NaN 
      1344 
      NaN 
      MQ 
      ... 
      57.92 
      53.96 
      86.63 
      360.0 
      6.90468 
      7.945768 
      0.0 
      1018.9 
      10.0 
      2013-09-30 06:00:00 
     
    
      1007587 
      2013 
      9 
      30 
      NaN 
      1159 
      NaN 
      NaN 
      1344 
      NaN 
      MQ 
      ... 
      60.08 
      51.98 
      74.56 
      360.0 
      4.60312 
      5.297178 
      0.0 
      1018.5 
      10.0 
      2013-09-30 06:00:00 
     
    
      1007588 
      2013 
      9 
      30 
      NaN 
      840 
      NaN 
      NaN 
      1020 
      NaN 
      MQ 
      ... 
      53.96 
      51.08 
      89.95 
      350.0 
      4.60312 
      5.297178 
      0.0 
      1018.1 
      10.0 
      2013-09-30 03:00:00 
     
    
      1007589 
      2013 
      9 
      30 
      NaN 
      840 
      NaN 
      NaN 
      1020 
      NaN 
      MQ 
      ... 
      53.06 
      50.00 
      89.31 
      30.0 
      3.45234 
      3.972884 
      0.0 
      1018.1 
      10.0 
      2013-09-30 03:00:00 
     
    
      1007590 
      2013 
      9 
      30 
      NaN 
      840 
      NaN 
      NaN 
      1020 
      NaN 
      MQ 
      ... 
      59.00 
      53.06 
      80.64 
      10.0 
      3.45234 
      3.972884 
      0.0 
      1017.8 
      10.0 
      2013-09-30 03:00:00 
     
  
1007591 rows × 30 columns
In [27]:
    
airports = pd.read_csv('data/nycflights13/airports.csv.gz')
airports
    
    Out[27]:
  
    
       
      faa 
      name 
      lat 
      lon 
      alt 
      tz 
      dst 
      tzone 
     
  
  
    
      0 
      04G 
      Lansdowne Airport 
      41.130472 
      -80.619583 
      1044 
      -5 
      A 
      America/New_York 
     
    
      1 
      06A 
      Moton Field Municipal Airport 
      32.460572 
      -85.680028 
      264 
      -6 
      A 
      America/Chicago 
     
    
      2 
      06C 
      Schaumburg Regional 
      41.989341 
      -88.101243 
      801 
      -6 
      A 
      America/Chicago 
     
    
      3 
      06N 
      Randall Airport 
      41.431912 
      -74.391561 
      523 
      -5 
      A 
      America/New_York 
     
    
      4 
      09J 
      Jekyll Island Airport 
      31.074472 
      -81.427778 
      11 
      -5 
      A 
      America/New_York 
     
    
      5 
      0A9 
      Elizabethton Municipal Airport 
      36.371222 
      -82.173417 
      1593 
      -5 
      A 
      America/New_York 
     
    
      6 
      0G6 
      Williams County Airport 
      41.467306 
      -84.506778 
      730 
      -5 
      A 
      America/New_York 
     
    
      7 
      0G7 
      Finger Lakes Regional Airport 
      42.883565 
      -76.781232 
      492 
      -5 
      A 
      America/New_York 
     
    
      8 
      0P2 
      Shoestring Aviation Airfield 
      39.794824 
      -76.647191 
      1000 
      -5 
      U 
      America/New_York 
     
    
      9 
      0S9 
      Jefferson County Intl 
      48.053809 
      -122.810644 
      108 
      -8 
      A 
      America/Los_Angeles 
     
    
      10 
      0W3 
      Harford County Airport 
      39.566838 
      -76.202403 
      409 
      -5 
      A 
      America/New_York 
     
    
      11 
      10C 
      Galt Field Airport 
      42.402889 
      -88.375111 
      875 
      -6 
      U 
      America/Chicago 
     
    
      12 
      17G 
      Port Bucyrus-Crawford County Airport 
      40.781556 
      -82.974806 
      1003 
      -5 
      A 
      America/New_York 
     
    
      13 
      19A 
      Jackson County Airport 
      34.175864 
      -83.561597 
      951 
      -5 
      U 
      America/New_York 
     
    
      14 
      1A3 
      Martin Campbell Field Airport 
      35.015806 
      -84.346833 
      1789 
      -5 
      A 
      America/New_York 
     
    
      15 
      1B9 
      Mansfield Municipal 
      42.000133 
      -71.196771 
      122 
      -5 
      A 
      America/New_York 
     
    
      16 
      1C9 
      Frazier Lake Airpark 
      54.013333 
      -124.768333 
      152 
      -8 
      A 
      America/Vancouver 
     
    
      17 
      1CS 
      Clow International Airport 
      41.695974 
      -88.129231 
      670 
      -6 
      U 
      America/Chicago 
     
    
      18 
      1G3 
      Kent State Airport 
      41.151389 
      -81.415111 
      1134 
      -5 
      A 
      America/New_York 
     
    
      19 
      1G4 
      Grand Canyon West Airport 
      35.899904 
      -113.815674 
      4813 
      -7 
      A 
      America/Phoenix 
     
    
      20 
      1H2 
      Effingham Memorial Airport 
      39.070000 
      -88.534000 
      585 
      -6 
      A 
      America/Chicago 
     
    
      21 
      1OH 
      Fortman Airport 
      40.555325 
      -84.386619 
      885 
      -5 
      U 
      America/New_York 
     
    
      22 
      1RL 
      Point Roberts Airpark 
      48.979722 
      -123.078889 
      10 
      -8 
      A 
      America/Los_Angeles 
     
    
      23 
      23M 
      Clarke CO 
      32.051700 
      -88.443400 
      320 
      -6 
      A 
      America/Chicago 
     
    
      24 
      24C 
      Lowell City Airport 
      42.953920 
      -85.343906 
      681 
      -5 
      A 
      America/New_York 
     
    
      25 
      24J 
      Suwannee County Airport 
      30.300125 
      -83.024694 
      104 
      -5 
      A 
      America/New_York 
     
    
      26 
      25D 
      Forest Lake Airport 
      45.247746 
      -92.994385 
      925 
      -6 
      A 
      America/Chicago 
     
    
      27 
      29D 
      Grove City Airport 
      41.146028 
      -80.167750 
      1371 
      -5 
      A 
      America/New_York 
     
    
      28 
      2A0 
      Mark Anton Airport 
      35.486250 
      -84.931083 
      718 
      -5 
      A 
      America/New_York 
     
    
      29 
      2B2 
      Plum Island Airport 
      42.795361 
      -70.839444 
      11 
      -5 
      A 
      America/New_York 
     
    
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
     
    
      1428 
      X59 
      Valkaria Municipal 
      27.960861 
      -80.558333 
      26 
      -5 
      A 
      America/New_York 
     
    
      1429 
      XFL 
      Flagler County Airport 
      29.282100 
      -81.121200 
      33 
      -5 
      A 
      America/New_York 
     
    
      1430 
      XNA 
      NW Arkansas Regional 
      36.281869 
      -94.306811 
      1287 
      -6 
      A 
      America/Chicago 
     
    
      1431 
      XZK 
      Amherst Amtrak Station AMM 
      42.375000 
      -72.511389 
      258 
      -5 
      A 
      America/New_York 
     
    
      1432 
      Y51 
      Municipal Airport 
      43.579360 
      -90.896474 
      1292 
      -6 
      A 
      America/Chicago 
     
    
      1433 
      Y72 
      Bloyer Field 
      43.976222 
      -90.480611 
      966 
      -6 
      A 
      America/Chicago 
     
    
      1434 
      YAK 
      Yakutat 
      59.301200 
      -139.393700 
      33 
      -9 
      A 
      \N 
     
    
      1435 
      YIP 
      Willow Run 
      42.237928 
      -83.530408 
      716 
      -5 
      A 
      America/New_York 
     
    
      1436 
      YKM 
      Yakima Air Terminal McAllister Field 
      46.568200 
      -120.544000 
      1095 
      -8 
      A 
      America/Los_Angeles 
     
    
      1437 
      YKN 
      Chan Gurney 
      42.871100 
      -97.396900 
      1200 
      -6 
      A 
      America/Chicago 
     
    
      1438 
      YNG 
      Youngstown Warren Rgnl 
      41.260736 
      -80.679097 
      1196 
      -5 
      A 
      America/New_York 
     
    
      1439 
      YUM 
      Yuma Mcas Yuma Intl 
      32.656578 
      -114.605980 
      216 
      -7 
      N 
      America/Phoenix 
     
    
      1440 
      Z84 
      Clear 
      64.301203 
      -149.120144 
      552 
      -9 
      A 
      America/Anchorage 
     
    
      1441 
      ZBP 
      Penn Station 
      39.307222 
      -76.615556 
      66 
      -5 
      A 
      America/New_York 
     
    
      1442 
      ZFV 
      Philadelphia 30th St Station 
      39.955700 
      -75.182000 
      0 
      -5 
      A 
      America/New_York 
     
    
      1443 
      ZPH 
      Municipal Airport 
      28.228056 
      -82.155833 
      90 
      -5 
      A 
      America/New_York 
     
    
      1444 
      ZRA 
      Atlantic City Rail Terminal 
      39.366500 
      -74.442000 
      8 
      -5 
      A 
      America/New_York 
     
    
      1445 
      ZRD 
      Train Station 
      37.534300 
      -77.429450 
      26 
      -5 
      A 
      America/New_York 
     
    
      1446 
      ZRP 
      Newark Penn Station 
      40.734722 
      -74.164167 
      0 
      -5 
      A 
      America/New_York 
     
    
      1447 
      ZRT 
      Hartford Union Station 
      41.768880 
      -72.681500 
      0 
      -5 
      A 
      America/New_York 
     
    
      1448 
      ZRZ 
      New Carrollton Rail Station 
      38.948000 
      -76.871900 
      39 
      -5 
      A 
      America/New_York 
     
    
      1449 
      ZSF 
      Springfield Amtrak Station 
      42.106000 
      -72.593054 
      65 
      -5 
      A 
      America/New_York 
     
    
      1450 
      ZSY 
      Scottsdale Airport 
      33.622889 
      -111.910528 
      1519 
      -7 
      A 
      America/Phoenix 
     
    
      1451 
      ZTF 
      Stamford Amtrak Station 
      41.046937 
      -73.541493 
      0 
      -5 
      A 
      America/New_York 
     
    
      1452 
      ZTY 
      Boston Back Bay Station 
      42.347800 
      -71.075000 
      20 
      -5 
      A 
      America/New_York 
     
    
      1453 
      ZUN 
      Black Rock 
      35.083228 
      -108.791778 
      6454 
      -7 
      A 
      America/Denver 
     
    
      1454 
      ZVE 
      New Haven Rail Station 
      41.298669 
      -72.925992 
      7 
      -5 
      A 
      America/New_York 
     
    
      1455 
      ZWI 
      Wilmington Amtrak Station 
      39.736667 
      -75.551667 
      0 
      -5 
      A 
      America/New_York 
     
    
      1456 
      ZWU 
      Washington Union Station 
      38.897460 
      -77.006430 
      76 
      -5 
      A 
      America/New_York 
     
    
      1457 
      ZYP 
      Penn Station 
      40.750500 
      -73.993500 
      35 
      -5 
      A 
      America/New_York 
     
  
1458 rows × 8 columns
In [28]:
    
df_withairport = pd.merge(df_withweather, airports, how='left', left_on='dest', right_on='faa')
df_withairport
    
    Out[28]:
  
    
       
      year 
      month 
      day 
      dep_time 
      sched_dep_time 
      dep_delay 
      arr_time 
      sched_arr_time 
      arr_delay 
      carrier 
      ... 
      visib 
      time_hour_y 
      faa 
      name 
      lat 
      lon 
      alt 
      tz 
      dst 
      tzone 
     
  
  
    
      0 
      2013 
      1 
      1 
      517.0 
      515 
      2.0 
      830.0 
      819 
      11.0 
      UA 
      ... 
      NaN 
      NaN 
      IAH 
      George Bush Intercontinental 
      29.984433 
      -95.341442 
      97.0 
      -6.0 
      A 
      America/Chicago 
     
    
      1 
      2013 
      1 
      1 
      533.0 
      529 
      4.0 
      850.0 
      830 
      20.0 
      UA 
      ... 
      NaN 
      NaN 
      IAH 
      George Bush Intercontinental 
      29.984433 
      -95.341442 
      97.0 
      -6.0 
      A 
      America/Chicago 
     
    
      2 
      2013 
      1 
      1 
      542.0 
      540 
      2.0 
      923.0 
      850 
      33.0 
      AA 
      ... 
      NaN 
      NaN 
      MIA 
      Miami Intl 
      25.793250 
      -80.290556 
      8.0 
      -5.0 
      A 
      America/New_York 
     
    
      3 
      2013 
      1 
      1 
      544.0 
      545 
      -1.0 
      1004.0 
      1022 
      -18.0 
      B6 
      ... 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
     
    
      4 
      2013 
      1 
      1 
      554.0 
      600 
      -6.0 
      812.0 
      837 
      -25.0 
      DL 
      ... 
      10.0 
      2013-01-01 01:00:00 
      ATL 
      Hartsfield Jackson Atlanta Intl 
      33.636719 
      -84.428067 
      1026.0 
      -5.0 
      A 
      America/New_York 
     
    
      5 
      2013 
      1 
      1 
      554.0 
      600 
      -6.0 
      812.0 
      837 
      -25.0 
      DL 
      ... 
      10.0 
      2013-01-01 01:00:00 
      ATL 
      Hartsfield Jackson Atlanta Intl 
      33.636719 
      -84.428067 
      1026.0 
      -5.0 
      A 
      America/New_York 
     
    
      6 
      2013 
      1 
      1 
      554.0 
      600 
      -6.0 
      812.0 
      837 
      -25.0 
      DL 
      ... 
      10.0 
      2013-01-01 01:00:00 
      ATL 
      Hartsfield Jackson Atlanta Intl 
      33.636719 
      -84.428067 
      1026.0 
      -5.0 
      A 
      America/New_York 
     
    
      7 
      2013 
      1 
      1 
      554.0 
      558 
      -4.0 
      740.0 
      728 
      12.0 
      UA 
      ... 
      NaN 
      NaN 
      ORD 
      Chicago Ohare Intl 
      41.978603 
      -87.904842 
      668.0 
      -6.0 
      A 
      America/Chicago 
     
    
      8 
      2013 
      1 
      1 
      555.0 
      600 
      -5.0 
      913.0 
      854 
      19.0 
      B6 
      ... 
      10.0 
      2013-01-01 01:00:00 
      FLL 
      Fort Lauderdale Hollywood Intl 
      26.072583 
      -80.152750 
      9.0 
      -5.0 
      A 
      America/New_York 
     
    
      9 
      2013 
      1 
      1 
      555.0 
      600 
      -5.0 
      913.0 
      854 
      19.0 
      B6 
      ... 
      10.0 
      2013-01-01 01:00:00 
      FLL 
      Fort Lauderdale Hollywood Intl 
      26.072583 
      -80.152750 
      9.0 
      -5.0 
      A 
      America/New_York 
     
    
      10 
      2013 
      1 
      1 
      555.0 
      600 
      -5.0 
      913.0 
      854 
      19.0 
      B6 
      ... 
      10.0 
      2013-01-01 01:00:00 
      FLL 
      Fort Lauderdale Hollywood Intl 
      26.072583 
      -80.152750 
      9.0 
      -5.0 
      A 
      America/New_York 
     
    
      11 
      2013 
      1 
      1 
      557.0 
      600 
      -3.0 
      709.0 
      723 
      -14.0 
      EV 
      ... 
      10.0 
      2013-01-01 01:00:00 
      IAD 
      Washington Dulles Intl 
      38.944533 
      -77.455811 
      313.0 
      -5.0 
      A 
      America/New_York 
     
    
      12 
      2013 
      1 
      1 
      557.0 
      600 
      -3.0 
      709.0 
      723 
      -14.0 
      EV 
      ... 
      10.0 
      2013-01-01 01:00:00 
      IAD 
      Washington Dulles Intl 
      38.944533 
      -77.455811 
      313.0 
      -5.0 
      A 
      America/New_York 
     
    
      13 
      2013 
      1 
      1 
      557.0 
      600 
      -3.0 
      709.0 
      723 
      -14.0 
      EV 
      ... 
      10.0 
      2013-01-01 01:00:00 
      IAD 
      Washington Dulles Intl 
      38.944533 
      -77.455811 
      313.0 
      -5.0 
      A 
      America/New_York 
     
    
      14 
      2013 
      1 
      1 
      557.0 
      600 
      -3.0 
      838.0 
      846 
      -8.0 
      B6 
      ... 
      10.0 
      2013-01-01 01:00:00 
      MCO 
      Orlando Intl 
      28.429394 
      -81.308994 
      96.0 
      -5.0 
      A 
      America/New_York 
     
    
      15 
      2013 
      1 
      1 
      557.0 
      600 
      -3.0 
      838.0 
      846 
      -8.0 
      B6 
      ... 
      10.0 
      2013-01-01 01:00:00 
      MCO 
      Orlando Intl 
      28.429394 
      -81.308994 
      96.0 
      -5.0 
      A 
      America/New_York 
     
    
      16 
      2013 
      1 
      1 
      557.0 
      600 
      -3.0 
      838.0 
      846 
      -8.0 
      B6 
      ... 
      10.0 
      2013-01-01 01:00:00 
      MCO 
      Orlando Intl 
      28.429394 
      -81.308994 
      96.0 
      -5.0 
      A 
      America/New_York 
     
    
      17 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      753.0 
      745 
      8.0 
      AA 
      ... 
      10.0 
      2013-01-01 01:00:00 
      ORD 
      Chicago Ohare Intl 
      41.978603 
      -87.904842 
      668.0 
      -6.0 
      A 
      America/Chicago 
     
    
      18 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      753.0 
      745 
      8.0 
      AA 
      ... 
      10.0 
      2013-01-01 01:00:00 
      ORD 
      Chicago Ohare Intl 
      41.978603 
      -87.904842 
      668.0 
      -6.0 
      A 
      America/Chicago 
     
    
      19 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      753.0 
      745 
      8.0 
      AA 
      ... 
      10.0 
      2013-01-01 01:00:00 
      ORD 
      Chicago Ohare Intl 
      41.978603 
      -87.904842 
      668.0 
      -6.0 
      A 
      America/Chicago 
     
    
      20 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      849.0 
      851 
      -2.0 
      B6 
      ... 
      10.0 
      2013-01-01 01:00:00 
      PBI 
      Palm Beach Intl 
      26.683161 
      -80.095589 
      19.0 
      -5.0 
      A 
      America/New_York 
     
    
      21 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      849.0 
      851 
      -2.0 
      B6 
      ... 
      10.0 
      2013-01-01 01:00:00 
      PBI 
      Palm Beach Intl 
      26.683161 
      -80.095589 
      19.0 
      -5.0 
      A 
      America/New_York 
     
    
      22 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      849.0 
      851 
      -2.0 
      B6 
      ... 
      10.0 
      2013-01-01 01:00:00 
      PBI 
      Palm Beach Intl 
      26.683161 
      -80.095589 
      19.0 
      -5.0 
      A 
      America/New_York 
     
    
      23 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      853.0 
      856 
      -3.0 
      B6 
      ... 
      10.0 
      2013-01-01 01:00:00 
      TPA 
      Tampa Intl 
      27.975472 
      -82.533250 
      26.0 
      -5.0 
      A 
      America/New_York 
     
    
      24 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      853.0 
      856 
      -3.0 
      B6 
      ... 
      10.0 
      2013-01-01 01:00:00 
      TPA 
      Tampa Intl 
      27.975472 
      -82.533250 
      26.0 
      -5.0 
      A 
      America/New_York 
     
    
      25 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      853.0 
      856 
      -3.0 
      B6 
      ... 
      10.0 
      2013-01-01 01:00:00 
      TPA 
      Tampa Intl 
      27.975472 
      -82.533250 
      26.0 
      -5.0 
      A 
      America/New_York 
     
    
      26 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      924.0 
      917 
      7.0 
      UA 
      ... 
      10.0 
      2013-01-01 01:00:00 
      LAX 
      Los Angeles Intl 
      33.942536 
      -118.408075 
      126.0 
      -8.0 
      A 
      America/Los_Angeles 
     
    
      27 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      924.0 
      917 
      7.0 
      UA 
      ... 
      10.0 
      2013-01-01 01:00:00 
      LAX 
      Los Angeles Intl 
      33.942536 
      -118.408075 
      126.0 
      -8.0 
      A 
      America/Los_Angeles 
     
    
      28 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      924.0 
      917 
      7.0 
      UA 
      ... 
      10.0 
      2013-01-01 01:00:00 
      LAX 
      Los Angeles Intl 
      33.942536 
      -118.408075 
      126.0 
      -8.0 
      A 
      America/Los_Angeles 
     
    
      29 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      923.0 
      937 
      -14.0 
      UA 
      ... 
      10.0 
      2013-01-01 01:00:00 
      SFO 
      San Francisco Intl 
      37.618972 
      -122.374889 
      13.0 
      -8.0 
      A 
      America/Los_Angeles 
     
    
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
     
    
      1007561 
      2013 
      9 
      30 
      2240.0 
      2250 
      -10.0 
      2347.0 
      7 
      -20.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      BUF 
      Buffalo Niagara Intl 
      42.940525 
      -78.732167 
      724.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007562 
      2013 
      9 
      30 
      2240.0 
      2250 
      -10.0 
      2347.0 
      7 
      -20.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      BUF 
      Buffalo Niagara Intl 
      42.940525 
      -78.732167 
      724.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007563 
      2013 
      9 
      30 
      2240.0 
      2250 
      -10.0 
      2347.0 
      7 
      -20.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      BUF 
      Buffalo Niagara Intl 
      42.940525 
      -78.732167 
      724.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007564 
      2013 
      9 
      30 
      2241.0 
      2246 
      -5.0 
      2345.0 
      1 
      -16.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      ROC 
      Greater Rochester Intl 
      43.118866 
      -77.672389 
      559.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007565 
      2013 
      9 
      30 
      2241.0 
      2246 
      -5.0 
      2345.0 
      1 
      -16.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      ROC 
      Greater Rochester Intl 
      43.118866 
      -77.672389 
      559.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007566 
      2013 
      9 
      30 
      2241.0 
      2246 
      -5.0 
      2345.0 
      1 
      -16.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      ROC 
      Greater Rochester Intl 
      43.118866 
      -77.672389 
      559.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007567 
      2013 
      9 
      30 
      2307.0 
      2255 
      12.0 
      2359.0 
      2358 
      1.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      BOS 
      General Edward Lawrence Logan Intl 
      42.364347 
      -71.005181 
      19.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007568 
      2013 
      9 
      30 
      2307.0 
      2255 
      12.0 
      2359.0 
      2358 
      1.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      BOS 
      General Edward Lawrence Logan Intl 
      42.364347 
      -71.005181 
      19.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007569 
      2013 
      9 
      30 
      2307.0 
      2255 
      12.0 
      2359.0 
      2358 
      1.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      BOS 
      General Edward Lawrence Logan Intl 
      42.364347 
      -71.005181 
      19.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007570 
      2013 
      9 
      30 
      2349.0 
      2359 
      -10.0 
      325.0 
      350 
      -25.0 
      B6 
      ... 
      10.0 
      2013-09-30 18:00:00 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
     
    
      1007571 
      2013 
      9 
      30 
      2349.0 
      2359 
      -10.0 
      325.0 
      350 
      -25.0 
      B6 
      ... 
      10.0 
      2013-09-30 18:00:00 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
     
    
      1007572 
      2013 
      9 
      30 
      2349.0 
      2359 
      -10.0 
      325.0 
      350 
      -25.0 
      B6 
      ... 
      10.0 
      2013-09-30 18:00:00 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
      NaN 
     
    
      1007573 
      2013 
      9 
      30 
      NaN 
      1842 
      NaN 
      NaN 
      2019 
      NaN 
      EV 
      ... 
      10.0 
      2013-09-30 13:00:00 
      BNA 
      Nashville Intl 
      36.124472 
      -86.678194 
      599.0 
      -6.0 
      A 
      America/Chicago 
     
    
      1007574 
      2013 
      9 
      30 
      NaN 
      1842 
      NaN 
      NaN 
      2019 
      NaN 
      EV 
      ... 
      10.0 
      2013-09-30 13:00:00 
      BNA 
      Nashville Intl 
      36.124472 
      -86.678194 
      599.0 
      -6.0 
      A 
      America/Chicago 
     
    
      1007575 
      2013 
      9 
      30 
      NaN 
      1842 
      NaN 
      NaN 
      2019 
      NaN 
      EV 
      ... 
      10.0 
      2013-09-30 13:00:00 
      BNA 
      Nashville Intl 
      36.124472 
      -86.678194 
      599.0 
      -6.0 
      A 
      America/Chicago 
     
    
      1007576 
      2013 
      9 
      30 
      NaN 
      1455 
      NaN 
      NaN 
      1634 
      NaN 
      9E 
      ... 
      10.0 
      2013-09-30 09:00:00 
      DCA 
      Ronald Reagan Washington Natl 
      38.852083 
      -77.037722 
      15.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007577 
      2013 
      9 
      30 
      NaN 
      1455 
      NaN 
      NaN 
      1634 
      NaN 
      9E 
      ... 
      10.0 
      2013-09-30 09:00:00 
      DCA 
      Ronald Reagan Washington Natl 
      38.852083 
      -77.037722 
      15.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007578 
      2013 
      9 
      30 
      NaN 
      1455 
      NaN 
      NaN 
      1634 
      NaN 
      9E 
      ... 
      10.0 
      2013-09-30 09:00:00 
      DCA 
      Ronald Reagan Washington Natl 
      38.852083 
      -77.037722 
      15.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007579 
      2013 
      9 
      30 
      NaN 
      2200 
      NaN 
      NaN 
      2312 
      NaN 
      9E 
      ... 
      10.0 
      2013-09-30 17:00:00 
      SYR 
      Syracuse Hancock Intl 
      43.111187 
      -76.106311 
      421.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007580 
      2013 
      9 
      30 
      NaN 
      2200 
      NaN 
      NaN 
      2312 
      NaN 
      9E 
      ... 
      10.0 
      2013-09-30 17:00:00 
      SYR 
      Syracuse Hancock Intl 
      43.111187 
      -76.106311 
      421.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007581 
      2013 
      9 
      30 
      NaN 
      2200 
      NaN 
      NaN 
      2312 
      NaN 
      9E 
      ... 
      10.0 
      2013-09-30 17:00:00 
      SYR 
      Syracuse Hancock Intl 
      43.111187 
      -76.106311 
      421.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007582 
      2013 
      9 
      30 
      NaN 
      1210 
      NaN 
      NaN 
      1330 
      NaN 
      MQ 
      ... 
      10.0 
      2013-09-30 07:00:00 
      BNA 
      Nashville Intl 
      36.124472 
      -86.678194 
      599.0 
      -6.0 
      A 
      America/Chicago 
     
    
      1007583 
      2013 
      9 
      30 
      NaN 
      1210 
      NaN 
      NaN 
      1330 
      NaN 
      MQ 
      ... 
      10.0 
      2013-09-30 07:00:00 
      BNA 
      Nashville Intl 
      36.124472 
      -86.678194 
      599.0 
      -6.0 
      A 
      America/Chicago 
     
    
      1007584 
      2013 
      9 
      30 
      NaN 
      1210 
      NaN 
      NaN 
      1330 
      NaN 
      MQ 
      ... 
      10.0 
      2013-09-30 07:00:00 
      BNA 
      Nashville Intl 
      36.124472 
      -86.678194 
      599.0 
      -6.0 
      A 
      America/Chicago 
     
    
      1007585 
      2013 
      9 
      30 
      NaN 
      1159 
      NaN 
      NaN 
      1344 
      NaN 
      MQ 
      ... 
      10.0 
      2013-09-30 06:00:00 
      CLE 
      Cleveland Hopkins Intl 
      41.411689 
      -81.849794 
      791.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007586 
      2013 
      9 
      30 
      NaN 
      1159 
      NaN 
      NaN 
      1344 
      NaN 
      MQ 
      ... 
      10.0 
      2013-09-30 06:00:00 
      CLE 
      Cleveland Hopkins Intl 
      41.411689 
      -81.849794 
      791.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007587 
      2013 
      9 
      30 
      NaN 
      1159 
      NaN 
      NaN 
      1344 
      NaN 
      MQ 
      ... 
      10.0 
      2013-09-30 06:00:00 
      CLE 
      Cleveland Hopkins Intl 
      41.411689 
      -81.849794 
      791.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007588 
      2013 
      9 
      30 
      NaN 
      840 
      NaN 
      NaN 
      1020 
      NaN 
      MQ 
      ... 
      10.0 
      2013-09-30 03:00:00 
      RDU 
      Raleigh Durham Intl 
      35.877639 
      -78.787472 
      435.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007589 
      2013 
      9 
      30 
      NaN 
      840 
      NaN 
      NaN 
      1020 
      NaN 
      MQ 
      ... 
      10.0 
      2013-09-30 03:00:00 
      RDU 
      Raleigh Durham Intl 
      35.877639 
      -78.787472 
      435.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007590 
      2013 
      9 
      30 
      NaN 
      840 
      NaN 
      NaN 
      1020 
      NaN 
      MQ 
      ... 
      10.0 
      2013-09-30 03:00:00 
      RDU 
      Raleigh Durham Intl 
      35.877639 
      -78.787472 
      435.0 
      -5.0 
      A 
      America/New_York 
     
  
1007591 rows × 38 columns
In [29]:
    
import numpy as np
a = np.array([1, 2, 3])
a
    
    Out[29]:
array([1, 2, 3])
注意这里我们传的是列表,而不是np.array(1, 2, 3)。
现在我们创建一个arange
In [30]:
    
np.arange(10)
    
    Out[30]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [31]:
    
# 给序列乘以一个系数
np.arange(10) * np.pi
    
    Out[31]:
array([  0.        ,   3.14159265,   6.28318531,   9.42477796,
        12.56637061,  15.70796327,  18.84955592,  21.99114858,
        25.13274123,  28.27433388])
我们也可以使用shape方法从一维数组创建多维数组
In [32]:
    
a = np.array([1, 2, 3, 4, 5, 6])
a.shape = (2, 3)
a
    
    Out[32]:
array([[1, 2, 3],
       [4, 5, 6]])
In [35]:
    
np.matrix('1 2; 3 4')
    
    Out[35]:
matrix([[1, 2],
        [3, 4]])
In [40]:
    
#矩阵乘
a1 = np.matrix('1 2; 3 4')
a2 = np.matrix('3 4; 5 7')
a1 * a2
    
    Out[40]:
matrix([[13, 18],
        [29, 40]])
In [41]:
    
#array转换为矩阵
mat_a = np.mat(a1)
mat_a
    
    Out[41]:
matrix([[1, 2],
        [3, 4]])
In [42]:
    
import numpy, scipy.sparse
n = 100000
x = (numpy.random.rand(n) * 2).astype(int).astype(float) #50%稀疏矩阵
x_csr = scipy.sparse.csr_matrix(x)
x_dok = scipy.sparse.dok_matrix(x.reshape(x_csr.shape))
x_dok
    
    Out[42]:
<1x100000 sparse matrix of type '<class 'numpy.float64'>'
	with 50077 stored elements in Dictionary Of Keys format>
In [43]:
    
import csv
with open('data/array/array.csv', 'r') as csvfile:
    csvreader = csv.reader(csvfile)
    data = []
    for row in csvreader:
        row = [float(x) for x in row]
        data.append(row)
data
    
    Out[43]:
[[2.0, 3.0, 4.0, 5.0], [3.0, 4.0, 5.0, 6.0], [7.0, 9.0, 9.0, 10.0]]
In [44]:
    
import numpy as np
import scipy as sp
a = np.array([[3, 2, 0], [1, -1, 0], [0, 5, 1]])
b = np.array([2, 4, -1])
x = np.linalg.solve(a, b)
x
    
    Out[44]:
array([ 2., -2.,  9.])
In [45]:
    
#检查结果是否正确
np.dot(a, x) == b
    
    Out[45]:
array([ True,  True,  True], dtype=bool)
前面我们介绍了pandas和numpy、scipy。现在我们来介绍python机器库Scikit。首先需要先知道机器学习的两种:
Scikit-learn有一下特性:
这里我们还是看nycflights13的数据集。
In [47]:
    
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.cross_validation import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
flights = pd.read_csv('data/nycflights13/flights.csv.gz')
weather = pd.read_csv('data/nycflights13/weather.csv.gz')
airports = pd.read_csv('data/nycflights13/airports.csv.gz')
df_withweather = pd.merge(flights, weather, how='left', on=['year', 'month', 'day', 'hour'])
df = pd.merge(df_withweather, airports, how='left', left_on='dest', right_on='faa')
df = df.dropna()
df
    
    Out[47]:
  
    
       
      year 
      month 
      day 
      dep_time 
      sched_dep_time 
      dep_delay 
      arr_time 
      sched_arr_time 
      arr_delay 
      carrier 
      ... 
      visib 
      time_hour_y 
      faa 
      name 
      lat 
      lon 
      alt 
      tz 
      dst 
      tzone 
     
  
  
    
      4 
      2013 
      1 
      1 
      554.0 
      600 
      -6.0 
      812.0 
      837 
      -25.0 
      DL 
      ... 
      10.0 
      2013-01-01 01:00:00 
      ATL 
      Hartsfield Jackson Atlanta Intl 
      33.636719 
      -84.428067 
      1026.0 
      -5.0 
      A 
      America/New_York 
     
    
      5 
      2013 
      1 
      1 
      554.0 
      600 
      -6.0 
      812.0 
      837 
      -25.0 
      DL 
      ... 
      10.0 
      2013-01-01 01:00:00 
      ATL 
      Hartsfield Jackson Atlanta Intl 
      33.636719 
      -84.428067 
      1026.0 
      -5.0 
      A 
      America/New_York 
     
    
      6 
      2013 
      1 
      1 
      554.0 
      600 
      -6.0 
      812.0 
      837 
      -25.0 
      DL 
      ... 
      10.0 
      2013-01-01 01:00:00 
      ATL 
      Hartsfield Jackson Atlanta Intl 
      33.636719 
      -84.428067 
      1026.0 
      -5.0 
      A 
      America/New_York 
     
    
      8 
      2013 
      1 
      1 
      555.0 
      600 
      -5.0 
      913.0 
      854 
      19.0 
      B6 
      ... 
      10.0 
      2013-01-01 01:00:00 
      FLL 
      Fort Lauderdale Hollywood Intl 
      26.072583 
      -80.152750 
      9.0 
      -5.0 
      A 
      America/New_York 
     
    
      9 
      2013 
      1 
      1 
      555.0 
      600 
      -5.0 
      913.0 
      854 
      19.0 
      B6 
      ... 
      10.0 
      2013-01-01 01:00:00 
      FLL 
      Fort Lauderdale Hollywood Intl 
      26.072583 
      -80.152750 
      9.0 
      -5.0 
      A 
      America/New_York 
     
    
      10 
      2013 
      1 
      1 
      555.0 
      600 
      -5.0 
      913.0 
      854 
      19.0 
      B6 
      ... 
      10.0 
      2013-01-01 01:00:00 
      FLL 
      Fort Lauderdale Hollywood Intl 
      26.072583 
      -80.152750 
      9.0 
      -5.0 
      A 
      America/New_York 
     
    
      11 
      2013 
      1 
      1 
      557.0 
      600 
      -3.0 
      709.0 
      723 
      -14.0 
      EV 
      ... 
      10.0 
      2013-01-01 01:00:00 
      IAD 
      Washington Dulles Intl 
      38.944533 
      -77.455811 
      313.0 
      -5.0 
      A 
      America/New_York 
     
    
      12 
      2013 
      1 
      1 
      557.0 
      600 
      -3.0 
      709.0 
      723 
      -14.0 
      EV 
      ... 
      10.0 
      2013-01-01 01:00:00 
      IAD 
      Washington Dulles Intl 
      38.944533 
      -77.455811 
      313.0 
      -5.0 
      A 
      America/New_York 
     
    
      13 
      2013 
      1 
      1 
      557.0 
      600 
      -3.0 
      709.0 
      723 
      -14.0 
      EV 
      ... 
      10.0 
      2013-01-01 01:00:00 
      IAD 
      Washington Dulles Intl 
      38.944533 
      -77.455811 
      313.0 
      -5.0 
      A 
      America/New_York 
     
    
      14 
      2013 
      1 
      1 
      557.0 
      600 
      -3.0 
      838.0 
      846 
      -8.0 
      B6 
      ... 
      10.0 
      2013-01-01 01:00:00 
      MCO 
      Orlando Intl 
      28.429394 
      -81.308994 
      96.0 
      -5.0 
      A 
      America/New_York 
     
    
      15 
      2013 
      1 
      1 
      557.0 
      600 
      -3.0 
      838.0 
      846 
      -8.0 
      B6 
      ... 
      10.0 
      2013-01-01 01:00:00 
      MCO 
      Orlando Intl 
      28.429394 
      -81.308994 
      96.0 
      -5.0 
      A 
      America/New_York 
     
    
      16 
      2013 
      1 
      1 
      557.0 
      600 
      -3.0 
      838.0 
      846 
      -8.0 
      B6 
      ... 
      10.0 
      2013-01-01 01:00:00 
      MCO 
      Orlando Intl 
      28.429394 
      -81.308994 
      96.0 
      -5.0 
      A 
      America/New_York 
     
    
      17 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      753.0 
      745 
      8.0 
      AA 
      ... 
      10.0 
      2013-01-01 01:00:00 
      ORD 
      Chicago Ohare Intl 
      41.978603 
      -87.904842 
      668.0 
      -6.0 
      A 
      America/Chicago 
     
    
      18 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      753.0 
      745 
      8.0 
      AA 
      ... 
      10.0 
      2013-01-01 01:00:00 
      ORD 
      Chicago Ohare Intl 
      41.978603 
      -87.904842 
      668.0 
      -6.0 
      A 
      America/Chicago 
     
    
      19 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      753.0 
      745 
      8.0 
      AA 
      ... 
      10.0 
      2013-01-01 01:00:00 
      ORD 
      Chicago Ohare Intl 
      41.978603 
      -87.904842 
      668.0 
      -6.0 
      A 
      America/Chicago 
     
    
      20 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      849.0 
      851 
      -2.0 
      B6 
      ... 
      10.0 
      2013-01-01 01:00:00 
      PBI 
      Palm Beach Intl 
      26.683161 
      -80.095589 
      19.0 
      -5.0 
      A 
      America/New_York 
     
    
      21 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      849.0 
      851 
      -2.0 
      B6 
      ... 
      10.0 
      2013-01-01 01:00:00 
      PBI 
      Palm Beach Intl 
      26.683161 
      -80.095589 
      19.0 
      -5.0 
      A 
      America/New_York 
     
    
      22 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      849.0 
      851 
      -2.0 
      B6 
      ... 
      10.0 
      2013-01-01 01:00:00 
      PBI 
      Palm Beach Intl 
      26.683161 
      -80.095589 
      19.0 
      -5.0 
      A 
      America/New_York 
     
    
      23 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      853.0 
      856 
      -3.0 
      B6 
      ... 
      10.0 
      2013-01-01 01:00:00 
      TPA 
      Tampa Intl 
      27.975472 
      -82.533250 
      26.0 
      -5.0 
      A 
      America/New_York 
     
    
      24 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      853.0 
      856 
      -3.0 
      B6 
      ... 
      10.0 
      2013-01-01 01:00:00 
      TPA 
      Tampa Intl 
      27.975472 
      -82.533250 
      26.0 
      -5.0 
      A 
      America/New_York 
     
    
      25 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      853.0 
      856 
      -3.0 
      B6 
      ... 
      10.0 
      2013-01-01 01:00:00 
      TPA 
      Tampa Intl 
      27.975472 
      -82.533250 
      26.0 
      -5.0 
      A 
      America/New_York 
     
    
      26 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      924.0 
      917 
      7.0 
      UA 
      ... 
      10.0 
      2013-01-01 01:00:00 
      LAX 
      Los Angeles Intl 
      33.942536 
      -118.408075 
      126.0 
      -8.0 
      A 
      America/Los_Angeles 
     
    
      27 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      924.0 
      917 
      7.0 
      UA 
      ... 
      10.0 
      2013-01-01 01:00:00 
      LAX 
      Los Angeles Intl 
      33.942536 
      -118.408075 
      126.0 
      -8.0 
      A 
      America/Los_Angeles 
     
    
      28 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      924.0 
      917 
      7.0 
      UA 
      ... 
      10.0 
      2013-01-01 01:00:00 
      LAX 
      Los Angeles Intl 
      33.942536 
      -118.408075 
      126.0 
      -8.0 
      A 
      America/Los_Angeles 
     
    
      29 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      923.0 
      937 
      -14.0 
      UA 
      ... 
      10.0 
      2013-01-01 01:00:00 
      SFO 
      San Francisco Intl 
      37.618972 
      -122.374889 
      13.0 
      -8.0 
      A 
      America/Los_Angeles 
     
    
      30 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      923.0 
      937 
      -14.0 
      UA 
      ... 
      10.0 
      2013-01-01 01:00:00 
      SFO 
      San Francisco Intl 
      37.618972 
      -122.374889 
      13.0 
      -8.0 
      A 
      America/Los_Angeles 
     
    
      31 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      923.0 
      937 
      -14.0 
      UA 
      ... 
      10.0 
      2013-01-01 01:00:00 
      SFO 
      San Francisco Intl 
      37.618972 
      -122.374889 
      13.0 
      -8.0 
      A 
      America/Los_Angeles 
     
    
      32 
      2013 
      1 
      1 
      559.0 
      600 
      -1.0 
      941.0 
      910 
      31.0 
      AA 
      ... 
      10.0 
      2013-01-01 01:00:00 
      DFW 
      Dallas Fort Worth Intl 
      32.896828 
      -97.037997 
      607.0 
      -6.0 
      A 
      America/Chicago 
     
    
      33 
      2013 
      1 
      1 
      559.0 
      600 
      -1.0 
      941.0 
      910 
      31.0 
      AA 
      ... 
      10.0 
      2013-01-01 01:00:00 
      DFW 
      Dallas Fort Worth Intl 
      32.896828 
      -97.037997 
      607.0 
      -6.0 
      A 
      America/Chicago 
     
    
      34 
      2013 
      1 
      1 
      559.0 
      600 
      -1.0 
      941.0 
      910 
      31.0 
      AA 
      ... 
      10.0 
      2013-01-01 01:00:00 
      DFW 
      Dallas Fort Worth Intl 
      32.896828 
      -97.037997 
      607.0 
      -6.0 
      A 
      America/Chicago 
     
    
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
     
    
      1007540 
      2013 
      9 
      30 
      2207.0 
      2140 
      27.0 
      2257.0 
      2250 
      7.0 
      MQ 
      ... 
      10.0 
      2013-09-30 16:00:00 
      BNA 
      Nashville Intl 
      36.124472 
      -86.678194 
      599.0 
      -6.0 
      A 
      America/Chicago 
     
    
      1007541 
      2013 
      9 
      30 
      2207.0 
      2140 
      27.0 
      2257.0 
      2250 
      7.0 
      MQ 
      ... 
      10.0 
      2013-09-30 16:00:00 
      BNA 
      Nashville Intl 
      36.124472 
      -86.678194 
      599.0 
      -6.0 
      A 
      America/Chicago 
     
    
      1007542 
      2013 
      9 
      30 
      2207.0 
      2140 
      27.0 
      2257.0 
      2250 
      7.0 
      MQ 
      ... 
      10.0 
      2013-09-30 16:00:00 
      BNA 
      Nashville Intl 
      36.124472 
      -86.678194 
      599.0 
      -6.0 
      A 
      America/Chicago 
     
    
      1007543 
      2013 
      9 
      30 
      2211.0 
      2059 
      72.0 
      2339.0 
      2242 
      57.0 
      EV 
      ... 
      10.0 
      2013-09-30 15:00:00 
      STL 
      Lambert St Louis Intl 
      38.748697 
      -90.370028 
      618.0 
      -6.0 
      A 
      America/Chicago 
     
    
      1007544 
      2013 
      9 
      30 
      2211.0 
      2059 
      72.0 
      2339.0 
      2242 
      57.0 
      EV 
      ... 
      10.0 
      2013-09-30 15:00:00 
      STL 
      Lambert St Louis Intl 
      38.748697 
      -90.370028 
      618.0 
      -6.0 
      A 
      America/Chicago 
     
    
      1007545 
      2013 
      9 
      30 
      2211.0 
      2059 
      72.0 
      2339.0 
      2242 
      57.0 
      EV 
      ... 
      10.0 
      2013-09-30 15:00:00 
      STL 
      Lambert St Louis Intl 
      38.748697 
      -90.370028 
      618.0 
      -6.0 
      A 
      America/Chicago 
     
    
      1007546 
      2013 
      9 
      30 
      2231.0 
      2245 
      -14.0 
      2335.0 
      2356 
      -21.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      PWM 
      Portland Intl Jetport 
      43.646161 
      -70.309281 
      77.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007547 
      2013 
      9 
      30 
      2231.0 
      2245 
      -14.0 
      2335.0 
      2356 
      -21.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      PWM 
      Portland Intl Jetport 
      43.646161 
      -70.309281 
      77.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007548 
      2013 
      9 
      30 
      2231.0 
      2245 
      -14.0 
      2335.0 
      2356 
      -21.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      PWM 
      Portland Intl Jetport 
      43.646161 
      -70.309281 
      77.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007549 
      2013 
      9 
      30 
      2233.0 
      2113 
      80.0 
      112.0 
      30 
      42.0 
      UA 
      ... 
      10.0 
      2013-09-30 16:00:00 
      SFO 
      San Francisco Intl 
      37.618972 
      -122.374889 
      13.0 
      -8.0 
      A 
      America/Los_Angeles 
     
    
      1007550 
      2013 
      9 
      30 
      2233.0 
      2113 
      80.0 
      112.0 
      30 
      42.0 
      UA 
      ... 
      10.0 
      2013-09-30 16:00:00 
      SFO 
      San Francisco Intl 
      37.618972 
      -122.374889 
      13.0 
      -8.0 
      A 
      America/Los_Angeles 
     
    
      1007551 
      2013 
      9 
      30 
      2233.0 
      2113 
      80.0 
      112.0 
      30 
      42.0 
      UA 
      ... 
      10.0 
      2013-09-30 16:00:00 
      SFO 
      San Francisco Intl 
      37.618972 
      -122.374889 
      13.0 
      -8.0 
      A 
      America/Los_Angeles 
     
    
      1007552 
      2013 
      9 
      30 
      2235.0 
      2001 
      154.0 
      59.0 
      2249 
      130.0 
      B6 
      ... 
      10.0 
      2013-09-30 15:00:00 
      MCO 
      Orlando Intl 
      28.429394 
      -81.308994 
      96.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007553 
      2013 
      9 
      30 
      2235.0 
      2001 
      154.0 
      59.0 
      2249 
      130.0 
      B6 
      ... 
      10.0 
      2013-09-30 15:00:00 
      MCO 
      Orlando Intl 
      28.429394 
      -81.308994 
      96.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007554 
      2013 
      9 
      30 
      2235.0 
      2001 
      154.0 
      59.0 
      2249 
      130.0 
      B6 
      ... 
      10.0 
      2013-09-30 15:00:00 
      MCO 
      Orlando Intl 
      28.429394 
      -81.308994 
      96.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007555 
      2013 
      9 
      30 
      2237.0 
      2245 
      -8.0 
      2345.0 
      2353 
      -8.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      BTV 
      Burlington Intl 
      44.471861 
      -73.153278 
      335.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007556 
      2013 
      9 
      30 
      2237.0 
      2245 
      -8.0 
      2345.0 
      2353 
      -8.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      BTV 
      Burlington Intl 
      44.471861 
      -73.153278 
      335.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007557 
      2013 
      9 
      30 
      2237.0 
      2245 
      -8.0 
      2345.0 
      2353 
      -8.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      BTV 
      Burlington Intl 
      44.471861 
      -73.153278 
      335.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007558 
      2013 
      9 
      30 
      2240.0 
      2245 
      -5.0 
      2334.0 
      2351 
      -17.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      SYR 
      Syracuse Hancock Intl 
      43.111187 
      -76.106311 
      421.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007559 
      2013 
      9 
      30 
      2240.0 
      2245 
      -5.0 
      2334.0 
      2351 
      -17.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      SYR 
      Syracuse Hancock Intl 
      43.111187 
      -76.106311 
      421.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007560 
      2013 
      9 
      30 
      2240.0 
      2245 
      -5.0 
      2334.0 
      2351 
      -17.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      SYR 
      Syracuse Hancock Intl 
      43.111187 
      -76.106311 
      421.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007561 
      2013 
      9 
      30 
      2240.0 
      2250 
      -10.0 
      2347.0 
      7 
      -20.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      BUF 
      Buffalo Niagara Intl 
      42.940525 
      -78.732167 
      724.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007562 
      2013 
      9 
      30 
      2240.0 
      2250 
      -10.0 
      2347.0 
      7 
      -20.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      BUF 
      Buffalo Niagara Intl 
      42.940525 
      -78.732167 
      724.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007563 
      2013 
      9 
      30 
      2240.0 
      2250 
      -10.0 
      2347.0 
      7 
      -20.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      BUF 
      Buffalo Niagara Intl 
      42.940525 
      -78.732167 
      724.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007564 
      2013 
      9 
      30 
      2241.0 
      2246 
      -5.0 
      2345.0 
      1 
      -16.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      ROC 
      Greater Rochester Intl 
      43.118866 
      -77.672389 
      559.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007565 
      2013 
      9 
      30 
      2241.0 
      2246 
      -5.0 
      2345.0 
      1 
      -16.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      ROC 
      Greater Rochester Intl 
      43.118866 
      -77.672389 
      559.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007566 
      2013 
      9 
      30 
      2241.0 
      2246 
      -5.0 
      2345.0 
      1 
      -16.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      ROC 
      Greater Rochester Intl 
      43.118866 
      -77.672389 
      559.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007567 
      2013 
      9 
      30 
      2307.0 
      2255 
      12.0 
      2359.0 
      2358 
      1.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      BOS 
      General Edward Lawrence Logan Intl 
      42.364347 
      -71.005181 
      19.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007568 
      2013 
      9 
      30 
      2307.0 
      2255 
      12.0 
      2359.0 
      2358 
      1.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      BOS 
      General Edward Lawrence Logan Intl 
      42.364347 
      -71.005181 
      19.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007569 
      2013 
      9 
      30 
      2307.0 
      2255 
      12.0 
      2359.0 
      2358 
      1.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      BOS 
      General Edward Lawrence Logan Intl 
      42.364347 
      -71.005181 
      19.0 
      -5.0 
      A 
      America/New_York 
     
  
838182 rows × 38 columns
In [48]:
    
pred = 'dep_delay'
features = ['month', 'day', 'dep_time', 'arr_time', 'carrier', 'dest', 'air_time',
           'distance', 'lat', 'lon', 'alt', 'dewp', 'humid', 'wind_speed', 'wind_gust',
           'precip', 'pressure', 'visib']
features_v = df[features]
pred_v = df[pred]
pd.options.mode.chained_assignment = None #default='warn'
# 因为航空公司不是一个数字,我们把它转化为数字哑变量
features_v['carrier'] = pd.factorize(features_v['carrier'])[0]
# dest也不是一个数字,我们也把它转为数字
features_v['dest'] = pd.factorize(features_v['dest'])[0]
    
In [49]:
    
features_v
    
    Out[49]:
  
    
       
      month 
      day 
      dep_time 
      arr_time 
      carrier 
      dest 
      air_time 
      distance 
      lat 
      lon 
      alt 
      dewp 
      humid 
      wind_speed 
      wind_gust 
      precip 
      pressure 
      visib 
     
  
  
    
      4 
      1 
      1 
      554.0 
      812.0 
      0 
      0 
      116.0 
      762 
      33.636719 
      -84.428067 
      1026.0 
      26.06 
      59.37 
      10.35702 
      11.918651 
      0.0 
      1012.0 
      10.0 
     
    
      5 
      1 
      1 
      554.0 
      812.0 
      0 
      0 
      116.0 
      762 
      33.636719 
      -84.428067 
      1026.0 
      26.06 
      59.37 
      12.65858 
      14.567241 
      0.0 
      1012.6 
      10.0 
     
    
      6 
      1 
      1 
      554.0 
      812.0 
      0 
      0 
      116.0 
      762 
      33.636719 
      -84.428067 
      1026.0 
      26.06 
      57.33 
      13.80936 
      15.891535 
      0.0 
      1011.9 
      10.0 
     
    
      8 
      1 
      1 
      555.0 
      913.0 
      1 
      1 
      158.0 
      1065 
      26.072583 
      -80.152750 
      9.0 
      26.06 
      59.37 
      10.35702 
      11.918651 
      0.0 
      1012.0 
      10.0 
     
    
      9 
      1 
      1 
      555.0 
      913.0 
      1 
      1 
      158.0 
      1065 
      26.072583 
      -80.152750 
      9.0 
      26.06 
      59.37 
      12.65858 
      14.567241 
      0.0 
      1012.6 
      10.0 
     
    
      10 
      1 
      1 
      555.0 
      913.0 
      1 
      1 
      158.0 
      1065 
      26.072583 
      -80.152750 
      9.0 
      26.06 
      57.33 
      13.80936 
      15.891535 
      0.0 
      1011.9 
      10.0 
     
    
      11 
      1 
      1 
      557.0 
      709.0 
      2 
      2 
      53.0 
      229 
      38.944533 
      -77.455811 
      313.0 
      26.06 
      59.37 
      10.35702 
      11.918651 
      0.0 
      1012.0 
      10.0 
     
    
      12 
      1 
      1 
      557.0 
      709.0 
      2 
      2 
      53.0 
      229 
      38.944533 
      -77.455811 
      313.0 
      26.06 
      59.37 
      12.65858 
      14.567241 
      0.0 
      1012.6 
      10.0 
     
    
      13 
      1 
      1 
      557.0 
      709.0 
      2 
      2 
      53.0 
      229 
      38.944533 
      -77.455811 
      313.0 
      26.06 
      57.33 
      13.80936 
      15.891535 
      0.0 
      1011.9 
      10.0 
     
    
      14 
      1 
      1 
      557.0 
      838.0 
      1 
      3 
      140.0 
      944 
      28.429394 
      -81.308994 
      96.0 
      26.06 
      59.37 
      10.35702 
      11.918651 
      0.0 
      1012.0 
      10.0 
     
    
      15 
      1 
      1 
      557.0 
      838.0 
      1 
      3 
      140.0 
      944 
      28.429394 
      -81.308994 
      96.0 
      26.06 
      59.37 
      12.65858 
      14.567241 
      0.0 
      1012.6 
      10.0 
     
    
      16 
      1 
      1 
      557.0 
      838.0 
      1 
      3 
      140.0 
      944 
      28.429394 
      -81.308994 
      96.0 
      26.06 
      57.33 
      13.80936 
      15.891535 
      0.0 
      1011.9 
      10.0 
     
    
      17 
      1 
      1 
      558.0 
      753.0 
      3 
      4 
      138.0 
      733 
      41.978603 
      -87.904842 
      668.0 
      26.06 
      59.37 
      10.35702 
      11.918651 
      0.0 
      1012.0 
      10.0 
     
    
      18 
      1 
      1 
      558.0 
      753.0 
      3 
      4 
      138.0 
      733 
      41.978603 
      -87.904842 
      668.0 
      26.06 
      59.37 
      12.65858 
      14.567241 
      0.0 
      1012.6 
      10.0 
     
    
      19 
      1 
      1 
      558.0 
      753.0 
      3 
      4 
      138.0 
      733 
      41.978603 
      -87.904842 
      668.0 
      26.06 
      57.33 
      13.80936 
      15.891535 
      0.0 
      1011.9 
      10.0 
     
    
      20 
      1 
      1 
      558.0 
      849.0 
      1 
      5 
      149.0 
      1028 
      26.683161 
      -80.095589 
      19.0 
      26.06 
      59.37 
      10.35702 
      11.918651 
      0.0 
      1012.0 
      10.0 
     
    
      21 
      1 
      1 
      558.0 
      849.0 
      1 
      5 
      149.0 
      1028 
      26.683161 
      -80.095589 
      19.0 
      26.06 
      59.37 
      12.65858 
      14.567241 
      0.0 
      1012.6 
      10.0 
     
    
      22 
      1 
      1 
      558.0 
      849.0 
      1 
      5 
      149.0 
      1028 
      26.683161 
      -80.095589 
      19.0 
      26.06 
      57.33 
      13.80936 
      15.891535 
      0.0 
      1011.9 
      10.0 
     
    
      23 
      1 
      1 
      558.0 
      853.0 
      1 
      6 
      158.0 
      1005 
      27.975472 
      -82.533250 
      26.0 
      26.06 
      59.37 
      10.35702 
      11.918651 
      0.0 
      1012.0 
      10.0 
     
    
      24 
      1 
      1 
      558.0 
      853.0 
      1 
      6 
      158.0 
      1005 
      27.975472 
      -82.533250 
      26.0 
      26.06 
      59.37 
      12.65858 
      14.567241 
      0.0 
      1012.6 
      10.0 
     
    
      25 
      1 
      1 
      558.0 
      853.0 
      1 
      6 
      158.0 
      1005 
      27.975472 
      -82.533250 
      26.0 
      26.06 
      57.33 
      13.80936 
      15.891535 
      0.0 
      1011.9 
      10.0 
     
    
      26 
      1 
      1 
      558.0 
      924.0 
      4 
      7 
      345.0 
      2475 
      33.942536 
      -118.408075 
      126.0 
      26.06 
      59.37 
      10.35702 
      11.918651 
      0.0 
      1012.0 
      10.0 
     
    
      27 
      1 
      1 
      558.0 
      924.0 
      4 
      7 
      345.0 
      2475 
      33.942536 
      -118.408075 
      126.0 
      26.06 
      59.37 
      12.65858 
      14.567241 
      0.0 
      1012.6 
      10.0 
     
    
      28 
      1 
      1 
      558.0 
      924.0 
      4 
      7 
      345.0 
      2475 
      33.942536 
      -118.408075 
      126.0 
      26.06 
      57.33 
      13.80936 
      15.891535 
      0.0 
      1011.9 
      10.0 
     
    
      29 
      1 
      1 
      558.0 
      923.0 
      4 
      8 
      361.0 
      2565 
      37.618972 
      -122.374889 
      13.0 
      26.06 
      59.37 
      10.35702 
      11.918651 
      0.0 
      1012.0 
      10.0 
     
    
      30 
      1 
      1 
      558.0 
      923.0 
      4 
      8 
      361.0 
      2565 
      37.618972 
      -122.374889 
      13.0 
      26.06 
      59.37 
      12.65858 
      14.567241 
      0.0 
      1012.6 
      10.0 
     
    
      31 
      1 
      1 
      558.0 
      923.0 
      4 
      8 
      361.0 
      2565 
      37.618972 
      -122.374889 
      13.0 
      26.06 
      57.33 
      13.80936 
      15.891535 
      0.0 
      1011.9 
      10.0 
     
    
      32 
      1 
      1 
      559.0 
      941.0 
      3 
      9 
      257.0 
      1389 
      32.896828 
      -97.037997 
      607.0 
      26.06 
      59.37 
      10.35702 
      11.918651 
      0.0 
      1012.0 
      10.0 
     
    
      33 
      1 
      1 
      559.0 
      941.0 
      3 
      9 
      257.0 
      1389 
      32.896828 
      -97.037997 
      607.0 
      26.06 
      59.37 
      12.65858 
      14.567241 
      0.0 
      1012.6 
      10.0 
     
    
      34 
      1 
      1 
      559.0 
      941.0 
      3 
      9 
      257.0 
      1389 
      32.896828 
      -97.037997 
      607.0 
      26.06 
      57.33 
      13.80936 
      15.891535 
      0.0 
      1011.9 
      10.0 
     
    
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
     
    
      1007540 
      9 
      30 
      2207.0 
      2257.0 
      5 
      45 
      97.0 
      764 
      36.124472 
      -86.678194 
      599.0 
      53.06 
      54.94 
      5.75390 
      6.621473 
      0.0 
      1015.0 
      10.0 
     
    
      1007541 
      9 
      30 
      2207.0 
      2257.0 
      5 
      45 
      97.0 
      764 
      36.124472 
      -86.678194 
      599.0 
      55.04 
      67.69 
      6.90468 
      7.945768 
      0.0 
      1015.6 
      10.0 
     
    
      1007542 
      9 
      30 
      2207.0 
      2257.0 
      5 
      45 
      97.0 
      764 
      36.124472 
      -86.678194 
      599.0 
      51.08 
      52.67 
      6.90468 
      7.945768 
      0.0 
      1015.0 
      10.0 
     
    
      1007543 
      9 
      30 
      2211.0 
      2339.0 
      2 
      40 
      120.0 
      872 
      38.748697 
      -90.370028 
      618.0 
      51.98 
      49.36 
      6.90468 
      7.945768 
      0.0 
      1015.2 
      10.0 
     
    
      1007544 
      9 
      30 
      2211.0 
      2339.0 
      2 
      40 
      120.0 
      872 
      38.748697 
      -90.370028 
      618.0 
      53.06 
      58.80 
      8.05546 
      9.270062 
      0.0 
      1015.8 
      10.0 
     
    
      1007545 
      9 
      30 
      2211.0 
      2339.0 
      2 
      40 
      120.0 
      872 
      38.748697 
      -90.370028 
      618.0 
      51.08 
      46.04 
      6.90468 
      7.945768 
      0.0 
      1015.1 
      10.0 
     
    
      1007546 
      9 
      30 
      2231.0 
      2335.0 
      1 
      51 
      48.0 
      273 
      43.646161 
      -70.309281 
      77.0 
      53.06 
      58.80 
      5.75390 
      6.621473 
      0.0 
      1015.4 
      10.0 
     
    
      1007547 
      9 
      30 
      2231.0 
      2335.0 
      1 
      51 
      48.0 
      273 
      43.646161 
      -70.309281 
      77.0 
      55.94 
      74.94 
      6.90468 
      7.945768 
      0.0 
      1016.0 
      10.0 
     
    
      1007548 
      9 
      30 
      2231.0 
      2335.0 
      1 
      51 
      48.0 
      273 
      43.646161 
      -70.309281 
      77.0 
      51.98 
      58.65 
      9.20624 
      10.594357 
      0.0 
      1015.4 
      10.0 
     
    
      1007549 
      9 
      30 
      2233.0 
      112.0 
      4 
      8 
      318.0 
      2565 
      37.618972 
      -122.374889 
      13.0 
      53.06 
      54.94 
      5.75390 
      6.621473 
      0.0 
      1015.0 
      10.0 
     
    
      1007550 
      9 
      30 
      2233.0 
      112.0 
      4 
      8 
      318.0 
      2565 
      37.618972 
      -122.374889 
      13.0 
      55.04 
      67.69 
      6.90468 
      7.945768 
      0.0 
      1015.6 
      10.0 
     
    
      1007551 
      9 
      30 
      2233.0 
      112.0 
      4 
      8 
      318.0 
      2565 
      37.618972 
      -122.374889 
      13.0 
      51.08 
      52.67 
      6.90468 
      7.945768 
      0.0 
      1015.0 
      10.0 
     
    
      1007552 
      9 
      30 
      2235.0 
      59.0 
      1 
      3 
      123.0 
      944 
      28.429394 
      -81.308994 
      96.0 
      51.98 
      49.36 
      6.90468 
      7.945768 
      0.0 
      1015.2 
      10.0 
     
    
      1007553 
      9 
      30 
      2235.0 
      59.0 
      1 
      3 
      123.0 
      944 
      28.429394 
      -81.308994 
      96.0 
      53.06 
      58.80 
      8.05546 
      9.270062 
      0.0 
      1015.8 
      10.0 
     
    
      1007554 
      9 
      30 
      2235.0 
      59.0 
      1 
      3 
      123.0 
      944 
      28.429394 
      -81.308994 
      96.0 
      51.08 
      46.04 
      6.90468 
      7.945768 
      0.0 
      1015.1 
      10.0 
     
    
      1007555 
      9 
      30 
      2237.0 
      2345.0 
      1 
      47 
      43.0 
      266 
      44.471861 
      -73.153278 
      335.0 
      53.06 
      58.80 
      5.75390 
      6.621473 
      0.0 
      1015.4 
      10.0 
     
    
      1007556 
      9 
      30 
      2237.0 
      2345.0 
      1 
      47 
      43.0 
      266 
      44.471861 
      -73.153278 
      335.0 
      55.94 
      74.94 
      6.90468 
      7.945768 
      0.0 
      1016.0 
      10.0 
     
    
      1007557 
      9 
      30 
      2237.0 
      2345.0 
      1 
      47 
      43.0 
      266 
      44.471861 
      -73.153278 
      335.0 
      51.98 
      58.65 
      9.20624 
      10.594357 
      0.0 
      1015.4 
      10.0 
     
    
      1007558 
      9 
      30 
      2240.0 
      2334.0 
      1 
      29 
      41.0 
      209 
      43.111187 
      -76.106311 
      421.0 
      53.06 
      58.80 
      5.75390 
      6.621473 
      0.0 
      1015.4 
      10.0 
     
    
      1007559 
      9 
      30 
      2240.0 
      2334.0 
      1 
      29 
      41.0 
      209 
      43.111187 
      -76.106311 
      421.0 
      55.94 
      74.94 
      6.90468 
      7.945768 
      0.0 
      1016.0 
      10.0 
     
    
      1007560 
      9 
      30 
      2240.0 
      2334.0 
      1 
      29 
      41.0 
      209 
      43.111187 
      -76.106311 
      421.0 
      51.98 
      58.65 
      9.20624 
      10.594357 
      0.0 
      1015.4 
      10.0 
     
    
      1007561 
      9 
      30 
      2240.0 
      2347.0 
      1 
      20 
      52.0 
      301 
      42.940525 
      -78.732167 
      724.0 
      53.06 
      58.80 
      5.75390 
      6.621473 
      0.0 
      1015.4 
      10.0 
     
    
      1007562 
      9 
      30 
      2240.0 
      2347.0 
      1 
      20 
      52.0 
      301 
      42.940525 
      -78.732167 
      724.0 
      55.94 
      74.94 
      6.90468 
      7.945768 
      0.0 
      1016.0 
      10.0 
     
    
      1007563 
      9 
      30 
      2240.0 
      2347.0 
      1 
      20 
      52.0 
      301 
      42.940525 
      -78.732167 
      724.0 
      51.98 
      58.65 
      9.20624 
      10.594357 
      0.0 
      1015.4 
      10.0 
     
    
      1007564 
      9 
      30 
      2241.0 
      2345.0 
      1 
      28 
      47.0 
      264 
      43.118866 
      -77.672389 
      559.0 
      53.06 
      58.80 
      5.75390 
      6.621473 
      0.0 
      1015.4 
      10.0 
     
    
      1007565 
      9 
      30 
      2241.0 
      2345.0 
      1 
      28 
      47.0 
      264 
      43.118866 
      -77.672389 
      559.0 
      55.94 
      74.94 
      6.90468 
      7.945768 
      0.0 
      1016.0 
      10.0 
     
    
      1007566 
      9 
      30 
      2241.0 
      2345.0 
      1 
      28 
      47.0 
      264 
      43.118866 
      -77.672389 
      559.0 
      51.98 
      58.65 
      9.20624 
      10.594357 
      0.0 
      1015.4 
      10.0 
     
    
      1007567 
      9 
      30 
      2307.0 
      2359.0 
      1 
      19 
      33.0 
      187 
      42.364347 
      -71.005181 
      19.0 
      53.06 
      58.80 
      5.75390 
      6.621473 
      0.0 
      1015.4 
      10.0 
     
    
      1007568 
      9 
      30 
      2307.0 
      2359.0 
      1 
      19 
      33.0 
      187 
      42.364347 
      -71.005181 
      19.0 
      55.94 
      74.94 
      6.90468 
      7.945768 
      0.0 
      1016.0 
      10.0 
     
    
      1007569 
      9 
      30 
      2307.0 
      2359.0 
      1 
      19 
      33.0 
      187 
      42.364347 
      -71.005181 
      19.0 
      51.98 
      58.65 
      9.20624 
      10.594357 
      0.0 
      1015.4 
      10.0 
     
  
838182 rows × 18 columns
In [50]:
    
# 因为各个特征的维度各不相同,我们需要做标准化
scaler = StandardScaler()
scaled_features = scaler.fit_transform(features_v)
    
In [51]:
    
scaled_features
    
    Out[51]:
array([[-1.62498171, -1.69693092, -1.62908082, ..., -0.11420569,
        -0.79438586,  0.29213712],
       [-1.62498171, -1.69693092, -1.62908082, ..., -0.11420569,
        -0.71389571,  0.29213712],
       [-1.62498171, -1.69693092, -1.62908082, ..., -0.11420569,
        -0.80780089,  0.29213712],
       ..., 
       [ 0.71764151,  1.6218685 ,  1.98106144, ..., -0.11420569,
        -0.33827498,  0.29213712],
       [ 0.71764151,  1.6218685 ,  1.98106144, ..., -0.11420569,
        -0.25778483,  0.29213712],
       [ 0.71764151,  1.6218685 ,  1.98106144, ..., -0.11420569,
        -0.33827498,  0.29213712]])
In [53]:
    
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
X_r = pca.fit(scaled_features).transform(scaled_features)
    
In [54]:
    
X_r
    
    Out[54]:
array([[-0.32039321, -0.88647489],
       [-0.32303154, -0.72462068],
       [-0.32447915, -0.59863037],
       ..., 
       [-2.11022832,  0.78923842],
       [-2.10108117,  0.41150146],
       [-2.11397223,  1.04517555]])
In [57]:
    
import matplotlib.pyplot as plt
print('explained variance ratio (first two components): %s' 
      % str(pca.explained_variance_ratio_))
plt.figure()
lw = 2
plt.scatter(X_r[:,0], X_r[:,1], alpha=.8, lw=lw)
plt.title('PCA of flights dataset')
    
    
explained variance ratio (first two components): [ 0.16961823  0.14269873]
    Out[57]:
Text(0.5,1,'PCA of flights dataset')
    
 
In [61]:
    
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import sklearn
from sklearn import linear_model, cross_validation, metrics, svm, ensemble
from sklearn.metrics import classification_report, confusion_matrix, precision_recall_fscore_support, accuracy_score
from sklearn.cross_validation import train_test_split, cross_val_score, ShuffleSplit
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler, OneHotEncoder
    
In [62]:
    
flights = pd.read_csv('data/nycflights13/flights.csv.gz')
weather = pd.read_csv('data/nycflights13/weather.csv.gz')
airports = pd.read_csv('data/nycflights13/airports.csv.gz')
df_withweather = pd.merge(flights, weather, how='left', on=['year', 'month', 'day', 'hour'])
df = pd.merge(df_withweather, airports, how='left', left_on='dest', right_on='faa')
df = df.dropna()
    
In [63]:
    
df
    
    Out[63]:
  
    
       
      year 
      month 
      day 
      dep_time 
      sched_dep_time 
      dep_delay 
      arr_time 
      sched_arr_time 
      arr_delay 
      carrier 
      ... 
      visib 
      time_hour_y 
      faa 
      name 
      lat 
      lon 
      alt 
      tz 
      dst 
      tzone 
     
  
  
    
      4 
      2013 
      1 
      1 
      554.0 
      600 
      -6.0 
      812.0 
      837 
      -25.0 
      DL 
      ... 
      10.0 
      2013-01-01 01:00:00 
      ATL 
      Hartsfield Jackson Atlanta Intl 
      33.636719 
      -84.428067 
      1026.0 
      -5.0 
      A 
      America/New_York 
     
    
      5 
      2013 
      1 
      1 
      554.0 
      600 
      -6.0 
      812.0 
      837 
      -25.0 
      DL 
      ... 
      10.0 
      2013-01-01 01:00:00 
      ATL 
      Hartsfield Jackson Atlanta Intl 
      33.636719 
      -84.428067 
      1026.0 
      -5.0 
      A 
      America/New_York 
     
    
      6 
      2013 
      1 
      1 
      554.0 
      600 
      -6.0 
      812.0 
      837 
      -25.0 
      DL 
      ... 
      10.0 
      2013-01-01 01:00:00 
      ATL 
      Hartsfield Jackson Atlanta Intl 
      33.636719 
      -84.428067 
      1026.0 
      -5.0 
      A 
      America/New_York 
     
    
      8 
      2013 
      1 
      1 
      555.0 
      600 
      -5.0 
      913.0 
      854 
      19.0 
      B6 
      ... 
      10.0 
      2013-01-01 01:00:00 
      FLL 
      Fort Lauderdale Hollywood Intl 
      26.072583 
      -80.152750 
      9.0 
      -5.0 
      A 
      America/New_York 
     
    
      9 
      2013 
      1 
      1 
      555.0 
      600 
      -5.0 
      913.0 
      854 
      19.0 
      B6 
      ... 
      10.0 
      2013-01-01 01:00:00 
      FLL 
      Fort Lauderdale Hollywood Intl 
      26.072583 
      -80.152750 
      9.0 
      -5.0 
      A 
      America/New_York 
     
    
      10 
      2013 
      1 
      1 
      555.0 
      600 
      -5.0 
      913.0 
      854 
      19.0 
      B6 
      ... 
      10.0 
      2013-01-01 01:00:00 
      FLL 
      Fort Lauderdale Hollywood Intl 
      26.072583 
      -80.152750 
      9.0 
      -5.0 
      A 
      America/New_York 
     
    
      11 
      2013 
      1 
      1 
      557.0 
      600 
      -3.0 
      709.0 
      723 
      -14.0 
      EV 
      ... 
      10.0 
      2013-01-01 01:00:00 
      IAD 
      Washington Dulles Intl 
      38.944533 
      -77.455811 
      313.0 
      -5.0 
      A 
      America/New_York 
     
    
      12 
      2013 
      1 
      1 
      557.0 
      600 
      -3.0 
      709.0 
      723 
      -14.0 
      EV 
      ... 
      10.0 
      2013-01-01 01:00:00 
      IAD 
      Washington Dulles Intl 
      38.944533 
      -77.455811 
      313.0 
      -5.0 
      A 
      America/New_York 
     
    
      13 
      2013 
      1 
      1 
      557.0 
      600 
      -3.0 
      709.0 
      723 
      -14.0 
      EV 
      ... 
      10.0 
      2013-01-01 01:00:00 
      IAD 
      Washington Dulles Intl 
      38.944533 
      -77.455811 
      313.0 
      -5.0 
      A 
      America/New_York 
     
    
      14 
      2013 
      1 
      1 
      557.0 
      600 
      -3.0 
      838.0 
      846 
      -8.0 
      B6 
      ... 
      10.0 
      2013-01-01 01:00:00 
      MCO 
      Orlando Intl 
      28.429394 
      -81.308994 
      96.0 
      -5.0 
      A 
      America/New_York 
     
    
      15 
      2013 
      1 
      1 
      557.0 
      600 
      -3.0 
      838.0 
      846 
      -8.0 
      B6 
      ... 
      10.0 
      2013-01-01 01:00:00 
      MCO 
      Orlando Intl 
      28.429394 
      -81.308994 
      96.0 
      -5.0 
      A 
      America/New_York 
     
    
      16 
      2013 
      1 
      1 
      557.0 
      600 
      -3.0 
      838.0 
      846 
      -8.0 
      B6 
      ... 
      10.0 
      2013-01-01 01:00:00 
      MCO 
      Orlando Intl 
      28.429394 
      -81.308994 
      96.0 
      -5.0 
      A 
      America/New_York 
     
    
      17 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      753.0 
      745 
      8.0 
      AA 
      ... 
      10.0 
      2013-01-01 01:00:00 
      ORD 
      Chicago Ohare Intl 
      41.978603 
      -87.904842 
      668.0 
      -6.0 
      A 
      America/Chicago 
     
    
      18 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      753.0 
      745 
      8.0 
      AA 
      ... 
      10.0 
      2013-01-01 01:00:00 
      ORD 
      Chicago Ohare Intl 
      41.978603 
      -87.904842 
      668.0 
      -6.0 
      A 
      America/Chicago 
     
    
      19 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      753.0 
      745 
      8.0 
      AA 
      ... 
      10.0 
      2013-01-01 01:00:00 
      ORD 
      Chicago Ohare Intl 
      41.978603 
      -87.904842 
      668.0 
      -6.0 
      A 
      America/Chicago 
     
    
      20 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      849.0 
      851 
      -2.0 
      B6 
      ... 
      10.0 
      2013-01-01 01:00:00 
      PBI 
      Palm Beach Intl 
      26.683161 
      -80.095589 
      19.0 
      -5.0 
      A 
      America/New_York 
     
    
      21 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      849.0 
      851 
      -2.0 
      B6 
      ... 
      10.0 
      2013-01-01 01:00:00 
      PBI 
      Palm Beach Intl 
      26.683161 
      -80.095589 
      19.0 
      -5.0 
      A 
      America/New_York 
     
    
      22 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      849.0 
      851 
      -2.0 
      B6 
      ... 
      10.0 
      2013-01-01 01:00:00 
      PBI 
      Palm Beach Intl 
      26.683161 
      -80.095589 
      19.0 
      -5.0 
      A 
      America/New_York 
     
    
      23 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      853.0 
      856 
      -3.0 
      B6 
      ... 
      10.0 
      2013-01-01 01:00:00 
      TPA 
      Tampa Intl 
      27.975472 
      -82.533250 
      26.0 
      -5.0 
      A 
      America/New_York 
     
    
      24 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      853.0 
      856 
      -3.0 
      B6 
      ... 
      10.0 
      2013-01-01 01:00:00 
      TPA 
      Tampa Intl 
      27.975472 
      -82.533250 
      26.0 
      -5.0 
      A 
      America/New_York 
     
    
      25 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      853.0 
      856 
      -3.0 
      B6 
      ... 
      10.0 
      2013-01-01 01:00:00 
      TPA 
      Tampa Intl 
      27.975472 
      -82.533250 
      26.0 
      -5.0 
      A 
      America/New_York 
     
    
      26 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      924.0 
      917 
      7.0 
      UA 
      ... 
      10.0 
      2013-01-01 01:00:00 
      LAX 
      Los Angeles Intl 
      33.942536 
      -118.408075 
      126.0 
      -8.0 
      A 
      America/Los_Angeles 
     
    
      27 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      924.0 
      917 
      7.0 
      UA 
      ... 
      10.0 
      2013-01-01 01:00:00 
      LAX 
      Los Angeles Intl 
      33.942536 
      -118.408075 
      126.0 
      -8.0 
      A 
      America/Los_Angeles 
     
    
      28 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      924.0 
      917 
      7.0 
      UA 
      ... 
      10.0 
      2013-01-01 01:00:00 
      LAX 
      Los Angeles Intl 
      33.942536 
      -118.408075 
      126.0 
      -8.0 
      A 
      America/Los_Angeles 
     
    
      29 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      923.0 
      937 
      -14.0 
      UA 
      ... 
      10.0 
      2013-01-01 01:00:00 
      SFO 
      San Francisco Intl 
      37.618972 
      -122.374889 
      13.0 
      -8.0 
      A 
      America/Los_Angeles 
     
    
      30 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      923.0 
      937 
      -14.0 
      UA 
      ... 
      10.0 
      2013-01-01 01:00:00 
      SFO 
      San Francisco Intl 
      37.618972 
      -122.374889 
      13.0 
      -8.0 
      A 
      America/Los_Angeles 
     
    
      31 
      2013 
      1 
      1 
      558.0 
      600 
      -2.0 
      923.0 
      937 
      -14.0 
      UA 
      ... 
      10.0 
      2013-01-01 01:00:00 
      SFO 
      San Francisco Intl 
      37.618972 
      -122.374889 
      13.0 
      -8.0 
      A 
      America/Los_Angeles 
     
    
      32 
      2013 
      1 
      1 
      559.0 
      600 
      -1.0 
      941.0 
      910 
      31.0 
      AA 
      ... 
      10.0 
      2013-01-01 01:00:00 
      DFW 
      Dallas Fort Worth Intl 
      32.896828 
      -97.037997 
      607.0 
      -6.0 
      A 
      America/Chicago 
     
    
      33 
      2013 
      1 
      1 
      559.0 
      600 
      -1.0 
      941.0 
      910 
      31.0 
      AA 
      ... 
      10.0 
      2013-01-01 01:00:00 
      DFW 
      Dallas Fort Worth Intl 
      32.896828 
      -97.037997 
      607.0 
      -6.0 
      A 
      America/Chicago 
     
    
      34 
      2013 
      1 
      1 
      559.0 
      600 
      -1.0 
      941.0 
      910 
      31.0 
      AA 
      ... 
      10.0 
      2013-01-01 01:00:00 
      DFW 
      Dallas Fort Worth Intl 
      32.896828 
      -97.037997 
      607.0 
      -6.0 
      A 
      America/Chicago 
     
    
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
      ... 
     
    
      1007540 
      2013 
      9 
      30 
      2207.0 
      2140 
      27.0 
      2257.0 
      2250 
      7.0 
      MQ 
      ... 
      10.0 
      2013-09-30 16:00:00 
      BNA 
      Nashville Intl 
      36.124472 
      -86.678194 
      599.0 
      -6.0 
      A 
      America/Chicago 
     
    
      1007541 
      2013 
      9 
      30 
      2207.0 
      2140 
      27.0 
      2257.0 
      2250 
      7.0 
      MQ 
      ... 
      10.0 
      2013-09-30 16:00:00 
      BNA 
      Nashville Intl 
      36.124472 
      -86.678194 
      599.0 
      -6.0 
      A 
      America/Chicago 
     
    
      1007542 
      2013 
      9 
      30 
      2207.0 
      2140 
      27.0 
      2257.0 
      2250 
      7.0 
      MQ 
      ... 
      10.0 
      2013-09-30 16:00:00 
      BNA 
      Nashville Intl 
      36.124472 
      -86.678194 
      599.0 
      -6.0 
      A 
      America/Chicago 
     
    
      1007543 
      2013 
      9 
      30 
      2211.0 
      2059 
      72.0 
      2339.0 
      2242 
      57.0 
      EV 
      ... 
      10.0 
      2013-09-30 15:00:00 
      STL 
      Lambert St Louis Intl 
      38.748697 
      -90.370028 
      618.0 
      -6.0 
      A 
      America/Chicago 
     
    
      1007544 
      2013 
      9 
      30 
      2211.0 
      2059 
      72.0 
      2339.0 
      2242 
      57.0 
      EV 
      ... 
      10.0 
      2013-09-30 15:00:00 
      STL 
      Lambert St Louis Intl 
      38.748697 
      -90.370028 
      618.0 
      -6.0 
      A 
      America/Chicago 
     
    
      1007545 
      2013 
      9 
      30 
      2211.0 
      2059 
      72.0 
      2339.0 
      2242 
      57.0 
      EV 
      ... 
      10.0 
      2013-09-30 15:00:00 
      STL 
      Lambert St Louis Intl 
      38.748697 
      -90.370028 
      618.0 
      -6.0 
      A 
      America/Chicago 
     
    
      1007546 
      2013 
      9 
      30 
      2231.0 
      2245 
      -14.0 
      2335.0 
      2356 
      -21.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      PWM 
      Portland Intl Jetport 
      43.646161 
      -70.309281 
      77.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007547 
      2013 
      9 
      30 
      2231.0 
      2245 
      -14.0 
      2335.0 
      2356 
      -21.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      PWM 
      Portland Intl Jetport 
      43.646161 
      -70.309281 
      77.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007548 
      2013 
      9 
      30 
      2231.0 
      2245 
      -14.0 
      2335.0 
      2356 
      -21.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      PWM 
      Portland Intl Jetport 
      43.646161 
      -70.309281 
      77.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007549 
      2013 
      9 
      30 
      2233.0 
      2113 
      80.0 
      112.0 
      30 
      42.0 
      UA 
      ... 
      10.0 
      2013-09-30 16:00:00 
      SFO 
      San Francisco Intl 
      37.618972 
      -122.374889 
      13.0 
      -8.0 
      A 
      America/Los_Angeles 
     
    
      1007550 
      2013 
      9 
      30 
      2233.0 
      2113 
      80.0 
      112.0 
      30 
      42.0 
      UA 
      ... 
      10.0 
      2013-09-30 16:00:00 
      SFO 
      San Francisco Intl 
      37.618972 
      -122.374889 
      13.0 
      -8.0 
      A 
      America/Los_Angeles 
     
    
      1007551 
      2013 
      9 
      30 
      2233.0 
      2113 
      80.0 
      112.0 
      30 
      42.0 
      UA 
      ... 
      10.0 
      2013-09-30 16:00:00 
      SFO 
      San Francisco Intl 
      37.618972 
      -122.374889 
      13.0 
      -8.0 
      A 
      America/Los_Angeles 
     
    
      1007552 
      2013 
      9 
      30 
      2235.0 
      2001 
      154.0 
      59.0 
      2249 
      130.0 
      B6 
      ... 
      10.0 
      2013-09-30 15:00:00 
      MCO 
      Orlando Intl 
      28.429394 
      -81.308994 
      96.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007553 
      2013 
      9 
      30 
      2235.0 
      2001 
      154.0 
      59.0 
      2249 
      130.0 
      B6 
      ... 
      10.0 
      2013-09-30 15:00:00 
      MCO 
      Orlando Intl 
      28.429394 
      -81.308994 
      96.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007554 
      2013 
      9 
      30 
      2235.0 
      2001 
      154.0 
      59.0 
      2249 
      130.0 
      B6 
      ... 
      10.0 
      2013-09-30 15:00:00 
      MCO 
      Orlando Intl 
      28.429394 
      -81.308994 
      96.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007555 
      2013 
      9 
      30 
      2237.0 
      2245 
      -8.0 
      2345.0 
      2353 
      -8.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      BTV 
      Burlington Intl 
      44.471861 
      -73.153278 
      335.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007556 
      2013 
      9 
      30 
      2237.0 
      2245 
      -8.0 
      2345.0 
      2353 
      -8.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      BTV 
      Burlington Intl 
      44.471861 
      -73.153278 
      335.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007557 
      2013 
      9 
      30 
      2237.0 
      2245 
      -8.0 
      2345.0 
      2353 
      -8.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      BTV 
      Burlington Intl 
      44.471861 
      -73.153278 
      335.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007558 
      2013 
      9 
      30 
      2240.0 
      2245 
      -5.0 
      2334.0 
      2351 
      -17.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      SYR 
      Syracuse Hancock Intl 
      43.111187 
      -76.106311 
      421.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007559 
      2013 
      9 
      30 
      2240.0 
      2245 
      -5.0 
      2334.0 
      2351 
      -17.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      SYR 
      Syracuse Hancock Intl 
      43.111187 
      -76.106311 
      421.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007560 
      2013 
      9 
      30 
      2240.0 
      2245 
      -5.0 
      2334.0 
      2351 
      -17.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      SYR 
      Syracuse Hancock Intl 
      43.111187 
      -76.106311 
      421.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007561 
      2013 
      9 
      30 
      2240.0 
      2250 
      -10.0 
      2347.0 
      7 
      -20.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      BUF 
      Buffalo Niagara Intl 
      42.940525 
      -78.732167 
      724.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007562 
      2013 
      9 
      30 
      2240.0 
      2250 
      -10.0 
      2347.0 
      7 
      -20.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      BUF 
      Buffalo Niagara Intl 
      42.940525 
      -78.732167 
      724.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007563 
      2013 
      9 
      30 
      2240.0 
      2250 
      -10.0 
      2347.0 
      7 
      -20.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      BUF 
      Buffalo Niagara Intl 
      42.940525 
      -78.732167 
      724.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007564 
      2013 
      9 
      30 
      2241.0 
      2246 
      -5.0 
      2345.0 
      1 
      -16.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      ROC 
      Greater Rochester Intl 
      43.118866 
      -77.672389 
      559.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007565 
      2013 
      9 
      30 
      2241.0 
      2246 
      -5.0 
      2345.0 
      1 
      -16.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      ROC 
      Greater Rochester Intl 
      43.118866 
      -77.672389 
      559.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007566 
      2013 
      9 
      30 
      2241.0 
      2246 
      -5.0 
      2345.0 
      1 
      -16.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      ROC 
      Greater Rochester Intl 
      43.118866 
      -77.672389 
      559.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007567 
      2013 
      9 
      30 
      2307.0 
      2255 
      12.0 
      2359.0 
      2358 
      1.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      BOS 
      General Edward Lawrence Logan Intl 
      42.364347 
      -71.005181 
      19.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007568 
      2013 
      9 
      30 
      2307.0 
      2255 
      12.0 
      2359.0 
      2358 
      1.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      BOS 
      General Edward Lawrence Logan Intl 
      42.364347 
      -71.005181 
      19.0 
      -5.0 
      A 
      America/New_York 
     
    
      1007569 
      2013 
      9 
      30 
      2307.0 
      2255 
      12.0 
      2359.0 
      2358 
      1.0 
      B6 
      ... 
      10.0 
      2013-09-30 17:00:00 
      BOS 
      General Edward Lawrence Logan Intl 
      42.364347 
      -71.005181 
      19.0 
      -5.0 
      A 
      America/New_York 
     
  
838182 rows × 38 columns
In [64]:
    
pred = 'dep_delay'
features = ['month', 'day', 'dep_time', 'arr_time', 'carrier', 'dest', 'air_time',
           'distance', 'lat', 'lon', 'alt', 'dewp', 'humid', 'wind_speed', 'wind_gust',
           'precip', 'pressure', 'visib']
features_v = df[features]
pred_v = df[pred]
how_late_is_late = 15.0
pd.options.mode.chained_assignment = None #default='warn'
# 因为航空公司不是一个数字,我们把它转化为数字哑变量
features_v['carrier'] = pd.factorize(features_v['carrier'])[0]
# dest也不是一个数字,我们也把它转为数字
features_v['dest'] = pd.factorize(features_v['dest'])[0]
scaler = StandardScaler()
scaled_features_v = scaler.fit_transform(features_v)
features_train, features_test, pred_train, pred_test = train_test_split(
    scaled_features_v, pred_v, test_size=0.30, random_state=0)
    
In [66]:
    
# 使用logistic回归来执行分类
clf_lr = sklearn.linear_model.LogisticRegression(penalty='l2',
                                                class_weight='balanced')
logistic_fit = clf_lr.fit(features_train, np.where(pred_train >= how_late_is_late, 1, 0))
predictions = clf_lr.predict(features_test)
    
In [67]:
    
# summary Report
# Confusion Matrix
cm_lr = confusion_matrix(np.where(pred_test >= how_late_is_late, 1, 0),
                        predictions)
print("Confusion Matrix")
print(pd.DataFrame(cm_lr))
# 获取精确值
report_lr = precision_recall_fscore_support(
    list(np.where(pred_test >= how_late_is_late, 1, 0)),
    list(predictions), average='binary')
#打印精度值
print("\nprecision = %0.2f, recall = %0.2f, F1 = %0.2f, accuracy = %0.2f"
     % (report_lr[0], report_lr[1], report_lr[2],
       accuracy_score(list(np.where(pred_test >= how_late_is_late, 1, 0)),
                     list(predictions))))
    
    
Confusion Matrix
        0      1
0  135181  64481
1   17910  33883
precision = 0.34, recall = 0.65, F1 = 0.45, accuracy = 0.67
In [68]:
    
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import sklearn
from sklearn.cluster import KMeans
from sklearn import linear_model, cross_validation, cluster
from sklearn.metrics import classification_report, confusion_matrix, precision_recall_fscore_support, accuracy_score
from sklearn.cross_validation import train_test_split, cross_val_score, ShuffleSplit
from sklearn.preprocessing import StandardScaler, OneHotEncoder
flights = pd.read_csv('data/nycflights13/flights.csv.gz')
weather = pd.read_csv('data/nycflights13/weather.csv.gz')
airports = pd.read_csv('data/nycflights13/airports.csv.gz')
df_withweather = pd.merge(flights, weather, how='left', on=['year', 'month', 'day', 'hour'])
df = pd.merge(df_withweather, airports, how='left', left_on='dest', right_on='faa')
df = df.dropna()
    
In [69]:
    
pred = 'dep_delay'
features = ['month', 'day', 'dep_time', 'arr_time', 'carrier', 'dest', 'air_time',
           'distance', 'lat', 'lon', 'alt', 'dewp', 'humid', 'wind_speed', 'wind_gust',
           'precip', 'pressure', 'visib']
features_v = df[features]
pred_v = df[pred]
how_late_is_late = 15.0
pd.options.mode.chained_assignment = None #default='warn'
# 因为航空公司不是一个数字,我们把它转化为数字哑变量
features_v['carrier'] = pd.factorize(features_v['carrier'])[0]
# dest也不是一个数字,我们也把它转为数字
features_v['dest'] = pd.factorize(features_v['dest'])[0]
scaler = StandardScaler()
scaled_features_v = scaler.fit_transform(features_v)
features_train, features_test, pred_train, pred_test = train_test_split(
    scaled_features_v, pred_v, test_size=0.30, random_state=0)
    
In [71]:
    
cluster = sklearn.cluster.KMeans(n_clusters=8, init='k-means++', n_init=10, max_iter=300, tol=0.0001, precompute_distances='auto', random_state=None, verbose=0)
cluster.fit(features_train)
    
    Out[71]:
KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
    n_clusters=8, n_init=10, n_jobs=1, precompute_distances='auto',
    random_state=None, tol=0.0001, verbose=0)
In [72]:
    
# 预测测试数据
result = cluster.predict(features_test)
    
In [73]:
    
result
    
    Out[73]:
array([7, 0, 7, ..., 7, 5, 0])
In [78]:
    
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
reduced_data = PCA(n_components=2).fit_transform(features_train)
kmeans = KMeans(init='k-means++', n_clusters=8, n_init=10)
kmeans.fit(reduced_data)
# mesh的步长
h = .02
x_min, x_max = reduced_data[:, 0].min() - 1, reduced_data[:, 0].max() + 1
y_min, y_max = reduced_data[:, 1].min() - 1, reduced_data[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
z = kmeans.predict(np.c_[xx.ravel(), yy.ravel()])
z = z.reshape(xx.shape)
plt.figure(1)
plt.clf()
plt.imshow(z, interpolation='nearest',
          extend=(xx.min(), xx.max(), yy.min(), yy.max()),
          cmap=plt.cm.Paired
          #aspect='auto' 
          # origin='lower'
          )
plt.plot(reduced_data[:, 0], reduced_data[:, 1], 'k.', markersize=2)
centroids = kmeans.cluster_centers_
plt.scatter(centroids[:, 0], centroids[:, 1],
           marker='x', s=169, linewidths=3,
           color='w', zorder=10)
plt.title('K-Means clustering on the dataset (PCA-reduced data)\n'
         'Centroids are marked with white cross')
plt.xlim(x_min, x_max)
plt.ylim(y_min, y_max)
plt.xticks(())
plt.yticks(())
plt.show()
    
    
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-78-605c50db6054> in <module>()
     20 plt.imshow(z, interpolation='nearest',
     21           extend=(xx.min(), xx.max(), yy.min(), yy.max()),
---> 22           cmap=plt.cm.Paired
     23           #aspect='auto'
     24           # origin='lower'
d:\app\Anaconda3\lib\site-packages\matplotlib\pyplot.py in imshow(X, cmap, norm, aspect, interpolation, alpha, vmin, vmax, origin, extent, shape, filternorm, filterrad, imlim, resample, url, hold, data, **kwargs)
   3078                         filternorm=filternorm, filterrad=filterrad,
   3079                         imlim=imlim, resample=resample, url=url, data=data,
-> 3080                         **kwargs)
   3081     finally:
   3082         ax._hold = washold
d:\app\Anaconda3\lib\site-packages\matplotlib\__init__.py in inner(ax, *args, **kwargs)
   1708                     warnings.warn(msg % (label_namer, func.__name__),
   1709                                   RuntimeWarning, stacklevel=2)
-> 1710             return func(ax, *args, **kwargs)
   1711         pre_doc = inner.__doc__
   1712         if pre_doc is None:
d:\app\Anaconda3\lib\site-packages\matplotlib\axes\_axes.py in imshow(self, X, cmap, norm, aspect, interpolation, alpha, vmin, vmax, origin, extent, shape, filternorm, filterrad, imlim, resample, url, **kwargs)
   5190         im = mimage.AxesImage(self, cmap, norm, interpolation, origin, extent,
   5191                               filternorm=filternorm, filterrad=filterrad,
-> 5192                               resample=resample, **kwargs)
   5193 
   5194         im.set_data(X)
d:\app\Anaconda3\lib\site-packages\matplotlib\image.py in __init__(self, ax, cmap, norm, interpolation, origin, extent, filternorm, filterrad, resample, **kwargs)
    755             filterrad=filterrad,
    756             resample=resample,
--> 757             **kwargs
    758         )
    759 
d:\app\Anaconda3\lib\site-packages\matplotlib\image.py in __init__(self, ax, cmap, norm, interpolation, origin, filternorm, filterrad, resample, **kwargs)
    233         self._imcache = None
    234 
--> 235         self.update(kwargs)
    236 
    237     def __getstate__(self):
d:\app\Anaconda3\lib\site-packages\matplotlib\artist.py in update(self, props)
    845         try:
    846             ret = [_update_property(self, k, v)
--> 847                    for k, v in props.items()]
    848         finally:
    849             self.eventson = store
d:\app\Anaconda3\lib\site-packages\matplotlib\artist.py in <listcomp>(.0)
    845         try:
    846             ret = [_update_property(self, k, v)
--> 847                    for k, v in props.items()]
    848         finally:
    849             self.eventson = store
d:\app\Anaconda3\lib\site-packages\matplotlib\artist.py in _update_property(self, k, v)
    838                 func = getattr(self, 'set_' + k, None)
    839                 if not callable(func):
--> 840                     raise AttributeError('Unknown property %s' % k)
    841                 return func(v)
    842 
AttributeError: Unknown property extend
    
 
扩展我们的算法:有时我们需要处理大量数据,并且采样已经无效,这个时候可以通过把数据分到多个机器来处理。
Spark是一个用来并行进行大数据处理的API。它将数据切割到集群来处理。在开发阶段,我们可以只在本地运行。
我们使用PySpark Shell来连接到集群。
运行下面路径的pyspark,会启动PySpark Shell
~/spark/bin/pyspark (Max/Linux)
C:\spark\bin\pyspark (Windows)
此时,可以在Shell中运行文件加载:
lines = sc.textFile("README.md")
lines.first() # 加载第一行
可以在http://localhost:4040查看PySpark运行的Job
大多数情况下,我们希望能够在Jupyter Notebook中运行PySpark,为此,我们需要设置环境变量:
PYSPARK_PYTHON=python3
PYSPARK_DRIVER_PYTHON="jupyter"
PYSPARK_DRIVER_PYTHON_OPTS="notebook"
然后运行~/spark/bin/pyspark,最后一个命令会启动一个jupyter server,样子跟我们用的一样。
In [1]:
    
lines = sc.text('README.md')
lines.take(5)
    
    
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-2f627e434df0> in <module>()
----> 1 lines = sc.text('README.md')
      2 lines.take(5)
NameError: name 'sc' is not defined
我们看看http://localhost:4040 可以查看运行的Job
In [2]:
    
linesWithSpark = lines.filter(lambda line: 'spark' in line)
linesWithSpark.count()
    
    
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-2-a9d79c569bce> in <module>()
----> 1 linesWithSpark = lines.filter(lambda line: 'spark' in line)
      2 linesWithSpark.count()
NameError: name 'lines' is not defined
Spark的基本类型是RDD(resilient distributed dataset),它是基本分布式数据类型。RDD有两类操作,第一个是变换(transformation),返回值仍然是RDD,另外一种是动作(action),用来计算结果。Spark的操作是Lazy的,也就是说只有在执行action时才会真正的开始处理。
In [ ]:
    
    
Content source: liulixiang1988/documents
Similar notebooks: