現実のデータの確認

推定地が実際のデータをうまく再現できるかをチェックする

simulationの対象はmunicipality別の、候補者ごとの得票率

実際のデータではdataの3~6行が候補者別のvote数（clark, dean, edwards, kerryの順番）

データの見せ方は以下の3つ

州ごとにmunicipalityを候補者二人の得票率を軸とした六つの散布図にプロットする
州ごとにmunicipalityの結果を集計し、時系列データとして候補者ごとの得票率をプロットする
州ごとにmunicipalityを、縦軸を候補者を合算した投票率とした散布図にプロットする。玉の大きさで別の指標も観れるようにする。プロットするのは、「横軸を白人/非白人比率、玉の大きさをkerryの得票率」、「横軸をoverba/それ以外の比率、玉の大きさをkerryの得票率」、「横軸を20000以下、玉の大きさをkerryの得票率」の3種類で、それぞれ7,5の35プロットを一枚の画像にする。

ここでは実際のデータを上のようにプロットする



In [1]:

    
using Optim
using StatsFuns
using DataFrames
using Gadfly
using PyPlot

# inverse beta distribution function を行列に対応するように拡張
import StatsFuns.betainvcdf
betainvcdf(alpha::Number, beta::Number, x::Array) = reshape([betainvcdf(alpha, beta, i) for i in reshape(x, 1, size(x, 1)*size(x, 2)) ], size(x, 1), size(x, 2))

# maxをArray{String, 1}に対応するように拡張
# しかしArray型で入っているのでAnyに対応させる
import Base.max
max(number::Real, comparison::Any) = [max(number, parse(Int, i)) for i in comparison] 

# normpdfを配列に拡張
import StatsFuns.normpdf
normpdf(array::Array{Float64, 2}) = reshape([normpdf(i) for i in reshape(array, 1, size(array, 1)*size(array, 2))], size(array, 1), size(array, 2))

# Candを作る
include("bayes.jl");

一つ目



In [10]:

    
# Votes_sを用いる
plt = PyPlot
names = ["clark", "dean", "edwards", "kerry"]
DeltaO  = 0.6891
DeltaMO = 0.5366
combination = [(i, i+j) for i in 1:3 for j in 1:(4-i)]

for S in 1:size(Cand,1)

    DELTA = DeltaO*Open+DeltaMO*MOpen
    RTOT = RDemHat.*(1+DELTA)-VOther
    RTot_s = RTOT[Cand[S, 14]:Cand[S, 15], :]
    RTot_s = max(RTot_s, sum(Votes[Cand[S, 14]:Cand[S, 15], :], 2))
    Votes_s = Votes[Cand[S, 14]:Cand[S, 15], :]./(RTot_s*ones(1,4))

    num_rows, num_cols = 2, 3
    fig, axes = subplots(num_rows, num_cols, figsize=(12, 8))
    axes = vec(axes)

    # cand1 vs cand2で、cand1が横軸、cand2が縦軸
    for (n,c) in enumerate(combination)
        cand1 = names[c[1]]
        cand2 = names[c[2]]
        ax = axes[n]
        ax[:scatter](Votes_s[:, c[1]], Votes_s[:, c[2]], s = 3)
        ax[:set_title]("$cand1 vs $cand2")
        #ax[:set_xticks]([0,0.25,0.5,0.75])
        #ax[:set_yticks]([0,0.25,0.5,0.75])
        savefig("state_number_$S")
    end
end

二つ目



In [35]:

    
shares = Array(Float64, size(Cand,1), 4)

for S in 1:size(Cand,1)
    shares[S,:] = sum(Votes[Cand[S, 14]:Cand[S, 15], :], 1)./ sum(RTOT[Cand[S, 14]:Cand[S, 15], :])
end

fig, ax = subplots()
for i in 1:4
    cand = names[i]
    ax[:plot](shares[:, i], linewidth=2, alpha=0.6, label="$cand")
end
ax[:legend]()
savefig("vote_share_states")

三つ目



In [12]:

    
# 人種の割合
# (white,black,otherasian+indian+other)の順番
dFXRace









    Out[12]:





3020×3 Array{Float64,2}:
 0.810272  0.173388   0.0163399
 0.882093  0.102615   0.0152917
 0.517332  0.46799    0.0146778
 0.769208  0.223692   0.0070991
 0.952964  0.0111547  0.0358813
 0.265601  0.725804   0.0085954
 0.58388   0.412795   0.0033251
 0.800173  0.184122   0.015705 
 0.611017  0.384757   0.0042266
 0.937479  0.0539249  0.0085961
 0.873385  0.109045   0.0175699
 0.555612  0.438388   0.0060002
 0.562471  0.430986   0.0065437
 ⋮                             
 0.414302  0.509073   0.0583763
 0.621227  0.309961   0.0562546
 0.877533  0.0755726  0.0309398
 0.343456  0.621993   0.0225511
 0.886961  0.065876   0.0321495
 0.47327   0.459132   0.0445664
 0.610868  0.263145   0.10883  
 0.723443  0.205992   0.0535477
 0.48229   0.46916    0.0287374
 0.66593   0.313042   0.008218 
 0.317782  0.657613   0.0169072
 0.748245  0.190217   0.0424863



In [17]:

    
# 教育水準の割合
# (overba,underba,hs,other)の順番
dFXEduc









    Out[17]:





3020×4 Array{Real,2}:
 0.180217   0.268694  0.338251  0.212838
 0.230663   0.29349   0.296094  0.179752
 0.109441   0.213008  0.32409   0.353461
 0.0710487  0.203545  0.357312  0.368095
 0.0959884  0.248383  0.360097  0.295531
 0.0774108  0.175033  0.352312  0.395244
 0.10409    0.228518  0.344955  0.322438
 0.152199   0.264487  0.322313  0.261001
 0.0954811  0.225211  0.320978  0.35833 
 0.0972363  0.189361  0.348588  0.364814
 0.0993746  0.204888  0.357656  0.338082
 0.0958463  0.206453  0.347242  0.350459
 0.120721   0.21088   0.376624  0.291775
 ⋮                                      
 0.0863374  0.194184  0.394842  0.324636
 0.192754   0.238743  0.319843  0.24866 
 0.214133   0.271058  0.326819  0.18799 
 0.16714    0.231805  0.37292   0.228135
 0.235483   0.270963  0.307874  0.18568 
 0.364959   0.291038  0.227864  0.11614 
 0.125606   0.233087  0.392332  0.248974
 0.199241   0.286545  0.313294  0.200921
 0.173905   0.309734  0.333852  0.182509
 0.129095   0.275985  0.350947  0.243973
 0.112691   0.212865  0.457087  0.217357
 0.269478   0.302081  0.287428  0.141013



In [18]:

    
# 足せば1になるのでやはりこいつも各収入階層の割合を示している。
# [20000;35000;72500;120000]の順番
dFXIncm









    Out[18]:





3020×4 Array{Real,2}:
 0.28694   0.245429  0.391685  0.0759455
 0.289652  0.269438  0.345979  0.094931 
 0.49837   0.22431   0.226994  0.050326 
 0.412163  0.264256  0.288365  0.035216 
 0.335143  0.287631  0.327103  0.0501228
 0.573504  0.230403  0.170298  0.0257951
 0.503935  0.250179  0.198068  0.0478179
 0.399053  0.268797  0.272874  0.0592772
 0.430113  0.280303  0.256377  0.0332074
 0.406256  0.291521  0.265795  0.0364272
 0.371185  0.283235  0.305959  0.0396202
 0.50331   0.224149  0.223045  0.0494956
 0.465877  0.21724   0.267155  0.0497274
 ⋮                                      
 0.456798  0.256552  0.238227  0.0484234
 0.301206  0.237067  0.328543  0.133184 
 0.289636  0.239351  0.338315  0.132698 
 0.39833   0.24144   0.273866  0.0863643
 0.295791  0.23883   0.351934  0.113445 
 0.249336  0.223871  0.34975   0.177043 
 0.310825  0.249744  0.330599  0.108833 
 0.293871  0.229433  0.344183  0.132513 
 0.327208  0.236643  0.342856  0.0932933
 0.390385  0.244424  0.294482  0.0707088
 0.509523  0.178897  0.253723  0.0578573
 0.225625  0.207086  0.369235  0.198054



In [13]:

    
# 人種
plt = PyPlot
names = ["clark", "dean", "edwards", "kerry"]
DeltaO  = 0.6891
DeltaMO = 0.5366
DELTA = DeltaO*Open+DeltaMO*MOpen
RTOT = RDemHat.*(1+DELTA)-VOther

voterate = sum(Votes, 2)./RTOT
whiterate = dFXRace[:, 1]./sum(dFXRace[:,2:3],2)
kerry = Votes[:, 4]./RTOT

num_rows, num_cols = 7, 5
fig, axes = subplots(num_rows, num_cols, figsize=(12, 20))
axes = vec(axes)

for S in 1:size(Cand,1)

    voterate_s = voterate[Cand[S, 14]:Cand[S, 15], :]
    whiterate_s = whiterate[Cand[S, 14]:Cand[S, 15], :]
    kerry_s = kerry[Cand[S, 14]:Cand[S, 15], :]

    ax = axes[S]
    ax[:scatter](whiterate_s, voterate_s, s = 50*kerry_s, alpha = 0.5)
    ax[:set_title]("state_number_$S")
    #ax[:set_xticks]([0,0.25,0.5,0.75])
    #ax[:set_yticks]([0,0.25,0.5,0.75])
    
end
# matplotlibそのまんま使えるのね
plt.tight_layout(pad=0.4, w_pad=0.5, h_pad=1.0)
savefig("race_plot")









    



WARNING: the no-op `transpose` fallback is deprecated, and no more specific `transpose` method for Ptr{PyCall.PyObject_struct} exists. Consider `permutedims(x, [2, 1])` or writing a specific `transpose(x::Ptr{PyCall.PyObject_struct})` method if appropriate.
 in depwarn(::String, ::Symbol) at ./deprecated.jl:64
 in transpose(::Ptr{





    












    



PyCall.PyObject_struct}) at ./deprecated.jl:770
 in transpose_f!(::Base.#transpose, ::Array{Ptr{PyCall.PyObject_struct},2}, ::Array{Ptr{PyCall.PyObject_struct},2}) at ./arraymath.jl:369
 in transpose(::Array{Ptr{PyCall.PyObject_struct},2}) at ./arraymath.jl:407
 in copy(::PyCall.PyArray{Ptr{PyCall.PyObject_struct},2}) at /Users/susu/.julia/v0.5/PyCall/src/numpy.jl:337
 in convert(::Type{Array{Ptr{PyCall.PyObject_struct},N}}, ::PyCall.PyObject) at /Users/susu/.julia/v0.5/PyCall/src/numpy.jl:453
 in convert(::Type{Array{PyCall.PyObject,N}}, ::PyCall.PyObject) at /Users/susu/.julia/v0.5/PyCall/src/numpy.jl:484
 in (::PyCall.##8#9{DataType,PyCall.PyObject})(::Int64) at /Users/susu/.julia/v0.5/PyCall/src/conversions.jl:180
 in ntuple(::PyCall.##8#9{DataType,PyCall.PyObject}, ::Int64) at ./tuple.jl:65
 in convert(::Type{Tuple{PyPlot.Figure,Array{PyCall.PyObject,N}}}, ::PyCall.PyObject) at /Users/susu/.julia/v0.5/PyCall/src/conversions.jl:180
 in convert(::Type{PyCall.PyAny}, ::PyCall.PyObject) at /Users/susu/.julia/v0.5/PyCall/src/conversions.jl:806
 in #pycall#66(::Array{Any,1}, ::Function, ::PyCall.PyObject, ::Type{PyCall.PyAny}, ::Int64, ::Vararg{Int64,N}) at /Users/susu/.julia/v0.5/PyCall/src/PyCall.jl:568
 in (::PyCall.#kw##pycall)(::Array{Any,1}, ::PyCall.#pycall, ::PyCall.PyObject, ::Type{PyCall.PyAny}, ::Int64, ::Vararg{Int64,N}) at ./<missing>:0
 in #subplots#115(::Array{Any,1}, ::Function, ::Int64, ::Vararg{Int64,N}) at /Users/susu/.julia/v0.5/PyPlot/src/PyPlot.jl:399
 in (::PyPlot.#kw##subplots)(::Array{Any,1}, ::PyPlot.#subplots, ::Int64, ::Vararg{Int64,N}) at ./<missing>:0
 in include_string(::String, ::String) at ./loading.jl:441
 in execute_request(::ZMQ.Socket, ::IJulia.Msg) at /Users/susu/.julia/v0.5/IJulia/src/execute_request.jl:169
 in eventloop(::ZMQ.Socket) at /Users/susu/.julia/v0.5/IJulia/src/eventloop.jl:8
 in (::IJulia.##9#15)() at ./task.jl:360
while loading In[13], in expression starting on line 14



In [43]:

    
# 学歴
plt = PyPlot
names = ["clark", "dean", "edwards", "kerry"]
DeltaO  = 0.6891
DeltaMO = 0.5366
DELTA = DeltaO*Open+DeltaMO*MOpen
RTOT = RDemHat.*(1+DELTA)-VOther

voterate = sum(Votes, 2)./RTOT
educrate = dFXEduc[:, 1]./sum(dFXEduc[:,2:4],2)
kerry = Votes[:, 4]./RTOT

num_rows, num_cols = 7, 5
fig, axes = subplots(num_rows, num_cols, figsize=(12, 20))
axes = vec(axes)

for S in 1:size(Cand,1)

    voterate_s = voterate[Cand[S, 14]:Cand[S, 15], :]
    educrate_s = educrate[Cand[S, 14]:Cand[S, 15], :]
    kerry_s = kerry[Cand[S, 14]:Cand[S, 15], :]

    ax = axes[S]
    ax[:scatter](educrate_s, voterate_s, s = 50*kerry_s, alpha = 0.5)
    ax[:set_title]("state_number_$S")
    #ax[:set_xticks]([0,0.25,0.5,0.75])
    #ax[:set_yticks]([0,0.25,0.5,0.75])
    
end
# matplotlibそのまんま使えるのね
plt.tight_layout(pad=0.4, w_pad=0.5, h_pad=1.0)
savefig("education_plot")



In [44]:

    
# 収入
plt = PyPlot
names = ["clark", "dean", "edwards", "kerry"]
DeltaO  = 0.6891
DeltaMO = 0.5366
DELTA = DeltaO*Open+DeltaMO*MOpen
RTOT = RDemHat.*(1+DELTA)-VOther

voterate = sum(Votes, 2)./RTOT
lowincome = dFXIncm[:, 1]
kerry = Votes[:, 4]./RTOT

num_rows, num_cols = 7, 5
fig, axes = subplots(num_rows, num_cols, figsize=(12, 20))
axes = vec(axes)

for S in 1:size(Cand,1)

    voterate_s = voterate[Cand[S, 14]:Cand[S, 15], :]
    lowincome_s = lowincome[Cand[S, 14]:Cand[S, 15], :]
    kerry_s = kerry[Cand[S, 14]:Cand[S, 15], :]

    ax = axes[S]
    ax[:scatter](lowincome_s, voterate_s, s = 50*kerry_s, alpha = 0.5)
    ax[:set_title]("state_number_$S")
    #ax[:set_xticks]([0,0.25,0.5,0.75])
    #ax[:set_yticks]([0,0.25,0.5,0.75])
    
end
# matplotlibそのまんま使えるのね
plt.tight_layout(pad=0.4, w_pad=0.5, h_pad=1.0)
savefig("lowincome_plot")