Exploring alert data for cities within the system

In this notebook, we willillustrate how to access the data from the system and run some basic models.



In [1]:

    
import pandas as pd
import getpass, os
os.environ['PSQL_USER']='dengueadmin'
os.environ['PSQL_HOST']='localhost'
os.environ['PSQL_DB']='dengue'
os.environ['PSQL_PASSWORD']=getpass.getpass("Enter the database password: ")









    



Enter the database password: ········



In [2]:

    
os.chdir('..')
from infodenguepredict.data.infodengue import get_temperature_data, get_alerta_table, get_tweet_data
%pylab inline









    



Using TensorFlow backend.






    



Populating the interactive namespace from numpy and matplotlib

Loading The Data

for our exploration let's pick the city of Rio de Janeiro.



In [3]:

    
A = get_alerta_table(3304557)#(3303500)
T = get_temperature_data(3304557)#(3303500)
Tw = get_tweet_data(3304557)#(3303500)

Let's look at the tables



In [4]:

    
A.head()









    Out[4]:







  
    
      
      SE
      casos_est
      casos_est_min
      casos_est_max
      casos
      municipio_geocodigo
      p_rt1
      p_inc100k
      Localidade_id
      nivel
      versao_modelo
      municipio_nome
    
    
      data_iniSE
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      2010-01-03
      201001
      30.0
      30
      30
      30
      3304557
      0.0
      0.461621
      0
      1
      2017-10-18
      Rio de Janeiro
    
    
      2010-01-10
      201002
      46.0
      46
      46
      46
      3304557
      0.0
      0.707819
      0
      1
      2017-10-18
      Rio de Janeiro
    
    
      2010-01-17
      201003
      30.0
      30
      30
      30
      3304557
      0.0
      0.461621
      0
      2
      2017-10-18
      Rio de Janeiro
    
    
      2010-01-24
      201004
      51.0
      51
      51
      51
      3304557
      0.0
      0.784756
      0
      2
      2017-10-18
      Rio de Janeiro
    
    
      2010-01-31
      201005
      58.0
      58
      58
      58
      3304557
      0.0
      0.892467
      0
      2
      2017-10-18
      Rio de Janeiro



In [5]:

    
T = T[~T.index.duplicated()]
T.to_csv('temperature_rio.csv', header=True, sep=',')
T.head()









    Out[5]:







  
    
      
      temp_min
      temp_max
      umid_min
      pressao_min
    
    
      data_dia
      
      
      
      
    
  
  
    
      2010-01-01
      24.0
      33.0
      40.0
      1007.0
    
    
      2010-01-02
      24.0
      33.0
      42.0
      1010.0
    
    
      2010-01-03
      25.0
      33.0
      44.0
      1012.0
    
    
      2010-01-04
      25.0
      32.0
      50.0
      1012.0
    
    
      2010-01-05
      24.0
      33.0
      56.0
      1007.0



In [6]:

    
Tw = Tw[~Tw.index.duplicated()]
Tw.head()









    Out[6]:







  
    
      
      numero
      CID10_codigo
    
    
      data_dia
      
      
    
  
  
    
      2012-08-01
      26
      A90
    
    
      2012-08-02
      10
      A90
    
    
      2012-08-03
      31
      A90
    
    
      2012-08-04
      15
      A90
    
    
      2012-08-05
      8
      A90

Let's try to join the tables by date. To align them, we must downsample each one to a weekly time frame



In [7]:

    
T.resample('W-SUN').mean().tail()









    Out[7]:







  
    
      
      temp_min
      temp_max
      umid_min
      pressao_min
    
    
      data_dia
      
      
      
      
    
  
  
    
      2017-09-24
      NaN
      NaN
      NaN
      NaN
    
    
      2017-10-01
      NaN
      NaN
      NaN
      NaN
    
    
      2017-10-08
      NaN
      NaN
      NaN
      NaN
    
    
      2017-10-15
      NaN
      NaN
      NaN
      NaN
    
    
      2017-10-22
      NaN
      NaN
      NaN
      NaN



In [8]:

    
Full = A.join(T.resample('W-SUN').mean()).join(Tw.resample('W-SUN').sum())
Full.head()









    Out[8]:







  
    
      
      SE
      casos_est
      casos_est_min
      casos_est_max
      casos
      municipio_geocodigo
      p_rt1
      p_inc100k
      Localidade_id
      nivel
      versao_modelo
      municipio_nome
      temp_min
      temp_max
      umid_min
      pressao_min
      numero
    
    
      data_iniSE
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      2010-01-03
      201001
      30.0
      30
      30
      30
      3304557
      0.0
      0.461621
      0
      1
      2017-10-18
      Rio de Janeiro
      24.333333
      33.000000
      42.000000
      1009.666667
      NaN
    
    
      2010-01-10
      201002
      46.0
      46
      46
      46
      3304557
      0.0
      0.707819
      0
      1
      2017-10-18
      Rio de Janeiro
      25.428571
      34.000000
      43.285714
      1008.000000
      NaN
    
    
      2010-01-17
      201003
      30.0
      30
      30
      30
      3304557
      0.0
      0.461621
      0
      2
      2017-10-18
      Rio de Janeiro
      24.142857
      34.000000
      43.857143
      1010.285714
      NaN
    
    
      2010-01-24
      201004
      51.0
      51
      51
      51
      3304557
      0.0
      0.784756
      0
      2
      2017-10-18
      Rio de Janeiro
      23.714286
      34.285714
      39.285714
      1009.000000
      NaN
    
    
      2010-01-31
      201005
      58.0
      58
      58
      58
      3304557
      0.0
      0.892467
      0
      2
      2017-10-18
      Rio de Janeiro
      23.857143
      33.857143
      41.285714
      1009.857143
      NaN

Note que as datas para as datas mais antigas os dados faltantes de Temperatura e Tweets, foram substituídos por NaN. Podemos remover estas datas, ficando com uma tabela sem dados faltantes. Mas perde-se mais de dois anos de dados.



In [9]:

    
Short = Full.dropna()
Short.head()









    Out[9]:







  
    
      
      SE
      casos_est
      casos_est_min
      casos_est_max
      casos
      municipio_geocodigo
      p_rt1
      p_inc100k
      Localidade_id
      nivel
      versao_modelo
      municipio_nome
      temp_min
      temp_max
      umid_min
      pressao_min
      numero
    
    
      data_iniSE
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      2012-08-05
      201232
      452.0
      452
      452
      452
      3304557
      1.278890e-10
      6.95509
      0
      1
      2017-10-18
      Rio de Janeiro
      18.571429
      28.000000
      40.142857
      886.857143
      90.0
    
    
      2012-08-12
      201233
      478.0
      478
      478
      478
      3304557
      6.105840e-03
      7.35516
      0
      1
      2017-10-18
      Rio de Janeiro
      16.142857
      26.714286
      34.571429
      1019.428571
      83.0
    
    
      2012-08-19
      201234
      377.0
      377
      377
      377
      3304557
      5.351960e-05
      5.80104
      0
      1
      2017-10-18
      Rio de Janeiro
      17.571429
      27.714286
      39.857143
      1023.000000
      63.0
    
    
      2012-08-26
      201235
      326.0
      326
      326
      326
      3304557
      2.434650e-05
      5.01628
      0
      1
      2017-10-18
      Rio de Janeiro
      16.857143
      28.142857
      31.571429
      1020.571429
      57.0
    
    
      2012-09-02
      201236
      211.0
      211
      211
      211
      3304557
      1.869840e-12
      3.24673
      0
      1
      2017-10-18
      Rio de Janeiro
      17.714286
      25.428571
      40.142857
      1017.857143
      64.0



In [10]:

    
Short[['casos_est', 'temp_min', 'umid_min', 'numero']].plot(subplots=True, figsize=(15,10),grid=True);

Calculando uma previsão



In [12]:

    
from infodenguepredict.models import sarimax,GAS,GASX
import statsmodels.api as sm









    



---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-12-7a97f23680f7> in <module>()
----> 1 from infodenguepredict.models import sarimax,GAS,GASX
      2 import statsmodels.api as sm

~/Documentos/Projects_software/InfoDenguePredict/infodenguepredict/models/GAS.py in <module>()
      1 import numpy as np
      2 import pandas as pd
----> 3 import pyflux as pf
      4 from datetime import datetime
      5 import matplotlib.pyplot as plt

/usr/local/lib/python3.6/dist-packages/pyflux/__init__.py in <module>()
     12     from . import __check_build
     13 
---> 14 from .arma import *
     15 from .var import *
     16 from .ensembles import *

/usr/local/lib/python3.6/dist-packages/pyflux/arma/__init__.py in <module>()
----> 1 from .arma import ARIMA
      2 from .arimax import ARIMAX
      3 from .nnar import NNAR

/usr/local/lib/python3.6/dist-packages/pyflux/arma/arma.py in <module>()
      8 import scipy.special as sp
      9 
---> 10 from .. import families as fam
     11 from .. import output as op
     12 from .. import tests as tst

/usr/local/lib/python3.6/dist-packages/pyflux/families/__init__.py in <module>()
----> 1 from .cauchy import Cauchy
      2 from .exponential import Exponential
      3 from .flat import Flat
      4 from .inverse_gamma import InverseGamma
      5 from .inverse_wishart import InverseWishart

/usr/local/lib/python3.6/dist-packages/pyflux/families/cauchy.py in <module>()
      6 from .family import Family
      7 from .flat import Flat
----> 8 from .normal import Normal
      9 
     10 from .gas_recursions import gas_recursion_cauchy_orderone, gas_recursion_cauchy_ordertwo

/usr/local/lib/python3.6/dist-packages/pyflux/families/normal.py in <module>()
      6 from .flat import Flat
      7 
----> 8 from .gas_recursions import gas_recursion_normal_orderone, gas_recursion_normal_ordertwo
      9 from .gas_recursions import gasx_recursion_normal_orderone, gasx_recursion_normal_ordertwo
     10 from .gas_recursions import gas_llev_recursion_normal_orderone, gas_llev_recursion_normal_ordertwo

ImportError: /usr/local/lib/python3.6/dist-packages/pyflux/families/gas_recursions.cpython-36m-x86_64-linux-gnu.so: undefined symbol: PyFPE_jbuf



In [ ]:

    
fig, axes = plt.subplots(1, 2, figsize=(15, 4))

fig = sm.graphics.tsa.plot_acf(Full.ix[1:, 'casos'], lags=52, ax=axes[0])
fig = sm.graphics.tsa.plot_pacf(Full.ix[1:, 'casos'], lags=52, ax=axes[1])



In [13]:

    
# Short.casos = Short.casos.apply(pd.np.log) 
model_1 = sarimax.build_model(Full, 'casos', [])



In [14]:

    
fit_1 = model_1.fit()



In [15]:

    
fit_1.summary()









    Out[15]:





Statespace Model Results

  Dep. Variable:                casos                No. Observations:      379   


  Model:            SARIMAX(2, 1, 1)x(2, 1, 1, 8)    Log Likelihood      -2672.281


  Date:                   Fri, 14 Apr 2017           AIC                 5358.561 


  Time:                       16:30:59               BIC                 5386.124 


  Sample:                    01-03-2010              HQIC                5369.499 


                           - 04-02-2017                                          


  Covariance Type:               opg                                              




              coef      std err       z       P>|z|   [0.025     0.975]  


  ar.L1         0.4349      0.082      5.306   0.000      0.274      0.596


  ar.L2         0.2611      0.048      5.432   0.000      0.167      0.355


  ma.L1        -0.2926      0.081     -3.618   0.000     -0.451     -0.134


  ar.S.L8      -0.1916      0.037     -5.123   0.000     -0.265     -0.118


  ar.S.L16     -0.0658      0.107     -0.616   0.538     -0.275      0.144


  ma.S.L8      -0.9813      0.131     -7.473   0.000     -1.239     -0.724


  sigma2     2.184e+05   2.44e+04      8.954   0.000   1.71e+05   2.66e+05




  Ljung-Box (Q):           35.76    Jarque-Bera (JB):   3285.15


  Prob(Q):                 0.66     Prob(JB):            0.00  


  Heteroskedasticity (H):  0.15     Skew:                -1.28 


  Prob(H) (two-sided):     0.00     Kurtosis:            17.75



In [16]:

    
def plot_pred(fit):
    plt.Figure(figsize=(10,7))
    predict = fit.get_prediction(start='2017-01-01', dynamic=False)
    predict_ci = predict.conf_int()
    Full.casos.plot(style='o',label='obs')
    predict.predicted_mean.plot(style='r--', label='In sample')
    plt.fill_between(predict_ci.index, predict_ci.ix[:, 0], predict_ci.ix[:, 1], color='r', alpha=0.1)
    forecast = fit.get_prediction(start='2017-03-05', end='2017-06-21', dynamic=False)
    forecast_ci = forecast.conf_int()
    forecast.predicted_mean.plot(style='b--', label='Out of Sample')
    plt.fill_between(forecast_ci.index, forecast_ci.ix[:, 0], forecast_ci.ix[:, 1], color='b', alpha=0.1)
    plt.legend(loc=0)
plot_pred(fit_1)



In [17]:

    
model_2 = GAS.build_model(Full, ar=2, sc=6, target='casos')
fit_2 = model_2.fit()



In [18]:

    
fit_2.summary()









    



PoissonGAS (2,0,6)                                                                                        
======================================================= ==================================================
Dependent Variable: casos                               Method: MLE                                       
Start Date: 2010-02-14 00:00:00                         Log Likelihood: -24732.9927                       
End Date: 2017-04-02 00:00:00                           AIC: 49483.9854                                   
Number of observations: 373                             BIC: 49519.2796                                   
==========================================================================================================
Latent Variable                          Estimate   Std Error  z        P>|z|    95% C.I.                 
======================================== ========== ========== ======== ======== =========================
Constant                                 3.6907     0.1458     25.3203  0.0      (3.405 | 3.9764)         
AR(1)                                    -0.0555    0.0074     -7.5339  0.0      (-0.0699 | -0.0411)      
AR(2)                                    0.5122     0.016      32.0709  0.0      (0.4809 | 0.5435)        
SC(1)                                    0.9594     0.0053     181.5683 0.0      (0.949 | 0.9697)         
SC(2)                                    1.1891     0.0085     140.5375 0.0      (1.1725 | 1.2057)        
SC(3)                                    0.7771     0.009      86.7366  0.0      (0.7596 | 0.7947)        
SC(4)                                    0.435      0.0112     38.6868  0.0      (0.4129 | 0.457)         
SC(5)                                    0.1172     0.006      19.5551  0.0      (0.1055 | 0.129)         
SC(6)                                    0.151      0.0045     33.6122  0.0      (0.1422 | 0.1598)        
==========================================================================================================



In [19]:

    
model_2.plot_fit()
plt.savefig('GAS_in_sample.png')
Full.casos.plot(style='ko')
model_2.plot_predict(h=10, past_values=52)



In [20]:

    
model_2.plot_z(figsize=(15,5))

Splitting the dataset for out-of-sample prediction validation



In [38]:

    
plt.figure()
ax = plt.gca()
train = Full.loc[Full.index<'2015-01-01']
model_3 = GAS.build_model(train, ar=2, sc=6, target='casos')
fit_3 = model_3.fit()
Full.casos.plot(style='ko', ax=ax, figsize=(15,10))
plt.hold(True)
model_3.plot_predict(h=10, past_values=20, ax=ax, intervals=True, figsize=(15,10))



In [34]:

    
model_4 = GASX.build_model(Full.dropna(), ar=4, sc=6, formula='casos~1+temp_min')



In [35]:

    
fit_4 = model_4.fit()



In [36]:

    
fit_4.summary()









    



Poisson GAS X(4,0,6)                                                                                      
======================================================= ==================================================
Dependent Variable: casos                               Method: MLE                                       
Start Date: 2012-09-16 00:00:00                         Log Likelihood: -134125.5737                      
End Date: 2016-07-31 00:00:00                           AIC: 268275.1473                                  
Number of observations: 202                             BIC: 268314.8465                                  
==========================================================================================================
Latent Variable                          Estimate   Std Error  z        P>|z|    95% C.I.                 
======================================== ========== ========== ======== ======== =========================
AR(1)                                    -0.2193    0.0037     -59.4285 0.0      (-0.2265 | -0.2121)      
AR(2)                                    0.4007     0.006      66.7359  0.0      (0.3889 | 0.4125)        
AR(3)                                    0.4395     0.003      145.3264 0.0      (0.4335 | 0.4454)        
AR(4)                                    0.2182     0.0062     35.2301  0.0      (0.2061 | 0.2303)        
SC(1)                                    -0.2516    0.0043     -58.8867 0.0      (-0.2599 | -0.2432)      
SC(2)                                    0.1075     0.0058     18.6135  0.0      (0.0962 | 0.1188)        
SC(3)                                    0.5602     0.0059     95.5509  0.0      (0.5487 | 0.5717)        
SC(4)                                    -0.0385    0.0009     -41.4265 0.0      (-0.0403 | -0.0367)      
SC(5)                                    0.5426     0.0065     83.2235  0.0      (0.5298 | 0.5554)        
SC(6)                                    0.1665     0.0049     33.6539  0.0      (0.1568 | 0.1762)        
Beta 1                                   0.724      0.0027     265.0288 0.0      (0.7187 | 0.7294)        
Beta temp_min                            0.0524     0.0003     208.9926 0.0      (0.0519 | 0.0529)        
==========================================================================================================



In [37]:

    
model_4.plot_fit()



In [26]:

    
model_4.plot_predict(h=10, past_values=15)









    



---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/usr/local/lib/python3.5/dist-packages/patsy/compat.py in call_and_wrap_exc(msg, origin, f, *args, **kwargs)
    116     try:
--> 117         return f(*args, **kwargs)
    118     except Exception as e:

/usr/local/lib/python3.5/dist-packages/patsy/eval.py in eval(self, expr, source_name, inner_namespace)
    165         return eval(code, {}, VarLookupDict([inner_namespace]
--> 166                                             + self._namespaces))
    167 

<string> in <module>()

/usr/local/lib/python3.5/dist-packages/patsy/eval.py in __getitem__(self, key)
     47             try:
---> 48                 return d[key]
     49             except KeyError:

/usr/local/lib/python3.5/dist-packages/patsy/eval.py in __getitem__(self, key)
     47             try:
---> 48                 return d[key]
     49             except KeyError:

TypeError: 'NoneType' object is not subscriptable

The above exception was the direct cause of the following exception:

PatsyError                                Traceback (most recent call last)
<ipython-input-26-b23f82bad089> in <module>()
----> 1 model_4.plot_predict(h=10, past_values=15)

/usr/local/lib/python3.5/dist-packages/pyflux/gas/gasx.py in plot_predict(self, h, past_values, intervals, oos_data, **kwargs)
    765             raise Exception("No latent variables estimated!")
    766         else:
--> 767             _, X_oos = dmatrices(self.formula, oos_data)
    768             X_oos = np.array([X_oos])[0]
    769             X_pred = X_oos[:h]

/usr/local/lib/python3.5/dist-packages/patsy/highlevel.py in dmatrices(formula_like, data, eval_env, NA_action, return_type)
    308     eval_env = EvalEnvironment.capture(eval_env, reference=1)
    309     (lhs, rhs) = _do_highlevel_design(formula_like, data, eval_env,
--> 310                                       NA_action, return_type)
    311     if lhs.shape[1] == 0:
    312         raise PatsyError("model is missing required outcome variables")

/usr/local/lib/python3.5/dist-packages/patsy/highlevel.py in _do_highlevel_design(formula_like, data, eval_env, NA_action, return_type)
    163         return iter([data])
    164     design_infos = _try_incr_builders(formula_like, data_iter_maker, eval_env,
--> 165                                       NA_action)
    166     if design_infos is not None:
    167         return build_design_matrices(design_infos, data,

/usr/local/lib/python3.5/dist-packages/patsy/highlevel.py in _try_incr_builders(formula_like, data_iter_maker, eval_env, NA_action)
     68                                       data_iter_maker,
     69                                       eval_env,
---> 70                                       NA_action)
     71     else:
     72         return None

/usr/local/lib/python3.5/dist-packages/patsy/build.py in design_matrix_builders(termlists, data_iter_maker, eval_env, NA_action)
    694                                                    factor_states,
    695                                                    data_iter_maker,
--> 696                                                    NA_action)
    697     # Now we need the factor infos, which encapsulate the knowledge of
    698     # how to turn any given factor into a chunk of data:

/usr/local/lib/python3.5/dist-packages/patsy/build.py in _examine_factor_types(factors, factor_states, data_iter_maker, NA_action)
    441     for data in data_iter_maker():
    442         for factor in list(examine_needed):
--> 443             value = factor.eval(factor_states[factor], data)
    444             if factor in cat_sniffers or guess_categorical(value):
    445                 if factor not in cat_sniffers:

/usr/local/lib/python3.5/dist-packages/patsy/eval.py in eval(self, memorize_state, data)
    564         return self._eval(memorize_state["eval_code"],
    565                           memorize_state,
--> 566                           data)
    567 
    568     __getstate__ = no_pickling

/usr/local/lib/python3.5/dist-packages/patsy/eval.py in _eval(self, code, memorize_state, data)
    549                                  memorize_state["eval_env"].eval,
    550                                  code,
--> 551                                  inner_namespace=inner_namespace)
    552 
    553     def memorize_chunk(self, state, which_pass, data):

/usr/local/lib/python3.5/dist-packages/patsy/compat.py in call_and_wrap_exc(msg, origin, f, *args, **kwargs)
    122                                  origin)
    123             # Use 'exec' to hide this syntax from the Python 2 parser:
--> 124             exec("raise new_exc from e")
    125         else:
    126             # In python 2, we just let the original exception escape -- better

/usr/local/lib/python3.5/dist-packages/patsy/compat.py in <module>()

PatsyError: Error evaluating factor: TypeError: 'NoneType' object is not subscriptable
    casos~1+temp_min
    ^^^^^

Looking at state-wide data



In [46]:

    
rio  = get_alerta_table(state='RJ')



In [47]:

    
rio.head()









    Out[47]:






  
    
      
      SE
      casos_est
      casos_est_min
      casos_est_max
      casos
      municipio_geocodigo
      p_rt1
      p_inc100k
      Localidade_id
      nivel
      versao_modelo
      municipio_nome
    
    
      data_iniSE
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      2010-01-03
      201001
      9.0
      9
      9
      9
      3300100
      0.0
      4.866440
      0
      1
      2017-01-25
      Angra dos Reis
    
    
      2010-01-03
      201001
      0.0
      0
      0
      0
      3300159
      0.0
      0.000000
      0
      1
      2017-01-25
      Aperibé
    
    
      2010-01-03
      201001
      1.0
      1
      1
      1
      3300209
      0.0
      0.826802
      0
      1
      2017-01-25
      Araruama
    
    
      2010-01-03
      201001
      0.0
      0
      0
      0
      3300225
      0.0
      0.000000
      0
      1
      2017-01-25
      Areal
    
    
      2010-01-03
      201001
      0.0
      0
      0
      0
      3300233
      0.0
      0.000000
      0
      1
      2017-01-25
      Armação dos Búzios

Let's keep only the columns we want to use



In [48]:

    
for col in ['casos_est_min', 'casos_est_max', 'Localidade_id', 'versao_modelo', 'municipio_nome']:
    del rio[col]



In [49]:

    
rio.head()









    Out[49]:






  
    
      
      SE
      casos_est
      casos
      municipio_geocodigo
      p_rt1
      p_inc100k
      nivel
    
    
      data_iniSE
      
      
      
      
      
      
      
    
  
  
    
      2010-01-03
      201001
      9.0
      9
      3300100
      0.0
      4.866440
      1
    
    
      2010-01-03
      201001
      0.0
      0
      3300159
      0.0
      0.000000
      1
    
    
      2010-01-03
      201001
      1.0
      1
      3300209
      0.0
      0.826802
      1
    
    
      2010-01-03
      201001
      0.0
      0
      3300225
      0.0
      0.000000
      1
    
    
      2010-01-03
      201001
      0.0
      0
      3300233
      0.0
      0.000000
      1

Converting dataframe from long format to wide format

The dataframe currently have all cities stacked on top of each other. In order to use this data in a predictive model, we need this table in wide format, that is, have only time along the rows and have cities variable listed as columns.



In [50]:

    
riopiv = rio.pivot(index=rio.index, columns='municipio_geocodigo')



In [51]:

    
riopiv.head()









    Out[51]:






  
    
      
      SE
      ...
      nivel
    
    
      municipio_geocodigo
      3300100
      3300159
      3300209
      3300225
      3300233
      3300258
      3300308
      3300407
      3300456
      3300506
      ...
      3305604
      3305703
      3305752
      3305802
      3305901
      3306008
      3306107
      3306156
      3306206
      3306305
    
    
      data_iniSE
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      2010-01-03
      201001
      201001
      201001
      201001
      201001
      201001
      201001
      201001
      201001
      201001
      ...
      1
      1
      1
      1
      1
      1
      1
      1
      1
      1
    
    
      2010-01-10
      201002
      201002
      201002
      201002
      201002
      201002
      201002
      201002
      201002
      201002
      ...
      1
      1
      1
      1
      1
      1
      1
      1
      1
      1
    
    
      2010-01-17
      201003
      201003
      201003
      201003
      201003
      201003
      201003
      201003
      201003
      201003
      ...
      1
      1
      1
      1
      1
      1
      1
      1
      1
      1
    
    
      2010-01-24
      201004
      201004
      201004
      201004
      201004
      201004
      201004
      201004
      201004
      201004
      ...
      1
      1
      4
      1
      1
      1
      1
      1
      1
      1
    
    
      2010-01-31
      201005
      201005
      201005
      201005
      201005
      201005
      201005
      201005
      201005
      201005
      ...
      1
      1
      4
      1
      1
      1
      1
      1
      1
      1
    
  

5 rows × 552 columns



In [52]:

    
riopiv['SE'].head()









    Out[52]:






  
    
      municipio_geocodigo
      3300100
      3300159
      3300209
      3300225
      3300233
      3300258
      3300308
      3300407
      3300456
      3300506
      ...
      3305604
      3305703
      3305752
      3305802
      3305901
      3306008
      3306107
      3306156
      3306206
      3306305
    
    
      data_iniSE
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      2010-01-03
      201001
      201001
      201001
      201001
      201001
      201001
      201001
      201001
      201001
      201001
      ...
      201001
      201001
      201001
      201001
      201001
      201001
      201001
      201001
      201001
      201001
    
    
      2010-01-10
      201002
      201002
      201002
      201002
      201002
      201002
      201002
      201002
      201002
      201002
      ...
      201002
      201002
      201002
      201002
      201002
      201002
      201002
      201002
      201002
      201002
    
    
      2010-01-17
      201003
      201003
      201003
      201003
      201003
      201003
      201003
      201003
      201003
      201003
      ...
      201003
      201003
      201003
      201003
      201003
      201003
      201003
      201003
      201003
      201003
    
    
      2010-01-24
      201004
      201004
      201004
      201004
      201004
      201004
      201004
      201004
      201004
      201004
      ...
      201004
      201004
      201004
      201004
      201004
      201004
      201004
      201004
      201004
      201004
    
    
      2010-01-31
      201005
      201005
      201005
      201005
      201005
      201005
      201005
      201005
      201005
      201005
      ...
      201005
      201005
      201005
      201005
      201005
      201005
      201005
      201005
      201005
      201005
    
  

5 rows × 92 columns

Now we have a multi-level column index. It may be preferable to flatten it.



In [53]:

    
riopiv.columns = ['{}_{}'.format(*col).strip() for col in riopiv.columns.values]
riopiv.head()









    Out[53]:






  
    
      
      SE_3300100
      SE_3300159
      SE_3300209
      SE_3300225
      SE_3300233
      SE_3300258
      SE_3300308
      SE_3300407
      SE_3300456
      SE_3300506
      ...
      nivel_3305604
      nivel_3305703
      nivel_3305752
      nivel_3305802
      nivel_3305901
      nivel_3306008
      nivel_3306107
      nivel_3306156
      nivel_3306206
      nivel_3306305
    
    
      data_iniSE
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      2010-01-03
      201001
      201001
      201001
      201001
      201001
      201001
      201001
      201001
      201001
      201001
      ...
      1
      1
      1
      1
      1
      1
      1
      1
      1
      1
    
    
      2010-01-10
      201002
      201002
      201002
      201002
      201002
      201002
      201002
      201002
      201002
      201002
      ...
      1
      1
      1
      1
      1
      1
      1
      1
      1
      1
    
    
      2010-01-17
      201003
      201003
      201003
      201003
      201003
      201003
      201003
      201003
      201003
      201003
      ...
      1
      1
      1
      1
      1
      1
      1
      1
      1
      1
    
    
      2010-01-24
      201004
      201004
      201004
      201004
      201004
      201004
      201004
      201004
      201004
      201004
      ...
      1
      1
      4
      1
      1
      1
      1
      1
      1
      1
    
    
      2010-01-31
      201005
      201005
      201005
      201005
      201005
      201005
      201005
      201005
      201005
      201005
      ...
      1
      1
      4
      1
      1
      1
      1
      1
      1
      1
    
  

5 rows × 552 columns



In [54]:

    
riopiv.shape









    Out[54]:





(368, 552)



In [ ]:

	SE	casos_est	casos_est_min	casos_est_max	casos	municipio_geocodigo	p_rt1	p_inc100k	Localidade_id	nivel	versao_modelo	municipio_nome
data_iniSE
2010-01-03	201001	30.0	30	30	30	3304557	0.0	0.461621	0	1	2017-10-18	Rio de Janeiro
2010-01-10	201002	46.0	46	46	46	3304557	0.0	0.707819	0	1	2017-10-18	Rio de Janeiro
2010-01-17	201003	30.0	30	30	30	3304557	0.0	0.461621	0	2	2017-10-18	Rio de Janeiro
2010-01-24	201004	51.0	51	51	51	3304557	0.0	0.784756	0	2	2017-10-18	Rio de Janeiro
2010-01-31	201005	58.0	58	58	58	3304557	0.0	0.892467	0	2	2017-10-18	Rio de Janeiro

	temp_min	temp_max	umid_min	pressao_min
data_dia
2010-01-01	24.0	33.0	40.0	1007.0
2010-01-02	24.0	33.0	42.0	1010.0
2010-01-03	25.0	33.0	44.0	1012.0
2010-01-04	25.0	32.0	50.0	1012.0
2010-01-05	24.0	33.0	56.0	1007.0

	numero	CID10_codigo
data_dia
2012-08-01	26	A90
2012-08-02	10	A90
2012-08-03	31	A90
2012-08-04	15	A90
2012-08-05	8	A90

	temp_min	temp_max	umid_min	pressao_min
data_dia
2017-09-24	NaN	NaN	NaN	NaN
2017-10-01	NaN	NaN	NaN	NaN
2017-10-08	NaN	NaN	NaN	NaN
2017-10-15	NaN	NaN	NaN	NaN
2017-10-22	NaN	NaN	NaN	NaN

	SE	casos_est	casos_est_min	casos_est_max	casos	municipio_geocodigo	p_rt1	p_inc100k	Localidade_id	nivel	versao_modelo	municipio_nome	temp_min	temp_max	umid_min	pressao_min	numero
data_iniSE
2012-08-05	201232	452.0	452	452	452	3304557	1.278890e-10	6.95509	0	1	2017-10-18	Rio de Janeiro	18.571429	28.000000	40.142857	886.857143	90.0
2012-08-12	201233	478.0	478	478	478	3304557	6.105840e-03	7.35516	0	1	2017-10-18	Rio de Janeiro	16.142857	26.714286	34.571429	1019.428571	83.0
2012-08-19	201234	377.0	377	377	377	3304557	5.351960e-05	5.80104	0	1	2017-10-18	Rio de Janeiro	17.571429	27.714286	39.857143	1023.000000	63.0
2012-08-26	201235	326.0	326	326	326	3304557	2.434650e-05	5.01628	0	1	2017-10-18	Rio de Janeiro	16.857143	28.142857	31.571429	1020.571429	57.0
2012-09-02	201236	211.0	211	211	211	3304557	1.869840e-12	3.24673	0	1	2017-10-18	Rio de Janeiro	17.714286	25.428571	40.142857	1017.857143	64.0

Dep. Variable:	casos	No. Observations:	379
Model:	SARIMAX(2, 1, 1)x(2, 1, 1, 8)	Log Likelihood	-2672.281
Date:	Fri, 14 Apr 2017	AIC	5358.561
Time:	16:30:59	BIC	5386.124
Sample:	01-03-2010	HQIC	5369.499
	- 04-02-2017
Covariance Type:	opg

	coef	std err	z	P>\|z\|	[0.025	0.975]
ar.L1	0.4349	0.082	5.306	0.000	0.274	0.596
ar.L2	0.2611	0.048	5.432	0.000	0.167	0.355
ma.L1	-0.2926	0.081	-3.618	0.000	-0.451	-0.134
ar.S.L8	-0.1916	0.037	-5.123	0.000	-0.265	-0.118
ar.S.L16	-0.0658	0.107	-0.616	0.538	-0.275	0.144
ma.S.L8	-0.9813	0.131	-7.473	0.000	-1.239	-0.724
sigma2	2.184e+05	2.44e+04	8.954	0.000	1.71e+05	2.66e+05

Ljung-Box (Q):	35.76	Jarque-Bera (JB):	3285.15
Prob(Q):	0.66	Prob(JB):	0.00
Heteroskedasticity (H):	0.15	Skew:	-1.28
Prob(H) (two-sided):	0.00	Kurtosis:	17.75

	SE										...	nivel
municipio_geocodigo	3300100	3300159	3300209	3300225	3300233	3300258	3300308	3300407	3300456	3300506	...	3305604	3305703	3305752	3305802	3305901	3306008	3306107	3306156	3306206	3306305
data_iniSE
2010-01-03	201001	201001	201001	201001	201001	201001	201001	201001	201001	201001	...	1	1	1	1	1	1	1	1	1	1
2010-01-10	201002	201002	201002	201002	201002	201002	201002	201002	201002	201002	...	1	1	1	1	1	1	1	1	1	1
2010-01-17	201003	201003	201003	201003	201003	201003	201003	201003	201003	201003	...	1	1	1	1	1	1	1	1	1	1
2010-01-24	201004	201004	201004	201004	201004	201004	201004	201004	201004	201004	...	1	1	4	1	1	1	1	1	1	1
2010-01-31	201005	201005	201005	201005	201005	201005	201005	201005	201005	201005	...	1	1	4	1	1	1	1	1	1	1