US Chronic Disease Indicators


Timothy Helton


The original dataset for this project is provided by the U.S. Centers for Disease Control and Prevention (CDC).

License: Open Database License (ODbL)



NOTE:
This notebook uses code found in the k2datascience.chronic_disease module. To execute all the cells do one of the following items:

  • Install the k2datascience package to the active Python interpreter.
  • Add k2datascience/k2datascience to the PYTHON_PATH system variable.
  • Create a link to the chronic_disease.py file in the same directory as this notebook.


Imports


In [1]:
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

from k2datascience import chronic_disease

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
%matplotlib inline

Load Data

Create CDI Instance


In [2]:
cdi = chronic_disease.CDI()
[print(x) for x in chronic_disease.data_dtype.keys()];


YearStart
YearEnd
LocationAbbr
LocationDesc
DataSource
Topic
Question
Response
DataValueUnit
DataValueTypeID
DataValueType
DataValue
DataValueAlt
DataValueFootnoteSymbol
DatavalueFootnote
LowConfidenceLimit
HighConfidenceLimit
StratificationCategory1
Stratification1
StratificationCategory2
Stratification2
StratificationCategory3
Stratification3
GeoLocation
TopicID
QuestionID
ResponseID
LocationID
StratificationCategoryID1
StratificationID1
StratificationCategoryID2
StratificationID2
StratificationCategoryID3
StratificationID3

In [3]:
cdi.load_data()
cdi.data.shape
cdi.data.ix[0]


Out[3]:
(237961, 34)
Out[3]:
yr_start                                               2013
yr_end                                                 2013
loc_abbr                                                 CA
loc_desc                                         California
data_src                                              YRBSS
topic                                               alcohol
question                            Alcohol use among youth
response                                                NaN
data_unit                                                 %
data_type_id                                        CrdPrev
data_type                                  Crude Prevalence
data_value                                              NaN
data_value_alt                                          NaN
footnote_symbol                                           -
footnote                                  No data available
low_conf                                                NaN
high_conf                                               NaN
strat_cat_1                                         Overall
strat_1                                             Overall
strat_cat_2                                             NaN
strat_2                                                 NaN
strat_cat_3                                             NaN
strat_3                                                 NaN
geo_loc            (37.63864012300047, -120.99999953799971)
topic_id                                                ALC
question_id                                          ALC1_1
response_id                                             NaN
loc_id                                                   06
strat_cat_1_id                                      OVERALL
strat_1_id                                              OVR
strat_cat_2_id                                          NaN
strat_2_id                                              NaN
strat_cat_3_id                                          NaN
strat_3_id                                              NaN
Name: 0, dtype: object

Diseases in Data


In [4]:
cdi.get_diseases()
cdi.diseases
cdi.plot_diseases()


Out[4]:
topic
alcohol                                            15436
arthritis                                          12620
asthma                                             18371
cancer                                              7001
cardiovascular disease                             45383
chronic kidney disease                              8193
chronic obstructive pulmonary disease              32704
diabetes                                           37226
disability                                            55
immunization                                        1850
mental health                                       2610
nutrition, physical activity, and weight status    14364
older adults                                        3289
oral health                                         5213
overarching conditions                             21569
reproductive health                                  815
tobacco                                            11262
Name: topic, dtype: int64