Motivation

The recent release of data on payments made by drug and medical device companies to physicians and teaching hospitals provides for a unique opportunity to develop insights on trends in the health care industry. For example, what classes of drugs are heavily promoted? in what states are companies investing the most on promoting their products? how does industry spending on health care providers equate to sales within a region? These are just some of the questions that I believe can be answered from these disclosures.

Data in this analysis

The analysis outlined here utilizes the General Payments data for payments made to physicians and teaching hospitals, provided through the Open Payments website hosted by CMS, and covering the period January 2014 to December 2014. In addition, the analysis also includes data on total retail sales for prescription drugs in 2014 provided by the Henry J. Kaiser family foundation, and tables on state population and physician numbers per state for 2014 published by the Federation of State Medical Board. The following is a summary of the tables.

General Payments Data:

10.78 million total number of records
2.52 billion USD in total value of disclosures
607,000 physicians receiving payments
1,442 companies that made payments
1,122 teaching hospitals receiving payments
5.3 G file size

Drug Sales Data:

259 billion USD in total value of sales
51 total number of records

Physican Data on Demographics:

318 million total US population
916,264 licensed physicians
51 states covered The tables are contained in a MySQL database

Analysis



In [21]:

    
import MySQLdb
import pandas as pd
import csv
import numpy as np
import scipy as sp
from sqlalchemy import create_engine
from IPython.display import display
import datetime as dt
import matplotlib.pyplot as plt
pd.set_option('max_columns', 50)
%matplotlib inline

import plotly.plotly as py
from plotly.graph_objs import Bar, Scatter, Marker, Layout
from plotly import __version__
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot

import colorlover as cl #colorscale
from IPython.display import HTML



In [22]:

    
init_notebook_mode() #use plotly offline



In [23]:

    
# Connect to database
conn = MySQLdb.connect(host="localhost", user="****", passwd="*****", 
                       unix_socket= "/var/run/mysqld/mysqld.sock", db="pharma")
cursor = conn.cursor()



In [24]:

    
'''Create a data frame from sql queries where we select for each state
the total dollar amount spent by industry in the state, the highest single
contribution, and the physician specialty associated with the highest
dollar contribution. Since we are only interested in payments made to providers
in mainland US (i.e., excluding territories, military bases), we make that requirement
explicit in the sql query'''
pay_df = pd.read_sql_query('SELECT DISTINCT Recipient_State, '
                           'ROUND(SUM(Total_Amount_of_Payment_USDollars)/1E6, 1) AS `Total_Dollar_Millions`, '
                           'MAX(Total_Amount_of_Payment_USDollars) AS `Max_Dollar`, '
                           'Physician_Specialty '
                           'FROM GeneralPay '
                           'WHERE Recipient_State IS NOT NULL '
                           'AND Recipient_Country = "United States" ' 
                           'AND Recipient_State IN ("AL", "AK", "AZ", "AR", "CA", "CO", '
                           '"CT", "DE", "DC", "FL", "GA", '
                           '"HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", '
                           '"ME", "MD", "MA", "MI", "MN", "MS", "MO", "MT", '
                           '"NE", "NV", "NH", "NJ", "NM", "NY", "NC", "ND", '
                           '"OH", "OK", "OR", "PA", "RI", "SC", "SD", "TN", '
                           '"TX", "UT", "VT", "VA", "WA", "WV", "WI", "WY") '
                           'GROUP BY Recipient_State '
                           'ORDER BY SUM(Total_Amount_of_Payment_USDollars)', conn)
pay_df = pay_df.sort_values(by= 'Recipient_State', ascending = True )
pay_df.head(4)









    Out[24]:






  
    
      
      Recipient_State
      Total_Dollar_Millions
      Max_Dollar
      Physician_Specialty
    
  
  
    
      0
      AK
      0.7
      159062
      Dental Providers/ Dentist/ General Practice
    
    
      25
      AL
      19.5
      327285
      Dental Providers/ Dentist/ Oral and Maxillofac...
    
    
      15
      AR
      7.9
      471456
      Dental Providers/ Dentist/ Oral and Maxillofac...
    
    
      42
      AZ
      78.8
      28361143
      Dental Providers/ Dentist/ Oral and Maxillofac...

Note: Here I am only showing 4 rows of the total 51 generated

Data visualization

We can take this data and create a chloropeth map where states that received the highest payments from industry will have a darker shade and vice-versa, therefore giving as visual representation of what states industry is spending most of their money in. I create the plot using plotly, a handy online graphing and visualization tool based on the D3.js JavaScript visualization library.



In [25]:

    
import plotly.graph_objs as go
go.Choropleth


color = cl.scales['6']['seq']['Oranges'] #select rgb color scale
colorbns = cl.interp( color, 20 ) #map color scale to 100 bins
colorbns = cl.to_rgb( colorbns ) #convert back to RGB

bin_array = np.linspace(0,1,len(colorbns)) # array of equal-sized bins 
bin_array.tolist()

scheme = [] #This will be a list of lists containing the bin and color scheme

for i in range(0, len(colorbns)):
    temp = []
    temp.append(bin_array[i])
    temp.append(colorbns[i])
    scheme.insert(i, temp)


data = [
        go.Choropleth(
            colorscale = scheme,
            autocolorscale = False,
            locations = pay_df['Recipient_State'],
            z = pay_df['Total_Dollar_Millions'], #value-to-color mapping
            locationmode = 'USA-states',
            text= pay_df['Recipient_State'],
            marker = dict(
                line = dict (
                    color = 'rgb(255,255,255)',
                    width = 2
                )
            ),
            colorbar = dict(          
                title = '<b>Industry spending in US$ (millions)</b>'
            ),            
             #color is picked from z value
            showscale = True, #Shows the colorbar (I will set attributes later)
        ),

]



layout = dict(           
              title="Health Care Industry Spending<br> on Physicians and Teaching Hospitals, 2014", 
              geo = dict(
                         scope = "usa",
                         projection = dict( type= "albers usa" ),#The Albers USA projection
                         showcoastlines = False,
                         showlakes = False,
              ),
)
    

figure = dict(data=data, layout=layout)
iplot(figure) #plot offline

What we find is that pharmaceutical companies and medical device manufacturers spent over 0.5 billion USD promoting their products with health care providers in the state of California. This is more than twice the second highest state, New York. While one can be tempted to draw conclusions from this map on the state of the health care market in different states, the plot does not convey all the information. For one, California is the most populous state and you would therefore expect the total dollar amount spent in that market to be comparably bigger than the other states. Also, the plot does not tell us how many physicians we have per unit of population in each state, something that would have an influence on the total industry spending in a particular state. Finally, in this analysis I have not excluded large teaching hospitals whose prescence in some states can help skew the data.

A more descriptive plot would have the total spending in each state normalized by the physicians per 100,000 of the population, and excluding the teaching hospitals in the state. To do this, it is necessary to join tables for data obtained from payments made to health care providers to tables on data about the number of physicians in each state, as well as the total population of each state at the time this report was compiled

Normalization of pharmaceutical and medical device spending per state to the number of physicians in the state, and the total population of the state.



In [26]:

    
'''From the General Pay table, select for each state the total dollar amount spent
 as well as the maximum single payment made to a health care provider for all states
 in the intercontinental US. Here, we exclude any payments made to any other entities
 i.e., teaching hospitals and only select for payments made to physicians'''

payphys_df = pd.read_sql_query('SELECT DISTINCT Recipient_State AS State_Code, '
                               'SUM(Total_Amount_of_Payment_USDollars) AS Total_Dollar, '
                               'MAX(Total_Amount_of_Payment_USDollars) AS Max_Dollar '                                              
                               'FROM GeneralPay '
                               'WHERE Recipient_State IS NOT NULL '
                               'AND Recipient_Country = "United States" '
                               'AND Covered_Recipient_Type = "Covered Recipient Physician" '
                               'AND Recipient_State IN ("AL", "AK", "AZ", "AR", "CA", "CO", '
                               '"CT", "DE", "DC", "FL", "GA", '
                               '"HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", '
                               '"ME", "MD", "MA", "MI", "MN", "MS", "MO", "MT", '
                               '"NE", "NV", "NH", "NJ", "NM", "NY", "NC", "ND", '
                               '"OH", "OK", "OR", "PA", "RI", "SC", "SD", "TN", '
                               '"TX", "UT", "VT", "VA", "WA", "WV", "WI", "WY") '
                               'GROUP BY Recipient_State '
                               'ORDER BY Recipient_State', conn)

#force the DF to index from 1 to match indexing in mysql (makes life easier when doing JOIN)
payphys_df.index = payphys_df.index + 1 
payphys_df.head()









    Out[26]:






  
    
      
      State_Code
      Total_Dollar
      Max_Dollar
    
  
  
    
      1
      AK
      712426
      159062
    
    
      2
      AL
      17805617
      327285
    
    
      3
      AR
      6634120
      135005
    
    
      4
      AZ
      76578934
      28361143
    
    
      5
      CA
      295743014
      41414329

This table shows for each state, the total dollar amount spent by industry as well as the highest dollar amount paid to a physician in that state in a single payment. To be able to use this data with population and physician data from other tables, I write the dataframe to a MySQL database



In [27]:

    
#write dataframe to MySQL database. Will then JOIN this table to several others
#payphys_df.to_sql(con=conn, name='PhysicianPay', if_exists='replace', flavor='mysql')

To extract further insight from the data, I JOIN the table on physician pay in each state, to a table on statistics on number of physicians in a state as well as the total population, and a table on drug sales in each state.



In [28]:

    
'''JOIN TABLES on Physician pay, the number of physicians in the state and the states
total population, and the drug sales, in total dollar amounts, in each state'''

payandsales_df = pd.read_sql_query('SELECT StateName.State, PhysicianPay.State_Code, '
                                   'ROUND(PhysicianPay.Total_Dollar/1E6, 1) '
                                   'AS Total_Dollar_Millions, PhysicianPay.Max_Dollar, '
                                   'ROUND(DrugSales.Total_Drug_Sales/1E9, 1) AS Drug_Sales_Billions, '
                                   'PhysicianCount.Total_Pop, PhysicianCount.Physician_per_100000, '
                                   'ROUND(PhysicianPay.Total_Dollar/Physician_per_100000, 1) '
                                   'AS Pay_per_PhysicianPer100000 '
                                   'FROM (StateName '
                                       'JOIN PhysicianPay '
                                           'ON StateName.State_Code = PhysicianPay.State_Code) '
                                                'JOIN DrugSales '
                                                   'ON (StateName.State = DrugSales.State) '
                                                       'JOIN PhysicianCount '
                                                           'ON (StateName.State = PhysicianCount.State)', conn)
                                            
payandsales_df.head(4)









    Out[28]:






  
    
      
      State
      State_Code
      Total_Dollar_Millions
      Max_Dollar
      Drug_Sales_Billions
      Total_Pop
      Physician_per_100000
      Pay_per_PhysicianPer100000
    
  
  
    
      0
      Alabama
      AL
      17.8
      327285
      5.2
      4849377
      331
      53793.4
    
    
      1
      Alaska
      AK
      0.7
      159062
      0.5
      736732
      514
      1386.0
    
    
      2
      Arizona
      AZ
      76.6
      28361143
      4.3
      6731484
      370
      206970.1
    
    
      3
      Arkansas
      AR
      6.6
      135005
      2.6
      2966369
      321
      20667.0

Some quick statistics on the data

With the additional features from joining the three tables, we can run some quick statistics to get a feel for the data



In [29]:

    
print(payandsales_df.describe())









    



       Total_Dollar_Millions       Max_Dollar  Drug_Sales_Billions  \
count              51.000000        51.000000            51.000000   
mean               39.378431   3372566.431373             5.080392   
std                53.690019   7918220.577706             5.606105   
min                 0.700000     50527.000000             0.500000   
25%                 5.850000    201414.500000             1.300000   
50%                18.300000    472946.000000             3.400000   
75%                50.150000   1635467.500000             5.850000   
max               295.700000  41414329.000000            29.700000   

       Pay_per_PhysicianPer100000  
count                   51.000000  
mean                104865.892157  
std                 149286.528199  
min                   1386.000000  
25%                  13410.450000  
50%                  51353.100000  
75%                 121030.950000  
max                 799305.400000

One thing I was curious about is how payments to physicians by drug companies and medical device manufacturers relate to drug sales. Would we see a correlation, an anticorrelation or no correlation at all? My null hypothesis is that there is no correlation between the two. To find out, I look at the Pearson correlation coefficient for the different columns.

Pearson Correlation Coefficients



In [30]:

    
print(payandsales_df.corr())









    



                            Total_Dollar_Millions  Max_Dollar  \
Total_Dollar_Millions                    1.000000    0.782909   
Max_Dollar                               0.782909    1.000000   
Drug_Sales_Billions                      0.967341    0.650062   
Pay_per_PhysicianPer100000               0.981374    0.748141   

                            Drug_Sales_Billions  Pay_per_PhysicianPer100000  
Total_Dollar_Millions                  0.967341                    0.981374  
Max_Dollar                             0.650062                    0.748141  
Drug_Sales_Billions                    1.000000                    0.952157  
Pay_per_PhysicianPer100000             0.952157                    1.000000

The total amount of dollars spent by industry (Total_Dollar_Millions) is strong correlated to the drug sales (Drug_Sales_Billions) with a correlation coefficient of 0.967. To accept or reject the null hypothesis we can run a t-test.

Running a t-test to determine the statistical significance



In [31]:

    
import scipy as sp
a = payandsales_df['Total_Dollar_Millions']
b = payandsales_df['Drug_Sales_Billions']
sp.stats.ttest_ind(a, b, axis=0, equal_var=True)









    Out[31]:





Ttest_indResult(statistic=4.5373903110244127, pvalue=1.5878870617766979e-05)

Giving as a p-value of 1.6e-05 which allows us to reject the null hypothesis. We can therefore see that the payment to physicians and the sales from retail precription drugs are very highly correlated, and that this correlation is statistically significant.

Re-plotting the total payments to health care providers, with the exclusion of teaching hospitals, and having normalized the payments to the number of physicians in each state per 100,000 of the population.



In [32]:

    
import plotly.graph_objs as go
go.Choropleth


color = cl.scales['6']['seq']['Reds']
colorbns = cl.interp( color, 20 )
colorbns = cl.to_rgb( colorbns )

bin_array = np.linspace(0,1,len(colorbns))
bin_array.tolist()

scheme = []

for i in range(0, len(colorbns)):
    temp = []
    temp.append(bin_array[i])
    temp.append(colorbns[i])
    scheme.insert(i, temp)


data = [
        go.Choropleth(
            colorscale = scheme,
            autocolorscale = False,
            locations = payandsales_df['State_Code'],
            z = payandsales_df['Pay_per_PhysicianPer100000'],
            locationmode = 'USA-states',
            text= payandsales_df['State_Code'],
            marker = dict(
                line = dict (
                    color = 'rgb(255,255,255)',
                    width = 2
                )
            ),
            colorbar = dict(          
                title = '<b>Industry spending in US$</b>'
            ),            
             #color is picked from z value
            showscale = True,
        ),

]



layout = dict(           
              title="Health Care Industry Spending on Physicians<br> "
                    "Normalized to Number of Physicians per 100000, 2014", 
              geo = dict(
                         scope = "usa",
                         projection = dict( type= "albers usa" ),
                         showcoastlines = False,
                         showlakes = False,
              ),
)
    

figure = dict(data=data, layout=layout)
iplot(figure) #plot offline



In [33]:

    
payperphy_df = payandsales_df.sort_values('Pay_per_PhysicianPer100000', axis=0, ascending=True, inplace=False)
import plotly.graph_objs as go
data = [
    go.Bar(
        x = payperphy_df['State'],
        y = payperphy_df['Pay_per_PhysicianPer100000'],
   
        hoverinfo = "all",
        marker=dict(
            color= '#04BCE0',
            line=dict(
                color='#013A45',
                width=1.5,
            )
        ),
    )
]

layout = dict(
    title = "Drug Companies Spending in US$ <br> Scaled to Number of Physicians Per 100000, 2014", 
    xaxis = dict(
        #title = 'Manufacturer',
        tickangle = 40
    ),
    yaxis = dict(
        title = 'Total Amount in USD per Capita'
    ),

)
    

figure = dict(data=data, layout=layout)
iplot(figure) #plot offline

With the data normalized to the number of physicians per 100,000, we find that California still saw the highest spending by industry in 2014, with the state of Texas coming in at second place.

Exploring the Data Further

What were the top marketed drugs in 2014?



In [34]:

    
'''Here, we select for each drug promoted to care providers
and rank them by the total dollar amount spent by their respective
manufacturers in promoting them to physicians and teaching hospitals.'''
pd.set_option('display.max_rows', 2000)
df = pd.read_sql_query('SELECT DISTINCT UCASE(Name_of_Associated_Covered_Drug_or_Biological1) AS Drug, '
                       'SUM(Total_Amount_of_Payment_USDollars) AS Total_Dollar, '
                       'Applicable_Manufacturer_or_Applicable_GPO_Making_Payment_Name AS Manufacturer '
                       'FROM GeneralPay '
                       'WHERE Name_of_Associated_Covered_Drug_or_Biological1 IS NOT NULL '
                       'AND Name_of_Associated_Covered_Drug_or_Biological2 IS NULL '
                       'AND Name_of_Associated_Covered_Device_or_Medical_Supply1 IS NULL '
                       'AND Name_of_Associated_Covered_Drug_or_Biological1 NOT IN '
                       '("No Product Discussed", "Non-Product", "AccurusVitEquip", "NONCOVERED PRODUCT", '
                       '"NON BRAND", "None", "Product Candidate", "Spine Instrumentation", '
                       '"Business Development Activity", "OTHER PRODUCT", '
                       '"Specimen Indentification", "NO PRODUCT", "NOT APPLICABLE", "ENT Medical Devices" '
                       '"NA", "BioSurgery - Non Prod Related", "Non-Covered-Product", ' 
                       '"Non Franchise - Non Prod Related" '
                       '"RefractiveEquipment", "No Associated Product", "Product Development", '
                       '"Non Franchise - Research & Develop") '
                       'AND Name_of_Associated_Covered_Drug_or_Biological1 NOT LIKE "%TAP%" '
                       'GROUP BY Name_of_Associated_Covered_Drug_or_Biological1 '
                       'ORDER BY Total_Dollar DESC LIMIT 20', conn)

df = df.sort_values('Total_Dollar', axis=0, ascending=True, inplace=False,kind='quicksort',na_position='last')
df.head(4)









    Out[34]:






  
    
      
      Drug
      Total_Dollar
      Manufacturer
    
  
  
    
      19
      BOTOX THERAPEUTIC
      8673624
      Allergan Inc.
    
    
      18
      COPAXONE
      8734827
      Teva Pharmaceuticals USA, Inc.
    
    
      17
      BELVIQ
      9064101
      Eisai Inc.
    
    
      16
      PERJETA
      9262371
      Genentech, Inc.

Ranking the top 20 marketed drugs in 2014 by dollar value spent on health care providers



In [35]:

    
import plotly.graph_objs as go
data = [
    go.Bar(
        x = df['Drug'],
        y = df['Total_Dollar'],
   
        hoverinfo = "all",
        marker=dict(
            color= '#F05513',
            line=dict(
                color='#662002',
                width=1.5,
            )
        ),
    )
]

layout = dict(
    title = "Top 20 Most Promoted Drugs By Dollar Value, 2014", 
    xaxis = dict(
        #title = 'Drug Name',
    ),
    yaxis = dict(
        title = 'Total Amount in USD'
    ),

)
    

figure = dict(data=data, layout=layout)
iplot(figure) #plot offline

Of these drugs, five of them -- Humira, Rituxan, Avastin, Herceptin, and Copaxone, were also among the top 20 best-selling drugs of 2014 with combined sales of over 39 billion dollars. Interestingly, the first three highly promoted drugs, Rituxan, Avastin and Herceptin, are cancer drugs marketed by Roche pharmaceutical division (Rituxan is co-marketed by Biogen Idec too) with total combined sales in 2014 of 22.4 billion USD.

What about the drug and medical device manufacturers? How do the different companies rank in terms of total dollar amount spent on physicians and teaching hospitals while promoting their products?



In [36]:

    
'''Selection of drug and device manufacturers ranked by 
total dollar amount spent on physicians and teaching hospitals'''
pd.set_option('display.max_rows', 2000)
df = pd.read_sql_query('SELECT DISTINCT ' 
                       'Applicable_Manufacturer_or_Applicable_GPO_Making_Payment_Name AS Manufacturer, '
                       'SUM(Total_Amount_of_Payment_USDollars) AS Total_Dollar '
                       'FROM GeneralPay '
                       'WHERE Applicable_Manufacturer_or_Applicable_GPO_Making_Payment_Name IS NOT NULL '
                       'GROUP BY Applicable_Manufacturer_or_Applicable_GPO_Making_Payment_Name '
                       'ORDER BY Total_Dollar DESC LIMIT 20', conn)

df = df.sort_values('Total_Dollar', axis=0, ascending=True, inplace=False,kind='quicksort',na_position='last')
df.head(4)









    Out[36]:






  
    
      
      Manufacturer
      Total_Dollar
    
  
  
    
      19
      Intuitive Surgical, Inc.
      28024245
    
    
      18
      Gilead Sciences Inc
      31393905
    
    
      17
      Medtronic USA, Inc.
      34539664
    
    
      16
      GlaxoSmithKline, LLC.
      36038864



In [37]:

    
import plotly.graph_objs as go
data = [
    go.Bar(
        x = df['Manufacturer'],
        y = df['Total_Dollar'],
   
        hoverinfo = "all",
        marker=dict(
            color= '#04BCE0',
            line=dict(
                color='#013A45',
                width=1.5,
            )
        ),
    )
]

layout = dict(
    title = "Top 20 Drug and Medical Device Manufacturers<br> "
            "By Dollars Spent on Physicians and Teaching Hospitals, 2014", 
    xaxis = dict(
        #title = 'Manufacturer',
    ),
    yaxis = dict(
        title = 'Total Amount in USD'
    ),

)
    

figure = dict(data=data, layout=layout)
iplot(figure) #plot offline

Unsurprisingly, Genentech, Inc. which is now a subsidiary of Roche spent almost three times as much on health care providers as its closest rival, Topera, Inc. Topera, Inc. is a recent startup that has developed technology to map the electric signals of the heart.

How about retail sales of prescription drugs in each state?



In [38]:

    
pd.set_option('display.max_rows', 2000)
sales_df = pd.read_sql_query('SELECT DISTINCT State, ROUND(Total_Drug_Sales/1e9, 2) AS Total_Sales_Billion '
                             'FROM DrugSales '
                             'ORDER BY Total_Sales_Billion ASC',conn) 
sales_df.head()









    Out[38]:






  
    
      
      State
      Total_Sales_Billion
    
  
  
    
      0
      Wyoming
      0.45
    
    
      1
      Alaska
      0.47
    
    
      2
      District of Columbia
      0.61
    
    
      3
      Montana
      0.68
    
    
      4
      North Dakota
      0.69



In [39]:

    
import plotly.graph_objs as go
data = [
    go.Bar(
        x = sales_df['State'],
        y = sales_df['Total_Sales_Billion'],
   
        hoverinfo = "all",
        marker=dict(
            color= '#04BCE0',
            line=dict(
                color='#013A45',
                width=1.5,
            )
        ),
    )
]

layout = dict(
    title = "Retail Sales for Prescription Drugs<br>Filled at Pharmacies in US$ (Billions), 2014", 
    xaxis = dict(
        #title = 'Manufacturer',
    tickangle = 40
    ),
    yaxis = dict(
        title = 'Total Amount in USD (Billions)'
    ),

)
    

figure = dict(data=data, layout=layout)
iplot(figure) #plot offline

Here again, California, the most populous state, saw the highest sales in total dollar amount of prescription drugs. A better way to look at these numbers though would be to scale them by the population in each state so that we can get an idea of the spending in prescription drugs per capita.

Spending in retail prescription drugs per capita in each state



In [40]:

    
salespercap_df = pd.read_sql_query('SELECT DrugSales.State, '
                                   'ROUND(DrugSales.Total_Drug_Sales/PhysicianCount.Total_Pop, 1) '
                                   'AS SalesPerCapita '
                                   'FROM PhysicianCount '
                                   'JOIN DrugSales '
                                   'ON PhysicianCount.State = DrugSales.State '
                                   'ORDER BY SalesPerCapita', conn)
salespercap_df.head()









    Out[40]:






  
    
      
      State
      SalesPerCapita
    
  
  
    
      0
      New Mexico
      469.0
    
    
      1
      Utah
      603.1
    
    
      2
      Colorado
      613.7
    
    
      3
      Arizona
      635.6
    
    
      4
      Alaska
      636.7



In [41]:

    
import plotly.graph_objs as go
data = [
    go.Bar(
        x = salespercap_df['State'],
        y = salespercap_df['SalesPerCapita'],
   
        hoverinfo = "all",
        marker=dict(
            color= '#037b93',
            line=dict(
                color='#013b46',
                width=1.5,
            )
        ),
    )
]

layout = dict(
    title = "Retail Sales for Prescription Drugs<br>Filled at Pharmacies Per Capita in US$, 2014", 
    xaxis = dict(
        #title = 'Manufacturer',
    tickangle = 40
    ),
    yaxis = dict(
        title = 'Total Amount in USD per Capita'
    ),

)
    

figure = dict(data=data, layout=layout)
iplot(figure) #plot offline

Here we now find that the sales of precription drugs per capita were highest in West Virginia, meaning that in West Virginia, on average, each resident spent about 1,287 USD on prescription medicine in 2014. That is 11% more than the next highest state, Maine, and 64% more than the state of New Mexico. This numbers could be indicative of the cost of prescription drugs in each state. The disparity in spending per state is interesting as is shown by the following statistics on the data where we see a standard deviation of USD 163.



In [42]:

    
#Statistics on the disitribution of drug sales in the different states
print(salespercap_df.describe())









    



       SalesPerCapita
count       51.000000
mean       831.947059
std        163.151829
min        469.000000
25%        725.700000
50%        813.100000
75%        915.300000
max       1287.300000

Teaching Hospitals

A ranking of the spending by industry on teaching hospitals



In [43]:

    
pd.set_option('display.max_rows', 2000)
teaching_df = pd.read_sql_query('SELECT DISTINCT UCASE(Teaching_Hospital_Name) AS Teaching_Hospital, '
                                'SUM(Total_Amount_of_Payment_USDollars) AS Total_Dollar, '
                                'MAX(Total_Amount_of_Payment_USDollars) AS Max_Dollar ' 
                                'FROM GeneralPay '
                                'WHERE Teaching_Hospital_Name IS NOT NULL '
                                'AND Covered_Recipient_Type = "Covered Recipient Teaching Hospital" '
                                'GROUP BY Teaching_Hospital_Name '
                                'ORDER BY Total_Dollar DESC LIMIT 20', conn)

teachin_df = teaching_df.sort_values('Total_Dollar', axis=0, ascending=False,
                                     inplace=False,kind='quicksort',na_position='last')
teaching_df.head()









    Out[43]:






  
    
      
      Teaching_Hospital
      Total_Dollar
      Max_Dollar
    
  
  
    
      0
      CITY OF HOPE NATIONAL MEDICAL CENTER
      251199365
      11646800
    
    
      1
      MASSACHUSETTS GENERAL HOSPITAL
      33799060
      5029420
    
    
      2
      THE UNITY HOSPITAL OF ROCHESTER
      20958502
      12196417
    
    
      3
      DENVER HEALTH MEDICAL CENTER
      12929518
      250000
    
    
      4
      CLEVELAND CLINIC HOSPITAL
      12876299
      1218462



In [44]:

    
pd.set_option('display.max_rows', 2000)
max_df = pd.read_sql_query('SELECT DISTINCT Applicable_Manufacturer_or_Applicable_GPO_Making_Payment_Name '
                           'AS Manufacturer, '
                           'Teaching_Hospital_Name AS Teaching_Hospital, '
                           'MAX(Total_Amount_of_Payment_USDollars) AS Max_Dollar '
                           'FROM GeneralPay '
                           'WHERE Teaching_Hospital_Name = "CITY OF HOPE NATIONAL MEDICAL CENTER" '
                           'GROUP BY Applicable_Manufacturer_or_Applicable_GPO_Making_Payment_Name '
                           'ORDER BY Max_Dollar DESC', conn)
max_df.head()









    Out[44]:






  
    
      
      Manufacturer
      Teaching_Hospital
      Max_Dollar
    
  
  
    
      0
      Genentech, Inc.
      CITY OF HOPE NATIONAL MEDICAL CENTER
      11646800
    
    
      1
      Abbott Laboratories
      CITY OF HOPE NATIONAL MEDICAL CENTER
      34809
    
    
      2
      NeuWave Medical, Inc.
      CITY OF HOPE NATIONAL MEDICAL CENTER
      30720
    
    
      3
      MRI Interventions, Inc.
      City of Hope National Medical Center
      25803
    
    
      4
      ACCURAY INCORPORATED
      CITY OF HOPE NATIONAL MEDICAL CENTER
      22286



In [45]:

    
import plotly.graph_objs as go
data = [
    go.Bar(
        x = teaching_df['Teaching_Hospital'],
        y = teaching_df['Total_Dollar'],
   
        hoverinfo = "all",
        marker=dict(
            color= '#A830E6',
            line=dict(
                color='#240236',
                width=1.5,
            )
        ),
    )
]

layout = dict(
    title = "Ranking of Teaching Hospitals by Amount of Dollars Recieved"
             "<br>From Drug and Medical Device Manufacturers, 2014", 
    xaxis = dict(
        #title = 'Manufacturer',
    ),
    yaxis = dict(
        title = 'Total Amount in USD'
    ),

)
    

figure = dict(data=data, layout=layout)
iplot(figure) #plot offline

The rankings of the different teaching hospitals show that the City of Hope National Medical Center, a major cancer treatment center, obtained more than USD 250 million from the pharmaceutical and medical device industry, with the highest single payment made being a USD 11.6 million contribution by Genentech, Inc.

	Recipient_State	Total_Dollar_Millions	Max_Dollar	Physician_Specialty
0	AK	0.7	159062	Dental Providers/ Dentist/ General Practice
25	AL	19.5	327285	Dental Providers/ Dentist/ Oral and Maxillofac...
15	AR	7.9	471456	Dental Providers/ Dentist/ Oral and Maxillofac...
42	AZ	78.8	28361143	Dental Providers/ Dentist/ Oral and Maxillofac...

	State_Code	Total_Dollar	Max_Dollar
1	AK	712426	159062
2	AL	17805617	327285
3	AR	6634120	135005
4	AZ	76578934	28361143
5	CA	295743014	41414329

	State	State_Code	Total_Dollar_Millions	Max_Dollar	Drug_Sales_Billions	Total_Pop	Physician_per_100000	Pay_per_PhysicianPer100000
0	Alabama	AL	17.8	327285	5.2	4849377	331	53793.4
1	Alaska	AK	0.7	159062	0.5	736732	514	1386.0
2	Arizona	AZ	76.6	28361143	4.3	6731484	370	206970.1
3	Arkansas	AR	6.6	135005	2.6	2966369	321	20667.0

	Drug	Total_Dollar	Manufacturer
19	BOTOX THERAPEUTIC	8673624	Allergan Inc.
18	COPAXONE	8734827	Teva Pharmaceuticals USA, Inc.
17	BELVIQ	9064101	Eisai Inc.
16	PERJETA	9262371	Genentech, Inc.

	Manufacturer	Total_Dollar
19	Intuitive Surgical, Inc.	28024245
18	Gilead Sciences Inc	31393905
17	Medtronic USA, Inc.	34539664
16	GlaxoSmithKline, LLC.	36038864

	State	Total_Sales_Billion
0	Wyoming	0.45
1	Alaska	0.47
2	District of Columbia	0.61
3	Montana	0.68
4	North Dakota	0.69

	State	SalesPerCapita
0	New Mexico	469.0
1	Utah	603.1
2	Colorado	613.7
3	Arizona	635.6
4	Alaska	636.7

	Teaching_Hospital	Total_Dollar	Max_Dollar
0	CITY OF HOPE NATIONAL MEDICAL CENTER	251199365	11646800
1	MASSACHUSETTS GENERAL HOSPITAL	33799060	5029420
2	THE UNITY HOSPITAL OF ROCHESTER	20958502	12196417
3	DENVER HEALTH MEDICAL CENTER	12929518	250000
4	CLEVELAND CLINIC HOSPITAL	12876299	1218462