Initial Analysis For Nischay's Data


In [1]:
import sys
sys.path.append("../..")

In [9]:
import devahp as devahp
import all_user_nish_excel as nishinput
import numpy as np
import pandas as pd
from ahptree import AhpTree
from plotly.offline import plot
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot
import plotly.graph_objs as go
init_notebook_mode()


Let's read in Nish's data


In [3]:
nish_data = nishinput.from_nish_excel("nish_data.xlsx")


['bleeding', 'sore throat', 'gp', 'qol', 'breath']
[[3, 0, 1], [4, 2, 3], [5, 4, 0], [6, 1, 3], [7, 1, 2], [8, 4, 1], [9, 2, 0], [10, 2, 4], [11, 0, 3], [12, 3, 4]]
['L1', 'L2', 'L4', 'L5', 'L6', 'L7', 'L8', 'L9', 'L10', 'T1', 'T2', 'T3', 'T4', 'T5', 'T6', 'T7', 'T8', 'B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B8', 'B9', 'B10', 'B11', 'B12', 'B13', 'B14', 'B15', 'B16', 'B17', 'B18', 'B19', 'B20', 'B21', 'B22', 'F1', 'F2', 'F3', 'F4', 'F5', 'F6', 'F7', 'F8', 'F9', 'F10', 'F11', 'F12', 'F13', 'F14', 'F15', 'F16', 'P01', 'P02', 'P3', 'P4', 'P5', 'P6', 'P7', 'P8', 'P9', 'P10', 'P11', 'P12', 'P13', 'P14', 'P15', 'M01', 'M02', 'M03', 'M04', 'M05', 'M06', 'M07', 'M08', 'M09', 'M10', 'M11', 'M12', 'M13', 'M14', 'M15', 'M16', 'M17', 'M18', 'M19', 'M20', 'M21', 'M22', 'M23', 'M24', 'M25', 'M26', 'M27', 'M28', 'M29', 'M30', 'O1', 'O2', 'O3', 'O4', 'I1', 'I2', 'I3', 'I4', 'I5', 'I6', 'I7', 'I8', 'I9', 'I10', 'I11', 'I12', 'I13', 'K1', 'K2', 'K3', 'K4', 'K5', 'K6', 'K7', 'K8', 'K9', 'K10', 'H1', 'H2', 'H3', 'H4', 'H5', 'H6', 'H7', 'D1', 'D2', 'D3', 'D4', 'D5', 'D6', 'D7', 'D8', 'A1', 'A2', 'A3', 'A4', 'A5', 'A6', 'A7', 'A8', 'A9', 'A10', 'A11', 'A12', 'A13', 'A14', 'A15', 'A16', 'A17', 'A18', 'A19', 'A20', 'A21', 'A22', 'A23', 'A24', 'A25', 'A26', 'A27', 'A28', 'G01', 'G02', 'G03', 'G04', 'G05', 'G06']
[5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180]

Now setup the AHP Tree


In [7]:
ahp = AhpTree(pw=nish_data)
ahp.add_alt("do")
ahp.add_alt("not")
ahp.get_node(["bleeding"]).set_alt_scores([0.9, 0.1])
ahp.get_node(["sore throat"]).set_alt_scores([1, .1])
ahp.get_node(["gp"]).set_alt_scores([1., 5])
ahp.get_node(["qol"]).set_alt_scores([2., 4])
ahp.get_node(["breath"]).set_alt_scores([1.,1])

Let's calculate the do/not scores for user L1


In [8]:
ahp.synthesize(user='L1')


Out[8]:
array([ 0.167075  ,  0.07915487])

This means the user 'L1' prefers do vs not by a factor of 2+

We will want to do demographics

So I will read in the raw data as a Pandas dataframe, so we can find user groups


In [15]:
demographics = pd.read_excel('nish_data.xlsx', sheetname='Weight', skiprows=3)
demographics.head()


Out[15]:
HOSPITAL Pat ID Bleeding v sore throat GP V QoL breath v bleeding Sore throat V QoL Sore throat v GP breath v sore throat GP v Bleeding GP v Breath ... matrix weight and pref total row B matrix weight and pref total row C matrix weight and pref total row D matrix weight and pref total row E Consistency index Random index Consistency ratio Tonsillectomy Watchful waiting Treatment
0 POOLE L1 8 0.111 0.11 0.11 6 0.11 0.11 8 ... 6.869181 5.284844 9.257423 5.393433 0.522593 1.12 0.466600 0.373443 0.626557 1
1 POOLE L2 1 0.142857 9 0.111111 0.111111 9 9 0.111111 ... 5.219058 5.349557 8.861565 8.806957 0.422810 1.12 0.377509 0.575567 0.424433 1
2 POOLE L4 0.111111 0.111111 9 9 9 0.333333 9 3 ... 8.330220 6.436680 12.014965 6.039831 0.740721 1.12 0.661358 0.691808 0.308192 1
3 POOLE L5 0.142857 0.142857 5 0.142857 7 0.142857 7 7 ... 8.380267 6.494774 8.003413 5.195272 0.421394 1.12 0.376245 0.644661 0.355339 1
4 POOLE L6 0.142857 0.142857 0.142857 1 5 0.142857 1 5 ... 5.831693 5.467970 5.755880 5.062369 0.124396 1.12 0.111068 0.638011 0.361989 1

5 rows × 33 columns

Let's look at all users with treatment = 2


In [29]:
treatment2users = demographics['Pat ID'][demographics['Treatment '] == 2]

In [39]:
treatment2usersResults = [ahp.synthesize(user) for user in treatment2users]
#Turn into a dataframe to make it pretty
dos = [result[0] for result in treatment2usersResults]
donts = [result[1] for result in treatment2usersResults]
treatment2df=pd.DataFrame({'Do':dos, 'Do Not':donts}, index=treatment2users)
treatment2df.head()


Out[39]:
Do Do Not
Pat ID
L10 0.162798 0.082053
T1 0.140000 0.117436
T3 0.146801 0.130695
T5 0.172501 0.148933
T6 0.183555 0.140195

I'm interested if any treatment 2 users prefered 'Do Not'


In [42]:
preferDoNot = treatment2df['Do'] < treatment2df['Do Not']
treatment2df.loc[preferDoNot,:]


Out[42]:
Do Do Not
Pat ID
B4 0.128169 0.176778
B19 0.145913 0.165310
F1 0.136939 0.162148
F8 0.132631 0.139826
M09 0.160723 0.169342
O4 0.134298 0.139666
I6 0.108913 0.185541
D1 0.128677 0.186021

Okay, there are a few, let's look at the other way around


In [43]:
preferDo = treatment2df['Do'] >= treatment2df['Do Not']
treatment2df.loc[preferDo,:]


Out[43]:
Do Do Not
Pat ID
L10 0.162798 0.082053
T1 0.140000 0.117436
T3 0.146801 0.130695
T5 0.172501 0.148933
T6 0.183555 0.140195
B2 0.167537 0.078107
B6 0.149351 0.115477
B7 0.170723 0.169547
B21 0.174946 0.109518
B22 0.168632 0.076368
P13 0.161826 0.148234
M05 0.145944 0.120917
M26 0.136920 0.134281
H4 0.180639 0.089174
D8 0.155432 0.115748
A3 0.147801 0.123338
A4 0.150619 0.107785
A8 0.168745 0.099578
A28 0.158382 0.082531
G01 0.148000 0.128444

In [ ]: