The datasets used in this project are available from 'The Cancer Genome Atlas' (TCGA) consortium. Clinical and RNA-seq gene count data sets can be downloaded via a provided data portal, or using an R package called TCGA2STAT. The R script used to download the data sets for this project and write files locally in feather format are available in the project repository here. Datasets are also stored in this project's repository here. The R script is not necessary for dataset download, but can be run locally to obtain the most up to date clinical information from TCGA.
In [1]:
%run -i Dataset_cleanup/arrange_DF.py
There is missing clinical data in many of the features, including what will become the outcome label ('pathologyNstage' - metastasis state). The series 'y_all' is the full list of pathologyNstage, where 'n1' represents metastasis, and 'n0' represents no metastasis observed to date. Some observations have no metastasis state recorded and are represented by NaN in y_all. These are removed for the trimmed 'y' series.
In [2]:
import matplotlib.pyplot as plt
%matplotlib inline
In [3]:
D = (('n0', y_all[y_all =='n0'].shape[0]),
('n1', y_all[y_all =='n1'].shape[0]),
('missing' , y_all[y_all.isnull()].shape[0]))
exp_fig = plt.figure(figsize=(5 , 5))
A = exp_fig.add_subplot(1,1,1)
ind = A.bar(range(0,3), height= [D[0][1], D[1][1], D[2][1]], align='center', color = 'grey')
A.set_xticks(range(0,3))
A.set_ylabel('Frequency', fontsize=16)
A.set_xlabel('Metastasis state', fontsize=16)
A.set_ylim(0,400)
A.set_xticklabels([D[0][0], D[1][0], D[2][0]]) #Must be a better way to do this as tuples (D, in this case) are not ordered?
ep = plt.show
exp_fig.savefig('Figures/LabelCount.png')
In [4]:
clinical['gleasonscore'] = pd.to_numeric(clinical['gleasonscore'], errors= 'coerce')
clinical['gleasonscore'].isnull().any() #checks if there are any values in gleason score that are null; False means all are numeric
Out[4]:
In [5]:
GS_hist = plt.figure(figsize=(5,5))
A = GS_hist.add_subplot(1,1,1)
sub = (('n1', ))
subs = [clinical.loc[y_all[y_all == 'n0'].index]['gleasonscore'],
clinical.loc[y_all[y_all == 'n1'].index]['gleasonscore'],
clinical.loc[y_all[y_all.isnull()].index]['gleasonscore']]
bins = [5.99, 6.99, 7.99, 8.99, 9.99, 10.99]
A.hist(subs, bins, label =['n0', 'n1', 'unknown'], color = ['blue','red','grey'], stacked = True)
A.set_ylim(0,275)
A.set_ylabel('Count', fontsize=16)
A.set_xlabel('Gleason Score', fontsize =16)
A.legend()
plt.show
GS_hist.savefig('Figures/GleasonHist.png')
It appears that no metastases are recorded in those specimens graded at a Gleason score of 6. This represents an opportunity to replace the missing data label with the most likely pathology state, n0, to more efficiently use the small data set.
In [6]:
"""Define the indices where gleasonscore == 6 and pathologyNstage is null"""
set(y_all[y_all.isnull()].index).intersection(list(clinical[clinical['gleasonscore'] == 6].index))
Out[6]:
In [7]:
"""Set all samples with Gleason grade of 6 and label is missing to 'no'. """
y_all.loc[set(y_all[y_all.isnull()].index).intersection(list(clinical[clinical['gleasonscore'] == 6].index))] = 'n0'
In [8]:
GS_hist = plt.figure(figsize=(5,5))
A = GS_hist.add_subplot(1,1,1)
sub = (('n1', ))
subs = [clinical.loc[y_all[y_all == 'n0'].index]['gleasonscore'],
clinical.loc[y_all[y_all == 'n1'].index]['gleasonscore'],
clinical.loc[y_all[y_all.isnull()].index]['gleasonscore']]
bins = [5.99, 6.99, 7.99, 8.99, 9.99, 10.99]
A.hist(subs, bins, label =['n0', 'n1', 'unknown'], color = ['blue','red','grey'], stacked = True)
A.set_ylim(0,275)
A.set_ylabel('Count', fontsize=16)
A.set_xlabel('Gleason Score', fontsize =16)
A.legend()
plt.show
GS_hist.savefig('Figures/GleasonHist2.png')
The starting dataset, 'X_all', includes the transformed transcript per million (TPM) estimates for all RNA-seq profiles. However some of the observations in this set do not have corresponding y_labels, as the clinical data set contains missing information.
Therefore X_all was trimmed to include only those observations where a finite y label exists, to yield 'X'.
In [9]:
print("Total observations in original dataset:",clinical.shape[0])
not_labeled = y_all[y_all.isnull()]
y = y_all[y_all.notnull()]
print("\nObservations with metastasis label:",y.shape[0])
print("Unlabeled observations (removed:)",not_labeled.shape[0])
In [10]:
"""Limit X to only observations where a target label is present."""
X = X_all.loc[set(y.index).intersection(X_all.index)] #Only observations that also have a known metastasis state are kept.
y = y.loc[set(X.index).intersection(y.index)]
print("X dimensions:",X.shape,"\ny dimensions:",y.shape)
In [11]:
X_no_y = X_all.loc[list(not_labeled.index)]
print("Dimensions of unlabeled dataset:",X_no_y.shape)
In [12]:
"""Check to determine if any clinical information is missing"""
X.isnull().values.any() #Are any values in the entire gene count data set null?
Out[12]:
In [13]:
gleason = clinical['gleasonscore']
#print(gleason.shape)
age = pd.to_numeric(clinical['yearstobirth'], errors = 'coerce')
age.fillna(value = np.mean(age), inplace=True)
psa = pd.to_numeric(clinical['psavalue'], errors= 'coerce')
psa.fillna(value = np.mean(psa), inplace =True)
#gleason = gleason.loc[y.index]
clinicalDF_all = pd.DataFrame({'gleason': gleason,
'age':age ,
'psa' : psa,
'y' : y_all}, index=X_all.index)
clinicalDF = clinicalDF_all.loc[y.index, :]
In [14]:
clinicalDF.drop(['y'], axis=1, inplace = True)
In [15]:
clinicalDF = clinicalDF.reindex(y.index)
In [16]:
sm = pd.scatter_matrix(clinicalDF,
alpha=0.25,
figsize= (10,10),
diagonal = 'kde',
c = y.replace({'n1': 'red','n0':'blue'}),
s = 125)
plt.savefig("Figures/ClinScatterMatrix.png")
In [17]:
"""Pilot experiment to see if any genes are activated/expressed differently in the 'n0' and 'n1 classes"""
from sklearn.feature_selection import SelectKBest, f_classif
K_selector = SelectKBest(f_classif, k = 'all') # using k=all here and will filter based on F-stat later.
K = K_selector.fit(X, y)
K_df = pd.DataFrame({'Gene':X.columns, 'F_score':K.scores_})
K_df.dropna(inplace=True)
In [18]:
"""Figure shows the same data twice, just with a linear and a log y-axis for clarity"""
F_fig = plt.figure(figsize=(15,5))
A = F_fig.add_subplot(1,2,1)
A.hist(K_df['F_score'], bins=range(0,45,1), facecolor='green')
#A.set_yscale('log')
A.set_ylabel('Count', fontsize=16)
A.set_xlabel("F-statistic", fontsize=16)
F_fig.suptitle("Distribution of F-statistics for metatastasis state group comparison of gene expression", fontsize = 18)
B = F_fig.add_subplot(1,2,2)
B.hist(K_df['F_score'], bins=range(0,45,1), facecolor='green')
B.set_yscale('log')
B.set_ylabel('Count', fontsize=16)
B.set_xlabel("F-statistic", fontsize=16)
#B.set_title("Distribution of F-statistics for metatastasis state group comparison of gene expression", fontsize = 18)
plt.show()
F_fig.savefig("Figures/FDist.png")
In [19]:
from sklearn.metrics import matthews_corrcoef, fbeta_score, classification_report, log_loss
from sklearn.model_selection import cross_val_score
In [20]:
"""Scorers needed throughout code:"""
from sklearn.metrics import make_scorer
LL_scorer = make_scorer(log_loss, greater_is_better=False, needs_proba=True, needs_threshold=False)
MCC_scorer = make_scorer(matthews_corrcoef, greater_is_better=True, needs_proba=False, needs_threshold=False)
fbeta_scorer = make_scorer(fbeta_score, greater_is_better=True, needs_proba=False, needs_threshold=False, pos_label='n1', beta = 2)
In [21]:
from sklearn.linear_model import LogisticRegression, LogisticRegressionCV
In [22]:
seed = 123
folds = 4
In [23]:
import matplotlib.patches as mpatches
blue = mpatches.Patch(color ='blue', label ='n0')
red = mpatches.Patch(color='red', label = 'n1')
In [24]:
from sklearn.model_selection import train_test_split
train_k = round(len(y)*0.70)
test_k = len(y) - train_k
In [25]:
"""Split samples into train and test sets, stratified by Gleason score"""
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size= test_k,
train_size = train_k,
random_state = seed,
stratify = clinicalDF['gleason'])
In [26]:
"""Push the train and test indices to split the clinical dataframe in the same way"""
clinicalDF_train = clinicalDF.loc[X_train.index,:]
clinicalDF_test = clinicalDF.loc[X_test.index,:]
In [27]:
clf_LR_clin = LogisticRegression(penalty='l2',
dual=False,
tol=0.0001,
C=1,
fit_intercept=True,
intercept_scaling=1,
class_weight='balanced',
random_state=123,
solver='liblinear',
max_iter=100,
multi_class='ovr',
verbose=0,
warm_start=False,
n_jobs=1)
clf_LR_clin.fit(clinicalDF_train, y_train) #use training set for model learning
Out[27]:
In [28]:
clf_LR_clin = LogisticRegression(penalty='l2',
dual=False,
tol=0.0001,
C=1,
fit_intercept=True,
intercept_scaling=1,
class_weight='balanced',
random_state=123,
solver='liblinear',
max_iter=100,
multi_class='ovr',
verbose=0,
warm_start=False,
n_jobs=1)
clf_LR_clin.fit(clinicalDF_train, y_train) #use training set for model learning
bench_fig = plt.figure(figsize=(15,5))
A= bench_fig.add_subplot(1,2,1)
B = bench_fig.add_subplot(1,2,2)
A.scatter(clinicalDF_train.loc[:,'gleason'],
clf_LR_clin.predict_proba(clinicalDF_train)[:,1],
color = y_train.replace({'n1':'red', 'n0': 'blue', 'NaN':'grey'}),
alpha = 0.3,
s = 25)
A.set_ylabel('Benchmark model probability of metastasis')
A.set_ylim(0,1)
B.set_ylim(0,1)
A.set_xlabel('Gleason')
plt.suptitle('Prognosis of metastasis risk based on features available at presentation', fontsize=20)
A.set_title('Training Set')
B.set_title('Validation Set')
B.scatter(clinicalDF_test.loc[:,'gleason'],
clf_LR_clin.predict_proba(clinicalDF_test)[:,1],
color = y_test.replace({'n1':'red', 'n0': 'blue', 'NaN':'grey'}),
alpha = 0.5,
s = 25)
B.set_xlabel('Gleason')
B.set_ylabel('Benchmark model probability of metastasis')
bench_fig.legend(labels = ['n0', 'n1'], handles = [blue, red], loc = 5)
bench_fig.suptitle("Benchmark Model", fontsize=20)
plt.show
bench_fig.savefig('Figures/benchmark.png')
In [29]:
"""The coefficents for each feature in the benchmark model"""
print(pd.DataFrame({'features':clinicalDF.columns, 'coefficient': clf_LR_clin.coef_[0]}))
In [30]:
"Benchmark Performance on Train dataset"
print(clf_LR_clin)
print('\nNull F beta: ', fbeta_score(y_train, clf_LR_clin.predict(clinicalDF_train), pos_label='n1',beta=2))
print('\nMCC: ',matthews_corrcoef(y_train, clf_LR_clin.predict(clinicalDF_train)),"\n")
print(classification_report(y_train, clf_LR_clin.predict(clinicalDF_train), labels = ['n0','n1']))
print('\nLogLoss: ', log_loss(y_train.replace({'n0':0, 'n1':1}),
clf_LR_clin.predict_proba(clinicalDF_train)[:,1]))
training_stat = pd.DataFrame({'F2': fbeta_score(y_train, clf_LR_clin.predict(clinicalDF_train), pos_label='n1',beta=2),
'MCC': matthews_corrcoef(y_train, clf_LR_clin.predict(clinicalDF_train)),
'LogLoss':log_loss(y_train.replace({'n0':0, 'n1':1}),
clf_LR_clin.predict_proba(clinicalDF_train)[:,1])}, index = ['clinical_only'])
In [31]:
"Benchmark Performance on Test dataset"
print(clf_LR_clin)
print('\nNull F beta: ', fbeta_score(y_test, clf_LR_clin.predict(clinicalDF_test), pos_label='n1',beta=2))
print('\nMCC: ',matthews_corrcoef(y_test, clf_LR_clin.predict(clinicalDF_test)),"\n")
print(classification_report(y_test, clf_LR_clin.predict(clinicalDF_test), labels = ['n0','n1']))
print('\nLogLoss: ', log_loss(y_test.replace({'n0':0, 'n1':1}),
clf_LR_clin.predict_proba(clinicalDF_test)[:,1]))
metric_stat = pd.DataFrame({'F2': fbeta_score(y_test, clf_LR_clin.predict(clinicalDF_test), pos_label='n1',beta=2),
'MCC': matthews_corrcoef(y_test, clf_LR_clin.predict(clinicalDF_test)),
'LogLoss':log_loss(y_test.replace({'n0':0, 'n1':1}),
clf_LR_clin.predict_proba(clinicalDF_test)[:,1])}, index = ['clinical_only'])
In [32]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
In [33]:
clf_RF = RandomForestClassifier(n_estimators=1000,
criterion='gini',
max_depth=None,
min_samples_split=2,
min_samples_leaf=1,
min_weight_fraction_leaf=0.0,
max_features= 'auto',
max_leaf_nodes=None,
bootstrap=True,
oob_score=False,
n_jobs=1,
random_state=seed,
verbose=0,
warm_start=False,
class_weight='balanced')
In [34]:
clf_RF.fit(X, y)
Gini_DF = pd.DataFrame({'Gini' :clf_RF.feature_importances_}, index=X.columns).sort_values(by = ['Gini'], axis = 0, ascending = False)
print(Gini_DF.head())
In [35]:
"""Set k number of genes to retain"""
k = 75
genes = Gini_DF.index[0:k] # gene list from first instance
for x in [5, 54, 543, 5432, 54321] :
clf_RF = RandomForestClassifier(n_estimators=1000,
criterion='gini',
max_depth=None,
min_samples_split=2,
min_samples_leaf=1,
min_weight_fraction_leaf=0.0,
max_features= 'auto',
max_leaf_nodes=None,
bootstrap=True,
oob_score=False,
n_jobs=1,
random_state=x,
verbose=0,
warm_start=False,
class_weight='balanced')
clf_RF.fit(X, y)
Gini_DF = pd.DataFrame({'Gini' :clf_RF.featureimportances}, index=X.columns).sort_values(by = ['Gini'], axis = 0, ascending = False)
genes = set(genes).intersection(Gini_DF.index[0:k])
print('Number of Consistent (Stable) Genes:',len(genes))
In [36]:
"""The genes most commonly determined in the slow iterative process above are listed below. By commenting out the cycle above
and just covering up the `genes` variable here, the code runs faster. For confirmation, one may turn the code type of the
cell above to 'code' instead of 'markdown' and run the process to check the subset below."""
genes = ['HIST1H3J',
'EDN3',
'FKBP15',
'FOXS1',
'GNE',
'HMGB2',
'MAP6D1',
'SNAPIN',
'SPAG1',
'UBE2D2',
'ZNRF3',
'DPM1',
'VPS36']
In [37]:
"""Subset X to include only the k best features from the Gini Importance table"""
X_k = X.loc[:,genes]
In [38]:
fig = plt.figure(figsize=(12,8))
for i in range(0,len(X_k.columns),1) :
ax = fig.add_subplot(np.ceil(len(X_k.columns)/5),5,i+1)
df0 = list(X_k.ix[y.loc[y=='n0'].index,i])
df1 = list(X_k.ix[y.loc[y=='n1'].index,i])
data = [df0,df1]
A = ax.boxplot(data, positions = [1,2], vert = False, widths = 0.5, patch_artist=True)
plt.setp(A['boxes'][1], color='red')
plt.setp(A['boxes'][0], color='blue')
plt.setp(A['medians'], color = 'black')
plt.setp(A['whiskers'], color = 'black')
plt.setp(A['fliers'], color = 'black', marker = '.', markersize = 3)
ax.yaxis.set_major_formatter(plt.NullFormatter())
ax.set_xlabel(X_k.columns[i], fontsize=12)
ax.set_xscale('log')
fig.tight_layout()
fig.legend(labels = ['n0', 'n1'], handles= [blue,red], ncol = 2, loc = 'lower center', bbox_to_anchor = (.945, 1.05))
fig.text(-.005, 0.5, 'Metastasis state', ha='center', va='center', fontsize=24,rotation='vertical')
fig.text(.5, -.025, 'Transcripts Per Million (TPM)', ha='center', va='center', fontsize=24)
fig.savefig('Figures/boxplots.png', dpi = 300, bbox_inches='tight')
plt.show()
In [39]:
scaler = StandardScaler() #mean = 0, variance = 1.
Xs = pd.DataFrame(scaler.fit_transform(X_k), columns = X_k.columns, index = X_k.index)
In [40]:
components = 3
In [41]:
pca = PCA(n_components = components, whiten=False)
pca.fit(Xs)
Out[41]:
In [42]:
Xpc = pd.DataFrame(pca.transform(Xs), columns = range(0,components,1), index = X.index)
In [43]:
%run -i 'support files/renders_PCA.py' #modified from MLE Nanodegree Project 3
pca_results(Xs, pca)
plt.savefig('Figures/pcaEV.png', dpi =500)
In [44]:
sm = pd.scatter_matrix(Xpc,
alpha=0.5,
figsize= (10,10),
diagonal = 'kde',
c = y.replace({'n1': 'red','n0':'blue'}),
s = 50)
plt.savefig("Figures/pcaScatter.png", dpi = 300)
Notice how the first principle component (labeled, 0) seems to distinguish the non-metastatic (blue) and metastatic samples (red).
In [45]:
"""Cover up X_train and X_test with the new feature set of 3 PCs"""
X_train, X_test = Xpc.loc[X_train.index, :], Xpc.loc[X_test.index, :]
In [46]:
clf_LR = LogisticRegression(penalty='l2',
dual=False,
tol=0.0001,
C=1,
fit_intercept=True,
intercept_scaling=1,
class_weight='balanced',
random_state=123,
solver='liblinear',
max_iter=100,
multi_class='ovr',
verbose=0,
warm_start=False,
n_jobs=1)
clf_LR.fit(X_train, y_train)
Out[46]:
In [47]:
%run -i 'Support Files/capstone_graphs.py'
fig.savefig('Figures/PC3.png')
In [48]:
run_name = '3_PCs'
%run -i 'Support Files/capstone_performance.py'
In [49]:
clf_LR = LogisticRegressionCV(Cs=10,
fit_intercept=True,
cv=folds,
dual=False,
penalty='l2',
scoring='log_loss',
solver='liblinear',
tol=0.0001,
max_iter=100,
class_weight='balanced',
n_jobs=1,
verbose=0,
refit=True,
intercept_scaling=1.0,
multi_class='ovr',
random_state=seed)
clf_LR.fit(X_train, y_train)
Out[49]:
In [50]:
print("The optimal C term: ",clf_LR.C_)
In [51]:
%run -i 'Support Files/capstone_graphs.py'
fig.savefig('Figures/optPC3.png')
In [52]:
run_name = '3_PCs_optimized'
%run -i 'Support Files/capstone_performance.py'
In [53]:
X_train.loc[:,'gleason'] = clinicalDF.loc[X_train.index,'gleason']
X_test.loc[:,'gleason'] = clinicalDF.loc[X_test.index, 'gleason']
In [54]:
clf_LR = LogisticRegressionCV(Cs=10,
fit_intercept=True,
cv=folds,
dual=False,
penalty='l2',
scoring='log_loss',
solver='liblinear',
tol=0.0001,
max_iter=100,
class_weight='balanced',
n_jobs=1,
verbose=0,
refit=True,
intercept_scaling=1.0,
multi_class='ovr',
random_state=seed)
clf_LR.fit(X_train, y_train)
Out[54]:
In [55]:
%run -i 'Support Files/capstone_graphs.py'
plt.savefig('Figures/PC3Gleason.png')
In [56]:
sm = pd.scatter_matrix(X_train,
alpha=0.5,
figsize= (12,12),
diagonal = 'kde',
c = y_train.replace({'n1': 'red','n0':'blue'}),
s = 50)
plt.savefig('Figures/FeatureSM.png')
In [57]:
run_name = 'Final_Model'
%run -i 'Support Files/capstone_performance.py'
In [58]:
"""The coefficient of each feature incorporated into the final model"""
pd.DataFrame({'Component': X_train.columns, 'coef': clf_LR.coef_[0]})
Out[58]:
In [59]:
%run -i 'Support Files/final_graph.py'
fig.savefig('Figures/FF.png')
In [60]:
def risk_function(barcode, DF, clinDF, genes, scaler, fit_pca, fit_clf) :
self = DF.loc[barcode,genes].reshape(1,-1)
self = pd.DataFrame(scaler.transform(self))
self = pd.DataFrame(fit_pca.transform(self), index = [barcode])
self['gleason'] = clinDF.loc[barcode,'gleasonscore']
return(fit_clf.predict_proba(self)[0,1])
In [61]:
"""Put the non-labeled samples through the risk function"""
risk = []
for x in not_labeled.index :
risk.append(risk_function(barcode = x,
DF = X_all,
clinDF = clinical,
genes = genes,
fit_pca = pca,
scaler = scaler,
fit_clf = clf_LR))
model_risk = pd.Series(risk, index=not_labeled.index)
In [62]:
clinicalDF_all.drop(['y'], axis=1, inplace = True)
In [63]:
bench_risk = pd.Series(clf_LR_clin.predict_proba(clinicalDF_all.loc[not_labeled.index,:])[:,1])
In [64]:
n0_question = X_all.loc[y[y!='n1'].index, :]
In [65]:
"""Put the non-metastatic samples through the risk function as a functional test"""
risk = []
for x in n0_question.index :
risk.append(risk_function(barcode = x,
DF = n0_question,
clinDF = clinical,
genes = genes,
fit_pca = pca,
scaler = scaler,
fit_clf = clf_LR))
n0_risk = pd.Series(risk, index=n0_question.index)
In [66]:
%run -i 'Support Files/free-form.py'
fig.savefig('Figures/missing.png')
In [67]:
%run -i 'Support Files/free-form2.py'
fig.savefig('Figures/N0Analysis.png')
It is important to remember that the dataset utilized in this project is longitudinal and active, and indeed some cases listed as 'n0' will eventually be diagnosed as 'n1'. Even though the model was trained on such false-negative labeled cases, some of the 'n0' class are still predicted as highly likely to metastasize. If the dataset were linked somehow to patient records (for ethical purposes, no such link exists), then these patients should be monitored with extreme care for signs of metastasis based on the predictions made by the project model.
In [68]:
n0_risk[n0_risk > 0.75].sort_values(ascending = False)
Out[68]:
In [69]:
benign = feather.read_dataframe('feather_files/benign.feather')
In [70]:
benign.set_index(['benign_index'], inplace = True)
In [71]:
benign.shape
Out[71]:
In [72]:
benign_tpm = transformation(benign)
In [73]:
"""Put the benign samples through the risk function"""
risk = []
for x in benign.index :
risk.append(risk_function(barcode = x,
DF = benign,
clinDF = clinical,
genes = genes,
fit_pca = pca,
scaler = scaler,
fit_clf = clf_LR))
benign_risk = pd.Series(risk, index=benign.index)
In [74]:
fig = plt.figure(figsize=(8,4))
A = fig.add_subplot(1,1,1)
A.hist(benign_risk.sort_values(ascending=False),
bins=12,
facecolor='blue')
#A.set_yscale('log')
A.set_ylabel('Count', fontsize=16)
A.set_xlim(0,1)
A.set_xlabel("Model Probability of Metastasis", fontsize=16)
fig.suptitle("Distribution of metastasis probability for benign samples", fontsize = 14)
plt.show()
fig.savefig("Figures/Sensitivity.png")
Sensitivity Analysis experiment shows that specimens from benign regions of the prostate (even when borrowing Gleason scores from an orthogonal malignancy) do not register as high risk for metastasis.
In [ ]: