UNC HiSeq mRNAseq gene expression (RSEM)

The goal of this notebook is to introduce you to the mRNAseq gene expression BigQuery table.

This table contains all available TCGA Level-3 gene expression data produced by UNC's RNAseqV2 pipeline using the Illumina HiSeq platform, as of July 2016. The most recent archive (eg unc.edu_BRCA.IlluminaHiSeq_RNASeqV2.Level_3.1.11.0) for each of the 33 tumor types was downloaded from the DCC, and data extracted from all files matching the pattern %.rsem.genes.normalized_results. Each of these raw “RSEM genes normalized results” files has two columns: gene_id and normalized_count. The gene_id string contains two parts: the gene symbol, and the Entrez gene ID, separated by | eg: TP53|7157. During ETL, the gene_id string is split and the gene symbol is stored in the original_gene_symbol field, and the Entrez gene ID is stored in the gene_id field. In addition, the Entrez ID is used to look up the current HGNC approved gene symbol, which is stored in the HGNC_gene_sybmol field.

In order to work with BigQuery, you need to import the python bigquery module (gcp.bigquery) and you need to know the name(s) of the table(s) you are going to be working with:



In [1]:

    
import gcp.bigquery as bq
mRNAseq_BQtable = bq.Table('isb-cgc:tcga_201607_beta.mRNA_UNC_HiSeq_RSEM')

From now on, we will refer to this table using this variable ($mRNAseq_BQtable), but we could just as well explicitly give the table name each time.

Let's start by taking a look at the table schema:



In [2]:

    
%bigquery schema --table $mRNAseq_BQtable









    Out[2]:

Now let's count up the number of unique patients, samples and aliquots mentioned in this table. We will do this by defining a very simple parameterized query. (Note that when using a variable for the table name in the FROM clause, you should not also use the square brackets that you usually would if you were specifying the table name as a string.)



In [3]:

    
%%sql --module count_unique

DEFINE QUERY q1
SELECT COUNT (DISTINCT $f, 25000) AS n
FROM $t



In [4]:

    
fieldList = ['ParticipantBarcode', 'SampleBarcode', 'AliquotBarcode']
for aField in fieldList:
  field = mRNAseq_BQtable.schema[aField]
  rdf = bq.Query(count_unique.q1,t=mRNAseq_BQtable,f=field).results().to_dataframe()
  print " There are %6d unique values in the field %s. " % ( rdf.iloc[0]['n'], aField)









    



 There are   9530 unique values in the field ParticipantBarcode. 
 There are  10289 unique values in the field SampleBarcode. 
 There are  10291 unique values in the field AliquotBarcode.

We can do the same thing to look at how many unique gene symbols and gene ids exist in the table:



In [5]:

    
fieldList = ['original_gene_symbol', 'HGNC_gene_symbol', 'gene_id']
for aField in fieldList:
  field = mRNAseq_BQtable.schema[aField]
  rdf = bq.Query(count_unique.q1,t=mRNAseq_BQtable,f=field).results().to_dataframe()
  print " There are %6d unique values in the field %s. " % ( rdf.iloc[0]['n'], aField)









    



 There are  20501 unique values in the field original_gene_symbol. 
 There are  20182 unique values in the field HGNC_gene_symbol. 
 There are  20531 unique values in the field gene_id.

Based on the counts, we can see that there are a few instances where the original gene symbol (from the underlying TCGA data file), or the HGNC gene symbol or the gene id (also from the original TCGA data file) is missing, but for the majority of genes, all three values should be available and for the most part the original gene symbol and the HGNC gene symbol that was added during ETL should all match up. This next query will generate the complete list of genes for which none of the identifiers are null, and where the original gene symbol and the HGNC gene symbol match. This list has over 18000 genes in it.



In [6]:

    
%%sql

SELECT
  HGNC_gene_symbol,
  original_gene_symbol,
  gene_id
FROM
  $mRNAseq_BQtable
WHERE
  ( original_gene_symbol IS NOT NULL
    AND HGNC_gene_symbol IS NOT NULL
    AND original_gene_symbol=HGNC_gene_symbol
    AND gene_id IS NOT NULL )
GROUP BY
  original_gene_symbol,
  HGNC_gene_symbol,
  gene_id
ORDER BY
  HGNC_gene_symbol









    Out[6]:





    HGNC_gene_symbol original_gene_symbol gene_id
A1BG A1BG 1
A1CF A1CF 29974
A2M A2M 2
A2ML1 A2ML1 144568
A4GALT A4GALT 53947
A4GNT A4GNT 51146
AAAS AAAS 8086
AACS AACS 65985
AADAC AADAC 13
AADACL2 AADACL2 344752
AADACL3 AADACL3 126767
AADACL4 AADACL4 343066
AADAT AADAT 51166
AAGAB AAGAB 79719
AAK1 AAK1 22848
AAMP AAMP 14
AANAT AANAT 15
AARS AARS 16
AARS2 AARS2 57505
AARSD1 AARSD1 80755
AASDH AASDH 132949
AASDHPPT AASDHPPT 60496
AASS AASS 10157
AATF AATF 26574
AATK AATK 9625
    
(rows: 18153, time: 25.5s,     4GB processed, job: job_q9uLYjnsTKOT4eFAyWM1uu2VVPw)

We might also want to know how often the gene symbols do not agree:



In [7]:

    
%%sql

SELECT
  HGNC_gene_symbol,
  original_gene_symbol,
  gene_id
FROM
  $mRNAseq_BQtable
WHERE
  ( original_gene_symbol IS NOT NULL
    AND HGNC_gene_symbol IS NOT NULL
    AND original_gene_symbol!=HGNC_gene_symbol
    AND gene_id IS NOT NULL )
GROUP BY
  original_gene_symbol,
  HGNC_gene_symbol,
  gene_id
ORDER BY
  HGNC_gene_symbol









    Out[7]:





    HGNC_gene_symbol original_gene_symbol gene_id
A1BG-AS1 NCRNA00181 503538
A2M-AS1 LOC144571 144571
AACSP1 AACSL 729522
AADACP1 LOC201651 201651
AAED1 C9orf21 195827
AAMDC C11orf67 28971
AAR2 C20orf4 25980
AARD C8orf85 441376
AATBC LOC284837 284837
AATK-AS1 LOC388428 388428
ABHD11-AS1 WBSCR26 171022
ABHD16A BAT5 7920
ABHD16B C20orf135 140701
ABHD17A FAM108A1 81926
ABHD17B FAM108B1 51104
ABHD17C FAM108C1 58489
ABHD18 C4orf29 80167
ABRACL C6orf115 58527
ACKR1 DARC 2532
ACKR2 CCBP2 1238
ACKR3 CXCR7 57007
ACKR4 CCRL1 51554
ACSM6 C10orf129 142827
ACTG1P4 LOC648740 648740
ACTL10 C20orf134 170487
    
(rows: 2013, time: 6.1s,     4GB processed, job: job_Qogec9zEdrWGOVbaaDX_G3EioLM)

BigQuery is not just a "look-up" service -- you can also use it to perform calculations. In this next query, we take a look at the mean, standard deviation, and coefficient of variation for the expression of EGFR, within each tumor-type, as well as the number of primary tumor samples that went into each summary statistic.



In [8]:

    
%%sql

SELECT
  Study,
  n,
  exp_mean,
  exp_sigma,
  (exp_sigma/exp_mean) AS exp_cv
FROM (
  SELECT
    Study,
    AVG(LOG2(normalized_count+1)) AS exp_mean,
    STDDEV_POP(LOG2(normalized_count+1)) AS exp_sigma,
    COUNT(AliquotBarcode) AS n
  FROM
    $mRNAseq_BQtable
  WHERE
    ( SampleTypeLetterCode="TP"
      AND HGNC_gene_symbol="EGFR" )
  GROUP BY
    Study )
ORDER BY
  exp_sigma DESC









    Out[8]:





    Study n exp_mean exp_sigma exp_cv
GBM 156 11.8977935121 2.37898710407 0.199951957617
SKCM 104 5.41189297221 2.3134143602 0.427468608873
BRCA 1095 7.3316523775 2.0186060124 0.275327567165
PCPG 179 5.21363387261 1.93511911652 0.371165134301
BLCA 408 9.26258869139 1.84003491022 0.19865233916
CESC 304 9.615363578 1.74670970515 0.181658206783
THYM 120 8.61408054406 1.74340125417 0.202389708948
LGG 516 11.1471994947 1.72222308803 0.154498274553
ESCA 184 11.6700063461 1.63953898439 0.140491696042
SARC 259 8.58641554296 1.59104767886 0.185298238933
UCEC 176 7.44476326481 1.5896445786 0.213525201818
LUSC 502 10.9294654726 1.54971809843 0.141792670677
TGCT 150 6.98336711449 1.50763881878 0.215889955958
KICH 66 9.48002047141 1.49846283983 0.158065359073
LUAD 515 9.89511374203 1.49825691249 0.151413814086
HNSC 520 11.3907589542 1.48880096929 0.130702526081
DLBC 48 4.41936166206 1.47405528888 0.333544842355
CHOL 36 9.19645454465 1.46185657708 0.158958712836
UCS 57 8.09853127117 1.44858302758 0.178869844306
ACC 79 7.7442559231 1.44367693285 0.186419063004
MESO 87 9.55197752348 1.43369934675 0.1500945059
STAD 415 10.2198608533 1.39368443653 0.136370196869
LIHC 371 9.60580817296 1.3719289882 0.142822859201
OV 305 8.53829302419 1.19815155796 0.140326825814
KIRP 290 9.36140534142 1.19513023402 0.127665685913
    
(rows: 32, time: 2.3s,    11GB processed, job: job_uEvr7cSfInKOsx2ykx_7KH24Uuo)

We can also easily move the gene-symbol out of the WHERE clause and into the SELECT and GROUP BY clauses and have BigQuery do this same calculation over all genes and all tumor types. This time we will use the --module option to define the query and then call it in the next cell from python.



In [9]:

    
%%sql --module highVar

SELECT
  Study,
  HGNC_gene_symbol,
  n,
  exp_mean,
  exp_sigma,
  (exp_sigma/exp_mean) AS exp_cv
FROM (
  SELECT
    Study,
    HGNC_gene_symbol,
    AVG(LOG2(normalized_count+1)) AS exp_mean,
    STDDEV_POP(LOG2(normalized_count+1)) AS exp_sigma,
    COUNT(AliquotBarcode) AS n
  FROM
    $t
  WHERE
    ( SampleTypeLetterCode="TP" )
  GROUP BY
    Study,
    HGNC_gene_symbol )
ORDER BY
  exp_sigma DESC

Once we have defined a query, we can put it into a python object and print out the SQL statement to make sure it looks as expected:



In [10]:

    
q = bq.Query(highVar,t=mRNAseq_BQtable)
print q.sql









    



SELECT
  Study,
  HGNC_gene_symbol,
  n,
  exp_mean,
  exp_sigma,
  (exp_sigma/exp_mean) AS exp_cv
FROM (
  SELECT
    Study,
    HGNC_gene_symbol,
    AVG(LOG2(normalized_count+1)) AS exp_mean,
    STDDEV_POP(LOG2(normalized_count+1)) AS exp_sigma,
    COUNT(AliquotBarcode) AS n
  FROM
    [isb-cgc:tcga_201607_beta.mRNA_UNC_HiSeq_RSEM]
  WHERE
    ( SampleTypeLetterCode="TP" )
  GROUP BY
    Study,
    HGNC_gene_symbol )
ORDER BY
  exp_sigma DESC

And then we can run it and save the results in another python object:



In [11]:

    
r = bq.Query(highVar,t=mRNAseq_BQtable).results()



In [12]:

    
#r.to_dataframe()

Since the result of the previous query is quite large (over 600,000 rows representing ~20,000 genes x ~30 tumor types), we might want to put those results into one or more subsequent queries that further refine these results, for example:



In [13]:

    
%%sql --module hv_genes

SELECT *
FROM ( $hv_result )
HAVING
  ( exp_mean > 6.
    AND n >= 200
    AND exp_cv > 0.5 )
ORDER BY
  exp_cv DESC



In [14]:

    
bq.Query(hv_genes,hv_result=r).results().to_dataframe()









    Out[14]:






  
    
      
      Study
      HGNC_gene_symbol
      n
      exp_mean
      exp_sigma
      exp_cv
    
  
  
    
      0
      SARC
      RPS4Y1
      259
      6.061873
      5.548601
      0.915328
    
    
      1
      COAD
      XIST
      285
      6.248359
      5.470363
      0.875488
    
    
      2
      COAD
      RPS4Y1
      285
      6.121045
      5.155333
      0.842231
    
    
      3
      BLCA
      GSTM1
      408
      6.010392
      5.006242
      0.832931
    
    
      4
      KIRC
      XIST
      533
      6.407005
      5.271564
      0.822781
    
    
      5
      BRCA
      CPB1
      1095
      6.559260
      5.168064
      0.787904
    
    
      6
      LGG
      RPS4Y1
      516
      6.704050
      5.271334
      0.786291
    
    
      7
      LGG
      XIST
      516
      7.173960
      5.539642
      0.772187
    
    
      8
      HNSC
      ACTC1
      520
      6.066815
      4.653893
      0.767106
    
    
      9
      BLCA
      KRT6C
      408
      6.000740
      4.567136
      0.761095
    
    
      10
      LIHC
      GSTM1
      371
      6.074415
      4.605980
      0.758259
    
    
      11
      SARC
      PLA2G2A
      259
      6.002323
      4.550597
      0.758139
    
    
      12
      STAD
      REG3A
      415
      6.314804
      4.661015
      0.738109
    
    
      13
      LUSC
      MAGEA6
      502
      6.056004
      4.452028
      0.735143
    
    
      14
      SARC
      XIST
      259
      7.028725
      5.046253
      0.717947
    
    
      15
      STAD
      FABP1
      415
      6.015140
      4.304719
      0.715647
    
    
      16
      KIRC
      FGB
      533
      6.385475
      4.557211
      0.713684
    
    
      17
      HNSC
      ACTA1
      520
      6.272165
      4.441155
      0.708074
    
    
      18
      BLCA
      KRT6B
      408
      6.821403
      4.818030
      0.706311
    
    
      19
      STAD
      RPS4Y1
      415
      6.695497
      4.713168
      0.703931
    
    
      20
      CESC
      SPRR2E
      304
      6.072856
      4.266463
      0.702546
    
    
      21
      LIHC
      FXYD2
      371
      6.026990
      4.227431
      0.701417
    
    
      22
      BLCA
      CLCA4
      408
      6.096983
      4.248395
      0.696803
    
    
      23
      LIHC
      EPCAM
      371
      6.018766
      4.176006
      0.693831
    
    
      24
      BLCA
      KRT4
      408
      6.182480
      4.287769
      0.693536
    
    
      25
      LIHC
      CYP1A2
      371
      6.324730
      4.386093
      0.693483
    
    
      26
      LUSC
      MAGEA9B
      502
      6.100760
      4.227037
      0.692871
    
    
      27
      LUAD
      XIST
      515
      7.728633
      5.342959
      0.691320
    
    
      28
      KIRC
      KDM5D
      533
      6.093867
      4.209498
      0.690776
    
    
      29
      SARC
      CHRDL2
      259
      6.360666
      4.375406
      0.687885
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      296
      STAD
      TM4SF4
      415
      6.620646
      3.355416
      0.506811
    
    
      297
      LIHC
      HGFAC
      371
      8.064111
      4.084009
      0.506443
    
    
      298
      STAD
      CDX1
      415
      7.238605
      3.665532
      0.506387
    
    
      299
      KIRP
      HBA1
      290
      6.039078
      3.055942
      0.506028
    
    
      300
      PRAD
      CST1
      497
      6.162322
      3.117676
      0.505926
    
    
      301
      OV
      GAL3ST3
      305
      6.111476
      3.089889
      0.505588
    
    
      302
      COAD
      ALDH1L1
      285
      6.285763
      3.170998
      0.504473
    
    
      303
      BRCA
      KLHDC7A
      1095
      6.339753
      3.197413
      0.504343
    
    
      304
      LUSC
      SBSN
      502
      6.835398
      3.444914
      0.503982
    
    
      305
      BRCA
      KRT81
      1095
      6.511374
      3.280551
      0.503819
    
    
      306
      SARC
      TPSAB1
      259
      6.814440
      3.433117
      0.503800
    
    
      307
      KIRP
      ALDOB
      290
      6.118838
      3.080089
      0.503378
    
    
      308
      SARC
      SERPINA3
      259
      6.266452
      3.154244
      0.503354
    
    
      309
      BRCA
      ELF5
      1095
      6.514652
      3.279154
      0.503351
    
    
      310
      LIHC
      SCGN
      371
      6.090981
      3.063063
      0.502885
    
    
      311
      BLCA
      CYP4F22
      408
      6.981163
      3.508778
      0.502606
    
    
      312
      PRAD
      ORM2
      497
      6.222926
      3.123566
      0.501945
    
    
      313
      CESC
      MMP10
      304
      6.789319
      3.405935
      0.501661
    
    
      314
      LUAD
      B3GNT6
      515
      6.068778
      3.042877
      0.501399
    
    
      315
      STAD
      CYP2B6
      415
      6.206014
      3.110105
      0.501144
    
    
      316
      SARC
      SMOC1
      259
      7.165206
      3.589628
      0.500980
    
    
      317
      SARC
      HR
      259
      6.015107
      3.013293
      0.500954
    
    
      318
      SARC
      ABCA8
      259
      6.951896
      3.482219
      0.500902
    
    
      319
      CESC
      SPRR1B
      304
      9.031974
      4.524126
      0.500901
    
    
      320
      KIRC
      SLC5A8
      533
      7.190818
      3.598825
      0.500475
    
    
      321
      OV
      ZNHIT2
      305
      6.444236
      3.224612
      0.500387
    
    
      322
      BLCA
      DES
      408
      8.655334
      4.330614
      0.500341
    
    
      323
      LIHC
      PLPP2
      371
      6.643719
      3.323184
      0.500199
    
    
      324
      THCA
      DCSTAMP
      505
      8.456327
      4.228831
      0.500079
    
    
      325
      BLCA
      CEACAM6
      408
      7.919889
      3.960153
      0.500026
    
  

326 rows × 6 columns



In [ ]:

HGNC_gene_symbol	original_gene_symbol	gene_id
A1BG	A1BG	1
A1CF	A1CF	29974
A2M	A2M	2
A2ML1	A2ML1	144568
A4GALT	A4GALT	53947
A4GNT	A4GNT	51146
AAAS	AAAS	8086
AACS	AACS	65985
AADAC	AADAC	13
AADACL2	AADACL2	344752
AADACL3	AADACL3	126767
AADACL4	AADACL4	343066
AADAT	AADAT	51166
AAGAB	AAGAB	79719
AAK1	AAK1	22848
AAMP	AAMP	14
AANAT	AANAT	15
AARS	AARS	16
AARS2	AARS2	57505
AARSD1	AARSD1	80755
AASDH	AASDH	132949
AASDHPPT	AASDHPPT	60496
AASS	AASS	10157
AATF	AATF	26574
AATK	AATK	9625

HGNC_gene_symbol	original_gene_symbol	gene_id
A1BG-AS1	NCRNA00181	503538
A2M-AS1	LOC144571	144571
AACSP1	AACSL	729522
AADACP1	LOC201651	201651
AAED1	C9orf21	195827
AAMDC	C11orf67	28971
AAR2	C20orf4	25980
AARD	C8orf85	441376
AATBC	LOC284837	284837
AATK-AS1	LOC388428	388428
ABHD11-AS1	WBSCR26	171022
ABHD16A	BAT5	7920
ABHD16B	C20orf135	140701
ABHD17A	FAM108A1	81926
ABHD17B	FAM108B1	51104
ABHD17C	FAM108C1	58489
ABHD18	C4orf29	80167
ABRACL	C6orf115	58527
ACKR1	DARC	2532
ACKR2	CCBP2	1238
ACKR3	CXCR7	57007
ACKR4	CCRL1	51554
ACSM6	C10orf129	142827
ACTG1P4	LOC648740	648740
ACTL10	C20orf134	170487

Study	n	exp_mean	exp_sigma	exp_cv
GBM	156	11.8977935121	2.37898710407	0.199951957617
SKCM	104	5.41189297221	2.3134143602	0.427468608873
BRCA	1095	7.3316523775	2.0186060124	0.275327567165
PCPG	179	5.21363387261	1.93511911652	0.371165134301
BLCA	408	9.26258869139	1.84003491022	0.19865233916
CESC	304	9.615363578	1.74670970515	0.181658206783
THYM	120	8.61408054406	1.74340125417	0.202389708948
LGG	516	11.1471994947	1.72222308803	0.154498274553
ESCA	184	11.6700063461	1.63953898439	0.140491696042
SARC	259	8.58641554296	1.59104767886	0.185298238933
UCEC	176	7.44476326481	1.5896445786	0.213525201818
LUSC	502	10.9294654726	1.54971809843	0.141792670677
TGCT	150	6.98336711449	1.50763881878	0.215889955958
KICH	66	9.48002047141	1.49846283983	0.158065359073
LUAD	515	9.89511374203	1.49825691249	0.151413814086
HNSC	520	11.3907589542	1.48880096929	0.130702526081
DLBC	48	4.41936166206	1.47405528888	0.333544842355
CHOL	36	9.19645454465	1.46185657708	0.158958712836
UCS	57	8.09853127117	1.44858302758	0.178869844306
ACC	79	7.7442559231	1.44367693285	0.186419063004
MESO	87	9.55197752348	1.43369934675	0.1500945059
STAD	415	10.2198608533	1.39368443653	0.136370196869
LIHC	371	9.60580817296	1.3719289882	0.142822859201
OV	305	8.53829302419	1.19815155796	0.140326825814
KIRP	290	9.36140534142	1.19513023402	0.127665685913

	Study	HGNC_gene_symbol	n	exp_mean	exp_sigma	exp_cv
0	SARC	RPS4Y1	259	6.061873	5.548601	0.915328
1	COAD	XIST	285	6.248359	5.470363	0.875488
2	COAD	RPS4Y1	285	6.121045	5.155333	0.842231
3	BLCA	GSTM1	408	6.010392	5.006242	0.832931
4	KIRC	XIST	533	6.407005	5.271564	0.822781
5	BRCA	CPB1	1095	6.559260	5.168064	0.787904
6	LGG	RPS4Y1	516	6.704050	5.271334	0.786291
7	LGG	XIST	516	7.173960	5.539642	0.772187
8	HNSC	ACTC1	520	6.066815	4.653893	0.767106
9	BLCA	KRT6C	408	6.000740	4.567136	0.761095
10	LIHC	GSTM1	371	6.074415	4.605980	0.758259
11	SARC	PLA2G2A	259	6.002323	4.550597	0.758139
12	STAD	REG3A	415	6.314804	4.661015	0.738109
13	LUSC	MAGEA6	502	6.056004	4.452028	0.735143
14	SARC	XIST	259	7.028725	5.046253	0.717947
15	STAD	FABP1	415	6.015140	4.304719	0.715647
16	KIRC	FGB	533	6.385475	4.557211	0.713684
17	HNSC	ACTA1	520	6.272165	4.441155	0.708074
18	BLCA	KRT6B	408	6.821403	4.818030	0.706311
19	STAD	RPS4Y1	415	6.695497	4.713168	0.703931
20	CESC	SPRR2E	304	6.072856	4.266463	0.702546
21	LIHC	FXYD2	371	6.026990	4.227431	0.701417
22	BLCA	CLCA4	408	6.096983	4.248395	0.696803
23	LIHC	EPCAM	371	6.018766	4.176006	0.693831
24	BLCA	KRT4	408	6.182480	4.287769	0.693536
25	LIHC	CYP1A2	371	6.324730	4.386093	0.693483
26	LUSC	MAGEA9B	502	6.100760	4.227037	0.692871
27	LUAD	XIST	515	7.728633	5.342959	0.691320
28	KIRC	KDM5D	533	6.093867	4.209498	0.690776
29	SARC	CHRDL2	259	6.360666	4.375406	0.687885
...	...	...	...	...	...	...
296	STAD	TM4SF4	415	6.620646	3.355416	0.506811
297	LIHC	HGFAC	371	8.064111	4.084009	0.506443
298	STAD	CDX1	415	7.238605	3.665532	0.506387
299	KIRP	HBA1	290	6.039078	3.055942	0.506028
300	PRAD	CST1	497	6.162322	3.117676	0.505926
301	OV	GAL3ST3	305	6.111476	3.089889	0.505588
302	COAD	ALDH1L1	285	6.285763	3.170998	0.504473
303	BRCA	KLHDC7A	1095	6.339753	3.197413	0.504343
304	LUSC	SBSN	502	6.835398	3.444914	0.503982
305	BRCA	KRT81	1095	6.511374	3.280551	0.503819
306	SARC	TPSAB1	259	6.814440	3.433117	0.503800
307	KIRP	ALDOB	290	6.118838	3.080089	0.503378
308	SARC	SERPINA3	259	6.266452	3.154244	0.503354
309	BRCA	ELF5	1095	6.514652	3.279154	0.503351
310	LIHC	SCGN	371	6.090981	3.063063	0.502885
311	BLCA	CYP4F22	408	6.981163	3.508778	0.502606
312	PRAD	ORM2	497	6.222926	3.123566	0.501945
313	CESC	MMP10	304	6.789319	3.405935	0.501661
314	LUAD	B3GNT6	515	6.068778	3.042877	0.501399
315	STAD	CYP2B6	415	6.206014	3.110105	0.501144
316	SARC	SMOC1	259	7.165206	3.589628	0.500980
317	SARC	HR	259	6.015107	3.013293	0.500954
318	SARC	ABCA8	259	6.951896	3.482219	0.500902
319	CESC	SPRR1B	304	9.031974	4.524126	0.500901
320	KIRC	SLC5A8	533	7.190818	3.598825	0.500475
321	OV	ZNHIT2	305	6.444236	3.224612	0.500387
322	BLCA	DES	408	8.655334	4.330614	0.500341
323	LIHC	PLPP2	371	6.643719	3.323184	0.500199
324	THCA	DCSTAMP	505	8.456327	4.228831	0.500079
325	BLCA	CEACAM6	408	7.919889	3.960153	0.500026