Enrichment Analysis

Get basic structure of data

One problem with using the notebook is that it is a little harder to look at datasets, you need to explicitly print things out. This section is where I keep those print out just for reference.

CEGS Data Sets


In [1]:
libname CEGS '/home/jfear/mclab/cegs_ase_paper/sas_data/';

proc datasets library=cegs nodetails;


Out[1]:
SAS Output

SAS Output

The SAS System

The DATASETS Procedure

Directory Information

Directory
Libref CEGS
Engine V9
Physical Name /home/jfear/mclab/cegs_ase_paper/sas_data
Filename /home/jfear/mclab/cegs_ase_paper/sas_data
Inode Number 6
Access Permission rwx------
Owner Name jfear
File Size (bytes) 0

Library Members

# Name Member Type File Size Last Modified
1 AI_REG_FIT_CIS DATA 131072 01/21/2016 14:36:23
2 AI_REG_FIT_CIS2 DATA 131072 01/21/2016 14:35:33
3 AI_REG_FIT_FULL DATA 327680 01/21/2016 11:04:45
4 AI_REG_FIT_FULL2 DATA 131072 01/21/2016 11:20:57
5 AI_REG_FIT_NOINT DATA 327680 01/21/2016 11:04:45
6 AI_REG_FIT_NOINT2 DATA 131072 01/21/2016 11:20:57
7 AI_REG_PARMS_CIS DATA 589824 01/21/2016 14:35:33
8 AI_REG_PARMS_CT DATA 851968 01/21/2016 11:04:18
9 AI_REG_PARMS_FULL DATA 1114112 01/21/2016 11:04:18
10 AI_SWITCH DATA 196608 01/19/2016 22:08:07
11 ALL_CALLS_SBS DATA 37355520 10/27/2015 11:23:09
12 ALN_SUM_UPDATED_LINE_LINE DATA 73728 03/07/2014 13:43:16
13 ALN_SUM_UPDATED_LINE_LINE_1TO10 DATA 24576 01/27/2014 10:01:19
14 ALN_SUM_UPDATED_LINE_LINE_INCOM DATA 40960 03/19/2014 12:42:22
15 ALN_SUM_UPDATED_LINE_W1118 DATA 73728 03/07/2014 13:43:17
16 ALN_SUM_UPDATED_LINE_W1118_1TO10 DATA 24576 01/27/2014 10:01:19
17 ALN_SUM_UPDATED_LINE_W1118_INCOM DATA 40960 03/19/2014 12:42:22
18 ASE_BAYES_FUSION_MEANS_SD DATA 2892800 08/27/2014 09:41:50
19 ASE_COUNTS_FOR_BAYESIAN DATA 1300832256 06/28/2014 17:09:11
20 ASE_COUNTS_FOR_BAYESIAN_BOTH_SBS DATA 956424192 07/01/2014 18:28:39
21 ASE_COUNTS_FOR_BAYESIAN_SBS DATA 692838400 06/28/2014 17:03:41
22 ASE_COUNTS_SUM_TABLE_LINE_MV_REP DATA 16384 01/07/2014 18:10:15
23 ASE_DESIGN_FILE DATA 131072 08/09/2015 17:33:14
24 ASE_QSIM_LINE DATA 172621824 04/08/2015 11:50:13
25 ASE_QSIM_TESTER DATA 172621824 08/30/2015 18:28:26
26 BASE_COUNT_DESIGN_R101 DATA 131072 10/21/2014 10:57:58
27 BAYESIAN_FLAG_ANALYZE DATA 172621824 04/15/2015 09:54:11
28 BAYESIAN_FUSION_LIST DATA 589824 06/29/2014 11:55:17
29 CIS_EST_V13 DATA 17432576 04/06/2016 09:55:26
30 CLEAN_ASE_SBS DATA 5242880 08/13/2015 15:02:25
31 CLEAN_ASE_STACK DATA 25755648 02/10/2016 17:03:39
32 COMPARE_CT_VAR DATA 196608 01/25/2016 17:00:45
33 COUNTS_FOR_BAYESIAN_BOTH_STACK DATA 1517944832 04/15/2015 09:33:02
34 COUNTS_FOR_EXTENDED_BAYESIAN DATA 1901658112 04/15/2015 09:18:23
35 COUNT_AI_MATED_BY_FUSION DATA 196608 10/27/2015 11:19:46
36 COUNT_AI_MATED_BY_LINE DATA 131072 10/27/2015 11:24:11
37 COUNT_TRANS_MATED_BY_FUSION DATA 196608 10/27/2015 11:11:50
38 DATA_FOR_BAYES_BMC_PG_MODEL DATA 1722220544 04/15/2015 10:30:50
39 DATA_FOR_BAYES_BOTH_EXTENDED_SBS DATA 1513881600 04/15/2015 10:41:10
40 DATA_FOR_BAYES_EXTENDED_SBS DATA 1513881600 04/15/2015 10:48:52
41 DISCORDANT_GENOTYPES DATA 131072 08/13/2015 15:48:07
42 EMP_BAYESIAN_INPUT DATA 2346450944 04/17/2015 08:25:31
43 EMP_BAYESIAN_RESULTS DATA 264372224 03/28/2015 14:22:20
44 EMP_BAYESIAN_RESULTS_W_FLAGS DATA 326893568 03/28/2015 14:43:11
45 EXON_DROP_LIST_100_GENOME DATA 131072 08/31/2015 12:12:31
46 EXTEND_VS_BOTH_05_09 DATA 98763776 09/25/2014 12:28:15
47 FB551_100_GENOME_FLAG_LINE_BIAS DATA 52494336 08/30/2015 14:45:07
48 FDR_FLAG_CIS DATA 589824 11/18/2015 15:57:21
49 FDR_MODEL_AI_CT DATA 786432 01/20/2016 17:18:36
50 FLAG_EXONIC_REGIONS_100_GENOME DATA 3145728 08/30/2015 14:45:21
51 FLAG_THETA_BIAS DATA 1538260992 11/05/2014 17:44:44
52 GENES_QITH_SPICE_AI DATA 131072 01/19/2016 22:16:38
53 GENOTYPE_LIST DATA 131072 10/13/2014 13:10:20
54 INTERACTION_EXPR_ANOVA_RESULTS DATA 6488064 11/18/2015 13:48:55
55 LUIS_PG_RESULTS DATA 18415616 11/27/2014 19:08:04
56 MATED_EXPR_ANOVA_RESULTS DATA 2818048 11/03/2015 11:33:54
57 MERGED_LINE_ASE DATA 30770176 08/06/2014 15:25:31
58 MGTFSIG DATA 327680 04/29/2016 08:32:15
59 MODEL_AI_CT DATA 1179648 01/17/2016 19:30:16
60 ORIGINAL_VS_BOTH_05 DATA 109495296 09/05/2014 15:58:57
61 OUTPUT_05_ALL_LINE_PERCENT DATA 134300672 09/05/2014 12:25:16
62 OUTPUT_BAYES_05 DATA 16286720 08/27/2014 09:27:27
63 OUTPUT_BAYES_BOTH_05 DATA 16286720 08/25/2014 11:43:38
64 OUTPUT_BOTH_05_ALL_LINE_PERCENT DATA 154960896 09/05/2014 12:32:08
65 PARMS_CVT DATA 327680 01/21/2016 18:31:32
66 PHENO_TEST1 DATA 90112 11/22/2013 13:04:12
67 POOL_2015_GENE_LISTS DATA 196608 11/01/2015 15:47:27
68 QSIM_BAYESIAN_RESULTS DATA 140115968 08/30/2015 18:38:25
69 QSIM_EMP_THETA_W_FLAG DATA 310902784 08/31/2015 12:12:20
70 R101_FB557_MISS DATA 139919360 08/28/2015 12:37:36
71 R101_MISSPECIFICATION DATA 140050432 08/28/2015 13:49:58
72 R2_CT_MODELS DATA 196608 01/25/2016 16:51:11
73 R332_MISSPECIFICATION DATA 140050432 08/28/2015 13:49:45
74 R361_MISSPECIFICATION DATA 140050432 08/28/2015 13:49:39
75 R365_MISSPECIFICATION DATA 140050432 08/28/2015 13:49:52
76 RITA_RESULTS_W_FLAGS DATA 4390912 10/15/2015 13:14:59
77 SEX_DET_GENO_AI DATA 131072 08/13/2015 15:04:14
78 SEX_DET_ON_OFF DATA 131072 08/13/2015 15:03:21
79 SIM_100_RESULTS DATA 48046080 09/11/2014 09:32:25
80 SIM_100_RESULTS_TRANS DATA 56102912 09/11/2014 09:31:46
81 STANDARD_MATING_STACK DATA 6422528 10/27/2015 14:32:21
82 STANDARD_VIRGIN_STACK DATA 6422528 10/27/2015 14:32:58
83 TESTS_ANNO DATA 36700160 11/18/2015 16:00:22
84 VIRGIN_EXPR_ANOVA_RESULTS DATA 3014656 11/03/2015 11:53:06

In [2]:
*proc print data=CEGS.r2_ct_models (obs=10); run;


Out[2]:
SAS Output

SAS Output


In [3]:
*proc print data=CEGS.CIS_EST_V13 (obs=10); run;


Out[3]:
SAS Output

SAS Output

Useful Dmel Data Sets


In [4]:
libname DMEL '/home/jfear/mclab/useful_dmel_data/flybase551/sasdata';

proc datasets library=DMEL nodetails;


Out[4]:
SAS Output

SAS Output

The SAS System

The DATASETS Procedure

Directory Information

Directory
Libref DMEL
Engine V9
Physical Name /home/jfear/mclab/useful_dmel_data/flybase551/sasdata
Filename /home/jfear/mclab/useful_dmel_data/flybase551/sasdata
Inode Number 462
Access Permission rwx------
Owner Name jfear
File Size (bytes) 0

Library Members

# Name Member Type File Size Last Modified
1 EXON2FBTR DATA 82911232 06/15/2013 12:44:36
2 EXON2FBTR_STACK DATA 26861568 06/09/2013 21:19:06
3 EXON2SYMBOL DATA 38150144 06/10/2013 08:53:56
4 EXONSIZES DATA 5881856 06/17/2013 11:47:36
5 FB551_SI_FUSIONS DATA 22880256 03/09/2016 16:38:49
6 FB551_SI_FUSIONS_UNIQUE DATA 463994880 03/22/2016 15:25:47
7 FB551_SI_FUSIONS_UNIQUE_FLAGGED DATA 463994880 04/06/2016 09:56:22
8 FB551_SI_FUSION_2_GENE_ID DATA 78110720 03/28/2016 14:57:19
9 FBGN2COORD DATA 942080 06/15/2013 12:45:54
10 FBGN2FBTR DATA 7053312 06/15/2013 12:44:50
11 FBGN2OLDFBGN DATA 4784128 05/08/2015 10:39:48
12 FLAG_AMBIG_NON DATA 16384 12/13/2013 12:40:50
13 FLAG_AMBIG_NON_REDUNDANT_FUSIONS DATA 2564096 03/04/2014 14:33:50
14 FUSIONS2GO DATA 730734592 10/16/2015 11:47:57
15 GENES2GO_NODUPS DATA 112197632 10/22/2015 13:46:59
16 GENES_BED_FILE DATA 1048576 03/01/2016 16:15:07
17 SD2SI_FUSIONS DATA 10878976 01/27/2015 12:01:29
18 SD_FUSIONS_TABLE DATA 8847360 01/27/2015 12:01:06
19 SEGMENTID2FUSIONID DATA 2269184 09/13/2013 14:54:23
20 SI_FUSIONS_TABLE DATA 8847360 01/27/2015 12:01:06
21 SYMBOL2CG DATA 70950912 06/15/2013 12:45:47
22 SYMBOL2COORD DATA 1613824 06/15/2013 12:45:54
23 SYMBOL2FBGN DATA 12541952 07/14/2015 15:03:26
24 TRANSCRIPT_LENGTH DATA 1048576 09/11/2015 14:08:36

In [5]:
proc print data=DMEL.FB551_SI_FUSIONS_UNIQUE_FLAGGED(obs=10); run;


Out[5]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set DMEL.FB551_SI_FUSIONS_UNIQUE_FLAGGED

Obs fusion_id Genes_per_fusion Exons_per_fusion FBgns_per_fusion FBpps_per_fusion FBtrs_per_fusion chrom start end symbol_cat Exon_Gene_ID_cat exon_ID_cat Exon_Name_cat FBtrs_per_exon_cat FBgn_cat FBpp_cat FBtr_cat max_fbtr_per_gene_symbol min_fbtr_per_gene_symbol_cat max_fbtr_per_gene_symbol_cat Constitutive Common Alternative most_three_prime_exon
1 F10001_SI 2 2 2 2 2 2L 19042380 19043883 Catsup|Ttc19 FBgn0002022|FBgn0032744 FBgn0002022:2|FBgn0032744:1 Catsup:2|Ttc19:1 1|1 FBgn0002022|FBgn0032744 FBpp0080687|FBpp0080720 FBtr0081143|FBtr0081178 . 2 2 0 1 0 1
2 F10005_SI 1 3 1 3 3 2L 19045355 19047047 Acn FBgn0263198 FBgn0263198:1|FBgn0263198:2|FBgn0263198:3 Acn:1|Acn:2|Acn:3 1|2|1 FBgn0263198 FBpp0080688|FBpp0080689|FBpp0307803 FBtr0081144|FBtr0081145|FBtr0336842 3 3 3 0 1 0 0
3 F10009_SI 1 3 1 3 3 2L 19047914 19048817 Acn FBgn0263198 FBgn0263198:7|FBgn0263198:8|FBgn0263198:9 Acn:7|Acn:8|Acn:9 1|1|1 FBgn0263198 FBpp0080688|FBpp0080689|FBpp0307803 FBtr0081144|FBtr0081145|FBtr0336842 3 3 3 0 1 0 1
4 F10012_SI 2 2 2 2 2 2L 19049422 19051034 CG10470|l(2)37Bb FBgn0032746|FBgn0002021 FBgn0002021:1|FBgn0032746:3 l(2)37Bb:1|CG10470:3 1|1 FBgn0002021|FBgn0032746 FBpp0080690|FBpp0080719 FBtr0081146|FBtr0081177 . 2 2 0 1 0 1
5 F10014_SI 1 2 1 2 2 2L 19051790 19052217 Rpn3 FBgn0261396 FBgn0261396:1|FBgn0261396:2 Rpn3:1|Rpn3:2 1|1 FBgn0261396 FBpp0291495|FBpp0307804 FBtr0302289|FBtr0336843 2 2 2 0 1 0 0
6 F10018_SI 1 2 1 2 2 2L 19056252 19058821 CG10492 FBgn0032748 FBgn0032748:3|FBgn0032748:4 CG10492:3|CG10492:4 1|1 FBgn0032748 FBpp0080692|FBpp0111956 FBtr0081148|FBtr0113043 2 2 2 0 1 0 1
7 F10019_SI 1 2 1 2 2 2L 19059129 19059850 Phlpp FBgn0032749 FBgn0032749:1|FBgn0032749:2 Phlpp:1|Phlpp:2 1|1 FBgn0032749 FBpp0080693|FBpp0307805 FBtr0081149|FBtr0336844 2 2 2 0 1 0 0
8 F10032_SI 2 6 2 5 5 2L 19068203 19069850 CG10702|CG17343 FBgn0032752|FBgn0032751 FBgn0032751:2|FBgn0032752:10|FBgn0032752:6|FBgn0032752:7|FBgn0032752:8|FBgn0032752:9 CG17343:2|CG10702:10|CG10702:6|CG10702:7|CG10702:8|CG10702:9 1|2|2|1|1|1 FBgn0032751|FBgn0032752 FBpp0080695|FBpp0080716|FBpp0080717|FBpp0080718|FBpp0292333 FBtr0081151|FBtr0081174|FBtr0081175|FBtr0081176|FBtr0303241 . 5 5 0 1 0 1
9 F10049_SI 2 2 2 2 2 2L 19102044 19102104 CG17344|CG43731 FBgn0032755|FBgn0263982 FBgn0032755:1|FBgn0263982:1 CG17344:1|CG43731:1 1|1 FBgn0032755|FBgn0263982 FBpp0080696|FBpp0303052 FBtr0081152|FBtr0330018 . 2 2 0 1 0 0
10 F10050_SI 2 2 2 2 2 2L 19102168 19102909 CG17344|CG43731 FBgn0032755|FBgn0263982 FBgn0032755:2|FBgn0263982:2 CG17344:2|CG43731:2 1|1 FBgn0032755|FBgn0263982 FBpp0080696|FBpp0303052 FBtr0081152|FBtr0330018 . 2 2 0 1 0 1

Looking at models

Trying to figure out what the models Lauren made look like. I am going to move forward with the interaction model.

CEGS.ai_reg_parms_ful


In [6]:
proc print data=CEGS.ai_reg_fit_full2 (obs=10); run;


Out[6]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set CEGS.AI_REG_FIT_FULL2

Obs fusion_id mating_status R2_full
1 F10005_SI M 0.9841
2 F10005_SI V 0.9636
3 F10060_SI M 0.8104
4 F10060_SI V 0.8600
5 F10136_SI M 0.9046
6 F10136_SI V 0.9369
7 F10268_SI M 0.9757
8 F10268_SI V 0.9740
9 F10317_SI M 0.8816
10 F10317_SI V 0.9078

In [7]:
proc print data=CEGS.ai_reg_parms_full (obs=10); run;


Out[7]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set CEGS.AI_REG_PARMS_FULL

Obs fusion_id mating_status Model Dependent Variable DF Estimate StdErr tValue Probt sign_parm
1 F10005_SI M MODEL1 q5_mean_theta Intercept 1 0.50743 0.00215 236.30 <.0001 +
2 F10005_SI M MODEL1 q5_mean_theta c_i 1 -1.86431 0.05181 -35.98 <.0001 -
3 F10005_SI M MODEL1 q5_mean_theta T_i_1a 1 -0.00173 0.02146 -0.08 0.9364 -
4 F10005_SI M MODEL1 q5_mean_theta int 1 2.13998 0.35341 6.06 <.0001 +
5 F10005_SI V MODEL1 q5_mean_theta Intercept 1 0.50404 0.00245 205.98 <.0001 +
6 F10005_SI V MODEL1 q5_mean_theta c_i 1 -1.88200 0.09409 -20.00 <.0001 -
7 F10005_SI V MODEL1 q5_mean_theta T_i_1a 1 0.07370 0.02406 3.06 0.0050 +
8 F10005_SI V MODEL1 q5_mean_theta int 1 4.42754 0.58473 7.57 <.0001 +
9 F10060_SI M MODEL1 q5_mean_theta Intercept 1 0.49708 0.01189 41.81 <.0001 +
10 F10060_SI M MODEL1 q5_mean_theta c_i 1 -1.79861 0.19850 -9.06 <.0001 -

In [8]:
proc print data=CEGS.ai_reg_fit_full (obs=10); run;


Out[8]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set CEGS.AI_REG_FIT_FULL

Obs fusion_id mating_status Model Dependent Label1 cValue1 nValue1 Label2 cValue2 nValue2
1 F10005_SI M MODEL1 q5_mean_theta Dependent Mean 0.50210 0.502100 Adj R-Sq 0.9841 0.984129
2 F10005_SI V MODEL1 q5_mean_theta Dependent Mean 0.50273 0.502733 Adj R-Sq 0.9636 0.963552
3 F10060_SI M MODEL1 q5_mean_theta Dependent Mean 0.49234 0.492345 Adj R-Sq 0.8104 0.810425
4 F10060_SI V MODEL1 q5_mean_theta Dependent Mean 0.47579 0.475793 Adj R-Sq 0.8600 0.859964
5 F10136_SI M MODEL1 q5_mean_theta Dependent Mean 0.53008 0.530083 Adj R-Sq 0.9046 0.904646
6 F10136_SI V MODEL1 q5_mean_theta Dependent Mean 0.53413 0.534125 Adj R-Sq 0.9369 0.936881
7 F10268_SI M MODEL1 q5_mean_theta Dependent Mean 0.48586 0.485857 Adj R-Sq 0.9757 0.975698
8 F10268_SI V MODEL1 q5_mean_theta Dependent Mean 0.49090 0.490905 Adj R-Sq 0.9740 0.973953
9 F10317_SI M MODEL1 q5_mean_theta Dependent Mean 0.09414 0.094139 Adj R-Sq 0.8816 0.881597
10 F10317_SI V MODEL1 q5_mean_theta Dependent Mean 0.09178 0.091778 Adj R-Sq 0.9078 0.907815

In [9]:
Data cis_int;
set cegs.cis_est_v13  (obs=10);
int= c_i*t_i_1a;
run;

proc sort data=cis_int;
by fusion_id mating_status;
run;

*full model;

proc reg data=cis_int ;
by fusion_id  mating_status;
model q5_mean_theta=c_i t_i_1a int;
ods output ParameterEstimates=parms_full fitstatistics=fit_full;
run;


Out[9]:
SAS Output

SAS Output

The SAS System

The REG Procedure

Model: MODEL1

Dependent Variable: q5_mean_theta

The REG Procedure

fusion_id=F10005_SI mating_status=M

MODEL1

Fit

q5_mean_theta

Number of Observations

Number of Observations Read 10
Number of Observations Used 10

Analysis of Variance

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 3 0.06772 0.02257 574.40 <.0001
Error 6 0.00023578 0.00003930    
Corrected Total 9 0.06795      

Fit Statistics

Root MSE 0.00627 R-Square 0.9965
Dependent Mean 0.46950 Adj R-Sq 0.9948
Coeff Var 1.33519    

Parameter Estimates

Parameter Estimates
Variable DF Parameter
Estimate
Standard
Error
t Value Pr > |t|
Intercept 1 0.50812 0.00267 190.40 <.0001
c_i 1 -1.81063 0.19727 -9.18 <.0001
T_i_1a 1 0.05454 0.03547 1.54 0.1751
int 1 1.72876 1.03292 1.67 0.1452

The SAS System

The REG Procedure

Model: MODEL1

Dependent Variable: q5_mean_theta

Observation-wise Statistics

q5_mean_theta

Diagnostic Plots

Fit Diagnostics

Residual Plots

Panel 1


In [10]:
proc print data=CEGS.r2_ct_models (obs=10); run;


Out[10]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set CEGS.R2_CT_MODELS

Obs fusion_id mating_status R2_full R2_noint R2_cis R2_diff_int R2_diff_trans
1 F10005_SI M 0.9841 0.9632 0.9637 0.0209 -0.0005
2 F10005_SI V 0.9636 0.8875 0.8487 0.0761 0.0388
3 F10060_SI M 0.8104 0.7341 0.6474 0.0763 0.0867
4 F10060_SI V 0.8600 0.8272 0.7507 0.0328 0.0765
5 F10136_SI M 0.9046 0.9046 0.8673 0.0000 0.0373
6 F10136_SI V 0.9369 0.9313 0.9166 0.0056 0.0147
7 F10268_SI M 0.9757 0.9726 0.9727 0.0031 -0.0001
8 F10268_SI V 0.9740 0.9549 0.9538 0.0191 0.0011
9 F10317_SI M 0.8816 0.8812 0.0884 0.0004 0.7928
10 F10317_SI V 0.9078 0.8579 0.1466 0.0499 0.7113

In [11]:
proc univariate data =CEGS.r2_ct_models(obs=2) normal plot  ;
var R2_full R2_noint R2_diff_int R2_diff_trans r2_cis;
run;


Out[11]:
SAS Output

SAS Output

The SAS System

The UNIVARIATE Procedure

Variable: R2_full

The UNIVARIATE Procedure

R2_full

Moments

Moments
N 2 Sum Weights 2
Mean 0.97385 Sum Observations 1.9477
Std Deviation 0.01449569 Variance 0.00021012
Skewness . Kurtosis .
Uncorrected SS 1.89697777 Corrected SS 0.00021012
Coeff Variation 1.48849299 Std Error Mean 0.01025

Basic Measures of Location and Variability

Basic Statistical Measures
Location Variability
Mean 0.973850 Std Deviation 0.01450
Median 0.973850 Variance 0.0002101
Mode . Range 0.02050
    Interquartile Range 0.02050

Tests For Location

Tests for Location: Mu0=0
Test Statistic p Value
Student's t t 95.00976 Pr > |t| 0.0067
Sign M 1 Pr >= |M| 0.5000
Signed Rank S 1.5 Pr >= |S| 0.5000

Tests For Normality

Tests for Normality
Test Statistic p Value
Shapiro-Wilk W 1 Pr < W 1.0000
Kolmogorov-Smirnov D 0.26025 Pr > D >0.1500
Cramer-von Mises W-Sq 0.041877 Pr > W-Sq >0.2500
Anderson-Darling A-Sq 0.250482 Pr > A-Sq 0.2332

Quantiles

Quantiles (Definition 5)
Quantile Estimate
100% Max 0.98410
99% 0.98410
95% 0.98410
90% 0.98410
75% Q3 0.98410
50% Median 0.97385
25% Q1 0.96360
10% 0.96360
5% 0.96360
1% 0.96360
0% Min 0.96360

Extreme Observations

Extreme Observations
Lowest Highest
Value Obs Value Obs
0.9636 2 0.9636 2
0.9841 1 0.9841 1

Plots for R2_full


The SAS System

The UNIVARIATE Procedure

Variable: R2_noint

R2_noint

Moments

Moments
N 2 Sum Weights 2
Mean 0.92535 Sum Observations 1.8507
Std Deviation 0.05352798 Variance 0.00286524
Skewness . Kurtosis .
Uncorrected SS 1.71541049 Corrected SS 0.00286524
Coeff Variation 5.78462023 Std Error Mean 0.03785

Basic Measures of Location and Variability

Basic Statistical Measures
Location Variability
Mean 0.925350 Std Deviation 0.05353
Median 0.925350 Variance 0.00287
Mode . Range 0.07570
    Interquartile Range 0.07570

Tests For Location

Tests for Location: Mu0=0
Test Statistic p Value
Student's t t 24.44782 Pr > |t| 0.0260
Sign M 1 Pr >= |M| 0.5000
Signed Rank S 1.5 Pr >= |S| 0.5000

Tests For Normality

Tests for Normality
Test Statistic p Value
Shapiro-Wilk W 1 Pr < W 1.0000
Kolmogorov-Smirnov D 0.26025 Pr > D >0.1500
Cramer-von Mises W-Sq 0.041877 Pr > W-Sq >0.2500
Anderson-Darling A-Sq 0.250482 Pr > A-Sq 0.2332

Quantiles

Quantiles (Definition 5)
Quantile Estimate
100% Max 0.96320
99% 0.96320
95% 0.96320
90% 0.96320
75% Q3 0.96320
50% Median 0.92535
25% Q1 0.88750
10% 0.88750
5% 0.88750
1% 0.88750
0% Min 0.88750

Extreme Observations

Extreme Observations
Lowest Highest
Value Obs Value Obs
0.8875 2 0.8875 2
0.9632 1 0.9632 1

Plots for R2_noint


The SAS System

The UNIVARIATE Procedure

Variable: R2_diff_int

R2_diff_int

Moments

Moments
N 2 Sum Weights 2
Mean 0.0485 Sum Observations 0.097
Std Deviation 0.03903229 Variance 0.00152352
Skewness . Kurtosis .
Uncorrected SS 0.00622802 Corrected SS 0.00152352
Coeff Variation 80.4789574 Std Error Mean 0.0276

Basic Measures of Location and Variability

Basic Statistical Measures
Location Variability
Mean 0.048500 Std Deviation 0.03903
Median 0.048500 Variance 0.00152
Mode . Range 0.05520
    Interquartile Range 0.05520

Tests For Location

Tests for Location: Mu0=0
Test Statistic p Value
Student's t t 1.757246 Pr > |t| 0.3294
Sign M 1 Pr >= |M| 0.5000
Signed Rank S 1.5 Pr >= |S| 0.5000

Tests For Normality

Tests for Normality
Test Statistic p Value
Shapiro-Wilk W 1 Pr < W 1.0000
Kolmogorov-Smirnov D 0.26025 Pr > D >0.1500
Cramer-von Mises W-Sq 0.041877 Pr > W-Sq >0.2500
Anderson-Darling A-Sq 0.250482 Pr > A-Sq 0.2332

Quantiles

Quantiles (Definition 5)
Quantile Estimate
100% Max 0.0761
99% 0.0761
95% 0.0761
90% 0.0761
75% Q3 0.0761
50% Median 0.0485
25% Q1 0.0209
10% 0.0209
5% 0.0209
1% 0.0209
0% Min 0.0209

Extreme Observations

Extreme Observations
Lowest Highest
Value Obs Value Obs
0.0209 1 0.0209 1
0.0761 2 0.0761 2

Plots for R2_diff_int


The SAS System

The UNIVARIATE Procedure

Variable: R2_diff_trans

R2_diff_trans

Moments

Moments
N 2 Sum Weights 2
Mean 0.01915 Sum Observations 0.0383
Std Deviation 0.0277893 Variance 0.00077225
Skewness . Kurtosis .
Uncorrected SS 0.00150569 Corrected SS 0.00077225
Coeff Variation 145.11382 Std Error Mean 0.01965

Basic Measures of Location and Variability

Basic Statistical Measures
Location Variability
Mean 0.019150 Std Deviation 0.02779
Median 0.019150 Variance 0.0007722
Mode . Range 0.03930
    Interquartile Range 0.03930

Tests For Location

Tests for Location: Mu0=0
Test Statistic p Value
Student's t t 0.974555 Pr > |t| 0.5082
Sign M 0 Pr >= |M| 1.0000
Signed Rank S 0.5 Pr >= |S| 1.0000

Tests For Normality

Tests for Normality
Test Statistic p Value
Shapiro-Wilk W 1 Pr < W 1.0000
Kolmogorov-Smirnov D 0.26025 Pr > D >0.1500
Cramer-von Mises W-Sq 0.041877 Pr > W-Sq >0.2500
Anderson-Darling A-Sq 0.250482 Pr > A-Sq 0.2332

Quantiles

Quantiles (Definition 5)
Quantile Estimate
100% Max 0.03880
99% 0.03880
95% 0.03880
90% 0.03880
75% Q3 0.03880
50% Median 0.01915
25% Q1 -0.00050
10% -0.00050
5% -0.00050
1% -0.00050
0% Min -0.00050

Extreme Observations

Extreme Observations
Lowest Highest
Value Obs Value Obs
-0.0005 1 -0.0005 1
0.0388 2 0.0388 2

Plots for R2_diff_trans


The SAS System

The UNIVARIATE Procedure

Variable: R2_cis

R2_cis

Moments

Moments
N 2 Sum Weights 2
Mean 0.9062 Sum Observations 1.8124
Std Deviation 0.08131728 Variance 0.0066125
Skewness . Kurtosis .
Uncorrected SS 1.64900938 Corrected SS 0.0066125
Coeff Variation 8.97343631 Std Error Mean 0.0575

Basic Measures of Location and Variability

Basic Statistical Measures
Location Variability
Mean 0.906200 Std Deviation 0.08132
Median 0.906200 Variance 0.00661
Mode . Range 0.11500
    Interquartile Range 0.11500

Tests For Location

Tests for Location: Mu0=0
Test Statistic p Value
Student's t t 15.76 Pr > |t| 0.0403
Sign M 1 Pr >= |M| 0.5000
Signed Rank S 1.5 Pr >= |S| 0.5000

Tests For Normality

Tests for Normality
Test Statistic p Value
Shapiro-Wilk W 1 Pr < W 1.0000
Kolmogorov-Smirnov D 0.26025 Pr > D >0.1500
Cramer-von Mises W-Sq 0.041877 Pr > W-Sq >0.2500
Anderson-Darling A-Sq 0.250482 Pr > A-Sq 0.2332

Quantiles

Quantiles (Definition 5)
Quantile Estimate
100% Max 0.9637
99% 0.9637
95% 0.9637
90% 0.9637
75% Q3 0.9637
50% Median 0.9062
25% Q1 0.8487
10% 0.8487
5% 0.8487
1% 0.8487
0% Min 0.8487

Extreme Observations

Extreme Observations
Lowest Highest
Value Obs Value Obs
0.8487 2 0.8487 2
0.9637 1 0.9637 1

Plots for R2_cis

Create data set

Now that I have selected a model, I need to make a dataset with significance flags. While Lauren made a dataset with for comparing model fit, I need to make my own significance flags for the coefficients using the probt values. I am going to go ahead and merge mated and virgin side-by-side.

WORK.merge_sig


In [12]:
*proc print data=CEGS.ai_reg_parms_full (obs=10); run;


Out[12]:

98   ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
99
100 *proc print data=CEGS.ai_reg_parms_full (obs=10); run;
101 ods html5 close;ods listing;

102

Mated Param Flags


In [13]:
data sig_cis;
    set CEGS.ai_reg_parms_full;
    where Variable eq 'c_i' and mating_status eq 'M';
    if probt le 0.05 then flag_sig_cis_m = 1;
    else flag_sig_cis_m = 0;
    keep fusion_id flag_sig_cis_m;
    run;

data sig_trans;
    set CEGS.ai_reg_parms_full;
    where Variable eq 'T_i_1a' and mating_status eq 'M';
    if probt le 0.05 then flag_sig_trans_m = 1;
    else flag_sig_trans_m = 0;
    keep fusion_id flag_sig_trans_m;
    run;
   
data sig_int;
    set CEGS.ai_reg_parms_full;
    where Variable eq 'int' and mating_status eq 'M';
    if probt le 0.05 then flag_sig_int_m = 1;
    else flag_sig_int_m = 0;
    keep fusion_id flag_sig_int_m;
    run;
    
data merge_sig_m;
    merge sig_cis sig_trans sig_int;
    by fusion_id;
    run;


Out[13]:

104  ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
105
106 data sig_cis;
107 set CEGS.ai_reg_parms_full;
108 where Variable eq 'c_i' and mating_status eq 'M';
109 if probt le 0.05 then flag_sig_cis_m = 1;
110 else flag_sig_cis_m = 0;
111 keep fusion_id flag_sig_cis_m;
112 run;
NOTE: There were 880 observations read from the data set CEGS.AI_REG_PARMS_FULL.
WHERE (Variable='c_i') and (mating_status='M');
NOTE: The data set WORK.SIG_CIS has 880 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.13 seconds
cpu time 0.01 seconds

113
114 data sig_trans;
115 set CEGS.ai_reg_parms_full;
116 where Variable eq 'T_i_1a' and mating_status eq 'M';
117 if probt le 0.05 then flag_sig_trans_m = 1;
118 else flag_sig_trans_m = 0;
119 keep fusion_id flag_sig_trans_m;
120 run;
NOTE: There were 880 observations read from the data set CEGS.AI_REG_PARMS_FULL.
WHERE (Variable='T_i_1a') and (mating_status='M');
NOTE: The data set WORK.SIG_TRANS has 880 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.13 seconds
cpu time 0.00 seconds

121
122 data sig_int;
123 set CEGS.ai_reg_parms_full;
124 where Variable eq 'int' and mating_status eq 'M';
125 if probt le 0.05 then flag_sig_int_m = 1;
126 else flag_sig_int_m = 0;
127 keep fusion_id flag_sig_int_m;
128 run;
NOTE: There were 880 observations read from the data set CEGS.AI_REG_PARMS_FULL.
WHERE (Variable='int') and (mating_status='M');
NOTE: The data set WORK.SIG_INT has 880 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.12 seconds
cpu time 0.00 seconds

129
130 data merge_sig_m;
131 merge sig_cis sig_trans sig_int;
132 by fusion_id;
133 run;
NOTE: There were 880 observations read from the data set WORK.SIG_CIS.
NOTE: There were 880 observations read from the data set WORK.SIG_TRANS.
NOTE: There were 880 observations read from the data set WORK.SIG_INT.
NOTE: The data set WORK.MERGE_SIG_M has 880 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds

134
135 ods html5 close;ods listing;

136

Virgin Param Flags


In [14]:
data sig_cis;
    set CEGS.ai_reg_parms_full;
    where Variable eq 'c_i' and mating_status eq 'V';
    if probt le 0.05 then flag_sig_cis_v = 1;
    else flag_sig_cis_v = 0;
    keep fusion_id flag_sig_cis_v;
    run;

data sig_trans;
    set CEGS.ai_reg_parms_full;
    where Variable eq 'T_i_1a' and mating_status eq 'V';
    if probt le 0.05 then flag_sig_trans_v = 1;
    else flag_sig_trans_v = 0;
    keep fusion_id flag_sig_trans_v;
    run;
   
data sig_int;
    set CEGS.ai_reg_parms_full;
    where Variable eq 'int' and mating_status eq 'V';
    if probt le 0.05 then flag_sig_int_v = 1;
    else flag_sig_int_v = 0;
    keep fusion_id flag_sig_int_v;
    run;
    
data merge_sig_v;
    merge sig_cis sig_trans sig_int;
    by fusion_id;
    run;


Out[14]:

138  ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
139
140 data sig_cis;
141 set CEGS.ai_reg_parms_full;
142 where Variable eq 'c_i' and mating_status eq 'V';
143 if probt le 0.05 then flag_sig_cis_v = 1;
144 else flag_sig_cis_v = 0;
145 keep fusion_id flag_sig_cis_v;
146 run;
NOTE: There were 880 observations read from the data set CEGS.AI_REG_PARMS_FULL.
WHERE (Variable='c_i') and (mating_status='V');
NOTE: The data set WORK.SIG_CIS has 880 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.13 seconds
cpu time 0.00 seconds

147
148 data sig_trans;
149 set CEGS.ai_reg_parms_full;
150 where Variable eq 'T_i_1a' and mating_status eq 'V';
151 if probt le 0.05 then flag_sig_trans_v = 1;
152 else flag_sig_trans_v = 0;
153 keep fusion_id flag_sig_trans_v;
154 run;
NOTE: There were 880 observations read from the data set CEGS.AI_REG_PARMS_FULL.
WHERE (Variable='T_i_1a') and (mating_status='V');
NOTE: The data set WORK.SIG_TRANS has 880 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.12 seconds
cpu time 0.01 seconds

155
156 data sig_int;
157 set CEGS.ai_reg_parms_full;
158 where Variable eq 'int' and mating_status eq 'V';
159 if probt le 0.05 then flag_sig_int_v = 1;
160 else flag_sig_int_v = 0;
161 keep fusion_id flag_sig_int_v;
162 run;
NOTE: There were 880 observations read from the data set CEGS.AI_REG_PARMS_FULL.
WHERE (Variable='int') and (mating_status='V');
NOTE: The data set WORK.SIG_INT has 880 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.13 seconds
cpu time 0.00 seconds

163
164 data merge_sig_v;
165 merge sig_cis sig_trans sig_int;
166 by fusion_id;
167 run;
NOTE: There were 880 observations read from the data set WORK.SIG_CIS.
NOTE: There were 880 observations read from the data set WORK.SIG_TRANS.
NOTE: There were 880 observations read from the data set WORK.SIG_INT.
NOTE: The data set WORK.MERGE_SIG_V has 880 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds

168 ods html5 close;ods listing;

169

Merge Mated and Virgin Param Flags


In [15]:
data merge_sig;
    merge merge_sig_m merge_sig_v;
    by fusion_id;
    run;
    
proc sort data=merge_sig; by fusion_id; run;


Out[15]:

171  ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
172
173 data merge_sig;
174 merge merge_sig_m merge_sig_v;
175 by fusion_id;
176 run;
NOTE: There were 880 observations read from the data set WORK.MERGE_SIG_M.
NOTE: There were 880 observations read from the data set WORK.MERGE_SIG_V.
NOTE: The data set WORK.MERGE_SIG has 880 observations and 7 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds

177
178 proc sort data=merge_sig; by fusion_id; run;
NOTE: There were 880 observations read from the data set WORK.MERGE_SIG.
NOTE: The data set WORK.MERGE_SIG has 880 observations and 7 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds

179 ods html5 close;ods listing;

180

In [16]:
proc print data=merge_sig (obs=10); run;


Out[16]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set WORK.MERGE_SIG

Obs fusion_id flag_sig_cis_m flag_sig_trans_m flag_sig_int_m flag_sig_cis_v flag_sig_trans_v flag_sig_int_v
1 F10005_SI 1 0 1 1 1 1
2 F10060_SI 1 1 1 1 1 1
3 F10136_SI 1 1 0 1 0 0
4 F10268_SI 1 0 1 1 0 1
5 F10317_SI 1 1 0 1 1 1
6 F10466_SI 1 1 0 1 1 0
7 F10806_SI 1 1 1 1 1 1
8 F1101_SI 1 1 1 1 0 1
9 F11767_SI 1 1 1 1 0 1
10 F11773_SI 1 0 1 1 0 1

Freqs of Param Flags to check counts

Mated Freqs of cis, trans, int significance


In [17]:
proc freq data=merge_sig;
tables flag_sig_cis_m;
run;

proc freq data=merge_sig;
tables flag_sig_trans_m;
run;

proc freq data=merge_sig;
tables flag_sig_int_m;
run;


Out[17]:
SAS Output

SAS Output

The SAS System

The FREQ Procedure

The FREQ Procedure

Table flag_sig_cis_m

One-Way Frequencies

flag_sig_cis_m Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 41 4.66 41 4.66
1 839 95.34 880 100.00

The SAS System

The FREQ Procedure

The FREQ Procedure

Table flag_sig_trans_m

One-Way Frequencies

flag_sig_trans_m Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 385 43.75 385 43.75
1 495 56.25 880 100.00

The SAS System

The FREQ Procedure

The FREQ Procedure

Table flag_sig_int_m

One-Way Frequencies

flag_sig_int_m Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 168 19.09 168 19.09
1 712 80.91 880 100.00

Virgin Freqs of cis, trans, int significance


In [18]:
proc freq data=merge_sig;
tables flag_sig_cis_v;
run;

proc freq data=merge_sig;
tables flag_sig_trans_v;
run;

proc freq data=merge_sig;
tables flag_sig_int_v;
run;


Out[18]:
SAS Output

SAS Output

The SAS System

The FREQ Procedure

The FREQ Procedure

Table flag_sig_cis_v

One-Way Frequencies

flag_sig_cis_v Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 40 4.55 40 4.55
1 840 95.45 880 100.00

The SAS System

The FREQ Procedure

The FREQ Procedure

Table flag_sig_trans_v

One-Way Frequencies

flag_sig_trans_v Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 356 40.45 356 40.45
1 524 59.55 880 100.00

The SAS System

The FREQ Procedure

The FREQ Procedure

Table flag_sig_int_v

One-Way Frequencies

flag_sig_int_v Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 164 18.64 164 18.64
1 716 81.36 880 100.00

Calculate Percentages of Genotypes with Flags

I need to summarize across the population some how. The easiest way is just to calculate the percentage of lines with a significant (P\<0.05) for cis, trans, and interaction terms.

WORK.pct_ai

Mated Percent AI


In [19]:
/* Mated */
data mated;
    set CEGS.clean_ase_sbs;
    keep line fusion_id flag_ai_combined_m;
    run;

proc means data=mated noprint;
    by fusion_id;
    output out=sum sum(flag_ai_combined_m)=sum_ai;
    run;

data m_freq_ai;
    set sum;
    if _FREQ_ gt 0 then m_pct_ai = sum_ai / _FREQ_ * 100;
    else m_pct_ai = 0;
    keep fusion_id m_pct_ai;
    run;


Out[19]:

220  ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
221
222 /* Mated */
223 data mated;
224 set CEGS.clean_ase_sbs;
225 keep line fusion_id flag_ai_combined_m;
226 run;
NOTE: There were 79967 observations read from the data set CEGS.CLEAN_ASE_SBS.
NOTE: The data set WORK.MATED has 79967 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.61 seconds
cpu time 0.03 seconds

227
228 proc means data=mated noprint;
229 by fusion_id;
230 output out=sum sum(flag_ai_combined_m)=sum_ai;
231 run;
NOTE: There were 79967 observations read from the data set WORK.MATED.
NOTE: The data set WORK.SUM has 5391 observations and 4 variables.
NOTE: PROCEDURE MEANS used (Total process time):
real time 0.02 seconds
cpu time 0.02 seconds

232
233 data m_freq_ai;
234 set sum;
235 if _FREQ_ gt 0 then m_pct_ai = sum_ai / _FREQ_ * 100;
236 else m_pct_ai = 0;
237 keep fusion_id m_pct_ai;
238 run;
NOTE: There were 5391 observations read from the data set WORK.SUM.
NOTE: The data set WORK.M_FREQ_AI has 5391 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds

239 ods html5 close;ods listing;

240

Virgin Percent AI


In [20]:
/* Virgin */
data virgin;
    set CEGS.clean_ase_sbs;
    keep line fusion_id flag_ai_combined_v;
    run;

proc means data=virgin noprint;
    by fusion_id;
    output out=sum sum(flag_ai_combined_v)=sum_ai;
    run;

data v_freq_ai;
    set sum;
    if _FREQ_ gt 0 then v_pct_ai = sum_ai / _FREQ_ * 100;
    else v_pct_ai = 0;
    keep fusion_id v_pct_ai;
    run;


Out[20]:

242  ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
243
244 /* Virgin */
245 data virgin;
246 set CEGS.clean_ase_sbs;
247 keep line fusion_id flag_ai_combined_v;
248 run;
NOTE: There were 79967 observations read from the data set CEGS.CLEAN_ASE_SBS.
NOTE: The data set WORK.VIRGIN has 79967 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.54 seconds
cpu time 0.03 seconds

249
250 proc means data=virgin noprint;
251 by fusion_id;
252 output out=sum sum(flag_ai_combined_v)=sum_ai;
253 run;
NOTE: There were 79967 observations read from the data set WORK.VIRGIN.
NOTE: The data set WORK.SUM has 5391 observations and 4 variables.
NOTE: PROCEDURE MEANS used (Total process time):
real time 0.02 seconds
cpu time 0.03 seconds

254
255 data v_freq_ai;
256 set sum;
257 if _FREQ_ gt 0 then v_pct_ai = sum_ai / _FREQ_ * 100;
258 else v_pct_ai = 0;
259 keep fusion_id v_pct_ai;
260 run;
NOTE: There were 5391 observations read from the data set WORK.SUM.
NOTE: The data set WORK.V_FREQ_AI has 5391 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

261 ods html5 close;ods listing;

262

Merge Mated and Virgin Percent AI


In [21]:
/* Merge */
data pct_ai;
    merge m_freq_ai v_freq_ai;
    by fusion_id;
    run;

proc print data=pct_ai (obs=10); run;


Out[21]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set WORK.PCT_AI

Obs fusion_id m_pct_ai v_pct_ai
1 F10001_SI 0.0000 20.0000
2 F10005_SI 16.6667 16.6667
3 F10009_SI 5.5556 5.5556
4 F10059_SI 7.1429 7.1429
5 F10060_SI 31.0345 31.0345
6 F10072_SI 25.0000 25.0000
7 F10136_SI 29.1667 16.6667
8 F10137_SI 18.1818 22.7273
9 F10147_SI 0.0000 0.0000
10 F10253_SI 21.4286 21.4286

Merge pct_ai and merge_sig

Merge together my estimate of ai and my flags for different mdoel parameters.


In [22]:
data pct_ai_model;
    merge pct_ai merge_sig;
    by fusion_id;
    if m_pct_ai eq '.' then m_pct_ai = 0;
    if v_pct_ai eq '.' then v_pct_ai = 0;
    
    if flag_sig_cis_m eq '.' then flag_sig_cis_m = 0;
    if flag_sig_trans_m eq '.' then flag_sig_trans_m = 0;
    if flag_sig_int_m eq '.' then flag_sig_int_m = 0;
    
    if flag_sig_cis_v eq '.' then flag_sig_cis_v = 0;
    if flag_sig_trans_v eq '.' then flag_sig_trans_v = 0;
    if flag_sig_int_v eq '.' then flag_sig_int_v = 0;
    run;


Out[22]:

276  ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
277
278 data pct_ai_model;
279 merge pct_ai merge_sig;
280 by fusion_id;
281 if m_pct_ai eq '.' then m_pct_ai = 0;
282 if v_pct_ai eq '.' then v_pct_ai = 0;
283
284 if flag_sig_cis_m eq '.' then flag_sig_cis_m = 0;
285 if flag_sig_trans_m eq '.' then flag_sig_trans_m = 0;
286 if flag_sig_int_m eq '.' then flag_sig_int_m = 0;
287
288 if flag_sig_cis_v eq '.' then flag_sig_cis_v = 0;
289 if flag_sig_trans_v eq '.' then flag_sig_trans_v = 0;
290 if flag_sig_int_v eq '.' then flag_sig_int_v = 0;
291 run;
NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
281:20 282:20 284:26 285:28 286:26 288:26 289:28 290:26
NOTE: There were 5391 observations read from the data set WORK.PCT_AI.
NOTE: There were 880 observations read from the data set WORK.MERGE_SIG.
NOTE: The data set WORK.PCT_AI_MODEL has 5391 observations and 9 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.02 seconds

292 ods html5 close;ods listing;

293

In [23]:
proc print data=pct_ai_model(obs=10); run;


Out[23]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set WORK.PCT_AI_MODEL

Obs fusion_id m_pct_ai v_pct_ai flag_sig_cis_m flag_sig_trans_m flag_sig_int_m flag_sig_cis_v flag_sig_trans_v flag_sig_int_v
1 F10001_SI 0.0000 20.0000 0 0 0 0 0 0
2 F10005_SI 16.6667 16.6667 1 0 1 1 1 1
3 F10009_SI 5.5556 5.5556 0 0 0 0 0 0
4 F10059_SI 7.1429 7.1429 0 0 0 0 0 0
5 F10060_SI 31.0345 31.0345 1 1 1 1 1 1
6 F10072_SI 25.0000 25.0000 0 0 0 0 0 0
7 F10136_SI 29.1667 16.6667 1 1 0 1 0 0
8 F10137_SI 18.1818 22.7273 0 0 0 0 0 0
9 F10147_SI 0.0000 0.0000 0 0 0 0 0 0
10 F10253_SI 21.4286 21.4286 0 0 0 0 0 0

Summarize to Gene

I order to do any kind of enrichment tests I need to summarize to the gene level. For simplicity I am ignoring the 428 (of 5391) fusions in my dataset that are multi-gene fusions. I am taking the mean across fusions.

WORK.means


In [24]:
data genes;
    set DMEL.FB551_SI_FUSIONS_UNIQUE_FLAGGED;
    keep fusion_id FBgn_cat symbol_cat genes_per_fusion;
    run;
    
proc sort data=genes;
    by fusion_id;
    run;    

proc print data=genes(obs=10);run;


Out[24]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set WORK.GENES

Obs fusion_id Genes_per_fusion symbol_cat FBgn_cat
1 F10001_SI 2 Catsup|Ttc19 FBgn0002022|FBgn0032744
2 F10005_SI 1 Acn FBgn0263198
3 F10009_SI 1 Acn FBgn0263198
4 F10012_SI 2 CG10470|l(2)37Bb FBgn0002021|FBgn0032746
5 F10014_SI 1 Rpn3 FBgn0261396
6 F10018_SI 1 CG10492 FBgn0032748
7 F10019_SI 1 Phlpp FBgn0032749
8 F10032_SI 2 CG10702|CG17343 FBgn0032751|FBgn0032752
9 F10049_SI 2 CG17344|CG43731 FBgn0032755|FBgn0263982
10 F10050_SI 2 CG17344|CG43731 FBgn0032755|FBgn0263982

In [25]:
data mg;
    merge pct_ai_model(in=in1) genes(in=in2);
    by fusion_id;
    if in1;
    run;


Out[25]:

316  ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
317
318 data mg;
319 merge pct_ai_model(in=in1) genes(in=in2);
320 by fusion_id;
321 if in1;
322 run;
NOTE: There were 5391 observations read from the data set WORK.PCT_AI_MODEL.
NOTE: There were 63706 observations read from the data set WORK.GENES.
NOTE: The data set WORK.MG has 5391 observations and 12 variables.
NOTE: DATA statement used (Total process time):
real time 0.10 seconds
cpu time 0.11 seconds

323 ods html5 close;ods listing;

324

In [26]:
proc print data=mg(obs=10); run;


Out[26]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set WORK.MG

Obs fusion_id m_pct_ai v_pct_ai flag_sig_cis_m flag_sig_trans_m flag_sig_int_m flag_sig_cis_v flag_sig_trans_v flag_sig_int_v Genes_per_fusion symbol_cat FBgn_cat
1 F10001_SI 0.0000 20.0000 0 0 0 0 0 0 2 Catsup|Ttc19 FBgn0002022|FBgn0032744
2 F10005_SI 16.6667 16.6667 1 0 1 1 1 1 1 Acn FBgn0263198
3 F10009_SI 5.5556 5.5556 0 0 0 0 0 0 1 Acn FBgn0263198
4 F10059_SI 7.1429 7.1429 0 0 0 0 0 0 2 CG10561|Ddc FBgn0000422|FBgn0002036
5 F10060_SI 31.0345 31.0345 1 1 1 1 1 1 1 Ddc FBgn0000422
6 F10072_SI 25.0000 25.0000 0 0 0 0 0 0 1 Aats-asn FBgn0086443
7 F10136_SI 29.1667 16.6667 1 1 0 1 0 0 1 fon FBgn0032773
8 F10137_SI 18.1818 22.7273 0 0 0 0 0 0 1 fon FBgn0032773
9 F10147_SI 0.0000 0.0000 0 0 0 0 0 0 1 CG17549 FBgn0032774
10 F10253_SI 21.4286 21.4286 0 0 0 0 0 0 1 CG10186 FBgn0032797

In [27]:
data noMulitGene;
    set mg;
    where genes_per_fusion eq 1;
    run;


Out[27]:

332  ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
333
334 data noMulitGene;
335 set mg;
336 where genes_per_fusion eq 1;
337 run;
NOTE: There were 4963 observations read from the data set WORK.MG.
WHERE genes_per_fusion=1;
NOTE: The data set WORK.NOMULITGENE has 4963 observations and 12 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds

338 ods html5 close;ods listing;

339

In [28]:
proc print data=noMulitGene (obs=10); run;


Out[28]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set WORK.NOMULITGENE

Obs fusion_id m_pct_ai v_pct_ai flag_sig_cis_m flag_sig_trans_m flag_sig_int_m flag_sig_cis_v flag_sig_trans_v flag_sig_int_v Genes_per_fusion symbol_cat FBgn_cat
1 F10005_SI 16.6667 16.6667 1 0 1 1 1 1 1 Acn FBgn0263198
2 F10009_SI 5.5556 5.5556 0 0 0 0 0 0 1 Acn FBgn0263198
3 F10060_SI 31.0345 31.0345 1 1 1 1 1 1 1 Ddc FBgn0000422
4 F10072_SI 25.0000 25.0000 0 0 0 0 0 0 1 Aats-asn FBgn0086443
5 F10136_SI 29.1667 16.6667 1 1 0 1 0 0 1 fon FBgn0032773
6 F10137_SI 18.1818 22.7273 0 0 0 0 0 0 1 fon FBgn0032773
7 F10147_SI 0.0000 0.0000 0 0 0 0 0 0 1 CG17549 FBgn0032774
8 F10253_SI 21.4286 21.4286 0 0 0 0 0 0 1 CG10186 FBgn0032797
9 F10259_SI 10.0000 0.0000 0 0 0 0 0 0 1 CG10186 FBgn0032797
10 F10268_SI 9.5238 21.4286 1 0 1 1 0 1 1 CG10186 FBgn0032797

In [29]:
proc sort data=noMulitGene;
    by FBgn_cat;
    run;

proc means data=noMulitGene noprint;
    by  FBgn_cat;
    output out=means 
    mean(m_pct_ai)=m_pct_ai_bar 
    mean(v_pct_ai)=v_pct_ai_bar

    sum(flag_sig_cis_m)=flag_sig_cis_m_sum
    sum(flag_sig_trans_m)=flag_sig_trans_m_sum
    sum(flag_sig_int_m)=flag_sig_int_m_sum

    sum(flag_sig_cis_v)=flag_sig_cis_v_sum
    sum(flag_sig_trans_v)=flag_sig_trans_v_sum
    sum(flag_sig_int_v)=flag_sig_int_v_sum
    ;

    run;
    
data means;
    set means;
    if flag_sig_cis_m_sum > 0 then flag_sig_cis_m_sum = 1;
    if flag_sig_trans_m_sum > 0 then flag_sig_trans_m_sum = 1;
    if flag_sig_int_m_sum > 0 then flag_sig_int_m_sum = 1;
    if flag_sig_cis_v_sum > 0 then flag_sig_cis_v_sum = 1;
    if flag_sig_trans_v_sum > 0 then flag_sig_trans_v_sum = 1;
    if flag_sig_int_v_sum > 0 then flag_sig_int_v_sum = 1;
    run;


Out[29]:

347  ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
348
349 proc sort data=noMulitGene;
350 by FBgn_cat;
351 run;
NOTE: There were 4963 observations read from the data set WORK.NOMULITGENE.
NOTE: The data set WORK.NOMULITGENE has 4963 observations and 12 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.03 seconds
cpu time 0.04 seconds

352
353 proc means data=noMulitGene noprint;
354 by FBgn_cat;
355 output out=means
356 mean(m_pct_ai)=m_pct_ai_bar
357 mean(v_pct_ai)=v_pct_ai_bar
358
359 sum(flag_sig_cis_m)=flag_sig_cis_m_sum
360 sum(flag_sig_trans_m)=flag_sig_trans_m_sum
361 sum(flag_sig_int_m)=flag_sig_int_m_sum
362
363 sum(flag_sig_cis_v)=flag_sig_cis_v_sum
364 sum(flag_sig_trans_v)=flag_sig_trans_v_sum
365 sum(flag_sig_int_v)=flag_sig_int_v_sum
366 ;
367
368 run;
NOTE: There were 4963 observations read from the data set WORK.NOMULITGENE.
NOTE: The data set WORK.MEANS has 2291 observations and 11 variables.
NOTE: PROCEDURE MEANS used (Total process time):
real time 0.03 seconds
cpu time 0.03 seconds

369
370 data means;
371 set means;
372 if flag_sig_cis_m_sum > 0 then flag_sig_cis_m_sum = 1;
373 if flag_sig_trans_m_sum > 0 then flag_sig_trans_m_sum = 1;
374 if flag_sig_int_m_sum > 0 then flag_sig_int_m_sum = 1;
375 if flag_sig_cis_v_sum > 0 then flag_sig_cis_v_sum = 1;
376 if flag_sig_trans_v_sum > 0 then flag_sig_trans_v_sum = 1;
377 if flag_sig_int_v_sum > 0 then flag_sig_int_v_sum = 1;
378 run;
NOTE: There were 2291 observations read from the data set WORK.MEANS.
NOTE: The data set WORK.MEANS has 2291 observations and 11 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds

379 ods html5 close;ods listing;

380

In [30]:
proc print data=means (obs=10); run;


Out[30]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set WORK.MEANS

Obs FBgn_cat _TYPE_ _FREQ_ m_pct_ai_bar v_pct_ai_bar flag_sig_cis_m_sum flag_sig_trans_m_sum flag_sig_int_m_sum flag_sig_cis_v_sum flag_sig_trans_v_sum flag_sig_int_v_sum
1 FBgn0000024 0 2 43.6508 49.2063 0 0 0 0 0 0
2 FBgn0000038 0 4 15.9926 14.8581 1 1 1 1 1 1
3 FBgn0000042 0 1 17.8571 25.0000 1 1 1 1 1 1
4 FBgn0000044 0 1 12.5000 50.0000 0 0 0 0 0 0
5 FBgn0000046 0 1 10.0000 15.0000 0 0 0 0 0 0
6 FBgn0000052 0 1 33.3333 22.2222 0 0 0 0 0 0
7 FBgn0000053 0 5 11.5098 12.6863 0 0 0 0 0 0
8 FBgn0000064 0 2 51.6327 50.2381 1 1 1 1 1 1
9 FBgn0000108 0 3 13.9971 16.5570 1 1 1 1 0 1
10 FBgn0000114 0 1 0.0000 0.0000 0 0 0 0 0 0

In [31]:
proc print data=noMulitGene(where=(FBgn_cat eq 'FBgn0000064')); run;


Out[31]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set WORK.NOMULITGENE

Obs fusion_id m_pct_ai v_pct_ai flag_sig_cis_m flag_sig_trans_m flag_sig_int_m flag_sig_cis_v flag_sig_trans_v flag_sig_int_v Genes_per_fusion symbol_cat FBgn_cat
16 F60648_SI 40.0000 43.3333 1 0 1 1 0 1 1 Ald FBgn0000064
17 F60652_SI 63.2653 57.1429 1 1 1 1 1 1 1 Ald FBgn0000064

Import Transcription Factor Gene List

Clone github repository that I created with some transcription factor gene lists.

WORK.TF2


In [32]:
%%shell
cd /home/jfear/devel
git clone https://github.com/Oliver-Lab/genelists.git


fatal: destination path 'genelists' already exists and is not an empty directory.
/home/jfear/devel
fatal: destination path 'genelists' already exists and is not an empty directory.

Import gene list


In [33]:
proc import datafile='!HOME/devel/genelists/transcription_factors/Rhee_2014/genesList' out=tf dbms=csv replace;
getnames=no;
run;


Out[33]:

394  ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
395
396 proc import datafile='!HOME/devel/genelists/transcription_factors/Rhee_2014/genesList' out=tf dbms=csv replace;
397 getnames=no;
398 run;
399 /**********************************************************************
400 * PRODUCT: SAS
401 * VERSION: 9.4
402 * CREATOR: External File Interface
403 * DATE: 29APR16
404 * DESC: Generated SAS Datastep Code
405 * TEMPLATE SOURCE: (None Specified.)
406 ***********************************************************************/
407 data WORK.TF ;
408 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
409 infile '!HOME/devel/genelists/transcription_factors/Rhee_2014/genesList' delimiter = ',' MISSOVER DSD lrecl=32767 ;
410 informat VAR1 $11. ;
411 format VAR1 $11. ;
412 input
413 VAR1 $
414 ;
415 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
416 run;
NOTE: The infile '!HOME/devel/genelists/transcription_factors/Rhee_2014/genesList' is:
Filename=/home/jfear/devel/genelists/transcription_factors/Rhee_2014/genesList,
Owner Name=jfear,Group Name=jfear,
Access Permission=-rw-rw-r--,
Last Modified=Fri Apr 29 07:42:57 2016,
File Size (bytes)=13013

NOTE: 1001 records were read from the infile '!HOME/devel/genelists/transcription_factors/Rhee_2014/genesList'.
The minimum record length was 11.
The maximum record length was 11.
NOTE: The data set WORK.TF has 1001 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds

1001 rows created in WORK.TF from !HOME/devel/genelists/transcription_factors/Rhee_2014/genesList.



NOTE: WORK.TF data set was successfully created.
NOTE: The data set WORK.TF has 1001 observations and 1 variables.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.06 seconds
cpu time 0.06 seconds

417 ods html5 close;ods listing;

418

In [34]:
proc print data=tf (obs=10); run;


Out[34]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set WORK.TF

Obs VAR1
1 FBgn0000014
2 FBgn0000015
3 FBgn0000018
4 FBgn0000022
5 FBgn0000028
6 FBgn0000054
7 FBgn0000061
8 FBgn0000097
9 FBgn0000099
10 FBgn0000137

Make sure FBgns match current annotation

I am not sure what FlyBase version these FBgn number are. I want to make sure that they match with the current annotation, so I will try merging to the full gene list.


In [35]:
proc sort data=tf;
    by VAR1;
    run;


Out[35]:

426  ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
427
428 proc sort data=tf;
429 by VAR1;
430 run;
NOTE: There were 1001 observations read from the data set WORK.TF.
NOTE: The data set WORK.TF has 1001 observations and 1 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

431 ods html5 close;ods listing;

432

In [36]:
data FBgns;
    set DMEL.FBGN2COORD;
    keep primary_fbgn;
    run;

proc sort data=FBgns nodups;
    by primary_FBgn;
    run;


Out[36]:

434  ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
435
436 data FBgns;
437 set DMEL.FBGN2COORD;
438 keep primary_fbgn;
439 run;
NOTE: There were 16379 observations read from the data set DMEL.FBGN2COORD.
NOTE: The data set WORK.FBGNS has 16379 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.15 seconds
cpu time 0.01 seconds

440
441 proc sort data=FBgns nodups;
442 by primary_FBgn;
443 run;
NOTE: There were 16379 observations read from the data set WORK.FBGNS.
NOTE: 0 duplicate observations were deleted.
NOTE: The data set WORK.FBGNS has 16379 observations and 1 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds

444 ods html5 close;ods listing;

445

In [37]:
data mgFbgn_Test;
    merge FBgns (in=in1) tf (in=in2 rename=(VAR1=primary_fbgn));
    by primary_fbgn;
    if in2 and not in1;
run;
proc print data=mgFbgn_Test; run;


Out[37]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set WORK.MGFBGN_TEST

Obs primary_fbgn
1 FBgn0014467
2 FBgn0083919

There are 2 FBgns that are not in my annotation.

FBgn0014467 comes up as FBgn0265784 (Dmel\CrebB)

FBgn0083919 comes up as FBgn0265991 (Dmel\Zasp52)

Check if these are in my big FBgn List.


In [38]:
proc print data=FBgns (where=(primary_fbgn eq 'FBgn0265784' or primary_fbgn eq 'FBgn0265991')); run;


Out[38]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set WORK.FBGNS

Obs primary_fbgn
16168 FBgn0265784
16366 FBgn0265991

Yes they are present, so I can just rename FBgn0014467 and FBgn0083919.


In [39]:
data tf2;
    rename VAR1 = FBgn_cat;
    set tf;
    if VAR1 eq 'FBgn0014467' then VAR1 = 'FBgn0265784';     
    if VAR1 eq 'FBgn0083919' then VAR1 = 'FBgn0265991';
    run;


Out[39]:

464  ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
465
466 data tf2;
467 rename VAR1 = FBgn_cat;
468 set tf;
469 if VAR1 eq 'FBgn0014467' then VAR1 = 'FBgn0265784';
470 if VAR1 eq 'FBgn0083919' then VAR1 = 'FBgn0265991';
471 run;
NOTE: There were 1001 observations read from the data set WORK.TF.
NOTE: The data set WORK.TF2 has 1001 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

472 ods html5 close;ods listing;

473

In [40]:
proc sort data=tf2;
by FBgn_cat;
run;

data mgFbgn_Test;
    merge FBgns (in=in1) tf2 (in=in2 rename=(FBgn_cat=primary_fbgn));
    by primary_fbgn;
    if in2 and not in1;
run;


Out[40]:

475  ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
476
477 proc sort data=tf2;
478 by FBgn_cat;
479 run;
NOTE: There were 1001 observations read from the data set WORK.TF2.
NOTE: The data set WORK.TF2 has 1001 observations and 1 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

480
481 data mgFbgn_Test;
482 merge FBgns (in=in1) tf2 (in=in2 rename=(FBgn_cat=primary_fbgn));
483 by primary_fbgn;
484 if in2 and not in1;
485 run;
NOTE: There were 16379 observations read from the data set WORK.FBGNS.
NOTE: There were 1001 observations read from the data set WORK.TF2.
NOTE: The data set WORK.MGFBGN_TEST has 0 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds

486 ods html5 close;ods listing;

487

Merge TFs to Summarized dataset

CEGS.mgTFsig


In [41]:
data CEGS.mgTFsig;
    merge TF2 (in=in1) means (in=in2);
    by FBgn_cat;
    if in1 then flag_tf = 1;
    if in2 and not in1 then flag_tf = 0;
    if in2;
    run;


Out[41]:

489  ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
490
491 data CEGS.mgTFsig;
492 merge TF2 (in=in1) means (in=in2);
493 by FBgn_cat;
494 if in1 then flag_tf = 1;
495 if in2 and not in1 then flag_tf = 0;
496 if in2;
497 run;
WARNING: Multiple lengths were specified for the BY variable FBgn_cat by input data sets. This might cause unexpected results.
NOTE: There were 1001 observations read from the data set WORK.TF2.
NOTE: There were 2291 observations read from the data set WORK.MEANS.
NOTE: The data set CEGS.MGTFSIG has 2293 observations and 12 variables.
NOTE: DATA statement used (Total process time):
real time 0.20 seconds
cpu time 0.01 seconds

498 ods html5 close;ods listing;

499

In [42]:
proc print data=CEGs.mgTFsig(obs=10); run;


Out[42]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set CEGS.MGTFSIG

Obs FBgn_cat _TYPE_ _FREQ_ m_pct_ai_bar v_pct_ai_bar flag_sig_cis_m_sum flag_sig_trans_m_sum flag_sig_int_m_sum flag_sig_cis_v_sum flag_sig_trans_v_sum flag_sig_int_v_sum flag_tf
1 FBgn0000024 0 2 43.6508 49.2063 0 0 0 0 0 0 0
2 FBgn0000038 0 4 15.9926 14.8581 1 1 1 1 1 1 0
3 FBgn0000042 0 1 17.8571 25.0000 1 1 1 1 1 1 0
4 FBgn0000044 0 1 12.5000 50.0000 0 0 0 0 0 0 0
5 FBgn0000046 0 1 10.0000 15.0000 0 0 0 0 0 0 0
6 FBgn0000052 0 1 33.3333 22.2222 0 0 0 0 0 0 0
7 FBgn0000053 0 5 11.5098 12.6863 0 0 0 0 0 0 0
8 FBgn0000064 0 2 51.6327 50.2381 1 1 1 1 1 1 0
9 FBgn0000108 0 3 13.9971 16.5570 1 1 1 1 0 1 0
10 FBgn0000114 0 1 0.0000 0.0000 0 0 0 0 0 0 0

Enrichments


In [43]:
proc freq data=CEGs.mgTFsig;
    table flag_tf*flag_sig_cis_m_sum / chisq;
    run;


Out[43]:
SAS Output

SAS Output

The SAS System

The FREQ Procedure

The FREQ Procedure

Table flag_tf * flag_sig_cis_m_sum

Cross-Tabular Freq Table

Frequency
Percent
Row Pct
Col Pct
Table of flag_tf by flag_sig_cis_m_sum
flag_tf flag_sig_cis_m_sum
0 1 Total
0
1635
71.30
75.24
94.29
538
23.46
24.76
96.24
2173
94.77
 
 
1
99
4.32
82.50
5.71
21
0.92
17.50
3.76
120
5.23
 
 
Total
1734
75.62
559
24.38
2293
100.00

Statistics for Table of flag_tf by flag_sig_cis_m_sum

Chi-Square Tests

Statistic DF Value Prob
Chi-Square 1 3.2499 0.0714
Likelihood Ratio Chi-Square 1 3.4839 0.0620
Continuity Adj. Chi-Square 1 2.8681 0.0904
Mantel-Haenszel Chi-Square 1 3.2485 0.0715
Phi Coefficient   -0.0376  
Contingency Coefficient   0.0376  
Cramer's V   -0.0376  

Fisher's Exact Test

Fisher's Exact Test
Cell (1,1) Frequency (F) 1635
Left-sided Pr <= F 0.0418
Right-sided Pr >= F 0.9753
   
Table Probability (P) 0.0171
Two-sided Pr <= P 0.0802

Sample Size = 2293


In [44]:
proc freq data=CEGs.mgTFsig;
    table flag_tf*flag_sig_cis_v_sum / chisq;
    run;


Out[44]:
SAS Output

SAS Output

The SAS System

The FREQ Procedure

The FREQ Procedure

Table flag_tf * flag_sig_cis_v_sum

Cross-Tabular Freq Table

Frequency
Percent
Row Pct
Col Pct
Table of flag_tf by flag_sig_cis_v_sum
flag_tf flag_sig_cis_v_sum
0 1 Total
0
1632
71.17
75.10
94.34
541
23.59
24.90
96.09
2173
94.77
 
 
1
98
4.27
81.67
5.66
22
0.96
18.33
3.91
120
5.23
 
 
Total
1730
75.45
563
24.55
2293
100.00

Statistics for Table of flag_tf by flag_sig_cis_v_sum

Chi-Square Tests

Statistic DF Value Prob
Chi-Square 1 2.6443 0.1039
Likelihood Ratio Chi-Square 1 2.8112 0.0936
Continuity Adj. Chi-Square 1 2.3019 0.1292
Mantel-Haenszel Chi-Square 1 2.6432 0.1040
Phi Coefficient   -0.0340  
Contingency Coefficient   0.0339  
Cramer's V   -0.0340  

Fisher's Exact Test

Fisher's Exact Test
Cell (1,1) Frequency (F) 1632
Left-sided Pr <= F 0.0615
Right-sided Pr >= F 0.9620
   
Table Probability (P) 0.0235
Two-sided Pr <= P 0.1265

Sample Size = 2293


In [45]:
proc freq data=CEGs.mgTFsig;
    table flag_tf*flag_sig_trans_m_sum / chisq;
    run;


Out[45]:
SAS Output

SAS Output

The SAS System

The FREQ Procedure

The FREQ Procedure

Table flag_tf * flag_sig_trans_m_sum

Cross-Tabular Freq Table

Frequency
Percent
Row Pct
Col Pct
Table of flag_tf by flag_sig_trans_m_sum
flag_tf flag_sig_trans_m_sum
0 1 Total
0
1817
79.24
83.62
94.49
356
15.53
16.38
96.22
2173
94.77
 
 
1
106
4.62
88.33
5.51
14
0.61
11.67
3.78
120
5.23
 
 
Total
1923
83.86
370
16.14
2293
100.00

Statistics for Table of flag_tf by flag_sig_trans_m_sum

Chi-Square Tests

Statistic DF Value Prob
Chi-Square 1 1.8692 0.1716
Likelihood Ratio Chi-Square 1 2.0239 0.1548
Continuity Adj. Chi-Square 1 1.5369 0.2151
Mantel-Haenszel Chi-Square 1 1.8684 0.1717
Phi Coefficient   -0.0286  
Contingency Coefficient   0.0285  
Cramer's V   -0.0286  

Fisher's Exact Test

Fisher's Exact Test
Cell (1,1) Frequency (F) 1817
Left-sided Pr <= F 0.1044
Right-sided Pr >= F 0.9374
   
Table Probability (P) 0.0418
Two-sided Pr <= P 0.2022

Sample Size = 2293


In [46]:
proc freq data=CEGs.mgTFsig;
    table flag_tf*flag_sig_trans_v_sum / chisq;
    run;


Out[46]:
SAS Output

SAS Output

The SAS System

The FREQ Procedure

The FREQ Procedure

Table flag_tf * flag_sig_trans_v_sum

Cross-Tabular Freq Table

Frequency
Percent
Row Pct
Col Pct
Table of flag_tf by flag_sig_trans_v_sum
flag_tf flag_sig_trans_v_sum
0 1 Total
0
1810
78.94
83.29
94.62
363
15.83
16.71
95.53
2173
94.77
 
 
1
103
4.49
85.83
5.38
17
0.74
14.17
4.47
120
5.23
 
 
Total
1913
83.43
380
16.57
2293
100.00

Statistics for Table of flag_tf by flag_sig_trans_v_sum

Chi-Square Tests

Statistic DF Value Prob
Chi-Square 1 0.5300 0.4666
Likelihood Ratio Chi-Square 1 0.5510 0.4579
Continuity Adj. Chi-Square 1 0.3623 0.5472
Mantel-Haenszel Chi-Square 1 0.5297 0.4667
Phi Coefficient   -0.0152  
Contingency Coefficient   0.0152  
Cramer's V   -0.0152  

Fisher's Exact Test

Fisher's Exact Test
Cell (1,1) Frequency (F) 1810
Left-sided Pr <= F 0.2792
Right-sided Pr >= F 0.8016
   
Table Probability (P) 0.0808
Two-sided Pr <= P 0.5296

Sample Size = 2293


In [47]:
proc freq data=CEGs.mgTFsig;
    table flag_tf*flag_sig_int_m_sum / chisq;
    run;


Out[47]:
SAS Output

SAS Output

The SAS System

The FREQ Procedure

The FREQ Procedure

Table flag_tf * flag_sig_int_m_sum

Cross-Tabular Freq Table

Frequency
Percent
Row Pct
Col Pct
Table of flag_tf by flag_sig_int_m_sum
flag_tf flag_sig_int_m_sum
0 1 Total
0
1699
74.10
78.19
94.39
474
20.67
21.81
96.15
2173
94.77
 
 
1
101
4.40
84.17
5.61
19
0.83
15.83
3.85
120
5.23
 
 
Total
1800
78.50
493
21.50
2293
100.00

Statistics for Table of flag_tf by flag_sig_int_m_sum

Chi-Square Tests

Statistic DF Value Prob
Chi-Square 1 2.4094 0.1206
Likelihood Ratio Chi-Square 1 2.5797 0.1082
Continuity Adj. Chi-Square 1 2.0681 0.1504
Mantel-Haenszel Chi-Square 1 2.4083 0.1207
Phi Coefficient   -0.0324  
Contingency Coefficient   0.0324  
Cramer's V   -0.0324  

Fisher's Exact Test

Fisher's Exact Test
Cell (1,1) Frequency (F) 1699
Left-sided Pr <= F 0.0718
Right-sided Pr >= F 0.9561
   
Table Probability (P) 0.0279
Two-sided Pr <= P 0.1377

Sample Size = 2293


In [48]:
proc freq data=CEGs.mgTFsig;
    table flag_tf*flag_sig_int_v_sum / chisq;
    run;


Out[48]:
SAS Output

SAS Output

The SAS System

The FREQ Procedure

The FREQ Procedure

Table flag_tf * flag_sig_int_v_sum

Cross-Tabular Freq Table

Frequency
Percent
Row Pct
Col Pct
Table of flag_tf by flag_sig_int_v_sum
flag_tf flag_sig_int_v_sum
0 1 Total
0
1697
74.01
78.09
94.44
476
20.76
21.91
95.97
2173
94.77
 
 
1
100
4.36
83.33
5.56
20
0.87
16.67
4.03
120
5.23
 
 
Total
1797
78.37
496
21.63
2293
100.00

Statistics for Table of flag_tf by flag_sig_int_v_sum

Chi-Square Tests

Statistic DF Value Prob
Chi-Square 1 1.8409 0.1748
Likelihood Ratio Chi-Square 1 1.9515 0.1624
Continuity Adj. Chi-Square 1 1.5449 0.2139
Mantel-Haenszel Chi-Square 1 1.8401 0.1749
Phi Coefficient   -0.0283  
Contingency Coefficient   0.0283  
Cramer's V   -0.0283  

Fisher's Exact Test

Fisher's Exact Test
Cell (1,1) Frequency (F) 1697
Left-sided Pr <= F 0.1047
Right-sided Pr >= F 0.9328
   
Table Probability (P) 0.0375
Two-sided Pr <= P 0.2098

Sample Size = 2293

Mated AI in > 50%


In [49]:
data mm;
    set CEGS.mgTFsig;
    if m_pct_ai_bar ge 50;
    run;

proc freq data=mm;
    table flag_tf*flag_sig_cis_m_sum / chisq;
    run;
    
proc freq data=mm;
    table flag_tf*flag_sig_trans_m_sum / chisq;
    run;
    
proc freq data=mm;
    table flag_tf*flag_sig_int_m_sum / chisq;
    run;


Out[49]:
SAS Output

SAS Output

The SAS System

The FREQ Procedure

The FREQ Procedure

Table flag_tf * flag_sig_cis_m_sum

Cross-Tabular Freq Table

Frequency
Percent
Row Pct
Col Pct
Table of flag_tf by flag_sig_cis_m_sum
flag_tf flag_sig_cis_m_sum
0 1 Total
0
46
41.44
42.20
97.87
63
56.76
57.80
98.44
109
98.20
 
 
1
1
0.90
50.00
2.13
1
0.90
50.00
1.56
2
1.80
 
 
Total
47
42.34
64
57.66
111
100.00

Statistics for Table of flag_tf by flag_sig_cis_m_sum

Chi-Square Tests

Statistic DF Value Prob
WARNING: 50% of the cells have expected counts less
than 5. Chi-Square may not be a valid test.
Chi-Square 1 0.0489 0.8250
Likelihood Ratio Chi-Square 1 0.0484 0.8260
Continuity Adj. Chi-Square 1 0.0000 1.0000
Mantel-Haenszel Chi-Square 1 0.0485 0.8257
Phi Coefficient   -0.0210  
Contingency Coefficient   0.0210  
Cramer's V   -0.0210  

Fisher's Exact Test

Fisher's Exact Test
Cell (1,1) Frequency (F) 46
Left-sided Pr <= F 0.6698
Right-sided Pr >= F 0.8229
   
Table Probability (P) 0.4927
Two-sided Pr <= P 1.0000

Sample Size = 111


The SAS System

The FREQ Procedure

The FREQ Procedure

Table flag_tf * flag_sig_trans_m_sum

Cross-Tabular Freq Table

Frequency
Percent
Row Pct
Col Pct
Table of flag_tf by flag_sig_trans_m_sum
flag_tf flag_sig_trans_m_sum
0 1 Total
0
63
56.76
57.80
98.44
46
41.44
42.20
97.87
109
98.20
 
 
1
1
0.90
50.00
1.56
1
0.90
50.00
2.13
2
1.80
 
 
Total
64
57.66
47
42.34
111
100.00

Statistics for Table of flag_tf by flag_sig_trans_m_sum

Chi-Square Tests

Statistic DF Value Prob
WARNING: 50% of the cells have expected counts less
than 5. Chi-Square may not be a valid test.
Chi-Square 1 0.0489 0.8250
Likelihood Ratio Chi-Square 1 0.0484 0.8260
Continuity Adj. Chi-Square 1 0.0000 1.0000
Mantel-Haenszel Chi-Square 1 0.0485 0.8257
Phi Coefficient   0.0210  
Contingency Coefficient   0.0210  
Cramer's V   0.0210  

Fisher's Exact Test

Fisher's Exact Test
Cell (1,1) Frequency (F) 63
Left-sided Pr <= F 0.8229
Right-sided Pr >= F 0.6698
   
Table Probability (P) 0.4927
Two-sided Pr <= P 1.0000

Sample Size = 111


The SAS System

The FREQ Procedure

The FREQ Procedure

Table flag_tf * flag_sig_int_m_sum

Cross-Tabular Freq Table

Frequency
Percent
Row Pct
Col Pct
Table of flag_tf by flag_sig_int_m_sum
flag_tf flag_sig_int_m_sum
0 1 Total
0
60
54.05
55.05
98.36
49
44.14
44.95
98.00
109
98.20
 
 
1
1
0.90
50.00
1.64
1
0.90
50.00
2.00
2
1.80
 
 
Total
61
54.95
50
45.05
111
100.00

Statistics for Table of flag_tf by flag_sig_int_m_sum

Chi-Square Tests

Statistic DF Value Prob
WARNING: 50% of the cells have expected counts less
than 5. Chi-Square may not be a valid test.
Chi-Square 1 0.0202 0.8870
Likelihood Ratio Chi-Square 1 0.0201 0.8873
Continuity Adj. Chi-Square 1 0.0000 1.0000
Mantel-Haenszel Chi-Square 1 0.0200 0.8875
Phi Coefficient   0.0135  
Contingency Coefficient   0.0135  
Cramer's V   0.0135  

Fisher's Exact Test

Fisher's Exact Test
Cell (1,1) Frequency (F) 60
Left-sided Pr <= F 0.7993
Right-sided Pr >= F 0.7002
   
Table Probability (P) 0.4996
Two-sided Pr <= P 1.0000

Sample Size = 111


In [50]:
data vv;
    set CEGS.mgTFsig;
    if v_pct_ai_bar ge 50;
    run;

proc freq data=vv;
    table flag_tf*flag_sig_cis_v_sum / chisq;
    run;
    
proc freq data=vv;
    table flag_tf*flag_sig_trans_v_sum / chisq;
    run;
    
proc freq data=vv;
    table flag_tf*flag_sig_int_v_sum / chisq;
    run;


Out[50]:
SAS Output

SAS Output

The SAS System

The FREQ Procedure

The FREQ Procedure

Table flag_tf * flag_sig_cis_v_sum

Cross-Tabular Freq Table

Frequency
Percent
Row Pct
Col Pct
Table of flag_tf by flag_sig_cis_v_sum
flag_tf flag_sig_cis_v_sum
0 1 Total
0
47
39.83
40.17
100.00
70
59.32
59.83
98.59
117
99.15
 
 
1
0
0.00
0.00
0.00
1
0.85
100.00
1.41
1
0.85
 
 
Total
47
39.83
71
60.17
118
100.00

Statistics for Table of flag_tf by flag_sig_cis_v_sum

Chi-Square Tests

Statistic DF Value Prob
WARNING: 50% of the cells have expected counts less
than 5. Chi-Square may not be a valid test.
Chi-Square 1 0.6676 0.4139
Likelihood Ratio Chi-Square 1 1.0217 0.3121
Continuity Adj. Chi-Square 1 0.0000 1.0000
Mantel-Haenszel Chi-Square 1 0.6620 0.4159
Phi Coefficient   0.0752  
Contingency Coefficient   0.0750  
Cramer's V   0.0752  

Fisher's Exact Test

Fisher's Exact Test
Cell (1,1) Frequency (F) 47
Left-sided Pr <= F 1.0000
Right-sided Pr >= F 0.6017
   
Table Probability (P) 0.6017
Two-sided Pr <= P 1.0000

Sample Size = 118


The SAS System

The FREQ Procedure

The FREQ Procedure

Table flag_tf * flag_sig_trans_v_sum

Cross-Tabular Freq Table

Frequency
Percent
Row Pct
Col Pct
Table of flag_tf by flag_sig_trans_v_sum
flag_tf flag_sig_trans_v_sum
0 1 Total
0
65
55.08
55.56
100.00
52
44.07
44.44
98.11
117
99.15
 
 
1
0
0.00
0.00
0.00
1
0.85
100.00
1.89
1
0.85
 
 
Total
65
55.08
53
44.92
118
100.00

Statistics for Table of flag_tf by flag_sig_trans_v_sum

Chi-Square Tests

Statistic DF Value Prob
WARNING: 50% of the cells have expected counts less
than 5. Chi-Square may not be a valid test.
Chi-Square 1 1.2369 0.2661
Likelihood Ratio Chi-Square 1 1.6113 0.2043
Continuity Adj. Chi-Square 1 0.0105 0.9182
Mantel-Haenszel Chi-Square 1 1.2264 0.2681
Phi Coefficient   0.1024  
Contingency Coefficient   0.1019  
Cramer's V   0.1024  

Fisher's Exact Test

Fisher's Exact Test
Cell (1,1) Frequency (F) 65
Left-sided Pr <= F 1.0000
Right-sided Pr >= F 0.4492
   
Table Probability (P) 0.4492
Two-sided Pr <= P 0.4492

Sample Size = 118


The SAS System

The FREQ Procedure

The FREQ Procedure

Table flag_tf * flag_sig_int_v_sum

Cross-Tabular Freq Table

Frequency
Percent
Row Pct
Col Pct
Table of flag_tf by flag_sig_int_v_sum
flag_tf flag_sig_int_v_sum
0 1 Total
0
67
56.78
57.26
98.53
50
42.37
42.74
100.00
117
99.15
 
 
1
1
0.85
100.00
1.47
0
0.00
0.00
0.00
1
0.85
 
 
Total
68
57.63
50
42.37
118
100.00

Statistics for Table of flag_tf by flag_sig_int_v_sum

Chi-Square Tests

Statistic DF Value Prob
WARNING: 50% of the cells have expected counts less
than 5. Chi-Square may not be a valid test.
Chi-Square 1 0.7416 0.3892
Likelihood Ratio Chi-Square 1 1.1086 0.2924
Continuity Adj. Chi-Square 1 0.0000 1.0000
Mantel-Haenszel Chi-Square 1 0.7353 0.3912
Phi Coefficient   -0.0793  
Contingency Coefficient   0.0790  
Cramer's V   -0.0793  

Fisher's Exact Test

Fisher's Exact Test
Cell (1,1) Frequency (F) 67
Left-sided Pr <= F 0.5763
Right-sided Pr >= F 1.0000
   
Table Probability (P) 0.5763
Two-sided Pr <= P 1.0000

Sample Size = 118


In [ ]: