In [1]:
list_pkgs <- c("plyr", "dplyr", "tidyr", "readr", "car", "ggplot2", "cowplot")
new_pkgs <- list_pkgs[!(list_pkgs %in% installed.packages()[,"Package"])]
if(length(new_pkgs) > 0){ install.packages(new_pkgs) }

library(plyr)
library(dplyr)
library(tidyr)
library(readr)
library(car)
library(ggplot2)
library(cowplot)


Attaching package: ‘dplyr’

The following objects are masked from ‘package:plyr’:

    arrange, count, desc, failwith, id, mutate, rename, summarise,
    summarize

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union


Attaching package: ‘car’

The following object is masked from ‘package:dplyr’:

    recode


Attaching package: ‘cowplot’

The following object is masked from ‘package:ggplot2’:

    ggsave

Outline

Introduce the basic concepts of QTL analysis - we will start with mapping in an F2 population and later consider issues that arise in more complex populations.

We should start by looking at the structure of data sets and distribution of traits.

We will use data from an F2 data to do basic analysis. Typical F2 populations are formed by crosses between inbred lines, which typically differ in some trait(s) of interest. We will start with the simplest ANOVA model to look at whether there are differences in the phenotypes of the different genotypes at a locus (we will use marker 16 and trait 2a for this first analysis).

We will next adapt this analysis to run it as a regression model, including creating index variables to separate additive from dominance effects.

This basic model can be adapted as an iterative analysis that walks across markers. We can use this to test all markers for effects on trait 2a and then run the analysis for trait 3a. We will discuss use of LPR or LOD as a measure of significance.

For trait 2a we see a single locus, but because of LD the adjacent markers are also significant. For trait 3a we are left with a confusing pattern - how could we identify the most likely positions of QTL?

How could we include marker co-factors if we want to identify positions of QTL?

If we run this model on a very large number of markers, how would we be able to identify significant loci? We could correct for multiple testing, but the tests are not independent. We could use a permutation test to establish the empirical significance threshold. We can run a simple permutational model for trait 1 to establish the significance of a single test. How can we establish the threshold for a set of tests?

We could also use this permutation approach simulate a true positive as a way of looking at power.

Introduction to QTL mapping

We'll begin this tutorial with a simple unstructured F2 population. This is a population constructed by crossing a pair of inbred lines to produce a 'hybrid' F1 population and randomly crossing these F1 individuals to produce a F2. The F2 has a single generation of recombination between the genomes of the two parental lines and all individuals are equally related because they all had genetically identical parents.

Let's first load the data we need and make some plots to see if our traits were measured and imported correctly.

Reading and plotting data


In [2]:
f2_data = read_csv("./F2 geno pheno with QTL effect.csv")
f2_data


IDSexSireDamLSBLSWtrait1trait2trait3G1G24G25G26G27G28G29G30G31Trait2aTrait3a
55 M 0 13 9 9 2.7840078 84.136 30.702 1 0 0 0 0 0 0 0 0 87.0846630.70200
56 M 0 13 9 9 2.9961462 91.376 37.442 -1 0 0 0 -1 -1 -1 0 1 89.2909834.34656
57 M 0 13 9 9 4.4964274 97.516 44.382 0 0 1 1 1 0 0 0 0 100.4646644.38200
58 F 0 13 9 9 4.1677393 86.236 27.362 0 -1 -1 -1 -1 -1 -1 -1 -1 84.1509825.81428
59 M 0 13 9 9 8.3322971 90.316 47.872 -1 -1 -1 -1 -1 -1 -1 0 0 88.2309844.77656
60 F 0 13 9 9 6.0586791 74.856 31.912 1 0 0 0 0 0 0 0 0 77.8046631.91200
61 F 0 13 9 9 5.2987302 91.926 34.582 -1 -1 -1 -1 -1 -1 -1 -1 -1 89.8409831.48656
62 M 0 13 9 9 5.0645314 95.696 45.092 1 0 0 0 0 -1 -1 -1 -1 98.6446645.09200
63 F 0 13 9 9 4.7603125 86.126 25.982 0 0 0 0 0 0 0 0 0 89.0746625.98200
64 F 0 13 11 11 1.0888584 91.126 32.162 1 0 0 0 0 0 0 0 0 94.0746632.16200
65 F 0 13 11 11 1.1838771 97.186 28.232 0 1 1 0 0 0 0 0 0 100.1346629.77972
66 F 0 13 11 11 0.7492970 93.536 37.602 0 1 1 1 1 1 1 1 1 95.6210240.69744
67 F 0 13 11 11 0.7780688 97.186 31.422 1 1 1 1 1 1 1 1 1 99.2710234.51744
69 M 0 13 11 11 1.6001284 97.476 36.482 -1 1 1 1 1 1 1 1 1 100.4246636.48200
70 F 0 13 11 11 1.4331547 97.166 31.212 1 0 -1 -1 -1 -1 -1 -1 -1 99.2510234.30744
71 M 0 13 11 11 3.8746510100.406 41.802 0 0 0 0 0 0 0 0 0 103.3546641.80200
72 M 0 13 11 11 2.9011491 90.076 35.232 0 -1 -1 -1 -1 -1 -1 -1 -1 93.0246633.68428
73 M 0 13 11 11 3.0200725 98.550 41.972 0 0 0 0 0 0 0 1 1 100.6350243.51972
74 M 0 13 11 11 2.2471589103.306 39.032 -1 0 0 0 0 0 0 0 0 106.2546637.48428
82 M 15 10 9 2 4.3885895 88.168 39.690 0 0 0 0 0 0 0 0 0 91.1166641.23772
83 M 15 10 9 2 4.5828357 93.558 46.960 0 0 0 0 0 0 0 0 0 95.6430250.05544
84 M 16 12 7 5 5.2427527 86.053 43.940 -1 -1 -1 -1 -1 -1 -1 -1 -1 83.9679840.84456
85 M 16 12 7 5 1.6977471 83.253 33.890 0 0 0 0 0 0 0 0 0 86.2016633.89000
86 M 16 12 7 5 7.2132029 96.193 47.840 1 0 0 0 0 0 0 0 0 99.1416647.84000
87 M 16 12 7 5 1.7171386 86.183 34.860 0 0 0 0 0 0 0 1 1 89.1316634.86000
90 M 16 12 7 5 5.2721871 92.243 47.600 1 0 0 0 -1 -1 -1 -1 -1 95.1916647.60000
91 F 4 17 12 12 2.0840940 77.051 30.942 0 1 1 1 1 1 1 1 1 79.1360232.48972
92 M 4 17 12 12 2.4946120 92.851 30.812 0 1 1 1 1 1 1 1 1 94.9360233.90744
93 M 4 17 12 12 3.6485070 97.441 40.372 0 0 0 0 0 0 0 0 0 100.3896640.37200
94 M 4 17 12 12 5.6060066 84.361 46.422 0 0 0 0 -1 -1 -1 -1 -1 87.3096646.42200
1193 F 45 35 5 12 2.240721 94.663 25.160 0 1 0 0 0 0 0 0 0 96.7480228.25544
1194 M 45 35 5 12 2.730726 97.223 40.190 -1 1 0 0 0 0 0 0 0 99.3080241.73772
1295 M 42 36 4 8 4.839992 94.128 44.674 -1 1 1 0 0 0 0 0 0 97.0766644.67400
1296 F 42 36 4 8 2.208632 93.078 28.184 -1 0 0 0 1 1 1 1 0 96.0266625.08856
1297 M 42 36 4 8 3.000926 96.698 44.394 0 1 0 0 0 0 0 0 0 98.7830247.48944
1298 F 42 36 4 8 2.381743 92.958 26.764 -1 1 1 1 1 1 1 1 0 90.8729825.21628
1299 F 41 9 4 8 1.460508 90.350 25.972 -1 0 0 0 1 1 1 1 0 88.2649824.42428
1300 F 41 9 4 8 1.686151 86.210 27.102 -1 1 1 1 1 1 1 1 0 89.1586627.10200
1301 F 41 9 4 8 2.936640 97.250 38.302 0 1 0 0 0 0 0 0 0 99.3350239.84972
1302 M 41 9 4 8 1.764011 90.584 34.292 1 1 1 1 1 1 1 1 1 92.6690237.38744
1437 F 41 9 6 10 2.521450 86.280 29.212 0 0 0 0 0 0 0 0 0 89.2286629.21200
1438 F 41 9 6 10 2.589012 98.960 32.812 0 1 1 0 0 0 0 0 0 101.0450235.90744
1439 F 41 9 6 10 3.250871 89.600 31.602 1 1 1 1 1 1 1 1 1 92.5486633.14972
1440 M 41 9 6 10 2.589012 96.320 40.652 0 1 1 0 0 0 0 0 0 99.2686642.19972
1441 M 41 9 6 10 3.177307 97.050 39.212 1 1 1 0 0 0 0 -1 -1 99.1350242.30744
1442 F 41 9 6 10 2.344552 85.050 31.082 -1 1 1 0 0 0 0 0 0 87.9986632.62972
1443 M 31 6 13 10 6.239614 97.306 46.882 1 1 1 0 0 0 0 0 0 99.3910249.97744
1445 M 31 6 13 10 6.145780 97.196 47.312 0 0 0 0 0 0 0 0 0 100.1446647.31200
1446 M 31 6 13 10 4.465913 98.426 43.812 1 0 0 1 0 0 0 -1 -1 101.3746645.35972
1447 F 31 6 13 12 2.712283 82.736 26.992 0 0 0 0 -1 -1 -1 -1 -1 85.6846626.99200
1448 M 31 6 13 12 2.309748 100.126 40.762 0 0 0 0 0 0 0 0 0 103.0746640.76200
1449 F 31 6 13 12 3.089857 104.106 30.632 1 0 0 0 0 0 0 0 0 107.0546632.17972
1451 M 31 6 13 12 5.148974 88.666 40.412 1 1 0 0 1 1 1 0 0 90.7510243.50744
1452 M 31 6 13 12 3.359891 106.176 44.112 1 1 1 1 1 1 1 0 0 108.2610247.20744
1453 F 31 6 13 12 1.523216 87.186 27.452 1 1 1 1 1 1 1 1 1 89.2710230.54744
1454 M 31 6 13 12 4.689586 94.156 41.692 0 0 0 0 0 -1 -1 -1 -1 92.0709838.59656
1455 F 31 6 13 12 2.159765 88.106 32.652 1 0 1 1 1 1 1 0 0 91.0546634.19972
1456 F 42 36 3 12 4.038872 97.268 31.814 1 0 0 0 0 1 1 0 0 100.2166633.36172
1457 M 42 36 3 12 2.187336 100.938 38.404 1 1 1 1 1 1 1 0 0 103.0230241.49944
1458 F 42 36 3 12 2.559654 88.328 28.804 0 -1 -1 -1 -1 -1 -1 -1 -1 86.2429825.70856

In [3]:
f2_data %>% 
    gather(trait, value, Trait2a:Trait3a) %>%
    ggplot(aes(trait, value)) + geom_violin() + geom_jitter(width = 0.1, alpha = 0.1)


Simple single marker models

Anova model

The simplest way of mapping a QTL is using an Anova. We divide genotypes at a particular marker position into 3 classes (the two homozygotes and the heterozygote) and look for diferences among the classes. Later we'll see how to dissect this further. This model is comparing the different genotypes states of the G16 marker with respect to the Trait2a phenotype. We are also controling for differences in Sex, litter size at birth (LSB) and litter size at weaning (LSW).

The lm() function in R can be used for this test.


In [4]:
anova_fit_f2_marker_16 = lm(Trait2a ~ Sex + LSB + LSW + as.factor(G16), data = f2_data)
Anova(anova_fit_f2_marker_16)


Sum SqDfF valuePr(>F)
Sex 1404.73117 1 36.4635275 3.064203e-09
LSB 20.72801 1 0.5380504 4.635905e-01
LSW 126.60330 1 3.2863248 7.046730e-02
as.factor(G16) 3294.99693 2 42.7651971 7.470714e-18
Residuals18992.47044 493 NA NA

In [5]:
coef(anova_fit_f2_marker_16)


(Intercept)
84.6884849551607
SexM
3.39866821160226
LSB
-0.138667893466863
LSW
0.403972112197251
as.factor(G16)0
6.40439809518905
as.factor(G16)1
5.52682718918684

Regression models

Another way of doing this is using a regression model, in which we code the different genotypes numericaly (-1 for one homozygote, 0 for the heterozygotes, and +1 for the other homozygote). The coeficient associated with this encoding is related to the additive effect of the marker.


In [6]:
reg_fit_f2_marker_16_A = lm(Trait2a ~ Sex + LSB + LSW + G16, data = f2_data)
summary(reg_fit_f2_marker_16_A)


Call:
lm(formula = Trait2a ~ Sex + LSB + LSW + G16, data = f2_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-61.765  -3.641   0.339   4.085  23.938 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  89.2698     1.7463  51.121  < 2e-16 ***
SexM          3.5658     0.5854   6.091 2.26e-09 ***
LSB          -0.0885     0.1967  -0.450    0.653    
LSW           0.3557     0.2319   1.534    0.126    
G16           2.6529     0.4213   6.296 6.73e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 6.463 on 494 degrees of freedom
  (6 observations deleted due to missingness)
Multiple R-squared:  0.137,	Adjusted R-squared:   0.13 
F-statistic:  19.6 on 4 and 494 DF,  p-value: 5.486e-15

We can also use a coding of 1 for the heterozygotes and 0 for the homozygotes to extract a coefficient related to the dominance effect of the marker.


In [7]:
f2_data$G16_D = ifelse(f2_data$G16, 0, 1) # This is the dominance coding
reg_fit_f2_marker_16_AD = lm(Trait2a ~ Sex + LSB + LSW + G16 + G16_D, data = f2_data)
summary(reg_fit_f2_marker_16_AD)


Call:
lm(formula = Trait2a ~ Sex + LSB + LSW + G16 + G16_D, data = f2_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-63.397  -3.424   0.465   3.742  22.142 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  87.4519     1.7001  51.438  < 2e-16 ***
SexM          3.3987     0.5628   6.039 3.06e-09 ***
LSB          -0.1387     0.1890  -0.734   0.4636    
LSW           0.4040     0.2228   1.813   0.0705 .  
G16           2.7634     0.4050   6.823 2.62e-11 ***
G16_D         3.6410     0.5582   6.523 1.71e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 6.207 on 493 degrees of freedom
  (6 observations deleted due to missingness)
Multiple R-squared:  0.2055,	Adjusted R-squared:  0.1975 
F-statistic: 25.51 on 5 and 493 DF,  p-value: < 2.2e-16

Mapping loci

If we have a set of markers we can run a series of single markers models along the chromossome to locate the position of the QTL in that cromossome. Let's create dominance collumns and run the regression models for each marker


In [8]:
n_markers = 31
marker_fits_Trait2a = list()
for(i in seq(n_markers)){
    current_marker = paste0('G', i)
    f2_data[[paste0(current_marker, '_D')]] = ifelse(f2_data[[current_marker]], 0, 1)
    model_formula = paste0("Trait2a ~ Sex + LSB + LSW + ", current_marker, "+", current_marker, "_D")
    marker_fits_Trait2a[[i]] = lm(as.formula(model_formula), data = f2_data)
}

Let's look at one of the model fits to see if everything is ok


In [9]:
summary(marker_fits_Trait2a[[3]])


Call:
lm(formula = as.formula(model_formula), data = f2_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-61.818  -3.749   0.048   4.309  24.783 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  89.2347     1.8463  48.332  < 2e-16 ***
SexM          3.6262     0.6058   5.986 4.14e-09 ***
LSB          -0.1426     0.2034  -0.701   0.4835    
LSW           0.4282     0.2398   1.786   0.0748 .  
G3            1.0665     0.4310   2.474   0.0137 *  
G3_D         -0.2116     0.5992  -0.353   0.7242    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 6.682 on 493 degrees of freedom
  (6 observations deleted due to missingness)
Multiple R-squared:  0.07934,	Adjusted R-squared:   0.07 
F-statistic: 8.497 on 5 and 493 DF,  p-value: 9.933e-08

We can use this set of fits to see how the log significance changes along the cromossome. The regression model allows us to separate the additive and dominance effects.

First the additive signficance


In [10]:
ldply(marker_fits_Trait2a, function(x) summary(x)$coefficients[5, 'Pr(>|t|)']) %>%
    ggplot(aes(1:n_markers, -log10(V1))) + geom_line() + labs(x = "marker", y = "LPR") +
        scale_x_continuous(breaks = seq(1, n_markers, by = 2))


Now the dominance significance


In [11]:
ldply(marker_fits_Trait2a, function(x) summary(x)$coefficients[6, 'Pr(>|t|)'])    %>%
    ggplot(aes(1:n_markers, -log10(V1))) + geom_line() + labs(x = "marker", y = "LPR") +
        scale_x_continuous(breaks = seq(1, n_markers, by = 2))


We could also do both graphs at once using some faceting.


In [12]:
# Additive and dominance significance ploted together
ldply(marker_fits_Trait2a, function(x) summary(x)$coefficients[c(5, 6), 'Pr(>|t|)']) %>% 
    rename(additive = G1, dominance = G1_D) %>%
    mutate(marker = 1:n_markers) %>%
    gather(variable, value, additive:dominance) %>%    
ggplot(aes(marker, -log10(value)), group = variable) + 
    geom_line() + labs(x = "marker", y = "LPR") +
        scale_x_continuous(breaks = seq(1, n_markers, by = 2)) + facet_wrap(~variable)


More than one effect


In [13]:
n_markers = 31
marker_fits_Trait3a = list()
for(i in seq(n_markers)){
    current_marker = paste0('G', i)
    model_formula = paste0("Trait3a ~ Sex + LSB + LSW + ", current_marker, "+", current_marker, "_D")
    marker_fits_Trait3a[[i]] = lm(as.formula(model_formula), data = f2_data)
}

In [14]:
ldply(marker_fits_Trait3a, function(x) summary(x)$coefficients[c(5, 6), 'Pr(>|t|)']) %>% 
    rename(additive = G1, dominance = G1_D) %>%
    mutate(marker = 1:n_markers) %>%
    gather(variable, value, additive:dominance) %>%    
ggplot(aes(marker, -log10(value)), group = variable) + 
    geom_line() + labs(x = "marker", y = "LPR") +
        scale_x_continuous(breaks = seq(1, n_markers, by = 2)) + facet_wrap(~variable)


Including more than one marker in a model

We can take LD into account in the mapping by including flanking markers. This provides more localised estimates of the QTLs.


In [15]:
fit_f2_m13         = lm(Trait2a ~ Sex + LSB + LSW + G13 + G13_D                            , data = f2_data)
fit_f2_m13_fl11_15 = lm(Trait2a ~ Sex + LSB + LSW + G13 + G13_D + G11 + G11_D + G15 + G15_D, data = f2_data)
summary(fit_f2_m13)$coefficients
summary(fit_f2_m13_fl11_15)$coefficients


EstimateStd. Errort valuePr(>|t|)
(Intercept)87.6405709 1.7584448 49.8398183 1.242938e-194
SexM 3.5589252 0.5771751 6.1661101 1.458395e-09
LSB-0.1477513 0.1939037 -0.7619831 4.464344e-01
LSW 0.4308590 0.2289043 1.8822668 6.038852e-02
G13 2.4783795 0.4101400 6.0427646 2.989783e-09
G13_D 2.7697281 0.5730483 4.8333237 1.796375e-06
EstimateStd. Errort valuePr(>|t|)
(Intercept)87.54050836 1.7178712 50.95871522 9.556054e-198
SexM 3.47284543 0.5628614 6.16998338 1.433808e-09
LSB-0.15554123 0.1899232 -0.81896904 4.132030e-01
LSW 0.42661370 0.2237477 1.90667265 5.714880e-02
G13-0.02139369 1.3164233 -0.01625138 9.870405e-01
G13_D-0.15242632 1.3276670 -0.11480764 9.086447e-01
G11-0.20063827 0.9288979 -0.21599605 8.290809e-01
G11_D-1.36413615 1.0484438 -1.30110570 1.938350e-01
G15 2.84037161 0.9999768 2.84043763 4.693044e-03
G15_D 4.83151406 1.0118631 4.77486922 2.379731e-06

Now let's redo all the individual markers, but including 2 flanking markers in the models. The markers at the edge of the chromossome only get one flanking marker.


In [16]:
n_markers = 31
interval = 6 # Play around with this value. 
fl_marker_fits_Trait3a = list()
for(i in seq(n_markers)){
    current_marker = paste0('G', i)
    f2_data[[paste0(current_marker, '_D')]] = ifelse(f2_data[[current_marker]], 0, 1)
    model_formula = paste0("Trait3a ~ Sex + LSB + LSW + ", current_marker, "+", current_marker, "_D")
    if((i - interval) >= 1)
        model_formula = paste0(model_formula, "+ G", paste0(i - interval), "+ G", paste0(i - interval), "_D")
    if((i + interval) <= n_markers)
        model_formula = paste0(model_formula, "+ G", paste0(i + interval), "+ G", paste0(i + interval), "_D")
    fl_marker_fits_Trait3a[[i]] = lm(as.formula(model_formula), data = f2_data)
}

In [17]:
ldply(fl_marker_fits_Trait3a, function(x) summary(x)$coefficients[c(5, 6), 'Pr(>|t|)']) %>% 
    rename(additive = G1, dominance = G1_D) %>%
    mutate(marker = 1:n_markers) %>%
    gather(variable, value, additive:dominance) %>%    
ggplot(aes(marker, -log(value)), group = variable) + 
    geom_line() + labs(x = "marker", y = "Log") +
        scale_x_continuous(breaks = seq(1, n_markers, by = 2)) + facet_wrap(~variable)


Type I error rate

Let's run a null model, with no QTLs, and see how many false postives we get. The expectation is the significance treashold we choose (5% maybe?).


In [18]:
simulated_genotypes = read_csv("sim genotypes.csv")
f2_data_null = inner_join(select(f2_data, ID:trait3), simulated_genotypes, by = "ID")
f2_data_null


IDSexSireDamLSBLSWtrait1trait2trait3G1G991G992G993G994G995G996G997G998G999G1000
55 M 0 13 9 9 2.7840078 84.136 30.702 1 -1 0 0 -1 0 -1 -1 -1 -1 1
56 M 0 13 9 9 2.9961462 91.376 37.442 -1 -1 0 -1 1 0 1 -1 1 0 0
57 M 0 13 9 9 4.4964274 97.516 44.382 0 0 0 -1 0 -1 1 1 1 1 -1
58 F 0 13 9 9 4.1677393 86.236 27.362 0 -1 0 0 1 -1 -1 1 0 -1 0
59 M 0 13 9 9 8.3322971 90.316 47.872 -1 1 1 0 1 0 0 0 0 -1 -1
60 F 0 13 9 9 6.0586791 74.856 31.912 1 1 0 -1 0 0 0 0 -1 0 0
61 F 0 13 9 9 5.2987302 91.926 34.582 -1 -1 1 -1 0 1 0 0 0 1 -1
62 M 0 13 9 9 5.0645314 95.696 45.092 1 0 -1 0 1 0 0 -1 1 0 0
63 F 0 13 9 9 4.7603125 86.126 25.982 0 1 0 1 0 0 0 -1 0 0 0
64 F 0 13 11 11 1.0888584 91.126 32.162 1 0 1 0 0 0 0 0 0 0 1
65 F 0 13 11 11 1.1838771 97.186 28.232 0 1 1 -1 0 0 0 1 1 0 -1
66 F 0 13 11 11 0.7492970 93.536 37.602 0 0 -1 1 0 0 1 0 -1 0 -1
67 F 0 13 11 11 0.7780688 97.186 31.422 1 0 0 0 1 0 -1 1 0 -1 1
69 M 0 13 11 11 1.6001284 97.476 36.482 -1 -1 1 -1 0 0 0 1 0 1 1
70 F 0 13 11 11 1.4331547 97.166 31.212 1 0 0 0 1 1 1 1 1 -1 0
71 M 0 13 11 11 3.8746510100.406 41.802 0 1 0 0 1 1 1 0 0 0 0
72 M 0 13 11 11 2.9011491 90.076 35.232 0 1 -1 0 0 0 1 1 1 1 0
73 M 0 13 11 11 3.0200725 98.550 41.972 0 0 -1 0 0 0 1 0 0 1 0
74 M 0 13 11 11 2.2471589103.306 39.032 -1 1 0 0 -1 0 0 0 1 1 1
82 M 15 10 9 2 4.3885895 88.168 39.690 0 0 0 0 1 0 1 1 1 0 1
83 M 15 10 9 2 4.5828357 93.558 46.960 0 1 0 -1 1 1 1 -1 -1 0 -1
84 M 16 12 7 5 5.2427527 86.053 43.940 -1 1 -1 1 0 1 0 0 1 1 0
85 M 16 12 7 5 1.6977471 83.253 33.890 0 -1 1 0 1 0 0 0 0 0 -1
86 M 16 12 7 5 7.2132029 96.193 47.840 1 0 -1 -1 1 0 -1 0 0 -1 -1
87 M 16 12 7 5 1.7171386 86.183 34.860 0 0 1 1 -1 1 0 1 0 1 1
90 M 16 12 7 5 5.2721871 92.243 47.600 1 0 1 0 0 0 0 0 0 0 -1
91 F 4 17 12 12 2.0840940 77.051 30.942 0 0 0 1 1 0 -1 0 0 0 0
92 M 4 17 12 12 2.4946120 92.851 30.812 0 0 0 1 0 0 0 1 -1 1 0
93 M 4 17 12 12 3.6485070 97.441 40.372 0 0 1 0 0 0 -1 -1 -1 0 0
94 M 4 17 12 12 5.6060066 84.361 46.422 0 -1 -1 1 0 1 0 0 -1 -1 -1
1193 F 45 35 5 12 2.240721 94.663 25.160 1 0 0 1 1 0 0 0 0 1 0
1194 M 45 35 5 12 2.730726 97.223 40.190 -1 -1 -1 0 -1 -1 1 1 1 0 0
1295 M 42 36 4 8 4.839992 94.128 44.674 0 -1 0 -1 1 -1 -1 -1 0 1 0
1296 F 42 36 4 8 2.208632 93.078 28.184 1 1 0 0 0 1 0 -1 1 1 -1
1297 M 42 36 4 8 3.000926 96.698 44.394 -1 0 -1 0 0 1 0 1 0 1 1
1298 F 42 36 4 8 2.381743 92.958 26.764 -1 0 0 0 -1 -1 0 0 0 -1 -1
1299 F 41 9 4 8 1.460508 90.350 25.972 0 -1 0 -1 -1 -1 -1 -1 0 1 1
1300 F 41 9 4 8 1.686151 86.210 27.102 1 0 -1 -1 1 1 0 0 -1 -1 1
1301 F 41 9 4 8 2.936640 97.250 38.302 1 1 -1 -1 1 0 1 -1 0 0 0
1302 M 41 9 4 8 1.764011 90.584 34.292 0 0 0 0 1 0 0 -1 0 0 1
1437 F 41 9 6 10 2.521450 86.280 29.212 1 0 -1 1 1 0 1 1 -1 0 0
1438 F 41 9 6 10 2.589012 98.960 32.812 0 0 -1 1 1 -1 0 -1 0 0 -1
1439 F 41 9 6 10 3.250871 89.600 31.602 0 -1 0 -1 -1 0 1 0 -1 0 0
1440 M 41 9 6 10 2.589012 96.320 40.652 0 1 -1 0 -1 -1 -1 1 0 1 0
1441 M 41 9 6 10 3.177307 97.050 39.212 1 1 -1 0 1 1 -1 0 1 1 1
1442 F 41 9 6 10 2.344552 85.050 31.082 -1 -1 1 -1 0 -1 -1 -1 -1 1 1
1443 M 31 6 13 10 6.239614 97.306 46.882 0 0 0 0 0 -1 -1 -1 0 1 1
1445 M 31 6 13 10 6.145780 97.196 47.312 1 -1 0 0 0 0 0 0 1 0 -1
1446 M 31 6 13 10 4.465913 98.426 43.812 0 -1 0 0 0 -1 1 0 0 0 -1
1447 F 31 6 13 12 2.712283 82.736 26.992 0 -1 0 1 -1 1 0 0 0 1 0
1448 M 31 6 13 12 2.309748100.126 40.762 1 -1 -1 1 -1 -1 0 -1 1 1 -1
1449 F 31 6 13 12 3.089857104.106 30.632 -1 0 1 -1 0 1 0 0 0 1 1
1451 M 31 6 13 12 5.148974 88.666 40.412 0 0 0 1 -1 -1 1 0 1 0 1
1452 M 31 6 13 12 3.359891106.176 44.112 0 -1 -1 1 0 0 0 1 0 1 -1
1453 F 31 6 13 12 1.523216 87.186 27.452 0 -1 0 0 0 1 -1 0 0 0 -1
1454 M 31 6 13 12 4.689586 94.156 41.692 0 1 -1 -1 -1 -1 0 0 1 -1 -1
1455 F 31 6 13 12 2.159765 88.106 32.652 -1 0 -1 1 0 -1 0 0 1 1 0
1456 F 42 36 3 12 4.038872 97.268 31.814 1 1 0 -1 1 -1 1 0 0 0 -1
1457 M 42 36 3 12 2.187336100.938 38.404 0 0 -1 0 0 0 0 1 0 1 1
1458 F 42 36 3 12 2.559654 88.328 28.804 -1 -1 -1 0 0 1 0 -1 0 0 0

In [19]:
null_markers = 675
marker_fits_trait1 = list()
for(i in seq(null_markers)){
    current_marker = paste0('G', i)
    f2_data_null[[paste0(current_marker, '_D')]] = ifelse(f2_data_null[[current_marker]], 0, 1)
    model_formula = paste0("trait1 ~ Sex + LSB + LSW + ", current_marker, "+", current_marker, "_D")
    marker_fits_trait1[[i]] = lm(as.formula(model_formula), data = f2_data_null)
}

In [20]:
table(laply(marker_fits_trait1, function(x) summary(x)$coefficients[5, 'Pr(>|t|)']) < 0.05) / null_markers


     FALSE       TRUE 
0.95555556 0.04444444