Session 3: Simulating recombination

In this module you learn about recombination by simulating cross-over events on chromosomes during meiosis. We will focus on common genetic experimental designs used when mapping traits, or studying recombination and linkage: F2 crosses (F1 hybrid crossed to an F1 hybrid), and F1-backcrosses (F1 hybrid crossed to one of its parents).

Learning objectives:

By the end of this lesson you should:

  1. Understand the process by which recombination mixes parental genotypes among offspring.
  2. Be able to calculate genetic distance in units of centimorgans between genetic markers.
  3. Be able to describe how linkage between genetic markers relates to recombination rates.

In [1]:
# pip install toyplot
import numpy as np
import pandas as pd
import toyplot

A genetic crossing experiment

A student plans to conduct a set of mating crosses for a genetic experiment. They will cross two homozygous strains of lab mice (Mus musculus) to produce F1 offspring, and then perform two types of additional crosses using these hybrids. The first is to backcross the F1 with a parental strain (f1xbackcross) and the second is to produce F2 offspring by crossing two F1 individuals to each other.

We can simulate the process of a genetic crossing experiment using computational simulations. Here we will keep track of chromosomes in diploid individuals as they are inherited in F1s and their offspring, and we will keep track of genotyped markers on the chromosomes that can be used to tell whether a chromosomal segment came from one parental strain versus the other. Our simulations will include the process of recombination, such that markers from the two parental strains can become combined onto the same chromosome during the experiment.

insert a picture here of the experimental design here (e.g., f1 x parent and f1 x f1.)

Part I. Simulating chromosomes

1.0: Generate homozygous chromosomes for each parental strain

We are interested in tracking the inheritance of alleles from parents to offspring, and we are assuming that the lab strains we are working with are highly inbred. Therefore, we start by generating chromosomes where one parent has an "A" allele, and other has a "B" allele, at every marker. The function below returns a DataFrame storing N markers sequenced on two copies of a homozygous chromosome (see example below).


In [27]:
def get_parental_chromosomes(allele, nmarkers):
    "Return dataframe with N markers on two homozygous chromosomes"
    chromosome = pd.DataFrame({
        "chrom": np.concatenate((np.repeat(1, nmarkers), np.repeat(2, nmarkers))),
        "marker": np.concatenate((np.arange(1, nmarkers+1), np.arange(1, nmarkers+1))),
        "allele": allele,
    })
    return chromosome

You can see in the dataframe below that there are 10 markers (genomic positions where we measured the genotype), labeled as 1-10. At each marker there are two copies (e.g., chrom 1 marker 1, and chrom 2 marker 2). The "chrom" here could represent different chromatids if we are looking at sister chromatids (identical copies), or they could be chromosomes if we are looking at the two homologous chromosomes in an individual. Thus this dataframe object can be used to represent any object where we wish to model two copies of marker.


In [28]:
# simulate chromosome with allele A at 5 markers
get_parental_chromosomes("A", 5)


Out[28]:
chrom marker allele
0 1 1 A
1 1 2 A
2 1 3 A
3 1 4 A
4 1 5 A
5 2 1 A
6 2 2 A
7 2 3 A
8 2 4 A
9 2 5 A

Let's store variable for each parental strain chromosome


In [29]:
strain1 = get_parental_chromosomes("A", 10)
strain2 = get_parental_chromosomes("B", 10)

1.1: Plot the chromosome as a simple scatterplot

We can represent the chromosome using a scatterplot in which each genotyped position (1-10) is used as a coordinate on the y-axis, and the chrom number (1 or 2) is used as a coordinate on the x-axis. We represent each data point with a square marker. This function includes a conditional statement that will also color each marker according to its genotype (e.g., allele is A or B). Finally, we include a bit of code to make it look nice by styling the axes.

We are actually going to use a more complicated function for plotting chromosomes for the rest of the notebook (shown a little further down) but it is just an extension of this simpler function.


In [30]:
def draw_chromosome(chroms, label=None):
    "Example function to draw a single chromosome as a scatterplot"

    # get colors from allele values in dataframe
    colors = ['#EE3A8C' if i=="A" else '#1C86EE' for i in chroms["allele"]]
    
    # create a canvas and coordinate axes
    canvas = toyplot.Canvas(width=125, height=250)
    axes = canvas.cartesian(xlabel=label)
    
    # plot scatterplot onto axes
    mark = axes.scatterplot(
        chroms["chrom"],
        chroms["marker"], 
        marker="s", 
        color=colors,
        size=12)

    # style the plot
    axes.y.show = False
    axes.x.ticks.show = True
    axes.x.spine.show = False
    axes.x.ticks.locator = toyplot.locator.Integer()
    return canvas, axes, mark

In [31]:
# draw the two homologous chromosomes in a strain1 individual
draw_chromosome(strain1, label="Chromosomes");


12Chromosomes

1.2: A generalized function for chromosome plots

The code for this function is very similar to the one above, but can take multiple chromosomes as input and it will plot them onto a grid. It can also plot chromatids for when the chromosomes have replicated, as we will see in a bit. It is not important for now that you understand how this plotting function works, for this notebook you'll need only to interpret the plots, but the code may be of interest.


In [55]:
def draw_chromosomes(data, title=None):
    "Draw multiple chromosomes side-by-side on a shared x-axis"
    
    # data should be a list or tuple of multiple chromosomes
    if not isinstance(data, (list, tuple)):
        data = (data,)

    # calculate dimensions of the canvas based on number of subplots
    nmarkers = int(data[0][data[0].chrom == 1].shape[0])
    nplots = len(data)
    nrows = np.ceil(nplots / 10).astype(int)
    ncols = np.min((10, nplots))
    width = (100 * ncols if 4 not in data[0].chrom.values else 200)
    height = 25 * nmarkers * nrows
    canvas = toyplot.Canvas(width=width, height=height)      
    
    # add a global title
    if title:
        canvas.text(
            canvas.width / 2, 25,
            "<b>{}</b>".format(title), 
            style={"font-size":"14px"},
        )
    
    # plot maternal as scatterplot onto axes
    for idx in range(nplots):
        
        # create a new axis
        ax = canvas.cartesian(
            grid=(nrows, ncols, idx), 
            margin=(60, 40, 40, 40),
        )

        # select input data
        chroms = data[idx]
        
        # get colors
        colors = ['#EE3A8C' if i=="A" else '#1C86EE' for i in chroms["allele"]]
        
        # draw scatterplot
        mark = ax.scatterplot(
            chroms["chrom"],
            chroms["marker"], 
            marker="s", 
            color=colors,
            size=12)
   
        # style the axes
        ax.y.show = False
        ax.x.ticks.show = True
        ax.x.spine.show = False
        ax.x.ticks.locator = toyplot.locator.Integer()

    return canvas, ax, mark

The plotting function uses two colors to represent alleles from the two parental strains

In reality, we expect that the chromosomes of two mouse lab strains would be highly similar throughout most of their genomes. Here we are representing by red or blue the parental origin of each chromosomal segment. When we refer to markers, we are referring to a SNP or allele that is unique to one strain or the other. You can think of these 10 markers as 10 SNPs equally spread across the chromosome. To keep track of which chromosomal segments the offspring inherit from each parent we will keep track of the genotyped markers at each position.


In [56]:
# generate a chromosome for an individual from each strain
strain1 = get_parental_chromosomes("A", 10)
strain2 = get_parental_chromosomes("B", 10)

# draw both chromosomes
draw_chromosomes((strain1, strain2), title="Chroms in two parent strains");


Chroms in two parent strains1212

Part II: Meiosis

2.1: Chromosomes replicate to form sister chromatids

A diploid individual has two homologous chromosomes. Within a meiotic cell the chromosomes are separated and each is replicated to produce an identical sister chromatid, leading to four chromatids. At this point no genetic material has yet to be exchanged between chromatids.


In [165]:
def get_chromatids(chroms):
    "returns four chromatids formed during Meiosis I in a meiotic cell."
    nc1 = chroms[chroms.chrom == 1].copy()
    nc2 = nc1.copy()
    nc3 = chroms[chroms.chrom == 2].copy()
    nc4 = nc3.copy()
    nc2.chrom = 2
    nc3.chrom = 3
    nc4.chrom = 4
    return pd.concat([nc1, nc2, nc3, nc4], ignore_index=True)

In the case of a pure strain 1 individual...

The sister chromatids are all the same, they all have the "A" allele at every marker.


In [166]:
# get chromatids (replicated chromosomes) for pure parent strain 
ctids = get_chromatids(strain1)
draw_chromosomes(ctids, "chromatids in strain1 parent");


chromatids in strain1 parent1234

In the strain 2 parent ...

The sister chromatids are also all the same, they are all "B" at every marker.


In [167]:
# generate two chromosomes from strain 1
ctids = get_chromatids(strain2)
draw_chromosomes(ctids, "chromatids in strain 2 parent");


chromatids in strain 2 parent1234

The dataframe looks similar for chromatids

We now use the column labeled "chrom" to represent the chromatids from each homologous chromosome. Chromatids 1-2 are from chromosome 1 and chromatids 3-4 and from chromosome 2.


In [168]:
# the chromatids plotted above
ctids


Out[168]:
chrom marker allele
0 1 1 B
1 1 2 B
2 1 3 B
3 1 4 B
4 1 5 B
5 1 6 B
6 1 7 B
7 1 8 B
8 1 9 B
9 1 10 B
10 2 1 B
11 2 2 B
12 2 3 B
13 2 4 B
14 2 5 B
15 2 6 B
16 2 7 B
17 2 8 B
18 2 9 B
19 2 10 B
20 3 1 B
21 3 2 B
22 3 3 B
23 3 4 B
24 3 5 B
25 3 6 B
26 3 7 B
27 3 8 B
28 3 9 B
29 3 10 B
30 4 1 B
31 4 2 B
32 4 3 B
33 4 4 B
34 4 5 B
35 4 6 B
36 4 7 B
37 4 8 B
38 4 9 B
39 4 10 B

Part 3: The mating process:

In the absence of recombination parents would only pass on one copy of their genome, either their mothers or fathers. Because recombination swaps genetic material between chromosomes, however, parents are instead able to pass on bits of both of their parents genomes on a single recombined chromosome. Let's first look at what it would be like if there was no recombination. Let's start by writing a function to perform the mating crosses (combine haploid gametes into a diploid zygote).

The function below combines many functions we have described previously so that we can perform our crossing experiment. We want to now be able to input the four gametic products from two individuals and randomly sample one gamete from each to combine into a new diploid offspring.


In [210]:
def mating(sperm, eggs):
    "Randomly samples gametes and returns a new diploid offspring"

    # sample two random numbers to select winning gamete in each
    sdx, edx = np.random.choice(np.arange(1, 5), 2)
    
    # get winners
    luckysperm = sperm[sperm.chrom == sdx].copy()
    luckyegg = eggs[eggs.chrom == edx].copy()
    
    # relabel chromatid of male as 2 and female as 1
    luckyegg.chrom =  1
    luckysperm.chrom = 2
    
    # combine and return as a single df
    return pd.concat([luckyegg, luckysperm], ignore_index=True)

3.1: Let's use our mating function to generate an F1 hybrid

Before we were taking a shortcut to modeling F1 hybrids by not sampling gametes from the two parental strains since they are completely homozygous anyways, and so recombination had no effect in their mixing their genomes. Here we write out the complete code to get gametes from each pure parent (two pure strain chromosomes), and produce an F1 y mating the gametes from each. As you can see in the drawing the F1 looks like we would expect, half of each parent.


In [211]:
# generate two chromosomes from strain 1
ctids1 = get_chromatids(strain1)

# generate two chromosomes from strain 2
ctids2 = get_chromatids(strain2)

# simulate an F1 by sampling one chromatid from each parent
f1 = mating(ctids1, ctids2)

# draw the F1 chromosomes
draw_chromosomes(f1, "F1 hybrid");


F1 hybrid12

Part 4: Crossovers swap genetic material

We can simulate the process of recombination using our dataframe representation of chromatids by swapping alleles between them in our dataframes. The function below performs this task after randomly sampling a location at which the crossover event occurs, and it randomly samples which chromatids will crossover (if we model interference then not all of them crossover each generation).


In [173]:
def meiosis(ctids, crossover=True, interference=False, weights=None):
    """
    Swaps alleles between chromatids on each of two input chromosomes dataframes
    (mat and pat). If crossover then recombinant chromatids are returned. If 
    interference then crossover only occurs between one pair of chromatids. Rate
    of recombination is constant by default, but can be weighted with an input
    list of lenght nmarkers - 1.
    """   
    # make a copy that we will modify and return
    ntids = ctids.copy()
    
    # space between markers where crossovers can occur
    nmarkers = ctids[ctids.chrom == 1].shape[0]
    intervals = nmarkers - 1

    # get recombination rates if modeling crossovers
    rates = np.ones(intervals) / intervals
    if weights and crossover:
        rates = np.array(weights) / sum(weights)
        assert len(weights) == intervals, "weights should be len {}".format(intervals)
        
    # grouping of chromatids on chromosomes
    cts1 = [1, 2]
    cts2 = [3, 4]

    # sample which chromatids to cross first
    c1 = cts1.pop(np.random.binomial(1, 0.5))
    c2 = cts2.pop(np.random.binomial(1, 0.5))

    # model crossovers
    while 1:
        if crossover:
            # sample a crossover interval using rates from weights
            breakpoint = np.random.choice(np.arange(1, nmarkers), p=rates)

            # select parts of the original dataframes to swap 
            c1x = (ctids.chrom == c1) & (ctids.marker > breakpoint)
            c2x = (ctids.chrom == c2) & (ctids.marker > breakpoint)
       
            # swap material   
            ntids.loc[c1x, "allele"] = ctids.loc[c2x, "allele"].values
            ntids.loc[c2x, "allele"] = ctids.loc[c1x, "allele"].values

        # end loop or sample next pair of chromatids
        if interference or (not cts1):
            break
        else:
            c1 = cts1.pop()
            c2 = cts2.pop()
            
    return ntids

4.1: Crossovers swap genetic material between chromatids

Let's model the process of recombination in the F1 hybrid genome. Here the effect of crossovers will be easy to observed because we should see alleles frome each of the parent genomes become combined onto individual chromatids. In the cell directly below we first show the chromatids that are produced in the F1 before crossovers occur (each chromosome is replicated). In the next cell down we include crossover events and now you can see the recombined genomes.


In [212]:
ctids = get_chromatids(f1)
ntids = meiosis(ctids, crossover=False)
draw_chromosomes(ntids, title="F1 chromatids w/o crossover");


F1 chromatids w/o crossover1234

In [213]:
ctids = get_chromatids(f1)
ntids = meiosis(ctids, crossover=True)
draw_chromosomes(ntids, title="F1 chromatids w crossover");


F1 chromatids w crossover1234

4.2: Non-random crossovers

The meiosis function also includes an argument called weights so that you can change the recombination rate along intervals between the genetic markers. For example, to make recombination 10X more likely between markers 2 and 3 we would write a weights list like below. There is still randomness in the simulations, but more often than not the crossover event will occure between markers 2 and 3.


In [214]:
weights = [1, 10, 1, 1, 1, 1, 1, 1, 1]
ntids = meiosis(ctids, crossover=True, weights=weights)
draw_chromosomes(ntids, title="non-random recombination");


non-random recombination1234
[4.2] Action: Try modifying the weights values in the list in the above cell. For example, change the 10 to 100, and rerun the code cell several times. You should see that recombination occurs almost always in the specified window when the weight is set high enough.

4.3: Chromatid interference

In this case only one pair of sister chromatids will exchange genetic material in a crossover, as opposed to both.


In [215]:
ntids = meiosis(ctids, crossover=True, interference=True)
draw_chromosomes(ntids, title="chromatid interference");


chromatid interference1234

4.4: View the new recombined chromatids in the dataframe

You can see that the allele column now contains a mixture of genotypes from the two parental strains. The chrom column now represent the chromatid number (1-4). The first two chromatids came from one parent, and the latter two came from the other, prior to the crossing over event exchanging genetic material between them.

As a reminder of how we got here, I include in the cell below the series of function calls that we have now developed so far. We first generate a chromosome from each pure parent strain using get_parental_chromosome(), and input one from each parent into get_chromatids() to get the four chromatids that would be observed in an F1 meiotic cell. Then we call the meiosis() function to simulate crossover events to produce recombined chromatids. Finally, we view the resulting dataframe representation of the chromosome.


In [216]:
# generate two chromosomes (maternal is "A", paternal is "B")
strain1 = get_parental_chromosomes("A", 10)
strain2 = get_parental_chromosomes("B", 10)

# get gametes from two parental chromosomes
sperm = meiosis(get_chromatids(strain1))
eggs = meiosis(get_chromatids(strain2))

# create the F1
f1 = mating(eggs, sperm)

# get gametes from the F1
fsperm = meiosis(get_chromatids(f1))

# show the dataframe
fsperm


Out[216]:
chrom marker allele
0 1 1 A
1 1 2 A
2 1 3 B
3 1 4 B
4 1 5 B
5 1 6 B
6 1 7 B
7 1 8 B
8 1 9 B
9 1 10 B
10 2 1 A
11 2 2 B
12 2 3 B
13 2 4 B
14 2 5 B
15 2 6 B
16 2 7 B
17 2 8 B
18 2 9 B
19 2 10 B
20 3 1 B
21 3 2 B
22 3 3 A
23 3 4 A
24 3 5 A
25 3 6 A
26 3 7 A
27 3 8 A
28 3 9 A
29 3 10 A
30 4 1 B
31 4 2 A
32 4 3 A
33 4 4 A
34 4 5 A
35 4 6 A
36 4 7 A
37 4 8 A
38 4 9 A
39 4 10 A

4.5: The importance of genetic markers (genetic variation)

In the first plot below we show the meiotic products of a pure parental strain individual. As you can see that there is no variation at the genotyped markers and therefore we cannot tell where the crossovers occurred. By contrast, the next plot shows the meitoic products of an F1 offspring between the two parental strains, and here you can clearly see where a crossover occured between two of the chromatids.

It is because of the differences at genotyped markers between the two strains that we are be able to observe where crossovers occurred in the second case but not in the first.


In [217]:
# generate two chromosomes (maternal is "A", paternal is "B")
strain1 = get_parental_chromosomes("A", 10)
strain2 = get_parental_chromosomes("B", 10)

# get chromatids of pure parental strain
gametes = meiosis(get_chromatids(strain1))
draw_chromosomes(gametes, "strain1 parent gametes");

# get chromatids from an F1 (chromosome from each parent)
gametes = meiosis(get_chromatids(f1))
draw_chromosomes(gametes, "F1 gametes");


strain1 parent gametes1234
F1 gametes1234
[4.5] Question: If there are very few differences between two strains it may be difficult to map crossover events with great accuracy. Why don't we always use very highly divergent crosses when trying to map recombination? What is the limit on the process? Answer in the cell below using Markdown.

Response:

Part 5: Crossing experiments

5.1: Produce an F1 x parent back-cross

As you can see we are continuing to build up functions based on several of the previously defined functions. This new function generates chromosomes from each pure parent (get_parental_chromosomes()), and from these generates F1 chromosomes with a crossover event (crossover(), then samples the four gametic products of the F1 chromosomes (get_gametes()), and also gets the four gametic products of a pure "A" parent (get_gametes()) and finally performs a mating cross with these gametes to produce an F1xbackcross offspring (mating()). The offspring is returned by the function as a result.


In [378]:
# get meiotic products of F1
eggs = meiosis(get_chromatids(f1))

# get meiotic products of strain1 parent
sperm = meiosis(get_chromatids(strain1))

# create an F1xparent cross
f1back = mating(sperm, eggs)

# draw the f1backcross chromosomes
draw_chromosomes(f1back);


12

We can write a function to do this as well, which will make it easier to reuse:


In [264]:
def f1xparent_cross(**kwargs):
    "returns a single offspring of a f1 x parent cross"

    # generate two chromosomes (maternal is "A", paternal is "B")
    s1 = get_parental_chromosomes("A", 10)
    s2 = get_parental_chromosomes("B", 10)

    # generate gametes from pure parents
    s1gam = meiosis(get_chromatids(s1))
    s2gam = meiosis(get_chromatids(s2))

    # make F1 from parent cross
    f1 = mating(s1gam, s2gam)

    # generate gametes from F1
    f1gam = meiosis(get_chromatids(f1))
        
    # make F1-parent backcross
    f1p = mating(f1gam, s1gam)
    return f1p

A list-comprehension statement is used below to call the function f1xparent_cross() 20 times and store the resulting 20 offspring in a list called f1ps. Then we draw the homologous chromosomes in the 20 offspring.


In [265]:
# produce 10 f1 x parent backcrosses
f1ps = [f1xparent_cross() for i in range(20)]

# draw chromosome of F1xparents
draw_chromosomes(f1ps, title="F1 x maternal backcross offspring");


F1 x maternal backcross offspring1212121212121212121212121212121212121212
[5.1] Action and Question Create a new function called `f1xparent_cross_interference()` by copying and pasting the `f1parent_cross()` function from above. In your new function add an argument to the `meiosis()` function so that it says `interference=True`. Now rerun the function like above to produce 20 offspring. In each output, how many of the F1-backcross offspring inherited zero recombinant chromosomes vs. one recombinant chromosome vs. two recombinant chromosomes? Write your response in the markdown cell below.

In [ ]:

Response:


5.2: Produce an F2 (F1 x F1) cross

We are now crossing two F1 hybrid individuals. The code is very similar to above.


In [302]:
def f1xf1_cross():
    "returns a single offspring of a f1 x f1 cross"
    
    # generate two chromosomes (maternal is "A", paternal is "B")
    s1 = get_parental_chromosomes("A", 10)
    s2 = get_parental_chromosomes("B", 10)
    
    # generate two f1s siblings
    f1s = []
    for i in range(2):

        # generate gametes from pure parents
        s1gam = meiosis(get_chromatids(s1))
        s2gam = meiosis(get_chromatids(s2))

        # make F1 from parent cross
        f1s.append(mating(s1gam, s2gam))
        
    # generate gametes from F1
    f1gam1 = meiosis(get_chromatids(f1s[0]))
    f1gam2 = meiosis(get_chromatids(f1s[1]))

    # make F1-parent backcross
    f2 = mating(f1gam1, f1gam2)
    return f2

Here we generate 20 F2 offspring. Compare this plot to the one above (f1xbackcross) to see how they are different.


In [292]:
# produce 10 f1 x parent backcross
f2s = [f1xf1_cross() for i in range(20)]

# draw chromosome of F2
draw_chromosomes(f2s, title="F1 x F1 offspring");


F1 x F1 offspring1212121212121212121212121212121212121212
[5.2] Question: What is the most apparent difference betweent the chromosomes of F2 offspring versus those of the F1xbackcross offspring? Enter your answer into the Markdown cell below.

Response:


Part 6: Quantifying recombination

By quantifying the number of crossovers that occur between markers we can measure their linkage, meaning whether their alleles are correlated. The alleles at two markers can be completely uncorrelated; for example, if they are on different chromosomes. However, if two markers show a significant correlation then we say that they are in linkage disequilibrium. Such information can be used to measure the distances between markers on chromosomes.

Let's start by simply counting how many recombinant chromosomes are observed within the offspring of our experimental crosses. The function below will return for a list of offspring how many individuals had 0, 1, or 2 recombinant chromosomes.


In [271]:
def count_recombinants(offspring):
    "returns a dictionary of the number of individuals with 0, 1, or 2 recombinant chroms"
    # a dictionary for storing results
    recs = {i: 0 for i in (0, 1, 2)}
    
    # for each individual get the number of alleles per chromatid
    for ind in offspring:
        alleles_per_tid = ind.groupby("chrom").allele.unique().apply(len) # [1, 2]
        recs[alleles_per_tid.sum() - 2] += 1
    return recs

In [274]:
# a result for 20 offspring of f1xbackross experiment
count_recombinants(f1ps)


Out[274]:
{0: 0, 1: 20, 2: 0}

6.1 Count recombinant chromosomes in F1xbackcross experiment

Now using this function we can calculate even for very large sample sizes the number of recombinant chromosomes in a list of individuals. Below we use a for loop to change the variable i in each loop to test increasingly larger samples of offspring. With small sample sizes we might expect to see variable results by chance, but with larger sample sizes we expect to converge closer to the expected ratio of recombinatants. In the example below we generate f1xbackcross individuals. The results are collected into a DataFrame at the end to print as a nice table.


In [277]:
# a list to store results in
results = []

# iterate over increasingly larger samples of offpring
for i in (10, 25, 50, 100, 250, 500):
    
    # generate offspring for N f1 x parent crosses
    offspring = [f1xparent_cross() for i in range(i)]
    
    # count cross overs in each offspring
    crossovers = count_recombinants(offspring)
    
    # print results
    results.append([i] + [crossovers[j] for j in (0, 1, 2)])

# print results as a dataframe
print("F1xparent backcross experiments")
pd.DataFrame(results, columns=["sample_size", "0-recomb", "1-recomb", "2-recomb"])


F1xparent backcross experiments
Out[277]:
sample_size 0-recomb 1-recomb 2-recomb
0 10 0 10 0
1 25 0 25 0
2 50 0 50 0
3 100 0 100 0
4 250 0 250 0
5 500 0 500 0
[6.1] Action and Question: Why are there no offspring with two recombinant chromosomes in the f1 x parent offspring? What happens if you replace the f1xparent_cross() function with your function f1xparent_cross_interference()? Do the results become more accurate with larger sample sizes? What do you think is approximately the ratio of 0 to 1 to 2 recombinant chromosomes per individual? Answer in the Markdown cell below.

Response:


6.2: Count recombinant chromosomes in F2 experiment

The code below is the same as above except we now call the f1xf1_cross() function to generate F2s instead of F1xbackcross individuals.


In [278]:
# a list to store results in
results = []

# iterate over increasingly larger samples of offpring
for i in (10, 25, 50, 100, 250, 500):
    
    # generate offspring for N f1 x parent crosses
    offspring = [f1xf1_cross() for i in range(i)]
    
    # count cross overs in each offspring
    crossovers = count_recombinants(offspring)
    
    # print results
    results.append([i] + [crossovers[j] for j in (0, 1, 2)])

# print results as a dataframe
print("F2 experiments")
pd.DataFrame(results, columns=["sample_size", "0-recomb", "1-recomb", "2-recomb"])


F2 experiments
Out[278]:
sample_size 0-recomb 1-recomb 2-recomb
0 10 0 0 10
1 25 0 0 25
2 50 0 0 50
3 100 0 0 100
4 250 0 0 250
5 500 0 0 500

6.3 Linkage measured in centimorgans

The function below uses the frequency of crossovers to calculate the distance between markers in units of centimorgans.


In [279]:
def get_distance_in_centimorgans(chroms, marker1, marker2):
    ctotal = 0
    recomb = 0
    for chrom in chroms:
        for ctid in (1, 2):
            a1 = chrom[(chrom.chrom == ctid) & (chrom.marker == marker1)].allele.values
            a2 = chrom[(chrom.chrom == ctid) & (chrom.marker == marker2)].allele.values
            ctotal += 1
            if a1 != a2:
                recomb += 1
    return (recomb / ctotal) * 100

We can apply this function like below, by entering a list of offspring of a genetic experiment as the first argument, followed by the two markers that we want to meaure the distance between. Once again, we expect that if we produce larger sample sizes we will be able to measure the genetic distance more accurately.


In [281]:
# get 20 f2 offspring
offspring = [f1xf1_cross() for i in range(20)]

# get distance between markers 1 and 3 in 20 f2 offspring 
get_distance_in_centimorgans(offspring, 1, 3)


Out[281]:
25.0

We can write a for-loop to calculate the genetic distances between many markers. Let's calculate the distance between marker 1 and each other marker and store the result so that we can plot it.


In [282]:
# get 100 f2 offspring
offspring = [f1xf1_cross() for i in range(100)]

# a list for storing results
distances = []

# iterate over markers
for marker in range(1, 11):
    dist = get_distance_in_centimorgans(offspring, 1, marker)
    distances.append(dist)

Using this list of values we can now plot the distances between marker and 1 and each other marker. The line plot below shows these genetic distances.


In [283]:
# create a line plot
c, a, m = toyplot.plot(
    a=range(1, 11),
    b=distances, 
    width=300, 
    height=300, 
    ylabel="genetic distance (cm)",
    xlabel="genetic marker",
);

# style the plot
a.y.ticks.show = True
a.x.ticks.show = True
a.x.ticks.locator = toyplot.locator.Explicit(range(1, 11, 1))


12345678910genetic marker050100genetic distance (cm)
[6.3] Question: What is the relationship between genetic distance and the order of markers on the chromosome? Why is the line straight and linear, what might make it non-linear?

6.4: Variable recombination rates

Until now we have assumed that recombination rate is constant across the chromosome. But this is not generally the case. Recombination tends to be lower near the centromere or telomeres, and sometimes in association with other genomic features as well. To the extent we can observe many crossover events we can actually detect whether some genomic regions have more or less crossover events than expected. Let's explore this through simulation by simulating chromosomes with higher crossover events in one particular region to see if we can measure and detect this difference.


In [298]:
def f1xf1_cross_variable_recomb_rates(weights):
    "returns a single offspring of a f1 x f1 cross with recomb hotspot"
    # generate two chromosomes (maternal is "A", paternal is "B")
    s1 = get_parental_chromosomes("A", 10)
    s2 = get_parental_chromosomes("B", 10)
        
    # generate two f1s siblings
    f1s = []
    for i in range(2):

        # generate gametes from pure parents
        s1gam = meiosis(get_chromatids(s1), weights=weights)
        s2gam = meiosis(get_chromatids(s2), weights=weights)

        # make F1 from parent cross
        f1s.append(mating(s1gam, s2gam))
        
    # generate gametes from F1
    f1gam1 = meiosis(get_chromatids(f1s[0]), weights=weights)
    f1gam2 = meiosis(get_chromatids(f1s[1]), weights=weights)

    # make F1-parent backcross
    f2 = mating(f1gam1, f1gam2)
    return f2

Now let's run an experiment to generate F2 offspring when there is variable recombination rates along the chromosome, as described by our vector of crossover probabilities (probs).


In [301]:
# define weights of crossover occurring in intervals
weights = [1, 1, 20, 1, 1, 1, 1, 1, 1]

# generate f2s with variable recomb rates
f2s_var = [f1xf1_cross_variable_recomb_rates(weights) for i in range(20)]

# draw chromosomes
draw_chromosomes(f2s_var, title="F2s with recombination hotspot");


F2s with recombination hotspot1212121212121212121212121212121212121212
[6.4] Question: Can you identify from these 20 individuals where a "recombination hotspot" is located? Answer in the Markdown cell below. How could we use measurements of genetic distances between markers in centimorgans to test the hypothesis that one region has more recombination than another?

Response:

[6.4] Action: Generate a new list of 100 F2 offspring with variable recombination rates by copying from the code in the cell above. Then try to use the `get_distance_in_centimorgans()` function like we did earlier to measure the genetic distance of markers relative to marker 1, and plot it, like we did before. How is the plot different? Does this help you to identify the recombination hotspot in these data?

In [ ]:


Part 7: Multi-generational experiment

7.1: multigeneration sibling crosses

So far we have only looked at the results of 1 or 2 generations of crosses. Let's look at what happens if we continue a crossing experiment over many generations. First, we will simulate an example where we create F1 siblings, and then cross them to each other like before to create F2s, however then we will continue to cross siblings each generation.


In [327]:
def generations_of_sib_crosses(ngens):
    "returns a list of offspring from successive sib crosses"
    
    # generate two homozygous parents
    s1 = get_parental_chromosomes("A", 10)
    s2 = get_parental_chromosomes("B", 10)
        
    # a list to store f1 each generation
    offspring = []
    
    # iterate over generations
    for gen in range(ngens):

        # generate two f1s siblings
        f1s = []
        for i in range(2):

            # generate gametes from pure parents
            s1gam = meiosis(get_chromatids(s1))
            s2gam = meiosis(get_chromatids(s2))

            # make F1 from parent cross
            f1s.append(mating(s1gam, s2gam))
        
        # store one F1 for output
        offspring.append(f1s[0])
        
        # set the F1 sibs as parents for the next generation
        s1 = f1s[0]
        s2 = f2s[1]
        
    return offspring

Run this code cell multiple times to see random replicate results


In [377]:
f1s = generations_of_sib_crosses(10)
draw_chromosomes(f1s, title="generation 1 --> 10: F1 sib crosses");


generation 1 --> 10: F1 sib crosses12121212121212121212

7.2: Multigeneration backcross experiment

Similarly we have performed an F1 x parent backcross before, but now we will continue to backcross those offspring to the pure parental strain over multiple generations.


In [362]:
def generations_of_backcross_to_maternal(ngens, nmarkers=10):
    "returns a list of offspring from successive backcrossing to the 'A' parent"
    
    # generate two parental diploids 
    s1 = get_parental_chromosomes("A", nmarkers)
    s2 = get_parental_chromosomes("B", nmarkers)
        
    # a list to store f1 each generations
    offspring = []
    
    # iterate over generations
    for gen in range(ngens):

        # generate two f1s siblings
        f1s = []
        for i in range(2):

            # generate gametes from pure parents
            s1gam = meiosis(get_chromatids(s1))
            s2gam = meiosis(get_chromatids(s2))

            # make F1 from parent cross
            f1s.append(mating(s1gam, s2gam))
            
        # store one F1 offspring
        offspring.append(f1s[0])
        
        # set one original parent and one F1 as parents for next gen
        s1 = s1
        s2 = f1s[0]
        
    return offspring

Again, run this cell block multiple times to see variable results


In [363]:
backs = generations_of_backcross_to_maternal(10)
draw_chromosomes(backs ,title="generation 1 --> 10: backcrossed");


generation 1 --> 10: backcrossed12121212121212121212
[7.2] Question: Why patterns do you observe over multiple generations in the f1xf1 cross versus the f1xparent cross? Why do you think these two types of crosses would be useful in genetic experiments, i.e., why would we want to create offspring with these combinations of recombined chromosomes? Answer in the markdown cell below.

Response: