Tom Ellis, May 2020.
If you are impatient to do an analyses as quickly as possible without reading the rest of the documentation, this page provides a minimal example. The work flow is as follows:
It goes without saying that to understand what the code is doing and get the most out of the data, you should read the tutorials.
Import the package.
In [31]:
import faps as fp
import numpy as np
Import genotype data. These are CSV files with:
In [32]:
adults = fp.read_genotypes('../data/parents_2012_genotypes.csv', genotype_col=1)
progeny = fp.read_genotypes('../data/offspring_2012_genotypes.csv', genotype_col=2, mothers_col=1)
# Mothers are a subset of the adults.
mothers = adults.subset(individuals=np.unique(progeny.mothers))
In this example, the data are for multiple maternal families, each containing a mixture of full- and half-siblings. We need to divide the offspring and mothers into maternal families.
In [33]:
progeny = progeny.split(progeny.mothers)
mothers = mothers.split(mothers.names)
I expect that multiple maternal families will be the most common scenario, but if you happen to only have a sigle maternal family, you can skip this.
Calculate paternity of individuals. This is equivalent to the G matrix in Ellis et al (2018).
In [34]:
patlik = fp.paternity_array(progeny, mothers, adults, mu = 0.0015)
Cluster offspring in each family into full-sibling families.
In [14]:
sibships = fp.sibship_clustering(patlik)
You can pull out various kinds of information about the each clustered maternal family. For example, get the most-likely number of full-sib families in maternal family J1246.
In [25]:
sibships["J1246"].mean_nfamilies()
Out[25]:
Or do this for all families with a dict comprehension:
In [36]:
{k: v.mean_nfamilies() for k,v in sibships.items()}
Out[36]: