Tom Ellis, February 2017
FAPS stands for Fractional Analysis of Sibships and Paternity. It is a Python package for reconstructing genealogical relationships in wild populations, and making inference about biological processes. The sections of this document are intended as a user's guide to introduce how FAPS works. For full details of the method, see Ellis et al. 2018.
The motivation for developing of FAPS was to provide a package to investigate biological processes in a large population of snapdragons in the wild. Existing packages to do this relied on computationally intense Markov-chain algorithms, which limited the scope for subsequent analysis and for checking assumptions through simulations. As such, most of the examples in this guide relate to snapdragons. That said, FAPS addresses general issues in pedigree reconstruction of wild populations, and it is hoped that FAPS will be useful for other plant and animal systems.
The specific aims of FAPS were to provide a package which would allow us to:
FAPS reconstructs relationships for one or more half-sibling arrays, that is to say a sample of offspring from one or more mothers whose identity is uncertain. A half-sibling array consistent of seedlings from the same maternal plant, or a family of lambs from the same ewe, to give two examples. The paternity of each offspring is unkown, and hence it is unknown whether pairs of offspring are full or half siblings.
There is also a sample of males, each of whom is a candidate to be the true sire of each offspring individual. FAPS is used to identify likely sibling relationships between offspring and their shared fathers based on typed genetic markers, and to use this information to make meaningful conclusions about population or matig biology. The procedure can be summarised as follows:
Like all statistical analyses, FAPS makes a number of assumptions about your data. It is good to state these explicitly so everyone is aware of the limitations of the data and method:
Depending on your biological questions, your data may not fit the assumptions listed above, and an alternative approach might be more appropriate. For example:
It is assumed you have read Ellis et al. 2018 for the basic background to the method. It would also be useful to read Devlin (1988) for an overview of the motivation and basic methods of fractional assignment. It is also assumed you have a basic understanding of probability and likelihood; see Bolker (2006, chapter 6) for an example of a general introduction.
FAPS uses Python as an interface, but it is hoped that this guide should allow users who aren't familiar with Python to adapt the code to their needs. It would be worthwhile to at least familiarise yourself with Python's data types, especially lists and NumPy arrays, and how list comprehensions work. A general introduction to Python concepts can be found here. I recommend interacting with FAPS through IPython/Jupyter, which allows you to test small pieces of code and annotate analyses as you go. This document, for example, is written in IPython.
You will of course need to have Python installed on your machine. If you do not already have this, instructions can be found here. You will also need to install the NumPy, fastcluster and Pandas libraries. These should be installed automatically if you intall FAPS using pip (see below), but if for some reason they are not, the easiest way to do this is to install one of the scientific Python bundles. Some of the simulation tools also make use of Jupyter widgets, but these are optional. There are no specific hardware requirements beyond what is needed to run Python, but it is possible that RAM will be a limiting factor if you are dealing with large samples (for example ~100 offspring and 10,000 candidate males).
All testing and development of FAPS was done on Linux and Mac machines. I have not tested it on Windows, nor do I intend to. That said, an advantage of Python is that it ought to work on any operating system, so in principle FAPS ought to run as well as on a Unix machine. One important difference is that Windows uses '\' instead of '/' in its file paths, so you will need to edit accordingly.
The best way to install FAPS is to use Python's package manager, Pip. Instructions to do so can be found at that projects documentation page. Windows users might also consider pip-Win. To download the stable release run pip install faps
in the command line. If Python is unable to locate the package, try pip install fap --user
. You can download the development version of FAPS from the project github repository.
Once in Python/IPython you'll need to import the package, as well as the NumPy library on which it is based. In the rest of this document, I'll assume you've run the following lines to do this if this isn't explicitly stated.
from faps import * import numpy as np
The asterisk on the first line is a shortcut to tell Python to import all the functions and classes in FAPS. This is somewhat lazy, but saves us having to give the package name every time we call something.
The basic unit on which analyses are built is a matrix of likelihoods of paternities, with a row for each offspring individual and a column for each candidate father (matrix G in Ellis et al. 2018). Each element represents the likelihood that a single candidate male is the father of a single offspring individual based on alleles shared between them and the offspring's mother. One of the aims of FAPS was to create a method which did not depend on marker type, mating system, ploidy, or genotyping technology, with the aim that it should be applicable to as broad a range of datasets that exist, or may yet exist. As such, the optimum way to estimate G will vary from case to case.
Although microsatellite data are fairly standard in format, SNP technologies are moving fast. Since every technology comes with its own quirks, so FAPS was written with the expectation that most users will be using non-standard data in some way. As such, it is really difficult to write functions to calculate G that are general, and users are strongly encouraged to think about the most appropriate way to calculate G for their data. Once you have done this, all other aspects of the analysis are independent of marker type. See the sections on Importing genotype data and Paternity arrays for more details on how to import data.
FAPS will also work given an appropriate G matrix for a polyploid species, but you will also need to provide G yourself. See Wang 2016 for inspiration. This topic is rather involved, and I personally do not feel comfortable implementing anything in this area myself, but I would be interested to hear from anyone who is willing to try it.