Village data
8000 individuals, simulated over 70 years Model contains gender, risk group, acute/chronic/AIDS Contamination from outside Labels contain ID, date, DOB, and gender Sequences sampled over a three year period gag/pol/env 1-1479, 1480-4479, 4480-6987 GTR model? 3x3 simulations time periods of increasing, decreasing or stationary incidence 3 sampling fractions?
In [1]:
%load_ext rpy2.ipython
%Rdevice svg
In [2]:
%%R
library(ape)
library(magrittr)
library(phangorn)
library(adephylo)
In [4]:
%%R
villagedir <- "../rawdata/october2014/Village"
In [5]:
%%R
gag <- seq(1,1479)
pol <- seq(1480,4479)
env <- seq(4480,6987)
In [6]:
%%R
seqdata.fn <- list.files(path=villagedir,pattern="fasta",full.names=TRUE)
numsc <- length(seqdata.fn)
seqdata <- list()
for(i in 1:numsc){
seqdata[[i]] <- read.dna(seqdata.fn[i],format="fasta",as.matrix=TRUE)
}
In [20]:
s="""DNA, gag = 1-1479
DNA, pol = 1480-4479
DNA, env = 4480-6987\n"""
f=open("villages_partition",'w')
f.write(s)
f.close()