Village data

8000 individuals, simulated over 70 years Model contains gender, risk group, acute/chronic/AIDS Contamination from outside Labels contain ID, date, DOB, and gender Sequences sampled over a three year period gag/pol/env 1-1479, 1480-4479, 4480-6987 GTR model? 3x3 simulations time periods of increasing, decreasing or stationary incidence 3 sampling fractions?


In [2]:
%load_ext rpy2.ipython
%Rdevice svg


The rpy2.ipython extension is already loaded. To reload it, use:
  %reload_ext rpy2.ipython

In [3]:
%%R
library(ape)
library(magrittr)
library(phangorn)
library(adephylo)


Loading required package: ade4

Attaching package: ‘adephylo’

The following object is masked from ‘package:ade4’:

    orthogram


In [4]:
%%R
villagedir <- "../rawdata/Village"

In [5]:
%%R
seqdirs <- list.dirs(villagedir)
seqdirs <- seqdirs[2:length(seqdirs)]
seqdata.fn <- c()
stubs <- c()
for(i in 1:length(seqdirs)){
    fas <- list.files(seqdirs[i],pattern="fasta")
    if(length(fas)>0){
        seqdata.fn <- c(seqdata.fn,paste(seqdirs[i],"/",fas,sep=""))
        stubs <- c(stubs,gsub(".fasta","",fas,fixed=TRUE))
    }
}

In [6]:
%%R
gag <- seq(1,1479)
pol <- seq(1480,4479)
env <- seq(4480,6987)

In [7]:
%%R
numsc <- length(seqdata.fn)
seqdata <- list()
for(i in 1:numsc){
    seqdata[[i]] <- read.dna(seqdata.fn[i],format="fasta",as.matrix=TRUE)
}

In [8]:
s="""DNA, gag = 1-1479
DNA, pol = 1480-4479
DNA, env = 4480-6987\n"""
f=open("villages_partition",'w')
f.write(s)
f.close()

In [9]:
%%R
for(i in 1:numsc){
    s <- seqdata[[i]]
    snames <- row.names(s)
    o <- order(snames)
    snames <- snames[o]
    write.dna(s,paste(stubs[i],".fas",sep=""),format="fasta",nbcol=-1,colsep="")
}

In [10]:
%%R
for(i in 1:numsc){
    s <- seqdata[[i]][,pol]
    snames <- row.names(s)
    o <- order(snames)
    snames <- snames[o]
    write.dna(s,paste(stubs[i],"_pol.fas",sep=""),format="fasta",nbcol=-1,colsep="")
}

In [ ]: