Now lets have a look at the results. Move into the data directory again.
In [ ]:
cd data/run_seroba
Look at the file called pred.tsv
in your results directoy, sample1
.
In [ ]:
cat sample1/pred.tsv
You can see three columns. The first one contains the prefix you chose for the run. The second one contains the predicted serotype and the third column may contain a comment regarding contamination. So, in this case we can see that sample1 was predicted to be of serotype 8 and at least 10% of the reads are called as a different snp than the other reads i.e. there is contamination.
Now, let's try again with the rest of the samples. All we need to do is to run the runSerotyping option, but this time we will do it in a for-loop so that we do not have to run the command manually for each sample.
The command we are going to use will run seroba runSerotyping
on all fastq files in the working directory, so to avoid serytyping sample1 again, first move the fastq files for this sample out from the working directory:
In [ ]:
mv sample1_* ../
Run the command below. It will take around 10 minutes.
In [ ]:
for file in *_1.fq.gz; do seroba runSerotyping \
../database/ $file "${file//_1.fq.gz/_2.fq.gz}" \
"${file//_1.fq.gz/}"; done
Now that we have performed multiple runs, we might want to create a summary of the results. To do this, move up one level to the data directory.
In [ ]:
cd ..
Now run the summary option:
`seroba summary results_dir`
Where the results_dir in this case is run_seroba
.
In [ ]:
seroba summary run_seroba
Have a look at the resulting tsv file.
In [ ]:
cat summary.tsv
As you can see, you now have a summary of all the runs in one file. One sample that might require some explanation is sample3 which was serotyped as 6E(6B). Serogroup 6 is a bit different from the other serogroups. The serotype of a sample can be 6E on the genotypic level, and 6A or 6B on the phenotypic level. In this case, sample3 has the 6E genotype and the 6B phenotype.
Now follows some questions for you to go over what you have learned in this tutorial one more time! You can find the answers in the link at the bottom of the page.
1. Your data requires you to use a kmer size of 60. You have already downloaded the database to a directory called database_60/. What command would you run to create a database for KMC and ARIBA with kmer size 60?
2. What is the predicted serotype of sample7?
3. How would you interpret the comment for sample7?
This is the end of the tutorial. You can return to the index or revisit the previous section.
The answers to the questions in this tutoial can be found here.