Once your samples have been sequenced or imported, it can be useful to match up the internal lane identifiers with the sample and supplier identifiers. We can look at the relationship between lane and sample using pf info
which will return values for:
Alternatively, you might want to know the EBI sample and submission numbers for a particular lane or sample. To get this, you can use pf accession
which will return:
For more information about EBI accession number format please see www.ebi.ac.uk/ena/submit/read-data-format.
You can also use pf to generate a spreadsheet with supplementary data, which can be useful for publication. pf supplementary
will return:
Optionally, pf supplementary
can also return the sample description.
In this section of the tutorial we will cover:
pf info
to get sample metadatapf accession
to get sample accessionspf supplementary
to get supplementary data.First, let's tell the system the location of our tutorial configuration file.
In [ ]:
export PF_CONFIG_FILE=$PWD/data/pathfind.conf
In [ ]:
pf info -h
Let's get the sample name that corresponds to lane 5477_6#1.
In [ ]:
pf info -t lane -i 5477_6#1
Here we can see that several pieces of metadata have been returned. One of these is the sample name: Tw01_0055.
Now, let's get the sample names for all lanes associated with study 664.
In [ ]:
pf info -t study -i 664
We can write this information to file using the -o
or --outfile
option.
Let's write our lane metadata to file.
In [ ]:
pf info -t study -i 664 -o
This has generated a new file "infofind.csv" which contains our comma-separated lane metadata.
In [ ]:
cat infofind.csv
We can also give the output file a different name.
Let's call the metadata file for study 664 "study_664_info.csv".
In [ ]:
pf info -t study -i 664 -o study_664_info.csv
This generates the file "study_664_info.csv" which contains our metadata.
In [ ]:
cat study_664_info.csv
In [ ]:
pf accession -h
Let's get the EBI accessions for all lanes associated with study 664.
In [ ]:
pf accession -t study -i 664
As with pf info
we can also write the output of pf accession
to a comma-delimited file.
Let's write the accessions associated with study 664 to a file called "study_664_accessions.csv".
In [ ]:
pf accession -t study -i 664 -o study_664_accessions.csv
This generates the file "study_664_accessions.csv" which contains our comma-separated accessions.
In [ ]:
cat study_664_accessions.csv
Finally, we can get the EBI URLs to download the raw data using the -f
or --fastq
option. By default, these will be written to a file called "fastq_urls.txt".
Let's get the URLs for downloading the FASTQ files for study 667 from the European Nucleodtide Archive (ENA).
In [ ]:
pf accession -t study -i 664 -f
This generated a file called "fastq_urls.txt" which contained the URLs to download the raw sequencing data, one URL per file.
In [ ]:
cat fastq_urls.txt
In [ ]:
pf supplementary -h
Let's get the supplementary data for all lanes associated with study 664.
In [ ]:
pf supplementary -t study -i 664
As with pf info
and pf accession
we can also write the output of pf supplementary
to a comma-delimited file.
Let's write the supplementary data associated with study 664 to a file called "study_664_supplementary.csv".
In [ ]:
pf supplementary -t study -i 664 -o study_664_supplementary.csv
This generates the file "study_664_supplementary.csv" which contains our comma-separated supplementary data.
In [ ]:
cat study_664_supplementary.csv
Finally, we can include sample description in the supplementary information by using the -d
or --description
option.
Let's get the supplementary data for all lanes associated with study 664, including the sample description
In [ ]:
pf supplementary -t study -i 664 -d
Q1: What is the sample name that corresponds with lane 10018_1#1?
In [ ]:
# Enter your answer here
Q2: What lane name(s) correspond with sample APP_T1_OP2?
In [ ]:
# Enter your answer here
Q3: What are the sample and lane names of the last lane in the file "data/lanes_to_search.txt".
Hint: use tail -1
to get the last line of the output
In [ ]:
# Enter your answer here
Q4: What are the sample and lane accessions for lane 5477_6#1?
In [ ]:
# Enter your answer here
Q5: What are the two URLs which can be used to download the FASTQ files for lane 5477_6#1 from the ENA?
In [ ]:
# Enter your answer here
In [ ]:
# Enter your answer here
You can head back to finding your data.
Otherwise, let's move on to looking at analysis pipeline status.