TCR-seq protocol

By Roman Sasik (rsasik@ucsd.edu) This Notebook describes the sequence of commands used in TCR-seq analysis.

The multiplexing barcodes are assumed to follow the design described in this paper: "Linking T-cell receptor sequence to functional phenotype at the single-cell level", A Han, J Glanville and MD Davis, Nature Biotechnology, 2014, 32 (7), p.684-92

In addition to original perl scripts below, you need to install the superfast TCR repertoir processing java program mitcr.jar, which can be downloaded at http://mitcr.milaboratory.com/. The relevant paper is MiTCR: software for T-cell receptor sequencing data analysis by DA Bolotin et al., Nature Methods 10, 813-814 (2013).

Perl and java are assumed to be installed.

Demultiplexing TCR reads

Processing starts with demultiplexing the reads from a single pair of large fastq files:


In [22]:
!perl demultiplex_fastq_TCRplates.pl Sample_S1_L001_R1_001.fastq Sample_S1_L001_R2_001.fastq

In [23]:
!ls *[A,B].fastq


01A01A.fastq  01B08B.fastq  01D04A.fastq  01E11B.fastq	01G07A.fastq
01A01B.fastq  01B09A.fastq  01D04B.fastq  01E12A.fastq	01G07B.fastq
01A02A.fastq  01B09B.fastq  01D05A.fastq  01E12B.fastq	01G08A.fastq
01A02B.fastq  01B10A.fastq  01D05B.fastq  01F01A.fastq	01G08B.fastq
01A03A.fastq  01B10B.fastq  01D06A.fastq  01F01B.fastq	01G09A.fastq
01A03B.fastq  01B11A.fastq  01D06B.fastq  01F02A.fastq	01G09B.fastq
01A04A.fastq  01B11B.fastq  01D07A.fastq  01F02B.fastq	01G10A.fastq
01A04B.fastq  01B12A.fastq  01D07B.fastq  01F03A.fastq	01G10B.fastq
01A05A.fastq  01B12B.fastq  01D08A.fastq  01F03B.fastq	01G11A.fastq
01A05B.fastq  01C01A.fastq  01D08B.fastq  01F04A.fastq	01G11B.fastq
01A06A.fastq  01C01B.fastq  01D09A.fastq  01F04B.fastq	01G12A.fastq
01A06B.fastq  01C02A.fastq  01D09B.fastq  01F05A.fastq	01G12B.fastq
01A07A.fastq  01C02B.fastq  01D10A.fastq  01F05B.fastq	01H01A.fastq
01A07B.fastq  01C03A.fastq  01D10B.fastq  01F06A.fastq	01H01B.fastq
01A08A.fastq  01C03B.fastq  01D11A.fastq  01F06B.fastq	01H02A.fastq
01A08B.fastq  01C04A.fastq  01D11B.fastq  01F07A.fastq	01H02B.fastq
01A09A.fastq  01C04B.fastq  01D12A.fastq  01F07B.fastq	01H03A.fastq
01A09B.fastq  01C05A.fastq  01D12B.fastq  01F08A.fastq	01H03B.fastq
01A10A.fastq  01C05B.fastq  01E01A.fastq  01F08B.fastq	01H04A.fastq
01A10B.fastq  01C06A.fastq  01E01B.fastq  01F09A.fastq	01H04B.fastq
01A11A.fastq  01C06B.fastq  01E02A.fastq  01F09B.fastq	01H05A.fastq
01A11B.fastq  01C07A.fastq  01E02B.fastq  01F10A.fastq	01H05B.fastq
01A12A.fastq  01C07B.fastq  01E03A.fastq  01F10B.fastq	01H06A.fastq
01A12B.fastq  01C08A.fastq  01E03B.fastq  01F11A.fastq	01H06B.fastq
01B01A.fastq  01C08B.fastq  01E04A.fastq  01F11B.fastq	01H07A.fastq
01B01B.fastq  01C09A.fastq  01E04B.fastq  01F12A.fastq	01H07B.fastq
01B02A.fastq  01C09B.fastq  01E05A.fastq  01F12B.fastq	01H08A.fastq
01B02B.fastq  01C10A.fastq  01E05B.fastq  01G01A.fastq	01H08B.fastq
01B03A.fastq  01C10B.fastq  01E06A.fastq  01G01B.fastq	01H09A.fastq
01B03B.fastq  01C11A.fastq  01E06B.fastq  01G02A.fastq	01H09B.fastq
01B04A.fastq  01C11B.fastq  01E07A.fastq  01G02B.fastq	01H10A.fastq
01B04B.fastq  01C12A.fastq  01E07B.fastq  01G03A.fastq	01H10B.fastq
01B05A.fastq  01C12B.fastq  01E08A.fastq  01G03B.fastq	01H11A.fastq
01B05B.fastq  01D01A.fastq  01E08B.fastq  01G04A.fastq	01H11B.fastq
01B06A.fastq  01D01B.fastq  01E09A.fastq  01G04B.fastq	01H12A.fastq
01B06B.fastq  01D02A.fastq  01E09B.fastq  01G05A.fastq	01H12B.fastq
01B07A.fastq  01D02B.fastq  01E10A.fastq  01G05B.fastq
01B07B.fastq  01D03A.fastq  01E10B.fastq  01G06A.fastq
01B08A.fastq  01D03B.fastq  01E11A.fastq  01G06B.fastq

This script demultiplexes reads multiplexed in a single pair of large fastq files and saves them into separate fastq files whose names indicate Plate, Well, and TCR isoform (A or B), for instance 01H12B.fastq. Up to one mismatch is allowed in any of the Plate, Well Row, Well Column, and TCR Isoform barcodes.

It will create 2x96 files (one per TCR isoform) per each Plate (a lot of files!)

This script will ignore all reads from plates whose code is commented out (see below in source code). This is useful when there is a mixture of TCR genotyping reads and phenotyping reads.

There is a separate demultiplex script for the phenotyping reads (see below). This is demultiplex_fastq_TCRplates.pl:


In [ ]:
#!/usr/bin/perl
$fileR1 = $ARGV[0];
$fileR2 = $ARGV[1];

open(F1,$fileR1);
open(F2,$fileR2);

%plate = (
   "GCAGA" => "01", #uncomment this line if plate code 01 is among the sequences to be demultiplexed
#   "TCGAA" => "02",
#   "AACAA" => "03",
#   "GGTGC" => "04",
#   "TTGGT" => "05",
#   "CATTC" => "06",
#   "ATTGG" => "07",
#   "CGGTT" => "08",
#   "ATCCT" => "09",
#   "ATGTC" => "10",
#   "TCACG" => "11",
#   "AGACC" => "12",
#   "CCCCA" => "13",
#   "GCGCT" => "14",
#   "TCCTT" => "15",
#   "TATAT" => "16",
#   "CGTAA" => "17",
#   "AAGGT" => "18",
#   "AGCTC" => "19",
#   "CTTGC" => "20",
#   "GTATC" => "21",
#   "TATGA" => "22",
#   "CACAC" => "23",
#   "ACACT" => "24",
#   "ACTAC" => "25",
#   "GTTAC" => "26",
);
%row = ( #if you want output for all rows, leave them all uncommented
   "TAAGC" => "A", 
   "TGCAC" => "B",
   "CTCAG" => "C",
   "GGAAT" => "D",
   "CGAGG" => "E",
   "AGGAG" => "F",
   "TGTTG" => "G",
   "CAACT" => "H",
);
%col = ( #if you want output for all columns, leave them all uncommented
   "GTTCA" => "01", 
   "CAGGA" => "02",
   "TTATA" => "03",
   "CCTGT" => "04",
   "ACCGC" => "05",
   "ACTTA" => "06",
   "GCTAG" => "07",
   "GACGT" => "08",
   "GGCTA" => "09",
   "GAATG" => "10",
   "CCAAC" => "11",
   "GAGAC" => "12",
);
%TCR = (
   "GTCAC" => "A", # TCRA
   "GAGAT" => "B",
);

foreach $plateID (keys(%plate)) {
   foreach $rowID (keys(%row)) {
      foreach $colID (keys(%col)) {
         foreach $TCRID (keys(%TCR)) {
            $fh = $plate{$plateID}.$row{$rowID}.$col{$colID}.$TCR{$TCRID};
            open $fh, '>', $fh.".fastq"; #open file for writing at the end
         }
      }
   }
}

while($A1 = <F1>) { #read 4 lines from R1 and 4 lines from R2
   $A2 = <F1>;
   $A3 = <F1>;
   $A4 = <F1>;

   $B1 = <F2>;
   $B2 = <F2>;
   $B3 = <F2>;
   $B4 = <F2>;

   $ID = substr($A2, 2, 5); #plate ID barcode
#  now find what the true bar code should have been if imperfect match
   $score = 0;
   $trueID = "";
   foreach $key (keys(%plate)) {
      my $count = ($ID^$key)  =~ tr/\0//;
      if ($count > $score) {
         $score = $count;
         $trueID = $key
      }
   }
   if ($score >= 4) {#accept $true_plateID as the true plate ID
      $rowID = $trueID;
   } else {#leave $plateID blank - sequence won't be output
      $rowID = ""
   }

   $ID = substr($B2, 2, 5); #column ID
#  now find what the true bar code should have been if imperfect match
   $score = 0;
   $trueID = "";
   foreach $key (keys(%col)) {
      my $count = ($ID^$key)  =~ tr/\0//;
      if ($count > $score) {
         $score = $count;
         $trueID = $key
      }
   }
   if ($score >= 4) {#accept $true_plateID as the true plate ID
      $colID = $trueID;
   } else {#leave $plateID blank - sequence won't be output
      $colID = ""
   }

   $ID = substr($B2, 7, 5); #TCR ID
#  now find what the true bar code should have been if imperfect match
   $score = 0;
   $trueID = "";
   foreach $key (keys(%TCR)) {
      my $count = ($ID^$key)  =~ tr/\0//;
      if ($count > $score) {
         $score = $count;
         $trueID = $key
      }
   }
   if ($score >= 4) {
      $TCRID = $trueID;
   } else {
      $TCRID = ""
   }




   if (exists $plate{$plateID}  and exists $row{$rowID}  and exists $col{$colID} and exists $TCR{$TCRID}) {
      $fh = $plate{$plateID}.$row{$rowID}.$col{$colID}.$TCR{$TCRID};
      print $fh $A1.$A2.$A3.$A4.$B1.$B2.$B3.$B4;
   };
}
close(F1);
close(F2);

Analyzing demultiplexed fastq files for TCRA/B species

After demultiplexing, each individual fastq file will be processed by mitcr. The output is a separate result file for each well, e.g., 01A06A_result.txt. The example below will produce reports for plate 01, row A and columns 06 through 09 (see source code below).


In [24]:
!perl analyze_wells.pl
!ls *_result.txt


01A06B
Initialisation: progress unknown
01A06A
Initialisation: progress unknown
01A08B
Initialisation: progress unknown
01A08A
Initialisation: progress unknown
01A07B
Initialisation: progress unknown
01A07A
Initialisation: progress unknown
01A09B
Initialisation: progress unknown
01A09A
Initialisation: progress unknown
01A06A_result.txt  01A07A_result.txt  01A08A_result.txt  01A09A_result.txt
01A06B_result.txt  01A07B_result.txt  01A08B_result.txt  01A09B_result.txt

The output is a tab-delimited file whose main components are these (this is the content of file 01A06A_result.txt):

The first column is the number of times this sequence is seen; the second column is the fraction (not a percentage) of the total count of sequences in the well. This is especially useful when there are two species of TCRA expressed in a single cell (as in this case). It does not happen with TCRB.The v- j- and d- alleles of the TCR are listed. The last two lines (a tiny fraction of the number of reads) are a result of sequencing/PCR errors. The program mitcr has an error-checking algorithm that reduces these calls. For details see MiTCR: software for T-cell receptor sequencing data analysis by DA Bolotin et al., Nature Methods 10, 813-814 (2013).

This is the source of analyze_wells.pl:


In [ ]:
#!/usr/bin/perl

%plate = (
   "GCAGA" => "01",
#   "TCGAA" => "02",
#   "AACAA" => "03",
#   "GGTGC" => "04",
#   "TTGGT" => "05",
#   "CATTC" => "06",
#   "ATTGG" => "07",
#   "CGGTT" => "08",
#   "ATCCT" => "09",
#   "ATGTC" => "10",
#   "TCACG" => "11",
#   "AGACC" => "12",
#   "CCCCA" => "13",
#   "GCGCT" => "14",
#   "TCCTT" => "15",
#   "TATAT" => "16",
#   "CGTAA" => "17",
#   "AAGGT" => "18",
#   "AGCTC" => "19",
#   "CTTGC" => "20",
#   "GTATC" => "21",
#   "TATGA" => "22",
#   "CACAC" => "23",
#   "ACACT" => "24",
#   "ACTAC" => "25",
#   "GTTAC" => "26",
);
%row = ( #uncomment line if you want output for row A, etc.
   "TAAGC" => "A", 
#   "TGCAC" => "B",
#   "CTCAG" => "C",
#   "GGAAT" => "D",
#   "CGAGG" => "E",
#   "AGGAG" => "F",
#   "TGTTG" => "G",
#   "CAACT" => "H",
);
%col = ( #uncomment line if you want output for column 01, etc.
#   "GTTCA" => "01",
#   "CAGGA" => "02",
#   "TTATA" => "03",
#   "CCTGT" => "04",
#   "ACCGC" => "05",
   "ACTTA" => "06",
   "GCTAG" => "07",
   "GACGT" => "08",
   "GGCTA" => "09",
#   "GAATG" => "10",
#   "CCAAC" => "11",
#   "GAGAC" => "12",
);
%TCR = (
   "GTCAC" => "A", # TCRA
   "GAGAT" => "B",
);


foreach $plateID (sort (keys(%plate))) {
   foreach $rowID (sort (keys(%row))) {
      foreach $colID (sort (keys(%col))) {
         foreach $TCRID (sort (keys(%TCR))) {
            $fh = $plate{$plateID}.$row{$rowID}.$col{$colID}.$TCR{$TCRID};
            print "$fh\n";
            system("java -Xmx10g -jar ./mitcr.jar -pset flex -gene TR$TCR{$TCRID} $fh.fastq $fh\_result.txt")
         }
      }
   }
}

Demultiplexing phenotyping reads

The following command demultiplexes phenotyping reads multiplexed in a single pair of large fastq files and saves them into separate fastq files whose names indicate Plate, Well, and "R1" or "R2" for left or right read, for instance 03H12R1.fastq. Up to one mismatch is allowed in any of the Plate, Well Row, or Well Column barcodes.

It will create 2x96 files per each Plate.

This script will ignore all reads from plates whose code is commented out (see below in source code). This is useful when there is a mixture of TCR genotyping reads and phenotyping reads.


In [25]:
!perl demultiplex_fastq_phenoplates.pl Sample_S1_L001_R1_001.fastq Sample_S1_L001_R2_001.fastq
!ls 03*.fastq


03A01R1.fastq  03B08R2.fastq  03D04R1.fastq  03E11R2.fastq  03G07R1.fastq
03A01R2.fastq  03B09R1.fastq  03D04R2.fastq  03E12R1.fastq  03G07R2.fastq
03A02R1.fastq  03B09R2.fastq  03D05R1.fastq  03E12R2.fastq  03G08R1.fastq
03A02R2.fastq  03B10R1.fastq  03D05R2.fastq  03F01R1.fastq  03G08R2.fastq
03A03R1.fastq  03B10R2.fastq  03D06R1.fastq  03F01R2.fastq  03G09R1.fastq
03A03R2.fastq  03B11R1.fastq  03D06R2.fastq  03F02R1.fastq  03G09R2.fastq
03A04R1.fastq  03B11R2.fastq  03D07R1.fastq  03F02R2.fastq  03G10R1.fastq
03A04R2.fastq  03B12R1.fastq  03D07R2.fastq  03F03R1.fastq  03G10R2.fastq
03A05R1.fastq  03B12R2.fastq  03D08R1.fastq  03F03R2.fastq  03G11R1.fastq
03A05R2.fastq  03C01R1.fastq  03D08R2.fastq  03F04R1.fastq  03G11R2.fastq
03A06R1.fastq  03C01R2.fastq  03D09R1.fastq  03F04R2.fastq  03G12R1.fastq
03A06R2.fastq  03C02R1.fastq  03D09R2.fastq  03F05R1.fastq  03G12R2.fastq
03A07R1.fastq  03C02R2.fastq  03D10R1.fastq  03F05R2.fastq  03H01R1.fastq
03A07R2.fastq  03C03R1.fastq  03D10R2.fastq  03F06R1.fastq  03H01R2.fastq
03A08R1.fastq  03C03R2.fastq  03D11R1.fastq  03F06R2.fastq  03H02R1.fastq
03A08R2.fastq  03C04R1.fastq  03D11R2.fastq  03F07R1.fastq  03H02R2.fastq
03A09R1.fastq  03C04R2.fastq  03D12R1.fastq  03F07R2.fastq  03H03R1.fastq
03A09R2.fastq  03C05R1.fastq  03D12R2.fastq  03F08R1.fastq  03H03R2.fastq
03A10R1.fastq  03C05R2.fastq  03E01R1.fastq  03F08R2.fastq  03H04R1.fastq
03A10R2.fastq  03C06R1.fastq  03E01R2.fastq  03F09R1.fastq  03H04R2.fastq
03A11R1.fastq  03C06R2.fastq  03E02R1.fastq  03F09R2.fastq  03H05R1.fastq
03A11R2.fastq  03C07R1.fastq  03E02R2.fastq  03F10R1.fastq  03H05R2.fastq
03A12R1.fastq  03C07R2.fastq  03E03R1.fastq  03F10R2.fastq  03H06R1.fastq
03A12R2.fastq  03C08R1.fastq  03E03R2.fastq  03F11R1.fastq  03H06R2.fastq
03B01R1.fastq  03C08R2.fastq  03E04R1.fastq  03F11R2.fastq  03H07R1.fastq
03B01R2.fastq  03C09R1.fastq  03E04R2.fastq  03F12R1.fastq  03H07R2.fastq
03B02R1.fastq  03C09R2.fastq  03E05R1.fastq  03F12R2.fastq  03H08R1.fastq
03B02R2.fastq  03C10R1.fastq  03E05R2.fastq  03G01R1.fastq  03H08R2.fastq
03B03R1.fastq  03C10R2.fastq  03E06R1.fastq  03G01R2.fastq  03H09R1.fastq
03B03R2.fastq  03C11R1.fastq  03E06R2.fastq  03G02R1.fastq  03H09R2.fastq
03B04R1.fastq  03C11R2.fastq  03E07R1.fastq  03G02R2.fastq  03H10R1.fastq
03B04R2.fastq  03C12R1.fastq  03E07R2.fastq  03G03R1.fastq  03H10R2.fastq
03B05R1.fastq  03C12R2.fastq  03E08R1.fastq  03G03R2.fastq  03H11R1.fastq
03B05R2.fastq  03D01R1.fastq  03E08R2.fastq  03G04R1.fastq  03H11R2.fastq
03B06R1.fastq  03D01R2.fastq  03E09R1.fastq  03G04R2.fastq  03H12R1.fastq
03B06R2.fastq  03D02R1.fastq  03E09R2.fastq  03G05R1.fastq  03H12R2.fastq
03B07R1.fastq  03D02R2.fastq  03E10R1.fastq  03G05R2.fastq
03B07R2.fastq  03D03R1.fastq  03E10R2.fastq  03G06R1.fastq
03B08R1.fastq  03D03R2.fastq  03E11R1.fastq  03G06R2.fastq

The source code of demultiplex_fastq_phenoplates.pl is here (in this example, Plate 03 contains phenotyping reads):


In [ ]:
#!/usr/bin/perl

$fileR1 = $ARGV[0];
$fileR2 = $ARGV[1];

open(F1,$fileR1);
open(F2,$fileR2);

%plate = (
#   "GCAGA" => "01",
#   "TCGAA" => "02",
   "AACAA" => "03",
#   "GGTGC" => "04",
#   "TTGGT" => "05",
#   "CATTC" => "06",
);
%row = (
   "TAAGC" => "A",
   "TGCAC" => "B",
   "CTCAG" => "C",
   "GGAAT" => "D",
   "CGAGG" => "E",
   "AGGAG" => "F",
   "TGTTG" => "G",
   "CAACT" => "H",
);
%col = (
   "GTTCA" => "01",
   "CAGGA" => "02",
   "TTATA" => "03",
   "CCTGT" => "04",
   "ACCGC" => "05",
   "ACTTA" => "06",
   "GCTAG" => "07",
   "GACGT" => "08",
   "GGCTA" => "09",
   "GAATG" => "10",
   "CCAAC" => "11",
   "GAGAC" => "12",
);

foreach $plateID (keys(%plate)) {
   foreach $rowID (keys(%row)) {
      foreach $colID (keys(%col)) {
            $fh = $plate{$plateID}.$row{$rowID}.$col{$colID};
            $fh1 = $plate{$plateID}.$row{$rowID}.$col{$colID}."1";
            $fh2 = $plate{$plateID}.$row{$rowID}.$col{$colID}."2";
            open $fh1, '>', $fh."R1.fastq";
            open $fh2, '>', $fh."R2.fastq";
      }
   }
}

while($A1 = <F1>) { #read 4 lines from R1 and 4 lines from R2
   $A2 = <F1>;
   $A3 = <F1>;
   $A4 = <F1>;

   $B1 = <F2>;
   $B2 = <F2>;
   $B3 = <F2>;
   $B4 = <F2>;

#  now find out if the bar codes make sense

   $ID = substr($A2, 2, 5); #plate ID
#  now find what the true bar code should have been if imperfect match
   $score = 0;
   $trueID = "";
   foreach $key (keys(%plate)) {
      my $count = ($ID^$key)  =~ tr/\0//;
      if ($count > $score) {
         $score = $count;
         $trueID = $key
      }
   }
   if ($score >= 4) {#accept $true_plateID as the true plate ID
      $plateID = $trueID;
   } else {#leave $plateID blank - sequence won't be output
      $plateID = ""
   }

   $ID = substr($A2, 9, 5); #row ID
#  now find what the true bar code should have been if imperfect match
   $score = 0;
   $trueID = "";
   foreach $key (keys(%row)) {
      my $count = ($ID^$key)  =~ tr/\0//;
      if ($count > $score) {
         $score = $count;
         $trueID = $key
      }
   }
   if ($score >= 4) {
      $rowID = $trueID;
   } else {
      $rowID = ""
   }

   $ID = substr($B2, 2, 5); #column ID
#  now find what the true bar code should have been if imperfect match
   $score = 0;
   $trueID = "";
   foreach $key (keys(%col)) {
      my $count = ($ID^$key)  =~ tr/\0//;
      if ($count > $score) {
         $score = $count;
         $trueID = $key
      }
   }
   if ($score >= 4) {
      $colID = $trueID;
   } else {
      $colID = ""
   }

   if (exists $plate{$plateID}  and exists $row{$rowID}  and exists $col{$colID} ) {
      $fh1 = $plate{$plateID}.$row{$rowID}.$col{$colID}."1";
      $fh2 = $plate{$plateID}.$row{$rowID}.$col{$colID}."2";
      print $fh1 $A1.$A2.$A3.$A4;
      print $fh2 $B1.$B2.$B3.$B4;
   };
}
close(F1);
close(F2);

Analyze demultiplexed phenotyping fastq files for expression levels of 17 cytokines and transcription factors

The following command will produce expression counts for all 17 cytokines and TF's, separately for each well:


In [26]:
!perl count_cytokines.pl
!ls *.count


03A01R1.count  03B09R1.count  03D05R1.count  03F01R1.count  03G09R1.count
03A02R1.count  03B10R1.count  03D06R1.count  03F02R1.count  03G10R1.count
03A03R1.count  03B11R1.count  03D07R1.count  03F03R1.count  03G11R1.count
03A04R1.count  03B12R1.count  03D08R1.count  03F04R1.count  03G12R1.count
03A05R1.count  03C01R1.count  03D09R1.count  03F05R1.count  03H01R1.count
03A06R1.count  03C02R1.count  03D10R1.count  03F06R1.count  03H02R1.count
03A07R1.count  03C03R1.count  03D11R1.count  03F07R1.count  03H03R1.count
03A08R1.count  03C04R1.count  03D12R1.count  03F08R1.count  03H04R1.count
03A09R1.count  03C05R1.count  03E01R1.count  03F09R1.count  03H05R1.count
03A10R1.count  03C06R1.count  03E02R1.count  03F10R1.count  03H06R1.count
03A11R1.count  03C07R1.count  03E03R1.count  03F11R1.count  03H07R1.count
03A12R1.count  03C08R1.count  03E04R1.count  03F12R1.count  03H08R1.count
03B01R1.count  03C09R1.count  03E05R1.count  03G01R1.count  03H09R1.count
03B02R1.count  03C10R1.count  03E06R1.count  03G02R1.count  03H10R1.count
03B03R1.count  03C11R1.count  03E07R1.count  03G03R1.count  03H11R1.count
03B04R1.count  03C12R1.count  03E08R1.count  03G04R1.count  03H12R1.count
03B05R1.count  03D01R1.count  03E09R1.count  03G05R1.count
03B06R1.count  03D02R1.count  03E10R1.count  03G06R1.count
03B07R1.count  03D03R1.count  03E11R1.count  03G07R1.count
03B08R1.count  03D04R1.count  03E12R1.count  03G08R1.count

The output is a set of tab-delimited files such as 03F03R1.count. Only the R1 read is used for counting; the R2 read is redundant (and lower quality anyway). The content of this file looks something close to this:

The source code of count_cytokines.pl is here (Plate 03 has pheno reads):


In [ ]:
#!/usr/bin/perl

%plate = (
#  "GCAGA" => "01",
#  "TCGAA" => "02",
   "AACAA" => "03",
#  "GGTGC" => "04",
#  "TTGGT" => "05",
#  "CATTC" => "06",
);
%row = (
   "TAAGC" => "A",
   "TGCAC" => "B",
   "CTCAG" => "C",
   "GGAAT" => "D",
   "CGAGG" => "E",
   "AGGAG" => "F",
   "TGTTG" => "G",
   "CAACT" => "H",
);
%col = (
   "GTTCA" => "01",
   "CAGGA" => "02",
   "TTATA" => "03",
   "CCTGT" => "04",
   "ACCGC" => "05",
   "ACTTA" => "06",
   "GCTAG" => "07",
   "GACGT" => "08",
   "GGCTA" => "09",
   "GAATG" => "10",
   "CCAAC" => "11",
   "GAGAC" => "12",
);
%cyt = (
   "GCCGGAGGAGGTGGATGTGC" => "GATA3",
   "CCCAACACAGGAGCGCACTG" => "TBET",
   "GGCAGCCAAGGCCCTGTCGT" => "FOXP3",
   "AGAGGAAGTCCATGTGGGAG" => "RORC",
   "GCGAGCTGGTGCGCACCGAC" => "RUNX1",
   "GGACCACGCAGGCGAGCTCG" => "RUNX3",
   "CCTACACGGCCCCACCTGCC" => "BCL6",
   "CCACAGAACTGAAACATCTT" => "IL2",
   "CCCAAGCTGAGAACCAAGAC" => "IL10",
   "AGACCTCTTTTATGATGGCC" => "IL12A",
   "GGTATGGAGCATCAACCTGA" => "IL13",
   "CAACCTGAACATCCATAACC" => "IL17A",
   "GGGTTCTCTTGGCTGTTACT" => "IFNG",
   "GGAGGCGCTCCCCAAGAAGA" => "TNFA",
   "CCGAGAAGCGGTACCTGAAC" => "TGFB",
   "GCCAACTTTGCAGCCCAGAA" => "PRF1",
   "CCACAATATCAAAGAACAGG" => "GZMB",
);
foreach $plateID (sort (keys(%plate))) {
   foreach $rowID (sort (keys(%row))) {
      foreach $colID (sort (keys(%col))) {
         $fh = $plate{$plateID}.$row{$rowID}.$col{$colID};
         open(F1,$fh."R1.fastq");
         open $fh, '>', $fh."R1.count";
         print $fh "\t$fh\n"; #print header

#        zero out counters
         foreach $key (keys(%cyt)) {$count{$cyt{$key}} = 0};
         while($A1 = <F1>) { #read 4 lines from R1 and 4 lines from R2
            $A2 = <F1>;
            $A3 = <F1>;
            $A4 = <F1>;
#           now find out if the bar codes make sense
            $seq = substr($A2, 36, 20);
            if (exists $cyt{$seq}) {$count{$cyt{$seq}}++}; #add to count
         };
         foreach $key (keys(%cyt)) {
            print $fh $cyt{$key}."\t".$count{$cyt{$key}}."\n"
         };
         close(F1);
         close($fh);

      }
   }
}

Cleanup after exercize:


In [27]:
!rm 0*