Preferential pairing of J-alpha and V-alpha alleles in an antigen-specific T cell receptor repertoire

See an interactive version of the plot on plot.ly

Code for generating this plot will be coming soon!

T cell and TCR diversity

As part of the adaptive immune system, T cell receptors (TCRs) are signaling molecules on the surface of T cells that play a major role in detecting pathogens. Each receptor is specific to a different pathogen. To be prepared for a wide range of possible pathogens the immune system is believed to have more than 100 million unique T cell receptors, though this number is so large that it is technically difficult to measure. Each TCR is made up of an alpha chain and a beta chain and each chain is encoded partially by two genes: a J gene and a V gene. To generate the enormous diversity of TCRs that is needed, there are many different alleles of the J genes and V genes that can be combined to form a unique TCR. This network plot illustrates the alleles that were found to be preferentially paired in a small subset of T cells that all recognize the same part of the same pathogen.

This plot shows the preferential pairing of V-alpha and J-alpha genes in this population of T cells. The majority of cells use TRAV3N-301 (78%) and TRAJ2601 (70%), though there are many cells using other alleles as well. Though these V and J alleles are also commonly seen together, their overall paired frequency is 68% which is significantly higher than the expected frequency if this set of observed alleles was randomly paired (55%). This suggests that there is a biological reason for this pairing.

Categorical correlation and catcorr network plots

The plot represents two columns (red and blue) of paired categorical data. Each column has many unique values along with quite a few repeated values: the diameter of each node is proportional to the frequency of each value in the column. Two nodes which represent unique values in two different columns are connected by an edge if they ever appear together (i.e. paired) in the data. The width of the edge is proportional to the number of times that pairing is observed in the data. If the frequency of the pairing is more/less common than one would expect by chance (based on the marginal frequencies), then the line is colored orange (p < 0.01 by Fisher’s exact test). The layout of the nodes is not directly driven by the data, but is instead automatically generated using network plotting algorithms in the pygraphviz and Graphiz software libraries.

Plots like this one show the correlation amongst the categories in each column and can be generated using the “catcorr” Python package with any set of categorical data. Please visit the github repository below for the code.