Quick Introduction to FlowCytometryTools

Author: Jonathan Friedman & Eugene Yurtsev
Last Updated Date: 2013-September-11
Please read the instructions on the webpage (https://bitbucket.org/gorelab) on how to install these packages.

Have fun!

Load Dependencies


In [80]:
%load_ext autoreload
%autoreload 2
import FlowCytometryTools
from FlowCytometryTools import FCPlate, FCMeasurement
import os
import glob
from GoreUtilities import graph
(Note: The lines above with the % mark instruct python to reload the dependencies if any of them changed.)

Loading Data

Loading individual files


In [81]:
datadir = os.path.join(FlowCytometryTools.__path__[0], 'tests', 'data', 'Plate01')
datafile = os.path.join(datadir, 'CFP_Well_A4.fcs')
print 'Loading file from path: {0}'.format(datafile)
sample = FCMeasurement(ID='Test Plate', datafile=datafile)


Loading file from path: FlowCytometryTools/tests/data/Plate01/CFP_Well_A4.fcs

Loading all fcs files in a folder into a plate (Recommended)


In [82]:
# You can insert your own directory instead of datadir here.
datadir = os.path.join(FlowCytometryTools.__path__[0], 'tests', 'data', 'Plate01')
print 'Loading files from directory path: {0}'.format(datadir)

# Load the files
plate = FCPlate.from_dir(ID='Test Plate', path=datadir)


Loading files from directory path: FlowCytometryTools/tests/data/Plate01

Transformations

One of the first things you should do with your data is tranform the fluorescence channels, so that they are easy to visualize. We will transform channels Y2-A and B1-A (two fluorescent channels) with an hlog transformation.

In [83]:
plate = FCPlate.from_dir(ID='Test Plate', path=datadir).transform('hlog', channels=('Y2-A', 'B1-A'))
# The line above is equivalent to the two steps below.
# plate = FCPlate.from_dir(ID='Test Plate', path=datadir)
# plate = plate.transform('hlog', channels=('Y2-A', 'B1-A'))

Examine the contents of the plate

The layout function tells for which wells the datafiles have been loaded.

In [84]:
print plate


ID:
Test Plate

Data:
  1  2     3     4  5     6     7  8  9  10 11 12
A        'A3'  'A4'     'A6'  'A7'               
B        'B3'  'B4'                              
C                             'C7'               
D                                                
E                                                
F                                                
G                                                
H                                                

Basics

Let's pick single well

Let's work with well 'A3' from the plate.

In [85]:
well = plate['A3']
print well


'A3'

Before we can do any plotting, we need to figure out what information was collected by the flow cytometer.


In [86]:
print well.channel_names


('HDR-T', 'FSC-A', 'FSC-H', 'FSC-W', 'SSC-A', 'SSC-H', 'SSC-W', 'V2-A', 'V2-H', 'V2-W', 'Y2-A', 'Y2-H', 'Y2-W', 'B1-A', 'B1-H', 'B1-W')

In [87]:
print well.channels


                 $PnN  $PnB $PnG               $PnE   $PnS    $PnR
Channel Number                                                    
1               HDR-T    32    1  0.000000,0.000000  HDR-T  262144
2               FSC-A    32    1  0.000000,0.000000  FSC-A  262144
3               FSC-H    32    1  0.000000,0.000000  FSC-H  262144
4               FSC-W    32    1  0.000000,0.000000  FSC-W  262144
5               SSC-A    32    1  0.000000,0.000000  SSC-A  262144
6               SSC-H    32    1  0.000000,0.000000  SSC-H  262144
7               SSC-W    32    1  0.000000,0.000000  SSC-W  262144
8                V2-A    32    1  0.000000,0.000000   V2-A  262144
9                V2-H    32    1  0.000000,0.000000   V2-H  262144
10               V2-W    32    1  0.000000,0.000000   V2-W  262144
11               Y2-A    32    1  0.000000,0.000000   Y2-A  262144
12               Y2-H    32    1  0.000000,0.000000   Y2-H  262144
13               Y2-W    32    1  0.000000,0.000000   Y2-W  262144
14               B1-A    32    1  0.000000,0.000000   B1-A  262144
15               B1-H    32    1  0.000000,0.000000   B1-H  262144
16               B1-W    32    1  0.000000,0.000000   B1-W  262144

If you want to understand what the different fields mean, you need to read the specification for the FCS format. To check which FCS format was used do the following:


In [88]:
well.meta['__header__']['FCS format']


Out[88]:
'FCS3.0'

There is more meta data buried inside of each FCS file. This meta data is stored inside of a dictionary called meta.


In [89]:
print type(well.meta)
print well.meta.keys()


<type 'dict'>
['$ENDDATA', '$ETIM', '$SRC', '$BEGINANALYSIS', '$BYTEORD', '$ENDSTEXT', '$TOT', '$BTIM', '$FIL', '$CYTSN', '_channels_', '$SYS', '$DATE', '$MODE', '$CELLS', '$PAR', '$BEGINSTEXT', '$NEXTDATA', '$ENDANALYSIS', '_channel_names_', '$CYT', '__header__', '$BEGINDATA', '$OP', '$DATATYPE']
For example, one of the keys is $TOT (Total number of events) and another is $DATE (Date of measurement).

In [90]:
print well.meta['$TOT']


10000

In [91]:
print well.meta['$DATE']


2013-Jul-19

Plotting

You'll find that some basic plotting functionality has already been implemented for you.
Let's check it out by first plotting 1d and 2d histograms on single wells, and then we'll move on to plots on an entire 96-well plate.

1d histograms


In [92]:
figure(figsize=(10, 10))
well.plot('B1-A', bins=100);


The plot function accepts all the same arguments as does the matplotlib histogram function.

In [93]:
figure(figsize=(10, 10))
well.plot('B1-A', bins=100, alpha=0.7, color='green', normed=1);


2d histograms

If you feed in the names of two channels to the plot function, it will plot a 2d histogram. Let's plots the channels B1-A and Y2-A on the x and y axis respectively.

In [18]:
figure(figsize=(10, 10))
well.plot(['B1-A', 'Y2-A']);


This plot function accepts all the same arguments as does the matplotlib pcolormesh function.
For example, we can change the default colormap used for plotting.

In [94]:
well.plot(['B1-A', 'Y2-A'], cmap=cm.Oranges, colorbar=False);


Also, instead of plotting histograms, we can choose to plot a scatter plot by setting the argument kind='scatter'.
In this case, the plot function will behave like the matplotlib scatter function and pass all additional arguments to scatter.

In [95]:
well.plot(['B1-A', 'Y2-A'], kind='scatter', color='red', s=1, alpha=0.3);


Working with Plates

You can check which wells have data loaded in them simply by using the print command


In [96]:
print plate


ID:
Test Plate

Data:
  1  2     3     4  5     6     7  8  9  10 11 12
A        'A3'  'A4'     'A6'  'A7'               
B        'B3'  'B4'                              
C                             'C7'               
D                                                
E                                                
F                                                
G                                                
H                                                

Many of the wells in the plate above are empty. Use the function dropna() to compactify the plate.


In [97]:
plate = plate.dropna()

print plate
plate.shape


ID:
Test Plate

Data:
      3     4     6     7
A  'A3'  'A4'  'A6'  'A7'
B  'B3'  'B4'            
C                    'C7'
Out[97]:
(3, 4)

Plotting plates


In [100]:
plate.plot('Y2-A', bins=100, autolabel=False, ylim=(0, 1000), xlim=(-1000, 10000));



In [101]:
figure(figsize=(10, 10))
plate.plot(['B1-A', 'Y2-A']);


Note: The plot command supports the plotting of specific wells.

In [102]:
figure(figsize=(10, 10))
plate.plot(['B1-A', 'Y2-A'], ids=['A3', 'C7']);


Gating

Load the gates we need from the library.

In [103]:
from FlowCytometryTools import ThresholdGate, PolyGate

Creating Gates

When creating a gate, you will need to provide set(s) of coordinates, name(s) of channel(s) and the region. For example, the command below will set a gate that will pass only events where the Y2-A value is ABOVE the value 1000.0.

In [104]:
y2_gate = ThresholdGate(1000.0, 'Y2-A', region='above')

GUI for creating gates (alpha release) (NOT YET IMPLEMENTED. SKIP SECTION)

You can launch the GUI for creating the gates, by calling the view() method of a selected well. This is an alpha release of the GUI, so it's usable but buggy at the moment. These bugs will be fixed in future releases. NOTE: Click on list gates to see the gates that have been created. These will appear in your terminal!

Important: You must install wx for python in order to use this GUI.


In [9]:
plate['A3'].view()

In [29]:
poly_gate = PolyGate([(7307.6923076923067, 5809.523809523811), (3483.5164835164837, 1455.7823129251719), (5901.0989010989015, -3714.2857142857138), 
(8230.7692307692305, 367.34693877551035), (8230.7692307692305, 367.34693877551035)], ['HDR-T', 'FSC-A'], region='in')

Visualizing Gates


In [105]:
figure(figsize=(10, 4))

ax1 = subplot(1, 2, 1)
title('Sample with gates drawn but not applied')
plate['A3'].plot(['B1-A', 'Y2-A'], gates=[y2_gate], apply_gates=False);

ax2 = subplot(1, 2, 2)
title('Sample with gates drawn and applied')
plate['A3'].plot(['B1-A', 'Y2-A'], gates=[y2_gate], apply_gates=True);

graph.autoscale_subplots([ax1, ax2])


Let's create a gate for picking up B1-A positive populations

In [106]:
b1_gate = ThresholdGate(2000.0, 'B1-A', region='above')
plate['A3'].plot(('B1-A','Y2-A'), gates=[y2_gate, b1_gate], apply_gates=False);


Let's plot both gates across the entire plate
NOTE: The plotting function for plates already uses the subplots call. So be careful when using subplot(). When in doubt just create a new figure.

In [108]:
figure(figsize=(8, 8))
#suptitle('All gates plotted but not applied')
plate.plot(('B1-A','Y2-A'), gates=[y2_gate, b1_gate], apply_gates=False);

figure(figsize=(8, 8))
#suptitle('B1-A gate applied')
plate.plot(('B1-A','Y2-A'), gates=[b1_gate], apply_gates=True);

figure(figsize=(8, 8))
#suptitle('Y2-A gate applied')
plate.plot(('B1-A','Y2-A'), gates=[y2_gate], apply_gates=True);


Applying gates to wells


In [109]:
original_well = plate['A3']
gated_well = original_well.gate(y2_gate, )

figure(figsize=(4, 10))
ax = subplot(2, 1, 1)
title('Original well')
original_well.plot('Y2-A', bins=300, color='gray', ax=ax)
xlim(-1000, 10000)

ax = subplot(2, 1, 2)
title('Gated well')
gated_well.plot('Y2-A', bins=300,  color='y', ax=ax)
ylim(0, 400)
xlim(-1000, 10000)


Out[109]:
(-1000, 10000)

Calculations

Example 01: Compute the total number of events

To make calculations, you need to write a function that accepts as input a well and returns the desired calculation.
As an example, we will create a function that counts the total number of events.

In [110]:
well = plate['A3']
To access the underlying data in the well, use the get_data function. 
The returned data is a pandas data frame.
That's right. You have the power of pandas at your hands!

In [111]:
data = well.get_data()

print data


<class 'pandas.core.frame.DataFrame'>
Int64Index: 10000 entries, 0 to 9999
Data columns (total 16 columns):
HDR-T    10000  non-null values
FSC-A    10000  non-null values
FSC-H    10000  non-null values
FSC-W    10000  non-null values
SSC-A    10000  non-null values
SSC-H    10000  non-null values
SSC-W    10000  non-null values
V2-A     10000  non-null values
V2-H     10000  non-null values
V2-W     10000  non-null values
Y2-A     10000  non-null values
Y2-H     10000  non-null values
Y2-W     10000  non-null values
B1-A     10000  non-null values
B1-H     10000  non-null values
B1-W     10000  non-null values
dtypes: float64(16)

In [112]:
data['Y2-A']


Out[112]:
0      403.400510
1     6599.577981
2      300.378417
3     -198.721151
4      -38.730308
5      127.777659
6     6708.679520
7      121.329122
8     6801.536594
9       52.695037
10      83.142991
11    3910.001459
12    6386.285924
13     276.940986
14     309.726443
...
9985     373.082932
9986    6385.726584
9987     268.389849
9988     117.292733
9989    6873.727725
9990    6637.909279
9991    6350.627572
9992    6493.302770
9993     159.846871
9994    5598.897305
9995    6269.275118
9996    7104.573524
9997    6584.836961
9998     -25.996083
9999    6605.191222
Name: Y2-A, Length: 10000, dtype: float64

In [113]:
data['Y2-A'].describe()


Out[113]:
count    10000.000000
mean      2871.121393
std       3210.111108
min       -528.319908
25%         39.917786
50%        303.091133
75%       6510.390682
max       8895.525581
dtype: float64
Let's get the count

In [114]:
count = data.shape[0]
print 'The count of events is: {0}'.format(count)


The count of events is: 10000
Now, let's put it all together into a single function that takes as an input a well and returns the desired output.

In [115]:
def count_events(well):
    data = well.get_data()
    count = data.shape[0]
    return count
Let's check that it works by applying it to a single well

In [116]:
print well.apply(count_events)


10000
apply this function to the entire plate (the result won't be very interesting since all the wells have exactly 10,000 events in them)

In [117]:
plate.apply(count_events)


Out[117]:
3 4 6 7
A 10000 10000 10000 10000
B 10000 10000 NaN NaN
C NaN NaN NaN 10000

Example 02: Calculating the standard deviation of data that passes a given gate.


In [124]:
def count_in_gate(well):
    gate = ThresholdGate(1000.0, 'Y2-A', region='above')
    new_gate = well.gate(gate)
    return new_gate.get_data().shape[0]

print plate.apply(count_in_gate)

gate = [ThresholdGate(1000.0, 'Y2-A', region='above')]
plate.plot('Y2-A', gates=gate, color='g');


      3   4    6   7
A  4433  50  420   6
B  5496   7  NaN NaN
C   NaN NaN  NaN   0