Using the Illumina InterOp Library in Python: Part 5

Install

If you do not have the Python InterOp library installed, then you can do the following:

$ pip install interop

You can verify that InterOp is properly installed:

$ python -m interop --test

Before you begin

If you plan to use this tutorial in an interactive fashion, then you should get an example run folder that contains InterOp files such as TileMetricsOut.bin.

Please change the path below so that it points at the run folder you wish to use:


In [1]:
run_folder = r""

Getting SAV Imaging Tab-like Metrics

The run_metrics class encapsulates the model for all the individual InterOp files as well as containing information from the RunInfo.xml. The Modules page contains a subset of the applications programmer's interface for all the major classes in C++. The available Python models all have the same names (with a few exceptions) and take the same parameters. This page is useful for accessing specific values loaded from the individual files.


In [2]:
from interop import py_interop_run_metrics, py_interop_run, py_interop_table
import numpy
import pandas as pd

In [3]:
run_metrics = py_interop_run_metrics.run_metrics()

By default, the run_metrics class loads all the InterOp files.

run_folder = run_metrics.read(run_folder)

The InterOp library can provide a list of all necessary InterOp files for a specific application. The following shows how to generate that list for the index summary statistics:


In [4]:
valid_to_load = py_interop_run.uchar_vector(py_interop_run.MetricCount, 0)

In [5]:
py_interop_table.list_imaging_table_metrics_to_load(valid_to_load, False)

The run_metrics class can use this list to load only the required InterOp files as follows:


In [6]:
run_metrics.read(run_folder, valid_to_load)

The column headers for the imaing table can be created as follows:


In [7]:
columns = py_interop_table.imaging_column_vector()
py_interop_table.create_imaging_table_columns(run_metrics, columns)

Convert the columns object to a list of strings.


In [8]:
headers = []
for i in range(columns.size()):
    column = columns[i]
    if column.has_children():
        headers.extend([column.name()+"("+subname+")" for subname in column.subcolumns()])
    else:
        headers.append(column.name())

Subsample rows and columns


In [9]:
row_count=3
column_count=7
headers=headers[:column_count]

In [10]:
print headers


['Lane', 'Tile', 'Cycle', 'Read', 'Cycle Within Read', 'Density(k/mm2)', 'Density Pf(k/mm2)']

The data from imaging table can populate a numpy ndarray as follows:


In [11]:
column_count = py_interop_table.count_table_columns(columns)
row_offsets = py_interop_table.map_id_offset()
py_interop_table.count_table_rows(run_metrics, row_offsets)
data = numpy.zeros((row_offsets.size(), column_count), dtype=numpy.float32)
py_interop_table.populate_imaging_table_data(run_metrics, columns, row_offsets, data.ravel())

In [12]:
data=data[:row_count, :]

Convert the header list and data ndarray into a Pandas table.


In [13]:
d = []
for col, label in enumerate(headers):
    d.append( (label, pd.Series([val for val in data[:, col]], index=[tuple(r) for r in  data[:, :3]])))

Render the Imaging Table data using Pandas


In [14]:
df = pd.DataFrame.from_dict(dict(d))
print(df.to_string(index=False))


Cycle  Cycle Within Read  Density Pf(k/mm2)  Density(k/mm2)  Lane  Read    Tile
  1.0                1.0              230.0           773.0   1.0   1.0  2101.0
  2.0                2.0              230.0           773.0   1.0   1.0  2101.0
  3.0                3.0              230.0           773.0   1.0   1.0  2101.0

Getting Only Occpuancy from the imaging table

This section shows how to get all metrics from a single InterOp file. Here we are getting all metrics from ExtendedTileMetricsOut.bin, which is % Occupied


In [15]:
valid_to_load = py_interop_run.uchar_vector(py_interop_run.MetricCount, 0)

This allows you to select a specific file to load.

Note that tables require at least one cycle metric set to be built. We include Extraction below for that reason.


In [16]:
valid_to_load[py_interop_run.ExtendedTile] = 1
valid_to_load[py_interop_run.Tile] = 1
valid_to_load[py_interop_run.Extraction] = 1

In [17]:
run_metrics.clear()
run_metrics.read(run_folder, valid_to_load)

In [18]:
py_interop_table.create_imaging_table_columns(run_metrics, columns)

In [19]:
headers = []
for i in range(columns.size()):
    column = columns[i]
    if column.has_children():
        headers.extend([column.name()+"("+subname+")" for subname in column.subcolumns()])
    else:
        headers.append(column.name())

In [20]:
column_count = py_interop_table.count_table_columns(columns)
row_offsets = py_interop_table.map_id_offset()
py_interop_table.count_table_rows(run_metrics, row_offsets)
data = numpy.zeros((row_offsets.size(), column_count), dtype=numpy.float32)
py_interop_table.populate_imaging_table_data(run_metrics, columns, row_offsets, data.ravel())

Select only the first row_count rows.


In [21]:
data=data[:row_count, :]

Select a subset of columns:

Lane, Tile, Cycle, % Occupied


In [22]:
header_subset = ["Lane", "Tile", "Cycle", "% Occupied"]
header_index = [(header, headers.index(header)) for header in header_subset]
ids = numpy.asarray([headers.index(header) for header in header_subset[:3]])

In [23]:
d = []
for label, col in header_index:
    d.append( (label, pd.Series([val for val in data[:, col]], index=[tuple(r) for r in data[:, ids]])))

Convert to a Pandas DataFrame object


In [24]:
df = pd.DataFrame.from_dict(dict(d))

Only display data from the first cycle


In [25]:
df = df.loc[df['Cycle'] == 1.0]

In [26]:
print(df.to_string(index=False))


% Occupied  Cycle  Lane    Tile
 85.699997    1.0   1.0  2101.0