Using the Illumina InterOp Library in Python: Part 4

Install

If you do not have the Python InterOp library installed, then you can do the following:

$ pip install -f https://github.com/Illumina/interop/releases/latest interop

You can verify that InterOp is properly installed:

$ python -m interop --test

Before you begin

If you plan to use this tutorial in an interactive fashion, then you should get an example run folder that contains IndexMetricsOut.bin.

Please change the path below so that it points at the run folder you wish to use:


In [1]:
run_folder = ""

Getting SAV Indexing Tab-like Metrics

The run_metrics class encapsulates the model for all the individual InterOp files as well as containing information from the RunInfo.xml. The Modules page contains a subset of the applications programmer's interface for all the major classes in C++. The available Python models all have the same names (with a few exceptions) and take the same parameters. This page is useful for accessing specific values loaded from the individual files.


In [2]:
from interop import py_interop_run_metrics, py_interop_run, py_interop_summary

In [3]:
run_metrics = py_interop_run_metrics.run_metrics()

By default, the run_metrics class loads all the InterOp files.

run_folder = run_metrics.read(run_folder)

The InterOp library can provide a list of all necessary InterOp files for a specific application. The following shows how to generate that list for the index summary statistics:


In [4]:
valid_to_load = py_interop_run.uchar_vector(py_interop_run.MetricCount, 0)

In [ ]:
py_interop_run_metrics.list_index_metrics_to_load(valid_to_load)

The run_metrics class can use this list to load only the required InterOp files as follows:


In [ ]:
run_folder = run_metrics.read(run_folder, valid_to_load)

The index_flowcell_summary class encapsulates all the metrics displayed on the SAV Indexing tab. This class contains a tree-like structure where metrics describing the run summary are at the root, there is a branch for each lane summary, and a sub branch for each count summary.


In [ ]:
summary = py_interop_summary.index_flowcell_summary()

The index_flowcell_summary object can be populated from the run_metrics object just so:


In [ ]:
py_interop_summary.summarize_index_metrics(run_metrics, summary)

Index Lane Summary

The index flowcell summary composes index lane summaries. An index lane summary contains information summarizing the entire lane as well as child index count summaries that describe a single sample.

Below, we use pandas to display the index count summary portion of the SAV Indexing Tab:


In [ ]:
import pandas as pd
columns = ( ('Index Number', 'id'), ('Sample Id', 'sample_id'), ('Project', 'project_name'), ('Index 1 (I7)', 'index1'), ('Index 2 (I5)', 'index2'), ('% Reads Identified (PF)', 'fraction_mapped'))
lane_summary = summary.at(0)

d = []
for label, func in columns:
    d.append( (label, pd.Series([getattr(lane_summary.at(i), func)() for i in range(lane_summary.size())], index=[lane_summary.at(i).id() for i in range(lane_summary.size())])))
df = pd.DataFrame.from_items(d)
df

You can also view the list of available metrics in the summary as follows:


In [ ]:
print "\n".join([method for method in dir(lane_summary) if not method.startswith('_') and method not in ("set", "push_back", "reserve", "this", "resize", "clear", "sort")])

In [ ]:


In [ ]: