In [1]:
from gatb import Graph
In [2]:
graph = Graph('-in ../../DiscoSnp/test/large_test/discoRes_k_31_c_auto.h5') # chr1 with simulated variants
graph
Out[2]:
In [3]:
help(graph)
Iterate over branching nodes:
In [4]:
for i, node in enumerate(graph):
print('{}: {!r}'.format(i, node))
if i > 10: break
Graph is a factory for Nodes:
In [5]:
kmer = b'AACGAGCACCAAAGACTTAGCATGAAAACCC'
node = graph[kmer] # Either a real graph node or one of their neighbors
node
Out[5]:
In [6]:
help(node)
In [7]:
bytes(node) # Conversion to bytestring encoded kmer
Out[7]:
In [8]:
assert node.reversed == node
node.reversed
Out[8]:
Query neighbors and degrees:
In [9]:
print(node.succs, node.out_degree)
print(node.preds, node.in_degree)
Query neighbors by manually doing the extension:
In [10]:
node_kmer = bytes(node)
for ext in b'ATGC':# NB: iterating over bytes produces character codes
ext = bytes((ext,)) # So this line reconstruct a single character bytes object
ext_kmer = node_kmer[1:] + ext
ext_node = graph[ext_kmer] # Construct the Node from the bytes encoded kmer
if ext_node in graph: # Checks if the node belong to the graph
print(ext_node)
Simple paths from neighbors are obtained as list of (path, end node, end reason):
In [11]:
node.paths
Out[11]:
Both paths ends at the same k-mer : this is a bubble induced by A/T on the first nucleotide in the path.
This code tests the forward paths results for all branching nodes and their RCs:
In [12]:
def check_paths(origin_node):
origin_node_kmer = bytes(origin_node)
for path, end_node, end_reason in origin_node.paths:
assert (origin_node_kmer + path).endswith(bytes(end_node))
if end_reason == 2: # In-branching end reason prioritized over out-branching
assert end_node.in_degree > 1
elif end_reason == 1:
assert end_node.out_degree > 1 # Out-branching
elif end_reason == 3: # Dead-end
assert end_node.out_degree == 0
for origin_node in graph:
check_paths(origin_node)
origin_node.reverse() # in-place reverse
check_paths(origin_node)
A graph (branching node, simple path) is constructed by a bounded breadth first seach over forward simple paths.
The start node is the running example (AACGAGCACCAAAGACTTAGCATGAAAACCC) of a node preceding a bubble.
The edges (paths) visited by the BFS are then displayed with `igraph :
The numbers on the edges are the simple path lengths.
In [13]:
from plot_graph import bfs_igraph
bfs_igraph(node, max_depth=20)
Out[13]: