You can code your own scripts using pergola as a Python library. Here we summarize some examples of how you can use it on your scripts.
/pergola/doc/notebooks under pergola GitHub repository. Should you want, you can interactively execute the code using Jupyter.The two basics data inputs pergola uses is a file with longitudinal recordings (sequence of temporal events) in the form of a CSV or xlsx file and a mapping file containing the correspondence between the fields in this previous file and the pergola ontology.
Pergola can process any sequence of temporal events contained in a character-separated file as in the example below:
Animal StartT EndT Behavior Value
1 137 156 eat 0.06
1 168 192 drink 0.02
1 250 281 eat 0.07
1 311 333 eat 0.08
1 457 482 drink 0.02
1 569 601 drink 0.03
Pergola needs that you set the equivalences between the fields of the input data and a controled vocabular defined by Pergola ontology. The format of the mapping file is the external mapping file format from the Gene Ontology Consortium, you can see an example below:
! Mapping of behavioural fields into genome browser fields
!
behavioural_file:Animal > pergola:track
behavioural_file:StartT > pergola:chromStart
behavioural_file:EndT > pergola:chromEnd
behavioural_file:Behavior > pergola:dataTypes
behavioural_file:Value > pergola:dataValue
In [1]:
# You might have to set the path to run this notebook directly from ipython notebook
import sys
my_path_to_modules = "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/"
sys.path.append(my_path_to_modules)
Mappings between the input data and pergola ontology are loaded in MappingInfo objects:
In [2]:
from pergola import mapping
# load mapping file
mapping_info = mapping.MappingInfo("../../sample_data/feeding_behavior/b2p.txt")
To view the mappings MappingInfo objects provide the :func:pergola.mapping.Mapping.write method
In [3]:
mapping_info.write()
MappingInfo objects are needed to load data into IntData objects as it will be explained in the lines below.
IntData objects load all the intervals of a file:
In [7]:
from pergola import parsers
from pergola import intervals
# load the data into an IntData object that will store the sequence of events
int_data = intervals.IntData("../../sample_data/feeding_behavior/feeding_behavior_HF_mice.csv", map_dict=mapping_info.correspondence)
Intervals when loaded are stored in a list of tuples that can be accessed by data attribute:
In [8]:
#Displays first 10 tuples of data list
int_data.data[:10]
Out[8]:
IntData objects also provide some other attributes like the set of different tracks (term for IDs in pergola ontology) contained in the data:
In [9]:
int_data.data_types
Out[9]:
The minimun value present in the data:
In [10]:
int_data.min
Out[10]:
The maximun value:
In [11]:
int_data.max
Out[11]:
The set of different tracks present in the data (term for different IDs in pergola ontology). In this case the different IDs for each mice:
In [12]:
int_data.tracks
Out[12]:
And finally the dataTypes (term for different types of data in pergola ontology) that can be used to encode for example different behaviours:
In [14]:
mapping_info.write()
In [15]:
mapping_info.correspondence['EndT']
Out[15]:
GenomicContainer is a generic class from which three subclasses derive:
Data can be loaded into a Track objects by read function. This function allows to convert the intervals to relative values using the first time point as 0:
In [16]:
int_data_read = int_data.read(relative_coord=True)
In [17]:
int_data_read.list_tracks
Out[17]:
In [18]:
int_data_read.range_values
Out[18]:
In [19]:
dict_bed = int_data_read.convert(mode='bed')
In [20]:
#dict_bed = data_read.convert(mode='bed')
for key in dict_bed:
print "key.......: ",key#del
bedSingle = dict_bed [key]
print "::::::::::::::",bedSingle.data_types
In [21]:
bed_12_food_sc = dict_bed[('2', 'food_sc')]
In [22]:
bed_12_food_sc.range_values
Out[22]:
In [23]:
type(bed_12_food_sc)
Out[23]:
In [24]:
bed_12_food_sc.data
# Code to print the data inside a bed object (generator object)
#for row in bed_12_food_sc.data:
# print row
Out[24]:
In [25]:
dict_bedGraph = int_data_read.convert(mode='bedGraph')
In [26]:
for key in dict_bedGraph:
print "key.......: ",key#del
bedGraphSingle = dict_bedGraph [key]
print "::::::::::::::",bedGraphSingle.data_types
In [27]:
bedG_8_food_sc = dict_bedGraph[('8', 'food_sc')]
In [28]:
bedG_8_food_sc.data
# Code to print the data inside a bed object (generator object)
#for row in bedG_8_food_sc:
# print row
Out[28]:
In [29]:
type(int_data_read)
Out[29]:
In [30]:
type(int_data_read.data)
Out[30]:
In [31]:
int_data_read.range_values
Out[31]:
In [32]:
int_data_read.list_tracks
Out[32]:
In [33]:
int_data_read.data[-10]
Out[33]:
In [34]:
int_data_read.data_types
Out[34]:
In [35]:
#data_read.convert(mode=write_format, tracks=sel_tracks, tracks_merge=tracks2merge,
# data_types=data_types_list, dataTypes_actions=dataTypes_act,
# window=window_size)
In [36]:
mapping.write_chr (int_data_read)
In [37]:
# Generate a cytoband file and a bed file with phases
mapping.write_cytoband(end = int_data.max - int_data.min, delta=43200, start_phase="dark", lab_bed=False)
In [38]:
#data_read = intData.read(relative_coord=True, multiply_t=1)
data_read = int_data.read(relative_coord=True)
In [39]:
#for i in data_read.data:
# print i
In [40]:
data_type_col = {'food_sc': 'orange', 'food_fat':'blue'}
In [41]:
bed_str = data_read.convert(mode="bed", data_types=["food_sc", "food_fat"], dataTypes_actions="all",
color_restrictions=data_type_col)
In [42]:
for key in bed_str:
bedSingle = bed_str[key]
bedSingle.save_track()
pergola allows the conversion to several genomic formats, here we summarize some commands and operations as an example of pergola capabilities:
track type=bed name="1_eat" description="1 eat" visibility=2 itemRgb="On" priority=20
chr1 137.0 156.0 "" 0.06 + 137.0 156.0 51,254,51
chr1 250.0 281.0 "" 0.07 + 250.0 281.0 0,254,0
chr1 311.0 333.0 "" 0.08 + 311.0 333.0 25,115,25
track type=bed name="1_eat" description="1 eat" visibility=2 itemRgb="On" priority=20
chr1 0 19 "" 0.06 + 0 19 51,254,51
chr1 113 144 "" 0.07 + 113 144 0,254,0
chr1 174 196 "" 0.08 + 174 196 25,115,25
In [43]:
data_type_col_bedGraph = {'food_sc':'orange', 'food_fat_food_sc':'blue'}
In [44]:
bedGraph_str = data_read.convert(mode="bedGraph", window=1800, data_types=["food_sc", "food_fat"], dataTypes_actions="all", color_restrictions=data_type_col_bedGraph)
In [45]:
for key in bedGraph_str:
bedGraph_single = bedGraph_str[key]
bedGraph_single.save_track()
track type=bedGraph name="1_eat" description="1_eat" visibility=full color=0,254,0 altColor=25,115,25 priority=20
chr1 0 30 0.06
chr1 30 60 0
chr1 60 90 0
chr1 90 120 0.0158064516129
chr1 120 150 0.0541935483871
chr1 150 180 0.0218181818182
chr1 180 210 0.0581818181818
chr1 210 240 0
In [48]:
## Bed file showing the files (recordings)
# reading correspondence file
mapping_file_data = mapping.MappingInfo("../../sample_data/feeding_behavior/f2g.txt")
In [49]:
mapping_file_data.write()
In [50]:
# Reading file info
files_data = intervals.IntData("../../sample_data/feeding_behavior/files.csv", map_dict=mapping_file_data.correspondence)
data_file_read = files_data.read(relative_coord=True)
In [51]:
bed_file = data_file_read.convert(mode="bed", dataTypes_actions="all", tracks_merge=files_data.tracks)
In [52]:
for key in bed_file:
bed_file_single = bed_file[key]
bed_file_single.save_track(name_file = "files_data")
In [53]:
# Reading phase info
phase_data = intervals.IntData("../../sample_data/feeding_behavior/phases_exp.csv", map_dict=mapping_file_data.correspondence)
data_phase_read = phase_data.read(relative_coord=True)
In [54]:
bed_file = data_phase_read.convert(mode="bed", dataTypes_actions="all", tracks_merge=phase_data.tracks)
In [55]:
for key in bed_file:
bed_file_single = bed_file[key]
bed_file_single.save_track(name_file = "phase_exp")
means bed file to delete
chr1 1 1801 "" 1000 + 0 1 0.06
chr1 137171 138971 "" 1000 + 132936 137171 0
chr1 397442 399242 "" 1000 + 391684 397442 0
chr1 568633 570433 "" 1000 + 563646 568633 0.125714
intermeal to delete
chr1 1 30 "" 1000 + 1 30 0
chr1 183 345 "" 1000 + 183 345 0
chr1 502 924 "" 1000 + 502 924 0
In [ ]: