pyOpenCGA Basic User Usage


[NOTE] The server methods used by pyopencga client are defined in the following swagger URL:

For tutorials and more info about accessing the OpenCGA REST please read the documentation at http://docs.opencb.org/display/opencga/Python

Loading pyOpenCGA


In [1]:
# Initialize PYTHONPATH for pyopencga
import sys
import os
from pprint import pprint

Now is time to import pyopencga modules.

You have two options

a) You can import pyopencga directly (skip next section) if you have installed pyopencga with pip install pyopencga (remember to use sudo unless you are using your own Python install or virtualenv)

b) If you need to import from the source code remember that Python3 does not accept relative importing, so you need to append the module path to sys.path

Preparing environmnet for importing from source


In [2]:
cwd = os.getcwd()
print("current_dir: ...."+cwd[-10:])

base_modules_dir = os.path.dirname(cwd)
print("base_modules_dir: ...."+base_modules_dir[-10:])

sys.path.append(base_modules_dir)


current_dir: ..../notebooks
base_modules_dir: ..../pyOpenCGA

Importing pyopencga


In [3]:
from pyopencga.opencga_config import ClientConfiguration 
from pyopencga.opencga_client import OpenCGAClient
import json

Creating some useful functions to manage the results


In [28]:
def get_not_private_methods(client):
    all_methods = dir(client)
    
    #showing all methos (exept the ones starting with "_", as they are private for the API)
    methods = [method for method in all_methods if not method.startswith("_")]
    return methods

Setup client and login

Configuration and Credentials

You need to provide a server URL in the standard configuration format for OpenCGA as a dict or in a json file

Regarding credentials, if you don't pass the password, it would be asked interactively without echo.


In [4]:
# server host

# user credentials
user = "demo"
passwd = "demo"

# the user demo access projects from user opencga
prj_owner = "opencga"

Creating ConfigClient for server connection configuration


In [5]:
# Creating ClientConfiguration dict
host = 'http://bioinfo.hpc.cam.ac.uk/opencga-demo'

config_dict = {"rest": {
                       "host": host 
                    }
               }

print("Config information:\n",config_dict)


Config information:
 {'rest': {'host': 'http://bioinfo.hpc.cam.ac.uk/opencga-demo'}}

Initialize the client configuration

You can pass a dictionary to the ClientConfiguration


In [6]:
config = ClientConfiguration(config_dict)
oc = OpenCGAClient(config)

Make the login


In [8]:
# here we put only the user in order to be asked for the password interactively
oc.login(user)


You are now connected to OpenCGA

Working with Users


In [29]:
# Listing available methods for the user client object
user_client = oc.users

# showing all methods (except the ones starting with "_", as they are private for the API)
get_not_private_methods(user_client)


Out[29]:
['auto_refresh',
 'configs',
 'create',
 'delete',
 'filters',
 'info',
 'login',
 'login_handler',
 'logout',
 'on_retry',
 'projects',
 'refresh_token',
 'session_id',
 'token',
 'update',
 'update_configs',
 'update_filter',
 'update_filters',
 'update_password']

In [17]:
## getting user information
## [NOTE] User needs the quey_id string directly --> (user)
uc_info = user_client.info(user).responses[0]['results'][0]

print("user info:")
print("name: {}\towned_projects: {}".format(uc_info["name"], len(uc_info["projects"])))


user info:
name: Demo	owned_projects: 0

The demo user has not projects from its own, but has access to some projectso from opencga user.

Let's see how to find it out.

We need to list the project info from project client not from the user client.

We use the method search()

And remember that OpenCGA REST objects encapsulate the result inside the responses property, so we need to access the first element of the responses array.


In [37]:
## Getting user projects
## [NOTE] Client specific methods have the query_id as a key:value (i.e (user=user_id)) 
project_client = oc.projects
projects_info = project_client.search().responses[0]["results"]

for project in projects_info:
    print("Name: {}\tfull_id: {}".format(project["name"], project["fqn"]))


Name: Exomes GRCh37	full_id: opencga@exomes_grch37

User demo has access to one project called opencga@exomes_grch37

note: in opencga the projects and studies have a full qualify name, fqn with the format [owner]@[porject]:[study]

Working with Projects


In [32]:
project_client = oc.projects

get_not_private_methods(project_client)


Out[32]:
['aggregation_stats',
 'auto_refresh',
 'create',
 'delete',
 'increment_release',
 'info',
 'login_handler',
 'on_retry',
 'search',
 'session_id',
 'studies',
 'token',
 'update']

In [38]:
## Getting all projects from logged in user
project_client = oc.projects
projects_list = project_client.search().responses[0]["results"]

for project in projects_list:
    print("Name: {}\tfull_id: {}".format(project["name"], project["fqn"]))


Name: Exomes GRCh37	full_id: opencga@exomes_grch37

In [56]:
## Getting information from a specific project
project_name = 'exomes_grch37'
project_info = project_client.info(project_name).responses[0]['results'][0]

#show the studies
for study in project_info['studies']:
    print("project:{}\nstudy:{}\ttype:{}".format(project_name, study['name'], study['type'] ))
    print('--')


project:exomes_grch37
study:Corpasome	type:CASE_CONTROL
--
project:exomes_grch37
study:CEPH Trio	type:CASE_CONTROL
--

In [58]:
## Fetching the studies from a project using the studies method
results = project_client.studies(project_name).responses[0]['results']
for result in results:
    pprint(result)


{'attributes': {},
 'cipher': 'none',
 'cohorts': [],
 'creationDate': '20190604154741',
 'dataStores': {},
 'datasets': [],
 'description': '',
 'experiments': [],
 'files': [],
 'fqn': 'opencga@exomes_grch37:corpasome',
 'groups': [{'id': '@members',
             'name': '@members',
             'userIds': ['opencga', 'demo']},
            {'id': '@admins', 'name': '@admins', 'userIds': []}],
 'id': 'corpasome',
 'individuals': [],
 'jobs': [],
 'lastModified': '20190604154741',
 'modificationDate': '20190604154741',
 'name': 'Corpasome',
 'panels': [],
 'permissionRules': {},
 'release': 1,
 'samples': [],
 'size': 0,
 'stats': {},
 'status': {'date': '20190604154741', 'message': '', 'name': 'READY'},
 'type': 'CASE_CONTROL',
 'uri': 'file:///mnt/data/opencga-demo/sessions/users/opencga/projects/1/2/',
 'uuid': 'Iyy1cwFrAAIAAViPXu86gw',
 'variableSets': []}
{'attributes': {},
 'cipher': 'none',
 'cohorts': [],
 'creationDate': '20190617155526',
 'dataStores': {},
 'datasets': [],
 'description': 'Data generated from replicates of the CEPH trio NA12878, '
                'NA12891 and NA12892 sequenced on NextSeq 500 using V2 '
                'reagents. Samples prepared using Nextera Rapid Capture Exome '
                'reagent kit.',
 'experiments': [],
 'files': [],
 'fqn': 'opencga@exomes_grch37:ceph_trio',
 'groups': [{'id': '@members',
             'name': '@members',
             'userIds': ['opencga', 'demo']},
            {'id': '@admins', 'name': '@admins', 'userIds': []}],
 'id': 'ceph_trio',
 'individuals': [],
 'jobs': [],
 'lastModified': '20190617155526',
 'modificationDate': '20190617155526',
 'name': 'CEPH Trio',
 'panels': [],
 'permissionRules': {},
 'release': 1,
 'samples': [],
 'size': 0,
 'stats': {},
 'status': {'date': '20190617155526', 'message': '', 'name': 'READY'},
 'type': 'CASE_CONTROL',
 'uri': 'file:///mnt/data/opencga-demo/sessions/users/opencga/projects/1/16/',
 'uuid': 'ZiZ7WgFrAAIAATfAQacuCw',
 'variableSets': []}