Functional genomics data is often very sparse, difficult to normalize ahead of time, and sometimes very large.
celldb is here to help make life easier! celldb is a client to a distributed key-value store, which allows you to work with these data quickly and easily.
We'll start by running a docker instance that runs redis.
Run this command in a separate terminal:
sudo docker run -itd -p 6379:6379 redis.
After importing the celldb client we connect to localhost.
In [4]:
from celldb import client as celldb
connection = celldb.connect("localhost")
Note that we save the returned connection. In the future, this connection object will be used to perform client actions.
At first, there are no data in celldb, but the list_samples function will still return an iterator.
In [5]:
samples_iter = celldb.list_samples(connection)
print(samples_iter)
To realize an iterator as a list, you can simply call the list function on it.
In [6]:
samples_list = list(samples_iter)
print(samples_list)
You can also use patterns that work on lists using the generator expression. We'll look at this again later.
In [7]:
for sample_id in samples_iter:
# there's nothing in the iterator so it should never reach here
print(sample_id)
RNA expression quantifications tell us, for a given sample, just how much a given gene was expressed. We will design a simple sample here, called sample_A and we will quantify two genes, gene_1 and gene_2 with values 0.5 and 1.0 respectively.
Upserting simply means we are either inserting or updating, and since samples and identified uniquely by their sampleId, a single value can be updated without inserting a new column or updated the entire row.
In [8]:
sampleId = "sample_A"
features = ["gene_1", "gene_2"]
values = [0.5, 1.0]
print(celldb.upsert_sample(connection, sampleId, features, values))
A convenience method is offered for collecting the list of sampleIds.
In [9]:
samples = celldb.list_samples(connection)
sample_list = list(samples)
We can also discover the features that have been quantified:
In [10]:
features = celldb.list_features(connection)
feature_list = list(features)
RNA expression analysis centers on gene-cell matrices, and so a function is provided which simplifies gathering these data.
In [11]:
celldb.matrix(connection, sample_list,feature_list)
Out[11]:
In [ ]: