welcome to celldb!

Functional genomics data is often very sparse, difficult to normalize ahead of time, and sometimes very large.

celldb is here to help make life easier! celldb is a client to a distributed key-value store, which allows you to work with these data quickly and easily.

We'll start by running a docker instance that runs redis.

Run this command in a separate terminal:

sudo docker run -itd -p 6379:6379 redis.

Connecting using the celldb client

After importing the celldb client we connect to localhost.

In [4]:
from celldb import client as celldb
connection = celldb.connect("localhost")

Note that we save the returned connection. In the future, this connection object will be used to perform client actions.


For methods that return a list of results, a python iterator is returned. This allows your code to step through the results (if they are very large), or realize as a list as needed.

At first, there are no data in celldb, but the list_samples function will still return an iterator.

In [5]:
samples_iter = celldb.list_samples(connection)

<generator object <genexpr> at 0x7f2962fabfa0>

To realize an iterator as a list, you can simply call the list function on it.

In [6]:
samples_list = list(samples_iter)


You can also use patterns that work on lists using the generator expression. We'll look at this again later.

In [7]:
for sample_id in samples_iter:
    # there's nothing in the iterator so it should never reach here

Upserting samples

RNA expression quantifications tell us, for a given sample, just how much a given gene was expressed. We will design a simple sample here, called sample_A and we will quantify two genes, gene_1 and gene_2 with values 0.5 and 1.0 respectively.

Upserting simply means we are either inserting or updating, and since samples and identified uniquely by their sampleId, a single value can be updated without inserting a new column or updated the entire row.

In [8]:
sampleId = "sample_A"
features = ["gene_1", "gene_2"]
values = [0.5, 1.0]
print(celldb.upsert_sample(connection, sampleId, features, values))


Listing samples

A convenience method is offered for collecting the list of sampleIds.

In [9]:
samples = celldb.list_samples(connection)
sample_list = list(samples)

Listing features

We can also discover the features that have been quantified:

In [10]:
features = celldb.list_features(connection)
feature_list = list(features)

Retrieving a matrix

RNA expression analysis centers on gene-cell matrices, and so a function is provided which simplifies gathering these data.

In [11]:
celldb.matrix(connection, sample_list,feature_list)

[['sample_A', 0.5, 1.0]]

In [ ]: