Codemonkeys 19/01/2017

interactive jupyter notebooks with widgets

Setting up

First install the package:

pip install ipywidgets

then activate the plugin for jupyter:

jupyter nbextension enable --py widgetsnbextension

(both on the command line).

If you want to run all the examples you'll need other stuff:

pip install jupyter folium seaborn pandas scikit-image

Important: you have to restart jupyter after enabling the extension

In a new notebook, import what we need from the module:


In [2]:
from ipywidgets import interact

In [118]:
# minimal example 
# interact() automatically generates the widget based on the type of the argument

# define a function which prints the square of its argument
def print_square(x):
    print("the square of {} is {}".format(x, round(x**2, 5))) #round to ignore floating point errors
    
# call it to make sure it works
print_square(3)
print_square(12)


the square of 3 is 9
the square of 12 is 144

In [114]:
# the first argument to interact is the name of the function
# subsequent arguments are the arguments to the function in keyword style
# interact uses the arguments to figure out what widgets to display

interact(print_square, x=10)


the square of 6 is 36

Because we gave an integer argument, interact() automatically creates an integer slider. It looks like for argument value n the slider goes from -n to 3n.


In [5]:
# by giving a tuple we can set (min,max,stepsize)
    
interact(print_square, x=(1,10,0.1))


the square of 4.9 is 24.01
Out[5]:
<function __main__.print_square>

In [6]:
# interact automatically generates different controls for different argument types 
# giving a list (not any iterable) generates a drop down 

def print_fruit(fruit):
    print("you have chosen {}".format(fruit))
    
interact(print_fruit, fruit=['apple', 'banana', 'pear'])


you have chosen apple
Out[6]:
<function __main__.print_fruit>

In [7]:
# see what happens if we pass a string
interact(print_fruit, fruit="apple")


you have chosen apple
Out[7]:
<function __main__.print_fruit>

In [8]:
# multiple arguments will generate multiple widgets
def repeat_fruit(x, fruit):
    print((fruit + ', ') * x)
    
interact(repeat_fruit, x=(1,10), fruit=['apple', 'banana', 'pear'])


apple, apple, apple, apple, apple, 
Out[8]:
<function __main__.repeat_fruit>

In [9]:
# in python 3 we can also use this syntax (function annotation)
def repeat_fruit(x:(1,10),fruit:['apple', 'banana', 'pear']):
    print((fruit + ', ') * x)
    
interact(repeat_fruit)
# I will not do this for the rest of the talk as it still looks weird to me :-)


apple, apple, apple, apple, apple, 
Out[9]:
<function __main__.repeat_fruit>

In [10]:
# for slow functions it will be unusable to update the output every time we move the slider
# add the  __manual argument to get an explicit button

def repeat_fruit(x, fruit):
    print((fruit + ', ') * x)
    
interact(repeat_fruit, x=(1,10), fruit=['apple', 'banana', 'pear'], __manual=True)


Out[10]:
<function __main__.repeat_fruit>

Example: putting a user interface on a function


In [11]:
# example taken from my talk on building command line interfaces with argparse
# a function that reads dna from a file and finds kmers that make up more than a
# given fraction of the total. Don't worry about the code, just look at the signature

import collections
from tqdm import tqdm

def find_common_kmers(filename, kmer_length, threshold, report):

    dna = open(filename).read().replace("\n", '') 

    all_kmers = []
    for start in tqdm(range(len(dna) - kmer_length + 1)):
        kmer = dna[start:start+kmer_length]
        all_kmers.append(kmer)

    kmer_counts = collections.Counter(all_kmers)
    total_count = len(all_kmers)

    for kmer, count in kmer_counts.items():
        fraction = count / total_count
        if fraction > threshold:
            if report == 'count':
                print(kmer, count)
            elif report == 'fraction':
                print(kmer, fraction)

In [12]:
# an example run
find_common_kmers('small.dna', 4, 0.01, 'fraction')


100%|██████████| 6999997/6999997 [00:02<00:00, 2544655.27it/s]
ATTT 0.019971008559003668
GAAA 0.013161434212043233
AATT 0.018285150693636013
AAAA 0.03301787129337341
CAAA 0.010694290297552984
TTTG 0.010571147387634594
TTTC 0.01341772003616573
TTTA 0.010059432882614093
TTTT 0.03351472864916942
AAAT 0.01972143702347301


In [13]:
# now let's use interact
# on multiple lines for readability
interact(
    find_common_kmers, 
    filename='small.dna',
    kmer_length = (1,10),
    threshold = (0.0, 0.1, 0.01),
    report = ['count', 'fraction'],
    __manual = True
)


Out[13]:
<function __main__.find_common_kmers>

The interface works fine but looks terrible. There is a whole system for laying out / stying widgets that I will not talk about.

Interactive dataframes


In [14]:
# set up pandas/seaborn stuff
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

In [15]:
# brief digression for pandas
# we can filter rows from a dataframe like this
# it gets displayed as a nice table using jupyter magic
euk = pd.read_csv("eukaryotes.tsv", sep="\t", na_values=['-'])
euk[euk['Group'] == 'Protists']


Out[15]:
Organism/Name TaxID BioProject_Accession BioProject_ID Group SubGroup Size_(Mb) GC% Assembly_Accession Chromosomes ... Plasmids WGS Scaffolds Genes Proteins Release_Date Modify_Date Status Center BioSample_Accession
0 Emiliania_huxleyi_CCMP1516 280463 PRJNA77753 77753 Protists Other_Protists 167.67600 64.50000 GCA_000372725.1 NaN ... NaN AHAL01 7795.0 38549.0 38554.0 2013/04/19 2014/08/01 Scaffold JGI SAMN02744062
81 Leishmania_major_strain_Friedlin 347515 PRJNA10724 10724 Protists Kinetoplasts 32.85510 59.71140 GCA_000002725.2 36.0 ... NaN NaN 36.0 9686.0 8316.0 1998/06/29 2015/02/27 Complete_Genome Friedlin_Consortium SAMEA3138173
82 Leishmania_major_strain_SD_75.1 860570 PRJNA50303 50303 Protists Kinetoplasts 31.24280 59.50000 GCA_000250755.2 NaN ... NaN AFZI01 36.0 NaN NaN 2012/02/28 2014/08/06 Scaffold WUGSC SAMN02953800
83 Leishmania_major_strain_LV39c5 860569 PRJNA50301 50301 Protists Kinetoplasts 32.32750 59.30000 GCA_000331345.1 NaN ... NaN AODR01 849.0 NaN NaN 2013/01/09 2014/08/04 Scaffold WUGSC SAMN01129976
84 Trypanosoma_brucei_gambiense_DAL972 679716 PRJNA260635 260635 Protists Kinetoplasts 22.14810 47.15250 GCA_000210295.1 11.0 ... NaN NaN 11.0 9930.0 9822.0 2009/10/14 2015/06/30 Chromosome NCBI SAMEA2272188
85 Trypanosoma_cruzi 5693 PRJNA11755 11755 Protists Kinetoplasts 89.93750 51.70000 GCA_000209065.1 NaN ... NaN AAHK01 29495.0 23696.0 19607.0 2005/07/14 2014/08/06 Scaffold Trypanosoma_cruzi_consortium SAMN02953627
86 Trypanosoma_cruzi_strain_Esmeraldo 366581 PRJNA50493 50493 Protists Kinetoplasts 38.08110 50.60000 GCA_000327425.1 NaN ... NaN ANOX01 15803.0 NaN NaN 2012/12/20 2014/08/04 Scaffold WUGSC SAMN00016463
87 Trypanosoma_cruzi_JR_cl._4 914063 PRJNA59941 59941 Protists Kinetoplasts 41.48080 51.20000 GCA_000331405.1 NaN ... NaN AODP01 15312.0 NaN NaN 2013/01/09 2014/08/06 Scaffold Genome_Sequencing_Center_(GSC)_at_Washington_U... SAMN02953827
88 Trypanosoma_cruzi_Tula_cl2 1206070 PRJNA169675 169675 Protists Kinetoplasts 83.51120 51.10000 GCA_000365225.1 NaN ... NaN AQHO01 45711.0 NaN NaN 2013/04/15 2014/08/06 Scaffold Kinetoplastid_Genomes_Consortium SAMN02953848
89 Trypanosoma_cruzi_marinkellei 85056 PRJNA77843 77843 Protists Kinetoplasts 34.22620 50.90000 GCA_000300495.1 NaN ... NaN AHKC01 NaN 10117.0 10104.0 2012/10/01 2014/08/06 Contig Karolinska_Institutet SAMN02953810
90 Trypanosoma_cruzi 5693 PRJNA40815 40815 Protists Kinetoplasts 38.58950 51.20000 GCA_000188675.2 NaN ... NaN ADWP02 NaN 10861.0 10847.0 2011/02/09 2014/08/06 Contig Karolinska_Institutet SAMN02953776
91 Giardia_lamblia_ATCC_50803 184922 PRJNA1439 1439 Protists Other_Protists 11.21360 49.20000 GCA_000002435.1 NaN ... NaN AACB02 92.0 6583.0 6502.0 2003/03/26 2014/08/05 Scaffold Marine_Biological_Laboratory SAMN02952905
92 Giardia_intestinalis_ATCC_50581 598745 PRJNA33815 33815 Protists Other_Protists 11.00150 47.30000 GCA_000182405.1 NaN ... NaN ACGJ01 2931.0 4550.0 4470.0 2009/07/13 2014/08/06 Contig Karolinska_Institutet SAMN02953749
93 Giardia_lamblia_P15 658858 PRJNA39315 39315 Protists Other_Protists 11.52210 47.20000 GCA_000182665.1 NaN ... NaN ACVC01 820.0 5150.0 5007.0 2010/10/05 2014/08/06 Contig Karolinska_Institutet,_Department_of_Cell_and_... SAMN02953764
94 Entamoeba_histolytica_HM-1:IMSS 294381 PRJNA142 142 Protists Other_Protists 20.83540 24.30000 GCA_000208925.2 NaN ... NaN AAFB02 1529.0 8268.0 8163.0 2004/12/09 2014/08/06 Scaffold J._Craig_Venter_Institute_(TIGR) SAMN02953605
95 Entamoeba_histolytica_KU27 885311 PRJNA51233 51233 Protists Other_Protists 15.27210 25.20000 GCA_000338855.1 NaN ... NaN AOSC01 1796.0 7464.0 7455.0 2013/02/08 2014/08/06 Scaffold J._Craig_Venter_Institute SAMN02953834
96 Entamoeba_histolytica_HM-1:IMSS-B 885319 PRJNA51239 51239 Protists Other_Protists 12.78190 27.10000 GCA_000344925.1 NaN ... NaN APGH01 1938.0 6301.0 6301.0 2013/03/07 2014/08/06 Scaffold J._Craig_Venter_Institute SAMN02953843
97 Entamoeba_histolytica_HM-3:IMSS 885315 PRJNA72935 72935 Protists Other_Protists 13.83070 25.30000 GCA_000346345.1 NaN ... NaN APGI01 1880.0 7362.0 7358.0 2013/03/13 2014/08/06 Scaffold J._Craig_Venter_Institute SAMN02953844
98 Entamoeba_histolytica_HM-1:IMSS-A 885318 PRJNA51237 51237 Protists Other_Protists 12.29230 25.20000 GCA_000365475.1 NaN ... NaN APBR01 1685.0 6327.0 6327.0 2013/04/16 2014/08/06 Scaffold J._Craig_Venter_Institute SAMN02953836
99 Eimeria_tenella 5802 PRJNA364 364 Protists Apicomplexans 1.38246 49.40000 GCA_000002835.1 1.0 ... NaN NaN 2.0 218.0 216.0 2006/06/16 2015/02/27 Chromosome Sanger_Institute SAMEA3138174
100 Cryptosporidium_parvum_Iowa_II 353152 PRJNA144 144 Protists Apicomplexans 9.10232 30.25130 GCA_000165345.1 8.0 ... NaN AAEE01 8.0 3887.0 3805.0 2004/04/05 2009/11/11 Chromosome Univ._Minnesota SAMN02952908
101 Cryptosporidium_parvum 5807 PRJNA13873 13873 Protists Apicomplexans 1.16470 31.10000 GCA_000209695.1 1.0 ... NaN NaN 1.0 473.0 473.0 2003/07/02 2015/02/27 Chromosome MRC_Laboratory_of_Molecular_Biology,_UK SAMEA3138349
102 Toxoplasma_gondii_ME49 508771 PRJNA28893 28893 Protists Apicomplexans 62.99930 52.30000 GCA_000006565.1 NaN ... NaN ABPA01 381.0 8151.0 7987.0 2008/05/20 2013/11/01 Scaffold J._Craig_Venter_Institute NaN
103 Toxoplasma_gondii_GT1 507601 PRJNA16727 16727 Protists Apicomplexans 65.06220 52.30000 GCA_000149715.2 NaN ... NaN AAQM03 1616.0 8627.0 8460.0 2006/05/05 2014/08/06 Scaffold TIGR SAMN02953654
104 Toxoplasma_gondii_TgCATBr9 943120 PRJNA61549 61549 Protists Apicomplexans 61.82420 52.40000 GCA_000224825.1 NaN ... NaN AFHV01 NaN NaN NaN 2011/05/17 2014/08/06 Contig J._Craig_Venter_Institute SAMN02953794
105 Toxoplasma_gondii_TgCATBr5 943121 PRJNA61551 61551 Protists Apicomplexans 61.63620 52.40000 GCA_000259835.1 NaN ... NaN AFPV01 NaN NaN NaN 2011/08/26 2014/08/06 Contig J._Craig_Venter_Institute SAMN02953796
106 Toxoplasma_gondii_CtCo5 1194599 PRJNA167493 167493 Protists Apicomplexans 62.62100 52.40000 GCA_000278365.1 NaN ... NaN AKIR01 NaN NaN NaN 2012/07/17 2014/08/06 Contig J._Craig_Venter_Institute SAMN02953818
107 Toxoplasma_gondii_COUG 1074873 PRJNA71479 71479 Protists Apicomplexans 63.69580 52.30000 GCA_000338675.1 NaN ... NaN AGQR01 NaN NaN NaN 2013/02/07 2014/08/06 Contig J._Craig_Venter_Institute SAMN02953803
108 Toxoplasma_gondii 5811 PRJNA61553 61553 Protists Apicomplexans 63.04690 52.40000 GCA_000256705.1 NaN ... NaN AHIV01 NaN NaN NaN 2012/03/27 2013/11/01 Contig J._Craig_Venter_Institute SAMN00736208
109 Plasmodium_berghei 5821 PRJNA146 146 Protists Apicomplexans 17.95460 23.70000 GCA_000005395.1 NaN ... NaN CAAI01 7479.0 10024.0 9821.0 2004/11/15 2015/01/30 Scaffold Sanger_Institute SAMEA3138182
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2961 Nannochloropsis_gaditana 72520 PRJNA65109 65109 Protists Other_Protists 25.61900 54.40000 GCA_001614215.1 NaN ... NaN AFGN01 NaN NaN NaN 2016/04/08 2016/04/08 Contig Qingdao_Institute_of_Bioenergy_and_Bioprocess_... SAMN04613695
2962 Nannochloropsis_salina_CCMP1776 1027361 PRJNA65115 65115 Protists Other_Protists 24.35710 54.10000 GCA_001614245.1 NaN ... NaN AFGQ01 4764.0 NaN NaN 2016/04/08 2016/04/08 Contig ingdao_Institute_of_Bioenergy_and_Bioprocess_T... SAMN04613696
2963 Nannochloropsis_oceanica_OZ-1 1027359 PRJNA65101 65101 Protists Other_Protists 28.02190 53.70000 GCA_001614235.1 NaN ... NaN AFGK01 1871.0 NaN NaN 2016/04/08 2016/04/08 Contig Qingdao_Institute_of_Bioenergy_and_Bioprocess_... SAMN04613693
2964 Plasmodium_gaboni 647221 PRJNA295394 295394 Protists Apicomplexans 20.38550 17.56100 GCA_001602025.1 14.0 ... NaN LVLB01 833.0 5375.0 5356.0 2016/03/28 2016/04/08 Chromosome UPENNBL SAMN04053639
2999 Haemoproteus_tartakovskyi 707206 PRJNA309868 309868 Protists Apicomplexans 23.20900 0.00821 GCA_001625125.1 NaN ... NaN LSRZ01 2983.0 NaN NaN 2016/04/20 2016/04/24 Scaffold BioCI SAMN04441127
3043 Euglena_gracilis_var._bacillaris 158060 PRJNA294935 294935 Protists Other_Protists 41.19580 50.30000 GCA_001638955.1 NaN ... NaN LQMU01 NaN NaN NaN 2016/05/06 2016/05/06 Contig Biology_Centre,_ASCR,_v.v.i. SAMN04038451
3052 Cryptosporidium_parvum 5807 PRJNA253836 253836 Protists Apicomplexans 9.10482 30.10000 GCA_001305455.2 NaN ... NaN LKHK02 18.0 NaN NaN 2015/10/01 2016/05/09 Scaffold Public_Health_Wales_(Microbiology) SAMN04088909
3080 Monocercomonoides_sp._PA203 453998 PRJNA304271 304271 Protists Other_Protists 74.72190 36.80000 GCA_001643675.1 NaN ... NaN LSRY01 2095.0 NaN NaN 2016/05/13 2016/05/13 Scaffold Charles_University_in_Prague SAMN04297179
3086 Leishmania_sp._MAR_LEM2494 1303197 PRJNA192703 192703 Protists Kinetoplasts 30.81400 59.58420 GCA_000409445.2 36.0 ... NaN ATAD02 251.0 NaN NaN 2013/06/07 2016/05/17 Chromosome Kinetoplastid_Genomes_Consortium SAMN04576851
3095 Leptomonas_seymouri_BHU1095 1263718 PRJNA176882 176882 Protists Kinetoplasts 26.51290 55.30000 GCA_000333875.2 NaN ... NaN ANAF02 2245.0 NaN NaN 2013/01/25 2016/05/20 Scaffold Central_Drug_Research_Institute SAMN02953823
3098 Babesia_microti 5868 PRJNA157385 157385 Protists Apicomplexans 6.63000 36.50000 GCA_001650055.1 NaN ... NaN JGVA01 234.0 NaN NaN 2016/05/23 2016/05/23 Contig IGS SAMN02203534
3099 Babesia_microti 5868 PRJNA157387 157387 Protists Apicomplexans 6.34611 36.30000 GCA_001650065.1 NaN ... NaN JGUZ01 82.0 NaN NaN 2016/05/23 2016/05/23 Contig IGS SAMN02203880
3100 Babesia_microti 5868 PRJNA157395 157395 Protists Apicomplexans 6.80056 36.30000 GCA_001650075.1 NaN ... NaN JGUW01 250.0 NaN NaN 2016/05/23 2016/05/23 Contig IGS SAMN02203869
3101 Babesia_microti 5868 PRJNA157389 157389 Protists Apicomplexans 6.87819 36.20000 GCA_001650105.1 NaN ... NaN JGUY01 140.0 NaN NaN 2016/05/23 2016/05/23 Contig IGS SAMN02203531
3102 Babesia_microti 5868 PRJNA157391 157391 Protists Apicomplexans 6.43801 36.40000 GCA_001650135.1 NaN ... NaN JGUX01 131.0 NaN NaN 2016/05/23 2016/05/23 Contig IGS SAMN02203533
3103 Babesia_microti 5868 PRJNA157393 157393 Protists Apicomplexans 6.36105 36.30000 GCA_001650145.1 NaN ... NaN JGUV01 131.0 NaN NaN 2016/05/23 2016/05/23 Contig IGS SAMN02203864
3108 Blastocystis_sp._NandII 463137 PRJNA308101 308101 Protists Other_Protists 16.46830 53.00000 GCA_001651215.1 NaN ... NaN LXWW01 580.0 6611.0 6544.0 2016/05/24 2016/05/24 Scaffold Dalhousie_University SAMN04386717
3121 Prorocentrum_minimum 39449 PRJNA271046 271046 Protists Other_Protists 29.34900 NaN GCA_001652855.1 NaN ... NaN JXLM01 NaN NaN NaN 2016/05/26 2016/05/26 Contig Sangmyung_University SAMN03272508
3132 Uroleptopsis_citrina 693449 PRJNA319592 319592 Protists Other_Protists 32.34720 NaN GCA_001653735.1 1.0 ... NaN LXJT01 NaN NaN NaN 2016/05/31 2016/05/31 Contig Institute_of_Evolution_&_Marine_Biodiversity,_... SAMN04901492
3137 Phytophthora_kernoviae 325452 PRJNA190826 190826 Protists Other_Protists 39.48120 50.20000 GCA_000448265.2 NaN ... NaN AUUF02 5026.0 NaN NaN 2013/08/19 2016/06/02 Scaffold Tree_Aggressors_Identification_using_Genomic_A... SAMN02178789
3141 Diplonema_papillatum 91374 PRJNA301207 301207 Protists Other_Protists 107.91500 NaN GCA_001655075.1 NaN ... NaN LMZG01 NaN NaN NaN 2016/06/02 2016/06/02 Contig Juntendo_University,_Tokyo SAMN04241767
3145 Eukaryota_sp._EH-2015 1653305 PRJNA277740 277740 Protists Other_Protists 48.99680 39.10000 GCA_001655205.1 NaN ... NaN LPNZ01 11727.0 NaN NaN 2016/06/03 2016/06/03 Scaffold University_of_Calgary SAMN03396259
3171 Angomonas_deanei 59799 PRJNA320679 320679 Protists Kinetoplasts 19.28230 49.60000 GCA_001659865.1 NaN ... NaN LXWQ01 408.0 NaN NaN 2016/06/08 2016/06/08 Scaffold Heinrich_Heine_University_Duesseldorf SAMN04954948
3174 Entamoeba_histolytica 5759 PRJDB4673 324242 Protists Other_Protists 19.92460 24.70000 GCA_001662325.1 NaN ... NaN BDEQ01 1.0 8476.0 8394.0 2016/06/01 2016/06/08 Contig National_Institute_of_Infectious_Diseases SAMD00049186
3176 Phytophthora_infestans 4787 PRJNA322103 322103 Protists Other_Protists 152.11500 23.20000 GCA_001661535.1 NaN ... NaN LYVM01 58.0 NaN NaN 2016/06/10 2016/06/10 Contig Central_Potato_Research_Institute SAMN05006719
3260 Plasmodium_ovale_wallikeri 864142 PRJEB12679 317846 Protists Apicomplexans 35.65480 28.90000 GCA_900088485.1 NaN ... NaN FLRD01 1914.0 8571.0 8421.0 2016/06/16 2016/06/16 Scaffold KAUST SAMEA3867551
3261 Plasmodium_ovale_wallikeri 864142 PRJEB12679 317846 Protists Apicomplexans 36.40700 29.40000 GCA_900088545.1 NaN ... NaN FLRE01 3484.0 8790.0 8646.0 2016/06/16 2016/06/16 Scaffold KAUST SAMEA3867552
3262 Plasmodium_ovale_curtisi 864141 PRJEB12678 317845 Protists Apicomplexans 34.51870 28.40000 GCA_900088555.1 NaN ... NaN FLQV01 4025.0 7928.0 7776.0 2016/06/16 2016/06/16 Scaffold KAUST SAMEA3867549
3263 Plasmodium_ovale_curtisi 864141 PRJEB12678 317845 Protists Apicomplexans 38.01000 27.70000 GCA_900088565.1 NaN ... NaN FLQU01 2227.0 8813.0 8625.0 2016/06/15 2016/06/15 Scaffold KAUST SAMEA3867550
3265 Plasmodium_malariae 5858 PRJEB12680 317842 Protists Apicomplexans 31.92520 24.70000 GCA_900088575.1 NaN ... NaN FLQW01 7270.0 6410.0 6343.0 2016/06/16 2016/06/16 Scaffold KAUST SAMEA3867575

381 rows × 21 columns


In [16]:
# turn this into a function
def filter_genomes(group):
    return euk[euk['Group'] == group]

filter_genomes('Plants')


Out[16]:
Organism/Name TaxID BioProject_Accession BioProject_ID Group SubGroup Size_(Mb) GC% Assembly_Accession Chromosomes ... Plasmids WGS Scaffolds Genes Proteins Release_Date Modify_Date Status Center BioSample_Accession
1 Arabidopsis_thaliana 3702 PRJNA10719 10719 Plants Land_Plants 119.668000 36.0528 GCA_000001735.1 5.0 ... NaN NaN 7.0 33583.0 35378.0 2001/08/13 2016/06/20 Chromosome The_Arabidopsis_Information_Resource_(TAIR) SAMN03081427
2 Arabidopsis_thaliana 3702 PRJNA30811 30811 Plants Land_Plants 96.500200 36.7000 GCA_000222325.1 NaN ... NaN AFNA01 2143.0 NaN NaN 2011/07/18 2014/08/11 Scaffold 1001genomes SAMN02981334
3 Arabidopsis_thaliana 3702 PRJNA30811 30811 Plants Land_Plants 98.066200 36.6000 GCA_000222345.1 NaN ... NaN AFNB01 1740.0 NaN NaN 2011/07/18 2014/08/11 Scaffold 1001genomes SAMN02981335
4 Arabidopsis_thaliana 3702 PRJNA30811 30811 Plants Land_Plants 96.256500 36.6000 GCA_000222365.1 NaN ... NaN AFMZ01 1261.0 NaN NaN 2011/07/18 2014/08/11 Scaffold 1001genomes SAMN02981333
5 Arabidopsis_thaliana 3702 PRJNA30811 30811 Plants Land_Plants 96.694000 36.7000 GCA_000222385.1 NaN ... NaN AFNC01 2408.0 NaN NaN 2011/07/18 2014/08/11 Scaffold 1001genomes SAMN02981336
6 Arabidopsis_thaliana 3702 PRJNA13190 13190 Plants Land_Plants 93.654500 36.0433 GCA_000211275.1 6.0 ... NaN NaN 6.0 16842.0 20111.0 2000/12/14 2007/05/19 Chromosome Arabidopsis_Genome_Initiative SAMN03081413
7 Solanum_lycopersicum 4081 PRJNA41343 41343 Plants Land_Plants 540.589000 34.5000 GCA_000181095.1 NaN ... NaN BABP01 100783.0 NaN NaN 2009/11/06 2015/09/25 Scaffold Kazusa_DNA_Research_Institute SAMD00036540
8 Solanum_lycopersicum 4081 PRJNA67471 67471 Plants Land_Plants 0.575198 43.5000 GCA_000325825.1 NaN ... NaN AFYB01 195.0 NaN NaN 2012/02/24 2014/08/11 Contig Mitochondrial_Genome SAMN02981358
9 Hordeum_vulgare_subsp._vulgare 112509 PRJEB86 179052 Plants Land_Plants 1868.640000 44.3000 GCA_000326085.1 NaN ... NaN CAJW01 2670738.0 NaN NaN 2012/10/29 2015/02/01 Scaffold IPK-Gatersleben SAMEA2272000
10 Hordeum_vulgare_subsp._vulgare 112509 PRJDA62403 62403 Plants Land_Plants 28.016000 44.7000 GCA_000227425.1 NaN ... NaN BACC01 NaN NaN NaN 2011/06/22 2015/09/16 Contig Institute_of_Plant_Science_and_Resources,_Okay... SAMD00036549
11 Hordeum_vulgare_subsp._vulgare 112509 PRJEB88 179053 Plants Land_Plants 1779.490000 44.9000 GCA_000326125.1 NaN ... NaN CAJX01 2077901.0 NaN NaN 2012/10/29 2015/02/01 Scaffold IPK-Gatersleben SAMEA2272683
12 Oryza_sativa_Japonica_Group 39947 PRJNA13141 13141 Plants Land_Plants 382.778000 42.6517 GCA_000005425.2 12.0 ... NaN NaN 15.0 30294.0 28392.0 2005/02/02 2012/08/09 Chromosome International_Rice_Genome_Sequencing_Project NaN
13 Oryza_sativa_Indica_Group 39946 PRJNA361 361 Plants Land_Plants 426.337000 43.7299 GCA_000004655.2 12.0 ... NaN AAAA02 10627.0 39285.0 37358.0 2002/04/04 2014/08/06 Chromosome Beijing_Institute_of_Genomics,_Chinese_Academy... SAMN02953581
14 Oryza_sativa_Japonica_Group 39947 PRJNA13139 13139 Plants Land_Plants 391.148000 44.0846 GCA_000149285.1 12.0 ... NaN AACV01 7777.0 37032.0 35394.0 2004/10/21 2014/08/06 Chromosome Beijing_Genomics_Institute SAMN02953597
15 Oryza_sativa_Japonica_Group 39947 PRJDA39809 39809 Plants Land_Plants 382.151000 43.0871 GCA_000164945.1 12.0 ... NaN BABO01 12.0 NaN NaN 2010/04/01 2010/05/24 Chromosome QTL_Genomics_Research_Center,_National_Institu... SAMD00009497
16 Oryza_sativa_Japonica_Group 39947 PRJDA67163 67163 Plants Land_Plants 382.627000 43.3738 GCA_000321445.1 12.0 ... NaN BACJ01 12.0 NaN NaN 2011/12/27 2011/12/28 Chromosome Iwate_Biotechnology_Research_Center SAMD00036555
17 Triticum_aestivum 4565 PRJEA41525 41525 Plants Land_Plants 1.266080 44.8000 GCA_000210335.1 1.0 ... NaN NaN 1.0 45.0 21.0 2010/07/15 2015/02/27 Chromosome International_Wheat_Genome_Sequencing_Consortium SAMEA2272260
18 Triticum_aestivum 4565 PRJNA61773 61773 Plants Land_Plants 159.087000 43.6000 GCA_000188135.1 NaN ... NaN AEOM01 NaN NaN NaN 2011/02/04 2014/08/11 Contig International_Wheat_Genome_Sequencing_Consortium SAMN02981295
19 Triticum_aestivum 4565 PRJEB217 171500 Plants Land_Plants 3800.330000 47.7000 GCA_000334095.1 NaN ... NaN CALP01 NaN NaN NaN 2012/12/23 2015/02/01 Contig MIPS/HMGU SAMEA2272365
20 Triticum_aestivum 4565 PRJEB217 171500 Plants Land_Plants 437.106000 48.3000 GCA_000334135.1 NaN ... NaN CALO01 NaN NaN NaN 2013/01/09 2015/01/30 Contig MIPS/HMGU SAMEA2272598
21 Zea_mays 4577 PRJNA249074 249074 Plants Land_Plants 2067.620000 46.8286 GCA_000005005.5 20.0 ... NaN NaN 523.0 104305.0 116015.0 2010/01/29 2014/08/02 Chromosome NCBI NaN
22 Zea_mays 4577 PRJNA51041 51041 Plants Land_Plants 177.051000 45.9000 GCA_000223545.1 NaN ... NaN AECO01 196697.0 NaN NaN 2011/08/16 2014/08/11 Scaffold Laboratorio_Nacional_de_Genómica_para_la_Biodi... SAMN02981271
23 Zea_mays 4577 PRJNA249074 249074 Plants Land_Plants 1.335070 45.4000 GCA_000275765.1 NaN ... NaN AHID01 1844.0 NaN NaN 2012/07/03 2014/08/11 Contig NCBI SAMN02981394
224 Sorghum_bicolor 4558 PRJNA13876 13876 Plants Land_Plants 739.150000 44.2547 GCA_000003195.1 10.0 ... NaN ABXC01 3316.0 33080.0 33005.0 2009/05/22 2015/03/12 Chromosome Sorghum_Consortium SAMN02953738
225 Sorghum_bicolor 4558 PRJNA74553 74553 Plants Land_Plants 0.018494 45.0000 GCA_000236725.2 NaN ... NaN AHAQ01 20.0 22.0 22.0 2011/12/02 2014/08/11 Contig BGI SAMN02981391
226 Sorghum_bicolor 4558 PRJNA74553 74553 Plants Land_Plants 0.021299 52.0000 GCA_000236745.2 NaN ... NaN AHAP01 35.0 35.0 35.0 2011/12/02 2014/08/11 Contig BGI SAMN02981390
227 Sorghum_bicolor 4558 PRJNA74553 74553 Plants Land_Plants 0.015475 48.9000 GCA_000236765.2 NaN ... NaN AHAO01 16.0 16.0 16.0 2011/12/02 2014/08/11 Contig BGI SAMN02981389
232 Chlamydomonas_reinhardtii 3055 PRJNA12260 12260 Plants Green_Algae 120.405000 61.9512 GCA_000002595.2 NaN ... NaN ABCN01 1558.0 14488.0 14489.0 2007/08/03 2014/08/06 Scaffold DOE_Joint_Genome_Institute SAMN02953692
260 Beta_vulgaris_subsp._vulgaris 3555 PRJNA176558 176558 Plants Land_Plants 426.675000 35.9000 GCA_000397105.1 NaN ... NaN ARYA01 260142.0 NaN NaN 2013/05/13 2014/08/04 Scaffold NWISRL-ARS-USDA SAMN02262047
270 Brassica_rapa 3711 PRJNA59981 59981 Plants Land_Plants 284.129000 35.8260 GCA_000309985.1 10.0 ... NaN AENI01 40432.0 48705.0 51005.0 2011/07/14 2014/08/11 Chromosome Brassica_rapa_genome_sequencing_project,_BraGSP SAMN02981293
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3002 Coelastrella_sp._M60 1489057 PRJNA306811 306811 Plants Green_Algae 80.221800 51.8000 GCA_001630525.1 NaN ... NaN LWCB01 NaN NaN NaN 2016/04/25 2016/04/25 Contig Madurai_Kamaraj_University SAMN04497931
3021 Carthamus_tinctorius 4222 PRJNA313459 313459 Plants Land_Plants 661.938000 36.5000 GCA_001633085.1 NaN ... NaN LUCG01 463906.0 NaN NaN 2016/04/27 2016/04/27 Scaffold University_of_Georgia SAMN04523257
3031 Quercus_lobata 97700 PRJNA308314 308314 Plants Land_Plants 759.241000 33.0000 GCA_001633185.1 NaN ... NaN LRBV01 40156.0 NaN NaN 2016/04/28 2016/04/28 Scaffold University_of_Maryland_Center_for_Environmenta... SAMN04359389
3032 Dichanthelium_oligosanthes 888268 PRJNA305932 305932 Plants Land_Plants 475.878000 47.1000 GCA_001633215.1 NaN ... NaN LWDX01 17441.0 NaN NaN 2016/04/28 2016/04/28 Scaffold University_of_Nebraska SAMN04335715
3033 Dorcoceras_hygrometricum 472368 PRJNA182117 182117 Plants Land_Plants 1521.360000 43.1000 GCA_001598015.1 NaN ... NaN LVEL01 401752.0 47778.0 47778.0 2016/03/25 2016/04/28 Scaffold Capital_Normal_University,_Beijing,_China SAMN02215335
3035 Nicotiana_tabacum 4097 PRJNA208209 208209 Plants Land_Plants 3643.470000 39.2006 GCA_000715135.1 NaN ... NaN AYMY01 168247.0 69595.0 84255.0 2014/05/29 2015/04/28 Scaffold Philip_Morris_International_R&D SAMN02316627
3064 Marchantia_polymorpha_subsp._polymorpha 1480153 PRJNA310693 310693 Plants Land_Plants 205.718000 36.1000 GCA_001641455.1 NaN ... NaN LVLJ01 4137.0 17956.0 17956.0 2016/05/10 2016/05/11 Scaffold Oxford_University SAMN04450698
3071 Mentha_longifolia 38859 PRJNA310613 310613 Plants Land_Plants 353.287000 36.7000 GCA_001642375.1 NaN ... NaN LSBG01 190876.0 NaN NaN 2016/05/11 2016/05/11 Scaffold Oregon_State_University SAMN04452879
3078 Trifolium_pratense 57577 PRJEB9186 284476 Plants Land_Plants 345.991000 18.9292 GCA_900079335.1 7.0 ... NaN FKJA01 39051.0 NaN NaN 2016/04/12 2016/04/12 Chromosome TGAC SAMEA3696998
3082 Zea_mays_subsp._mays 381124 PRJNA311133 311133 Plants Land_Plants 2377.040000 46.7918 GCA_001644905.1 10.0 ... NaN LWRW01 149686.0 NaN NaN 2016/05/11 2016/05/17 Chromosome W22_Sequencing_Consortium SAMN04479043
3092 Oryza_sativa 4530 PRJNA313502 313502 Plants Land_Plants 295.390000 NaN GCA_001648745.1 NaN ... NaN LVCH01 64800.0 NaN NaN 2016/05/19 2016/05/19 Scaffold Centre_for_Cellular_and_Molecular_Biology SAMN04538187
3093 Solanum_tuberosum 4113 PRJEB8869 321712 Plants Land_Plants 90.458200 39.1000 GCA_900004685.1 NaN ... NaN CVMJ01 NaN NaN NaN 2016/05/16 2016/05/16 Contig Bioforsk_-_Norwegian_Institute_of_Agricultural... SAMEA3335593
3097 Eichhornia_paniculata 44951 PRJNA310303 310303 Plants Land_Plants 571.388000 33.2000 GCA_001647135.1 NaN ... NaN LTAE01 40286.0 NaN NaN 2016/05/02 2016/05/02 Scaffold University_of_Toronto SAMN04450015
3106 Musa_itinerans 574487 PRJNA312694 312694 Plants Land_Plants 455.349000 35.4000 GCA_001649415.1 NaN ... NaN LVTN01 28415.0 NaN NaN 2016/05/21 2016/05/21 Scaffold South_China_Botanic_Garden,_CAS SAMN04505257
3111 Daucus_carota_subsp._sativus 79200 PRJNA268187 268187 Plants Land_Plants 421.503000 30.4880 GCA_001625215.1 9.0 ... NaN LNRQ01 4826.0 33453.0 32113.0 2016/04/20 2016/05/27 Chromosome USDA_ARS SAMN03216637
3116 Oryza_sativa_Indica_Group 39946 PRJNA276972 276972 Plants Land_Plants 398.762000 43.6902 GCA_001618785.1 12.0 ... NaN LBAZ01 11486.0 NaN NaN 2016/04/11 2016/04/14 Chromosome Huazhong_Agricultural_University SAMN03380733
3117 Setaria_italica 4555 PRJNA314430 314430 Plants Land_Plants 477.542000 46.2756 GCA_001652605.1 9.0 ... NaN LWRS01 2689.0 NaN NaN 2016/05/26 2016/05/26 Chromosome Academia_Sinica SAMN04534922
3122 Arabidopsis_thaliana 3702 PRJNA311266 311266 Plants Land_Plants 118.891000 35.8882 GCA_001651475.1 5.0 ... NaN LUHQ01 30.0 27158.0 30837.0 2016/05/25 2016/05/27 Chromosome Center_for_genomic_Regulation SAMN04457953
3123 Oryza_sativa_Indica_Group 39946 PRJNA276972 276972 Plants Land_Plants 386.486000 43.5152 GCA_001618795.1 12.0 ... NaN LBBA01 8481.0 NaN NaN 2016/04/11 2016/04/14 Chromosome Huazhong_Agricultural_University SAMN03380734
3133 Hevea_brasiliensis 3981 PRJNA310386 310386 Plants Land_Plants 1373.370000 32.8000 GCA_001654055.1 NaN ... NaN LVXX01 7452.0 NaN NaN 2016/06/01 2016/06/01 Scaffold Rubber_Research_Institute SAMN04451765
3177 Manihot_esculenta 3983 PRJNA234389 234389 Plants Land_Plants 582.118000 37.8526 GCA_001659605.1 18.0 ... NaN LTYI01 2019.0 33044.0 41393.0 2016/06/07 2016/06/10 Chromosome DOE-Joint_Genome_Institute SAMN04116956
3179 Fagopyrum_esculentum 3617 PRJDB4232 313366 Plants Land_Plants 1177.690000 NaN GCA_001661195.1 NaN ... NaN BCYN01 387594.0 NaN NaN 2016/05/11 2016/05/11 Scaffold Kazusa_DNA_Research_Institute SAMD00041197
3181 Ananas_comosus 4615 PRJNA300906 300906 Plants Land_Plants 524.070000 37.8000 GCA_001661175.1 NaN ... NaN LSRQ01 8448.0 23598.0 23598.0 2016/06/10 2016/06/10 Scaffold Universiti_Malaysia_Sabah SAMN04230745
3189 Metrosideros_polymorpha_var._glaberrima 101978 PRJDB4221 320241 Plants Land_Plants 304.366000 38.2000 GCA_001662345.1 NaN ... NaN BCNH01 36376.0 NaN NaN 2016/04/28 2016/04/28 Scaffold Forest_Biology,_Division_of_Forest_and_Biomate... SAMD00044343
3190 Chlamydomonas_applanata 35704 PRJDB4711 320242 Plants Green_Algae 78.504200 56.3000 GCA_001662365.1 NaN ... NaN BDCZ01 2533.0 NaN NaN 2016/04/22 2016/04/22 Scaffold Department_of_Life_Sciences,_Graduate_School_o... SAMD00049913
3191 Chlamydomonas_asymmetrica 51683 PRJDB4711 320242 Plants Green_Algae 141.916000 55.1000 GCA_001662385.1 NaN ... NaN BDDA01 4102.0 NaN NaN 2016/04/22 2016/04/22 Scaffold Department_of_Life_Sciences,_Graduate_School_o... SAMD00049914
3192 Chlamydomonas_debaryana 47281 PRJDB4711 320242 Plants Green_Algae 120.364000 62.3000 GCA_001662405.1 NaN ... NaN BDDB01 10139.0 NaN NaN 2016/04/22 2016/04/22 Scaffold Department_of_Life_Sciences,_Graduate_School_o... SAMD00049915
3193 Chlamydomonas_sphaeroides 28458 PRJDB4711 320242 Plants Green_Algae 122.189000 65.9000 GCA_001662425.1 NaN ... NaN BDDC01 6890.0 NaN NaN 2016/04/22 2016/04/22 Scaffold Department_of_Life_Sciences,_Graduate_School_o... SAMD00049916
3195 Rosa_x_damascena 3765 PRJNA322107 322107 Plants Land_Plants 711.720000 37.6000 GCA_001662545.1 NaN ... NaN LYNE01 307872.0 NaN NaN 2016/06/13 2016/06/13 Scaffold BIO-FD&C_CO.,LTD SAMN05017599
3257 Elaeis_guineensis 51953 PRJNA262510 262510 Plants Land_Plants 499.029000 32.1000 GCA_001672495.1 NaN ... NaN JRVM01 218141.0 NaN NaN 2016/06/20 2016/06/20 Scaffold Seoul_National_University SAMN03083522

314 rows × 21 columns


In [17]:
# we can use pandas to get a list of the groups in the dataframe
list(euk['Group'].unique())


Out[17]:
['Protists', 'Plants', 'Fungi', 'Animals', 'Other']

In [18]:
# use interact to populate a drop down from the dataframe itself
interact(filter_genomes, group=list(euk['Group'].unique()))


Organism/Name TaxID BioProject_Accession BioProject_ID Group SubGroup Size_(Mb) GC% Assembly_Accession Chromosomes ... Plasmids WGS Scaffolds Genes Proteins Release_Date Modify_Date Status Center BioSample_Accession
0 Emiliania_huxleyi_CCMP1516 280463 PRJNA77753 77753 Protists Other_Protists 167.67600 64.50000 GCA_000372725.1 NaN ... NaN AHAL01 7795.0 38549.0 38554.0 2013/04/19 2014/08/01 Scaffold JGI SAMN02744062
81 Leishmania_major_strain_Friedlin 347515 PRJNA10724 10724 Protists Kinetoplasts 32.85510 59.71140 GCA_000002725.2 36.0 ... NaN NaN 36.0 9686.0 8316.0 1998/06/29 2015/02/27 Complete_Genome Friedlin_Consortium SAMEA3138173
82 Leishmania_major_strain_SD_75.1 860570 PRJNA50303 50303 Protists Kinetoplasts 31.24280 59.50000 GCA_000250755.2 NaN ... NaN AFZI01 36.0 NaN NaN 2012/02/28 2014/08/06 Scaffold WUGSC SAMN02953800
83 Leishmania_major_strain_LV39c5 860569 PRJNA50301 50301 Protists Kinetoplasts 32.32750 59.30000 GCA_000331345.1 NaN ... NaN AODR01 849.0 NaN NaN 2013/01/09 2014/08/04 Scaffold WUGSC SAMN01129976
84 Trypanosoma_brucei_gambiense_DAL972 679716 PRJNA260635 260635 Protists Kinetoplasts 22.14810 47.15250 GCA_000210295.1 11.0 ... NaN NaN 11.0 9930.0 9822.0 2009/10/14 2015/06/30 Chromosome NCBI SAMEA2272188
85 Trypanosoma_cruzi 5693 PRJNA11755 11755 Protists Kinetoplasts 89.93750 51.70000 GCA_000209065.1 NaN ... NaN AAHK01 29495.0 23696.0 19607.0 2005/07/14 2014/08/06 Scaffold Trypanosoma_cruzi_consortium SAMN02953627
86 Trypanosoma_cruzi_strain_Esmeraldo 366581 PRJNA50493 50493 Protists Kinetoplasts 38.08110 50.60000 GCA_000327425.1 NaN ... NaN ANOX01 15803.0 NaN NaN 2012/12/20 2014/08/04 Scaffold WUGSC SAMN00016463
87 Trypanosoma_cruzi_JR_cl._4 914063 PRJNA59941 59941 Protists Kinetoplasts 41.48080 51.20000 GCA_000331405.1 NaN ... NaN AODP01 15312.0 NaN NaN 2013/01/09 2014/08/06 Scaffold Genome_Sequencing_Center_(GSC)_at_Washington_U... SAMN02953827
88 Trypanosoma_cruzi_Tula_cl2 1206070 PRJNA169675 169675 Protists Kinetoplasts 83.51120 51.10000 GCA_000365225.1 NaN ... NaN AQHO01 45711.0 NaN NaN 2013/04/15 2014/08/06 Scaffold Kinetoplastid_Genomes_Consortium SAMN02953848
89 Trypanosoma_cruzi_marinkellei 85056 PRJNA77843 77843 Protists Kinetoplasts 34.22620 50.90000 GCA_000300495.1 NaN ... NaN AHKC01 NaN 10117.0 10104.0 2012/10/01 2014/08/06 Contig Karolinska_Institutet SAMN02953810
90 Trypanosoma_cruzi 5693 PRJNA40815 40815 Protists Kinetoplasts 38.58950 51.20000 GCA_000188675.2 NaN ... NaN ADWP02 NaN 10861.0 10847.0 2011/02/09 2014/08/06 Contig Karolinska_Institutet SAMN02953776
91 Giardia_lamblia_ATCC_50803 184922 PRJNA1439 1439 Protists Other_Protists 11.21360 49.20000 GCA_000002435.1 NaN ... NaN AACB02 92.0 6583.0 6502.0 2003/03/26 2014/08/05 Scaffold Marine_Biological_Laboratory SAMN02952905
92 Giardia_intestinalis_ATCC_50581 598745 PRJNA33815 33815 Protists Other_Protists 11.00150 47.30000 GCA_000182405.1 NaN ... NaN ACGJ01 2931.0 4550.0 4470.0 2009/07/13 2014/08/06 Contig Karolinska_Institutet SAMN02953749
93 Giardia_lamblia_P15 658858 PRJNA39315 39315 Protists Other_Protists 11.52210 47.20000 GCA_000182665.1 NaN ... NaN ACVC01 820.0 5150.0 5007.0 2010/10/05 2014/08/06 Contig Karolinska_Institutet,_Department_of_Cell_and_... SAMN02953764
94 Entamoeba_histolytica_HM-1:IMSS 294381 PRJNA142 142 Protists Other_Protists 20.83540 24.30000 GCA_000208925.2 NaN ... NaN AAFB02 1529.0 8268.0 8163.0 2004/12/09 2014/08/06 Scaffold J._Craig_Venter_Institute_(TIGR) SAMN02953605
95 Entamoeba_histolytica_KU27 885311 PRJNA51233 51233 Protists Other_Protists 15.27210 25.20000 GCA_000338855.1 NaN ... NaN AOSC01 1796.0 7464.0 7455.0 2013/02/08 2014/08/06 Scaffold J._Craig_Venter_Institute SAMN02953834
96 Entamoeba_histolytica_HM-1:IMSS-B 885319 PRJNA51239 51239 Protists Other_Protists 12.78190 27.10000 GCA_000344925.1 NaN ... NaN APGH01 1938.0 6301.0 6301.0 2013/03/07 2014/08/06 Scaffold J._Craig_Venter_Institute SAMN02953843
97 Entamoeba_histolytica_HM-3:IMSS 885315 PRJNA72935 72935 Protists Other_Protists 13.83070 25.30000 GCA_000346345.1 NaN ... NaN APGI01 1880.0 7362.0 7358.0 2013/03/13 2014/08/06 Scaffold J._Craig_Venter_Institute SAMN02953844
98 Entamoeba_histolytica_HM-1:IMSS-A 885318 PRJNA51237 51237 Protists Other_Protists 12.29230 25.20000 GCA_000365475.1 NaN ... NaN APBR01 1685.0 6327.0 6327.0 2013/04/16 2014/08/06 Scaffold J._Craig_Venter_Institute SAMN02953836
99 Eimeria_tenella 5802 PRJNA364 364 Protists Apicomplexans 1.38246 49.40000 GCA_000002835.1 1.0 ... NaN NaN 2.0 218.0 216.0 2006/06/16 2015/02/27 Chromosome Sanger_Institute SAMEA3138174
100 Cryptosporidium_parvum_Iowa_II 353152 PRJNA144 144 Protists Apicomplexans 9.10232 30.25130 GCA_000165345.1 8.0 ... NaN AAEE01 8.0 3887.0 3805.0 2004/04/05 2009/11/11 Chromosome Univ._Minnesota SAMN02952908
101 Cryptosporidium_parvum 5807 PRJNA13873 13873 Protists Apicomplexans 1.16470 31.10000 GCA_000209695.1 1.0 ... NaN NaN 1.0 473.0 473.0 2003/07/02 2015/02/27 Chromosome MRC_Laboratory_of_Molecular_Biology,_UK SAMEA3138349
102 Toxoplasma_gondii_ME49 508771 PRJNA28893 28893 Protists Apicomplexans 62.99930 52.30000 GCA_000006565.1 NaN ... NaN ABPA01 381.0 8151.0 7987.0 2008/05/20 2013/11/01 Scaffold J._Craig_Venter_Institute NaN
103 Toxoplasma_gondii_GT1 507601 PRJNA16727 16727 Protists Apicomplexans 65.06220 52.30000 GCA_000149715.2 NaN ... NaN AAQM03 1616.0 8627.0 8460.0 2006/05/05 2014/08/06 Scaffold TIGR SAMN02953654
104 Toxoplasma_gondii_TgCATBr9 943120 PRJNA61549 61549 Protists Apicomplexans 61.82420 52.40000 GCA_000224825.1 NaN ... NaN AFHV01 NaN NaN NaN 2011/05/17 2014/08/06 Contig J._Craig_Venter_Institute SAMN02953794
105 Toxoplasma_gondii_TgCATBr5 943121 PRJNA61551 61551 Protists Apicomplexans 61.63620 52.40000 GCA_000259835.1 NaN ... NaN AFPV01 NaN NaN NaN 2011/08/26 2014/08/06 Contig J._Craig_Venter_Institute SAMN02953796
106 Toxoplasma_gondii_CtCo5 1194599 PRJNA167493 167493 Protists Apicomplexans 62.62100 52.40000 GCA_000278365.1 NaN ... NaN AKIR01 NaN NaN NaN 2012/07/17 2014/08/06 Contig J._Craig_Venter_Institute SAMN02953818
107 Toxoplasma_gondii_COUG 1074873 PRJNA71479 71479 Protists Apicomplexans 63.69580 52.30000 GCA_000338675.1 NaN ... NaN AGQR01 NaN NaN NaN 2013/02/07 2014/08/06 Contig J._Craig_Venter_Institute SAMN02953803
108 Toxoplasma_gondii 5811 PRJNA61553 61553 Protists Apicomplexans 63.04690 52.40000 GCA_000256705.1 NaN ... NaN AHIV01 NaN NaN NaN 2012/03/27 2013/11/01 Contig J._Craig_Venter_Institute SAMN00736208
109 Plasmodium_berghei 5821 PRJNA146 146 Protists Apicomplexans 17.95460 23.70000 GCA_000005395.1 NaN ... NaN CAAI01 7479.0 10024.0 9821.0 2004/11/15 2015/01/30 Scaffold Sanger_Institute SAMEA3138182
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2961 Nannochloropsis_gaditana 72520 PRJNA65109 65109 Protists Other_Protists 25.61900 54.40000 GCA_001614215.1 NaN ... NaN AFGN01 NaN NaN NaN 2016/04/08 2016/04/08 Contig Qingdao_Institute_of_Bioenergy_and_Bioprocess_... SAMN04613695
2962 Nannochloropsis_salina_CCMP1776 1027361 PRJNA65115 65115 Protists Other_Protists 24.35710 54.10000 GCA_001614245.1 NaN ... NaN AFGQ01 4764.0 NaN NaN 2016/04/08 2016/04/08 Contig ingdao_Institute_of_Bioenergy_and_Bioprocess_T... SAMN04613696
2963 Nannochloropsis_oceanica_OZ-1 1027359 PRJNA65101 65101 Protists Other_Protists 28.02190 53.70000 GCA_001614235.1 NaN ... NaN AFGK01 1871.0 NaN NaN 2016/04/08 2016/04/08 Contig Qingdao_Institute_of_Bioenergy_and_Bioprocess_... SAMN04613693
2964 Plasmodium_gaboni 647221 PRJNA295394 295394 Protists Apicomplexans 20.38550 17.56100 GCA_001602025.1 14.0 ... NaN LVLB01 833.0 5375.0 5356.0 2016/03/28 2016/04/08 Chromosome UPENNBL SAMN04053639
2999 Haemoproteus_tartakovskyi 707206 PRJNA309868 309868 Protists Apicomplexans 23.20900 0.00821 GCA_001625125.1 NaN ... NaN LSRZ01 2983.0 NaN NaN 2016/04/20 2016/04/24 Scaffold BioCI SAMN04441127
3043 Euglena_gracilis_var._bacillaris 158060 PRJNA294935 294935 Protists Other_Protists 41.19580 50.30000 GCA_001638955.1 NaN ... NaN LQMU01 NaN NaN NaN 2016/05/06 2016/05/06 Contig Biology_Centre,_ASCR,_v.v.i. SAMN04038451
3052 Cryptosporidium_parvum 5807 PRJNA253836 253836 Protists Apicomplexans 9.10482 30.10000 GCA_001305455.2 NaN ... NaN LKHK02 18.0 NaN NaN 2015/10/01 2016/05/09 Scaffold Public_Health_Wales_(Microbiology) SAMN04088909
3080 Monocercomonoides_sp._PA203 453998 PRJNA304271 304271 Protists Other_Protists 74.72190 36.80000 GCA_001643675.1 NaN ... NaN LSRY01 2095.0 NaN NaN 2016/05/13 2016/05/13 Scaffold Charles_University_in_Prague SAMN04297179
3086 Leishmania_sp._MAR_LEM2494 1303197 PRJNA192703 192703 Protists Kinetoplasts 30.81400 59.58420 GCA_000409445.2 36.0 ... NaN ATAD02 251.0 NaN NaN 2013/06/07 2016/05/17 Chromosome Kinetoplastid_Genomes_Consortium SAMN04576851
3095 Leptomonas_seymouri_BHU1095 1263718 PRJNA176882 176882 Protists Kinetoplasts 26.51290 55.30000 GCA_000333875.2 NaN ... NaN ANAF02 2245.0 NaN NaN 2013/01/25 2016/05/20 Scaffold Central_Drug_Research_Institute SAMN02953823
3098 Babesia_microti 5868 PRJNA157385 157385 Protists Apicomplexans 6.63000 36.50000 GCA_001650055.1 NaN ... NaN JGVA01 234.0 NaN NaN 2016/05/23 2016/05/23 Contig IGS SAMN02203534
3099 Babesia_microti 5868 PRJNA157387 157387 Protists Apicomplexans 6.34611 36.30000 GCA_001650065.1 NaN ... NaN JGUZ01 82.0 NaN NaN 2016/05/23 2016/05/23 Contig IGS SAMN02203880
3100 Babesia_microti 5868 PRJNA157395 157395 Protists Apicomplexans 6.80056 36.30000 GCA_001650075.1 NaN ... NaN JGUW01 250.0 NaN NaN 2016/05/23 2016/05/23 Contig IGS SAMN02203869
3101 Babesia_microti 5868 PRJNA157389 157389 Protists Apicomplexans 6.87819 36.20000 GCA_001650105.1 NaN ... NaN JGUY01 140.0 NaN NaN 2016/05/23 2016/05/23 Contig IGS SAMN02203531
3102 Babesia_microti 5868 PRJNA157391 157391 Protists Apicomplexans 6.43801 36.40000 GCA_001650135.1 NaN ... NaN JGUX01 131.0 NaN NaN 2016/05/23 2016/05/23 Contig IGS SAMN02203533
3103 Babesia_microti 5868 PRJNA157393 157393 Protists Apicomplexans 6.36105 36.30000 GCA_001650145.1 NaN ... NaN JGUV01 131.0 NaN NaN 2016/05/23 2016/05/23 Contig IGS SAMN02203864
3108 Blastocystis_sp._NandII 463137 PRJNA308101 308101 Protists Other_Protists 16.46830 53.00000 GCA_001651215.1 NaN ... NaN LXWW01 580.0 6611.0 6544.0 2016/05/24 2016/05/24 Scaffold Dalhousie_University SAMN04386717
3121 Prorocentrum_minimum 39449 PRJNA271046 271046 Protists Other_Protists 29.34900 NaN GCA_001652855.1 NaN ... NaN JXLM01 NaN NaN NaN 2016/05/26 2016/05/26 Contig Sangmyung_University SAMN03272508
3132 Uroleptopsis_citrina 693449 PRJNA319592 319592 Protists Other_Protists 32.34720 NaN GCA_001653735.1 1.0 ... NaN LXJT01 NaN NaN NaN 2016/05/31 2016/05/31 Contig Institute_of_Evolution_&_Marine_Biodiversity,_... SAMN04901492
3137 Phytophthora_kernoviae 325452 PRJNA190826 190826 Protists Other_Protists 39.48120 50.20000 GCA_000448265.2 NaN ... NaN AUUF02 5026.0 NaN NaN 2013/08/19 2016/06/02 Scaffold Tree_Aggressors_Identification_using_Genomic_A... SAMN02178789
3141 Diplonema_papillatum 91374 PRJNA301207 301207 Protists Other_Protists 107.91500 NaN GCA_001655075.1 NaN ... NaN LMZG01 NaN NaN NaN 2016/06/02 2016/06/02 Contig Juntendo_University,_Tokyo SAMN04241767
3145 Eukaryota_sp._EH-2015 1653305 PRJNA277740 277740 Protists Other_Protists 48.99680 39.10000 GCA_001655205.1 NaN ... NaN LPNZ01 11727.0 NaN NaN 2016/06/03 2016/06/03 Scaffold University_of_Calgary SAMN03396259
3171 Angomonas_deanei 59799 PRJNA320679 320679 Protists Kinetoplasts 19.28230 49.60000 GCA_001659865.1 NaN ... NaN LXWQ01 408.0 NaN NaN 2016/06/08 2016/06/08 Scaffold Heinrich_Heine_University_Duesseldorf SAMN04954948
3174 Entamoeba_histolytica 5759 PRJDB4673 324242 Protists Other_Protists 19.92460 24.70000 GCA_001662325.1 NaN ... NaN BDEQ01 1.0 8476.0 8394.0 2016/06/01 2016/06/08 Contig National_Institute_of_Infectious_Diseases SAMD00049186
3176 Phytophthora_infestans 4787 PRJNA322103 322103 Protists Other_Protists 152.11500 23.20000 GCA_001661535.1 NaN ... NaN LYVM01 58.0 NaN NaN 2016/06/10 2016/06/10 Contig Central_Potato_Research_Institute SAMN05006719
3260 Plasmodium_ovale_wallikeri 864142 PRJEB12679 317846 Protists Apicomplexans 35.65480 28.90000 GCA_900088485.1 NaN ... NaN FLRD01 1914.0 8571.0 8421.0 2016/06/16 2016/06/16 Scaffold KAUST SAMEA3867551
3261 Plasmodium_ovale_wallikeri 864142 PRJEB12679 317846 Protists Apicomplexans 36.40700 29.40000 GCA_900088545.1 NaN ... NaN FLRE01 3484.0 8790.0 8646.0 2016/06/16 2016/06/16 Scaffold KAUST SAMEA3867552
3262 Plasmodium_ovale_curtisi 864141 PRJEB12678 317845 Protists Apicomplexans 34.51870 28.40000 GCA_900088555.1 NaN ... NaN FLQV01 4025.0 7928.0 7776.0 2016/06/16 2016/06/16 Scaffold KAUST SAMEA3867549
3263 Plasmodium_ovale_curtisi 864141 PRJEB12678 317845 Protists Apicomplexans 38.01000 27.70000 GCA_900088565.1 NaN ... NaN FLQU01 2227.0 8813.0 8625.0 2016/06/15 2016/06/15 Scaffold KAUST SAMEA3867550
3265 Plasmodium_malariae 5858 PRJEB12680 317842 Protists Apicomplexans 31.92520 24.70000 GCA_900088575.1 NaN ... NaN FLQW01 7270.0 6410.0 6343.0 2016/06/16 2016/06/16 Scaffold KAUST SAMEA3867575

381 rows × 21 columns

Out[18]:
<function __main__.filter_genomes>

In [19]:
# we can filter with multiple criteria
# this will look weird if you're not used to pandas, don't worry
euk[(euk['Size_(Mb)'] < 1000) & (euk['Group']=='Protists') & (euk['Center'] == 'JGI')]


Out[19]:
Organism/Name TaxID BioProject_Accession BioProject_ID Group SubGroup Size_(Mb) GC% Assembly_Accession Chromosomes ... Plasmids WGS Scaffolds Genes Proteins Release_Date Modify_Date Status Center BioSample_Accession
0 Emiliania_huxleyi_CCMP1516 280463 PRJNA77753 77753 Protists Other_Protists 167.6760 64.5 GCA_000372725.1 NaN ... NaN AHAL01 7795.0 38549.0 38554.0 2013/04/19 2014/08/01 Scaffold JGI SAMN02744062
827 Phytophthora_capsici_LT1534 763924 PRJNA48515 48515 Protists Other_Protists 56.0343 50.4 GCA_000325885.1 NaN ... NaN ADVJ01 NaN NaN NaN 2012/07/31 2014/08/11 Contig JGI SAMN02981264

2 rows × 21 columns


In [20]:
# as a function
def show_genomes(max_size, group, center):
    return euk[(euk['Size_(Mb)'] < max_size) & (euk['Group']==group) & (euk['Center'] == center)]

show_genomes(1000, 'Protists', 'JGI')


Out[20]:
Organism/Name TaxID BioProject_Accession BioProject_ID Group SubGroup Size_(Mb) GC% Assembly_Accession Chromosomes ... Plasmids WGS Scaffolds Genes Proteins Release_Date Modify_Date Status Center BioSample_Accession
0 Emiliania_huxleyi_CCMP1516 280463 PRJNA77753 77753 Protists Other_Protists 167.6760 64.5 GCA_000372725.1 NaN ... NaN AHAL01 7795.0 38549.0 38554.0 2013/04/19 2014/08/01 Scaffold JGI SAMN02744062
827 Phytophthora_capsici_LT1534 763924 PRJNA48515 48515 Protists Other_Protists 56.0343 50.4 GCA_000325885.1 NaN ... NaN ADVJ01 NaN NaN NaN 2012/07/31 2014/08/11 Contig JGI SAMN02981264

2 rows × 21 columns


In [21]:
# and with widgets
interact(
    show_genomes,
    max_size=(0, euk['Size_(Mb)'].max()),
    group = list(euk['Group'].unique()),
    center = list(euk.groupby('Center').size().sort_values(ascending=False).index) # pandas to sort centers by number of genomes
             )


Organism/Name TaxID BioProject_Accession BioProject_ID Group SubGroup Size_(Mb) GC% Assembly_Accession Chromosomes ... Plasmids WGS Scaffolds Genes Proteins Release_Date Modify_Date Status Center BioSample_Accession
112 Plasmodium_falciparum_Dd2 57267 PRJNA17829 17829 Protists Apicomplexans 20.8756 22.9000 GCA_000149795.1 NaN ... NaN AASM01 2837.0 5480.0 5139.0 2006/09/08 2015/08/11 Scaffold Broad_Institute SAMN02953660
113 Plasmodium_falciparum_VS/1 478864 PRJNA20867 20867 Protists Apicomplexans 18.8876 21.8000 GCA_000150295.1 NaN ... NaN ABGS01 5856.0 NaN NaN 2007/11/09 2014/08/06 Scaffold Broad_Institute SAMN02953700
114 Plasmodium_falciparum_Senegal_V34.04 478863 PRJNA20865 20865 Protists Apicomplexans 13.2408 21.3000 GCA_000150315.1 NaN ... NaN ABGT01 4329.0 NaN NaN 2007/11/09 2014/08/06 Scaffold Broad_Institute SAMN02953701
115 Plasmodium_falciparum_RO-33 5834 PRJNA20863 20863 Protists Apicomplexans 13.7141 23.1000 GCA_000150335.1 NaN ... NaN ABGU01 4991.0 NaN NaN 2007/11/09 2014/08/06 Scaffold Broad_Institute SAMN02953702
116 Plasmodium_falciparum_K1 5839 PRJNA20861 20861 Protists Apicomplexans 13.2909 21.8000 GCA_000150355.1 NaN ... NaN ABGV01 4772.0 NaN NaN 2007/11/09 2014/08/06 Scaffold Broad_Institute SAMN02953703
117 Plasmodium_falciparum_FCC-2/Hainan 478862 PRJNA20859 20859 Protists Apicomplexans 12.9639 23.5000 GCA_000150375.1 NaN ... NaN ABGW01 4956.0 NaN NaN 2007/11/09 2014/08/06 Scaffold Broad_Institute SAMN02953704
118 Plasmodium_falciparum_D10 478861 PRJNA20857 20857 Protists Apicomplexans 13.3751 22.2000 GCA_000150395.1 NaN ... NaN ABGX01 4471.0 NaN NaN 2007/11/09 2014/08/06 Scaffold Broad_Institute SAMN02953705
119 Plasmodium_falciparum_D6 478860 PRJNA20853 20853 Protists Apicomplexans 13.2165 22.4000 GCA_000150415.1 NaN ... NaN ABGY01 5011.0 NaN NaN 2007/11/09 2014/08/06 Scaffold Broad_Institute SAMN02953706
262 Tetrahymena_thermophila_SB210 312017 PRJNA51571 51571 Protists Other_Protists 157.6930 23.7000 GCA_000261185.1 NaN ... NaN AFSS02 1464.0 NaN NaN 2011/10/24 2014/08/06 Scaffold Broad_Institute SAMN02953797
316 Phytophthora_infestans_T30-4 403677 PRJNA17665 17665 Protists Other_Protists 228.5440 50.6000 GCA_000142945.1 NaN ... NaN AATU01 4921.0 19150.0 17797.0 2006/11/15 2014/08/06 Scaffold Broad_Institute SAMN02953670
559 Thecamonas_trahens_ATCC_50062 461836 PRJNA37929 37929 Protists Other_Protists 28.6806 63.6000 GCA_000142905.1 NaN ... NaN ADVD01 131.0 10656.0 10626.0 2010/06/03 2015/07/28 Scaffold Broad_Institute SAMN02953775
640 Tetrahymena_malaccensis_436 1075772 PRJNA51577 51577 Protists Other_Protists 106.6720 23.2000 GCA_000231845.2 NaN ... NaN AFXY02 554.0 NaN NaN 2011/10/24 2014/08/06 Scaffold Broad_Institute SAMN02953799
653 Tetrahymena_borealis 5893 PRJNA51575 51575 Protists Other_Protists 93.5061 24.3000 GCA_000260095.1 NaN ... NaN AHAN01 325.0 NaN NaN 2012/04/27 2013/11/01 Scaffold Broad_Institute SAMN00727833
654 Tetrahymena_elliotti_4EA 1075773 PRJNA51573 51573 Protists Other_Protists 90.8376 23.5000 GCA_000231825.2 NaN ... NaN AFXF02 331.0 NaN NaN 2011/10/24 2014/08/06 Scaffold Broad_Institute SAMN02953798
711 Phytophthora_parasitica_P1569 1317065 PRJNA181332 181332 Protists Other_Protists 55.2296 49.6000 GCA_000365505.1 NaN ... NaN ANIZ01 NaN 23251.0 28117.0 2013/04/16 2014/08/04 Contig Broad_Institute SAMN01816558
712 Phytophthora_parasitica_P1976 1317066 PRJNA181333 181333 Protists Other_Protists 54.8843 49.6000 GCA_000365525.1 NaN ... NaN ANJA01 NaN 23215.0 28082.0 2013/04/16 2014/08/04 Contig Broad_Institute SAMN01816559
713 Phytophthora_parasitica_CJ01A1 1317063 PRJNA181330 181330 Protists Other_Protists 54.2899 49.6000 GCA_000365545.1 NaN ... NaN ANIX01 NaN 23141.0 28065.0 2013/04/16 2014/08/04 Contig Broad_Institute SAMN01816556
714 Phytophthora_parasitica_P10297 1317064 PRJNA181331 181331 Protists Other_Protists 54.5743 49.6000 GCA_000367145.1 NaN ... NaN ANIY01 NaN 23199.0 27956.0 2013/04/16 2014/08/04 Contig Broad_Institute SAMN01816557
809 Saprolegnia_diclina_VS20 1156394 PRJNA86859 86859 Protists Other_Protists 62.8858 55.3000 GCA_000281045.1 NaN ... NaN AIJL01 390.0 17448.0 18229.0 2012/07/23 2014/08/11 Scaffold Broad_Institute SAMN02981407
940 Salpingoeca_rosetta 946362 PRJNA37927 37927 Protists Other_Protists 55.4403 55.5000 GCA_000188695.1 NaN ... NaN ACSY01 154.0 11796.0 11731.0 2011/02/09 2013/11/06 Scaffold Broad_Institute SAMN00013333
973 Plasmodium_yoelii_17X 1323249 PRJNA163121 163121 Protists Apicomplexans 22.2224 23.0000 GCA_000505035.1 NaN ... NaN AMYO01 130.0 5922.0 7508.0 2013/12/06 2014/08/04 Scaffold Broad_Institute SAMN00974094
1004 Phytophthora_parasitica_INRA-310 761204 PRJNA73155 73155 Protists Other_Protists 82.3892 49.4000 GCA_000247585.2 NaN ... NaN AGFV02 708.0 23232.0 27942.0 2012/02/13 2014/08/27 Scaffold Broad_Institute SAMN02981373
1005 Phytophthora_parasitica 4792 PRJNA205154 205154 Protists Other_Protists 48.0752 49.6000 GCA_000509465.1 NaN ... NaN AVGB01 6489.0 22040.0 26824.0 2013/12/16 2014/08/04 Scaffold Broad_Institute SAMN02178347
1006 Phytophthora_parasitica 4792 PRJNA205155 205155 Protists Other_Protists 47.4491 49.6000 GCA_000509485.1 NaN ... NaN AVGC01 6813.0 21855.0 26676.0 2013/12/16 2014/08/04 Scaffold Broad_Institute SAMN02178348
1007 Phytophthora_parasitica 4792 PRJNA205156 205156 Protists Other_Protists 47.8094 49.6000 GCA_000509505.1 NaN ... NaN AVGD01 6724.0 21737.0 26512.0 2013/12/16 2014/08/04 Scaffold Broad_Institute SAMN02178349
1008 Phytophthora_parasitica 4792 PRJNA205153 205153 Protists Other_Protists 47.7167 49.6000 GCA_000509525.1 NaN ... NaN AVGE01 6791.0 21842.0 26648.0 2013/12/16 2014/08/04 Scaffold Broad_Institute SAMN02178346
1030 Aphanomyces_astaci 112090 PRJNA187372 187372 Protists Other_Protists 75.8444 49.6000 GCA_000520075.1 NaN ... NaN AYTG01 835.0 19570.0 26259.0 NaN NaN Scaffold Broad_Institute SAMN01906578
1031 Aphanomyces_invadans 157072 PRJNA188082 188082 Protists Other_Protists 71.4025 52.1000 GCA_000520115.1 NaN ... NaN AYTH01 481.0 15408.0 20816.0 NaN NaN Scaffold Broad_Institute SAMN01907307
1032 Plasmodium_falciparum_NF54 5843 PRJNA67505 67505 Protists Apicomplexans 25.8002 28.4000 GCA_000401695.2 NaN ... NaN AMYQ01 371.0 5938.0 5936.0 2013/05/16 2014/02/06 Scaffold Broad_Institute SAMN01737343
1033 Plasmodium_falciparum_UGT5.1 1237627 PRJNA176388 176388 Protists Apicomplexans 25.5834 27.0000 GCA_000401715.2 NaN ... NaN AMYP01 545.0 5918.0 5922.0 2013/05/16 2014/08/04 Scaffold Broad_Institute SAMN01737342
1034 Plasmodium_falciparum_Vietnam_Oak-Knoll_(FVO) 1036723 PRJNA67477 67477 Protists Apicomplexans 25.8857 25.0000 GCA_000521015.1 NaN ... NaN AOPP01 447.0 6227.0 6234.0 2014/01/16 2014/01/16 Scaffold Broad_Institute SAMN00765681
1035 Plasmodium_falciparum_MaliPS096_E11 1036727 PRJNA67491 67491 Protists Apicomplexans 26.6052 26.2000 GCA_000521035.1 NaN ... NaN AOPQ01 547.0 6304.0 6317.0 2014/01/16 2014/01/17 Scaffold Broad_Institute SAMN00765679
1036 Plasmodium_falciparum_Tanzania_(2000708) 1036725 PRJNA67485 67485 Protists Apicomplexans 27.0235 25.8000 GCA_000521055.1 NaN ... NaN AOPR01 988.0 6702.0 6719.0 2014/01/16 2014/08/06 Scaffold Broad_Institute SAMN02953832
1037 Plasmodium_falciparum_NF135/5.C10 1036726 PRJNA67487 67487 Protists Apicomplexans 24.4990 22.8000 GCA_000521075.1 NaN ... NaN AOPS01 234.0 6331.0 6349.0 2014/01/16 2014/01/17 Scaffold Broad_Institute SAMN00768929
1038 Plasmodium_falciparum_Palo_Alto/Uganda 57270 PRJNA67499 67499 Protists Apicomplexans 24.5624 25.8000 GCA_000521095.1 NaN ... NaN AOPT01 218.0 6039.0 6048.0 2014/01/16 2014/08/06 Scaffold Broad_Institute SAMN02953833
1039 Plasmodium_falciparum_CAMP/Malaysia 5835 PRJNA67497 67497 Protists Apicomplexans 23.5691 22.9000 GCA_000521115.1 NaN ... NaN AOPU01 331.0 6100.0 6118.0 2014/01/16 2014/01/17 Scaffold Broad_Institute SAMN00765680
1040 Plasmodium_falciparum_FCH/4 1036724 PRJNA67481 67481 Protists Apicomplexans 24.6945 32.2000 GCA_000521155.1 NaN ... NaN AOPV01 405.0 5778.0 5770.0 2014/01/16 2014/01/17 Scaffold Broad_Institute SAMN00765683
1045 Plasmodium_falciparum_7G8 57266 PRJNA20851 20851 Protists Apicomplexans 24.5591 23.0000 GCA_000150435.3 NaN ... NaN ABGZ02 277.0 6290.0 6314.0 2007/11/09 2014/02/06 Scaffold Broad_Institute SAMN00773069
1046 Plasmodium_falciparum_Santa_Lucia 478859 PRJNA20849 20849 Protists Apicomplexans 23.4710 22.1000 GCA_000150455.3 NaN ... NaN ABHA02 247.0 6179.0 6195.0 2007/11/09 2014/08/06 Scaffold Broad_Institute SAMN02953707
1061 Plasmodium_inui_San_Antonio_1 1237626 PRJNA176387 176387 Protists Apicomplexans 27.4050 42.4000 GCA_000524495.1 NaN ... NaN AMYR01 323.0 5878.0 5832.0 2014/01/27 2014/08/08 Scaffold Broad_Institute SAMN01737341
1173 Fonticula_alba 691883 PRJNA189482 189482 Protists Other_Protists 31.2965 64.3000 GCA_000388065.2 NaN ... NaN AROH01 214.0 6454.0 6309.0 2013/05/02 2014/05/01 Scaffold Broad_Institute SAMN02741864
1228 Saprolegnia_parasitica_CBS_223.65 695850 PRJNA36583 36583 Protists Other_Protists 53.1316 57.5000 GCA_000151545.2 NaN ... NaN ADCG02 1443.0 20399.0 20121.0 2010/02/04 2014/08/11 Scaffold Broad_Institute SAMN02981252
1317 Plasmodium_vinckei_vinckei 54757 PRJNA163123 163123 Protists Apicomplexans 18.2216 23.4000 GCA_000709005.1 NaN ... NaN AMYS01 49.0 5009.0 4954.0 2014/06/16 2014/07/30 Scaffold Broad_Institute SAMN00974095
1752 Capsaspora_owczarzaki_ATCC_30864 595528 PRJNA20341 20341 Protists Other_Protists 27.9678 53.7000 GCA_000151315.2 NaN ... NaN ACFS02 84.0 8793.0 8792.0 2009/08/05 2015/03/06 Scaffold Broad_Institute SAMN02953747
1892 Plasmodium_fragile 5857 PRJNA67411 67411 Protists Apicomplexans 25.9145 42.1000 GCA_000956335.1 NaN ... NaN JOOM01 248.0 5744.0 5672.0 2014/06/25 2015/03/20 Scaffold Broad_Institute SAMN00013493
2172 Plasmodium_vivax_India_VII 1077284 PRJNA65119 65119 Protists Apicomplexans 29.2515 41.5000 GCA_000320625.2 NaN ... NaN AFBK01 568.0 6631.0 6616.0 2012/07/25 2015/07/22 Scaffold Broad_Institute SAMN00710644
2173 Plasmodium_vivax_Brazil_I 1033975 PRJNA67065 67065 Protists Apicomplexans 28.8987 40.4723 GCA_000320645.2 NaN ... NaN AFMK01 261.0 6541.0 6464.0 2012/07/25 2015/07/22 Scaffold Broad_Institute SAMN00710434
2174 Plasmodium_vivax_Mauritania_I 1035515 PRJNA67237 67237 Protists Apicomplexans 28.4638 40.5718 GCA_000320665.2 NaN ... NaN AFNI01 206.0 6442.0 6381.0 2012/07/25 2015/07/22 Scaffold Broad_Institute SAMN00710347
2175 Plasmodium_vivax_North_Korean 1035514 PRJNA67239 67239 Protists Apicomplexans 29.6790 40.3732 GCA_000320685.2 NaN ... NaN AFNJ01 542.0 6782.0 6695.0 2012/07/25 2015/07/22 Scaffold Broad_Institute SAMN00710542
2188 Plasmodium_falciparum_RAJ116 580058 PRJNA33065 33065 Protists Apicomplexans 14.1493 26.0000 GCA_000186025.2 NaN ... NaN ACBR01 1203.0 3250.0 3210.0 2009/04/03 2015/07/28 Scaffold Broad_Institute SAMN02953739
2190 Plasmodium_falciparum_IGH-CR14 580059 PRJNA33119 33119 Protists Apicomplexans 21.7788 21.3000 GCA_000186055.2 NaN ... NaN ACBS01 853.0 5125.0 5073.0 2009/04/03 2015/08/04 Scaffold Broad_Institute SAMN02953740
2193 Sphaeroforma_arctica_JP610 667725 PRJNA20463 20463 Protists Other_Protists 121.6310 43.2000 GCA_001186125.1 NaN ... NaN AEOD01 15619.0 18525.0 18730.0 2011/08/19 2015/07/29 Scaffold Broad_Institute SAMN02953786
2223 Plasmodium_falciparum_HB3 137071 PRJNA16340 16340 Protists Apicomplexans 24.2940 20.8933 GCA_000149665.2 NaN ... NaN AANS01 1191.0 5696.0 5462.0 2006/03/27 2015/08/11 Scaffold Broad_Institute SAMN02953642

53 rows × 21 columns

Out[21]:
<function __main__.show_genomes>

In [22]:
from ipywidgets import IntRangeSlider, SelectMultiple

# for more control we can create the widgets explicitly

# a function that takes a (min, max) size range and a list of subgroups (insects, birds, etc.)
# and returns a dataframe, filtered, and just showing some columns
def show_genomes(size_range, subgroups):
    min_size, max_size = size_range # the size_range widget will give us a (min, max) tuple   
    selected =  euk[(min_size < euk['Size_(Mb)']) & (euk['Size_(Mb)'] < max_size) & (euk['SubGroup'].isin(subgroups))]
    return selected[['Organism/Name', 'Size_(Mb)', 'Group', 'SubGroup', 'GC%', 'Status', 'Center']]


# now we have to explicitly create our widgets
# create size slider for min/max interval
size_slider = IntRangeSlider(min=0, max=euk['Size_(Mb)'].max(), description='Genome size', continuous_update=False)

# allow multiple selection from subgroups
subgroup_select = SelectMultiple(
    options=list(euk['SubGroup'].unique()),
    description='Subgroups'
)

# note that while we have to construct the widgets, we have never (yet) 
# written any code to handle events - this is very declarative
interact(show_genomes, size_range=size_slider, subgroups=subgroup_select)


Organism/Name Size_(Mb) Group SubGroup GC% Status Center
Out[22]:
<function __main__.show_genomes>

In [23]:
# for extra fancyness we can update widgets based on other widgets

# we create our two widgets just as before
size_slider = IntRangeSlider(min=0, max=euk['Size_(Mb)'].max(), description='Genome size', continuous_update=False)

subgroup_select = SelectMultiple(
    options=list(euk['SubGroup'].unique()),
    description='Subgroups'
)

# function to update the size range slider limits when we select new groups
def update_size_range(change):
    subgroups = change['new']
    selected_genome_sizes = euk[euk['SubGroup'].isin(subgroups)]['Size_(Mb)']
    size_slider.min = selected_genome_sizes.min()
    size_slider.max = selected_genome_sizes.max()
    size_slider.value = (0, size_slider.max*0.8)

# tell the selecter to call the update function whenever it changes
subgroup_select.observe(update_size_range, 'value')


interact(show_genomes, size_range=size_slider, subgroups=subgroup_select)


Organism/Name Size_(Mb) Group SubGroup GC% Status Center
Out[23]:
<function __main__.show_genomes>

In [24]:
from ipywidgets import Select, IntSlider
# when combined with charting this gets interesting

# set a proper index on the pandas dataframe
euk = pd.read_csv("eukaryotes.tsv", sep="\t", na_values=['-'])
euk.index = euk.apply(lambda x : "{} ({})".format(x['Organism/Name'], x['BioSample_Accession']), axis=1)

# now our function will plot the genomes instead of displaying a dataframe
# takes a number of genomes to show, and the name of a subgroup
def show_genomes(count, subgroup):
    plt.gcf().clear() # clear the plot before drawing a new one
    selected = euk[euk['SubGroup'] == subgroup]
    
    selected.sort_values('Genes', ascending=False)[:count][['Genes', 'Proteins', 'Size_(Mb)']].plot.barh(
        figsize=(10,4), 
        subplots=True,
        sharex=False # try setting this to True and see what happens
    )

subgroup_select = Select(
    options=list(euk['SubGroup'].unique()),
    description='Subgroup'
)

# the count is controlled by a manual integer slider
count_slider = IntSlider(
    min = 2, max=100, value=10, continuous_update=False
)

interact(show_genomes, subgroup=subgroup_select, count=count_slider)


Out[24]:
<function __main__.show_genomes>
<matplotlib.figure.Figure at 0x7f6eadf434e0>

In [25]:
# another example with seaborn
sns.set_style("whitegrid")

# this function will take a list of subgroups
# select rows belonging to them
# and plot size vs. number of predicted proteins
def plot_size_proteins(subgroups):
    sns.lmplot(
        data=euk[euk['SubGroup'].isin(subgroups)], 
        x='Size_(Mb)', 
        y='Proteins', 
        size=4, 
        hue='SubGroup') # use hue to set colour column

# for this we want just one widget, a multiple selection
subgroup_select = SelectMultiple(
    options=list(euk['SubGroup'].unique()),
    description='Subgroups'
    )

interact(plot_size_proteins, subgroups=subgroup_select)


Out[25]:
<function __main__.plot_size_proteins>

In [26]:
# a blobtools-like example with taxonomically-annotated contig data
import numpy as np
con=pd.read_csv('contigs.csv')

# make a log coverage column
con['log_coverage'] = con.apply(lambda x : np.log10(x['coverage']), axis=1)
con.head()


/home/martin/.virtualenvs/datavis/lib/python3.5/site-packages/ipykernel/__main__.py:6: RuntimeWarning: divide by zero encountered in log10
Out[26]:
name length GC coverage phylum log_coverage
0 scaffold1_size1534183 1534183 0.4304 0.603315 Bacteroidetes -0.219456
1 scaffold2_size1255804 1255804 0.4237 1.266944 Bacteroidetes 0.102757
2 scaffold3_size1208507 1208507 0.5007 0.364660 Armatimonadetes -0.438112
3 scaffold4_size1204010 1204010 0.4281 0.499764 Bacteroidetes -0.301235
4 scaffold5_size1189196 1189196 0.4942 0.320681 Proteobacteria -0.493927

In [27]:
from ipywidgets import Dropdown

# a function which draws a scatter plot of GC vs log coverage
# filtered by which phylum the contig (presumably) comes from 
# and also filtered by minimum contig size
# it also samples a fraction of the total contigs for plotting
def draw_plot(phyla, min_size, frac):
    con_sample = con.sample(frac=frac)
    sns.lmplot(
        
        data = con_sample[(con_sample['phylum'].isin(phyla)) & (con['length'] > min_size)],
        x = 'GC',
        y = 'log_coverage',
        hue='phylum',
        fit_reg=False,
        size=4
    )

phylum_select = SelectMultiple(
    options=list(con.groupby('phylum').size().sort_values(ascending=False).index),
    description='phyla'
    )

min_size_slider = IntSlider(
    min = 1000,
    max = 100000,
    continuous_update=False,
    description = 'min contig length'
)

sample_select = Dropdown(
    options = [1, 0.5, 0.2, 0.1],
    description = 'sample fraction'
)

interact(draw_plot, phyla=phylum_select, min_size=min_size_slider, frac=sample_select)


/home/martin/.virtualenvs/datavis/lib/python3.5/site-packages/ipykernel/__main__.py:11: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
Out[27]:
<function __main__.draw_plot>

In [61]:
# one more with the eukaryotic genomes data

# function to plot a heatmap showing number of genomes for each group and completion status
# filtered by sequencing center
def plot_heatmap(center):
    plt.figure(figsize=(7,7))

    # add some spaces and change the colour map
    # see here http://chrisalbon.com/python/seaborn_color_palettes.html for colour maps
    size_v_status = euk[euk['Center'] == center].groupby(['SubGroup', 'Status']).size().unstack()
    sns.heatmap(size_v_status, square=True, linewidths=2, cmap='OrRd', annot=True, fmt="3.0f")
    
interact(plot_heatmap, center=list(euk.groupby('Center').size().sort_values(ascending=False).index))


Out[61]:
<function __main__.plot_heatmap>

In [109]:
import folium
from folium import plugins

# shall we do one with maps?
# using distribution of Anopheles gambiae from here
# http://lifemapper.org/species/Anopheles%20gambiae

# read the csv file into a dataframe
ano = pd.read_csv('ag.csv')

# function to draw a map showing specimens between two years
def draw_map(target_year):
    
    # make a map centered on Africa
    map_osm = folium.Map(location=[-10, -0], zoom_start=3)
    
    
    # use pandas to grab the rows between the two target years
    selected = ano[(ano['year'] >= target_year[0]) & (ano['year'] <= target_year[1])]
    
    # make a list of (lat, long) tuples
    locations = zip(list(selected['dec_lat']), list(selected['dec_long']))

    # pass the list to folium and ask it to make a heatmap
    map_osm.add_children(folium.plugins.HeatMap(locations))
    return map_osm

# call it like this

draw_map((1970, 1979))


Out[109]:

In [110]:
# now make it interactive by adding a slider for the year range
years_slider = IntRangeSlider(
    min=1967, 
    max=1998, 
    description='years', 
    continuous_update=False)

interact(draw_map, target_year=years_slider)


Out[110]:
<function __main__.draw_map>

In [84]:
# and one with images
# load an image of cells under a microscope that I found on the internet
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches

from skimage import io
from skimage.filters import threshold_otsu
from skimage.segmentation import clear_border
from skimage.measure import label, regionprops
from skimage.morphology import closing, square
from skimage.color import label2rgb, rgb2gray
from skimage.viewer import ImageViewer


image = rgb2gray(io.imread('cells.png'))
io.imshow(image)


Out[84]:
<matplotlib.image.AxesImage at 0x7f6ea8623390>

In [87]:
# use this example from the scikit-image tutorial to detect and label cells
# create a slider to control minimum size of areas detected
def find_shapes(min_size):
    # apply threshold
    thresh = threshold_otsu(image)

    bw = closing(image > thresh, square(1))

    # remove artifacts connected to image border
    cleared = clear_border(bw)

    # label image regions
    label_image = label(cleared)
    image_label_overlay = label2rgb(label_image, image=image)

    fig, ax = plt.subplots(figsize=(5, 5))
    ax.imshow(image_label_overlay)

    for region in regionprops(label_image):
        # take regions with large enough areas
        if region.area >= min_size:
            # draw rectangle around segmented coins
            minr, minc, maxr, maxc = region.bbox
            rect = mpatches.Rectangle((minc, minr), maxc - minc, maxr - minr,
                                      fill=False, edgecolor='red', linewidth=2)
            ax.add_patch(rect)

    ax.set_axis_off()
    plt.tight_layout()
    plt.show()
 
# we can call the function thus:
find_shapes(200)



In [88]:
# now to make it interactive we just add a slider for the minimum feature size

min_size_slider = IntSlider(min=50, max=1000, continous_update=False)
    
interact(find_shapes, min_size=min_size_slider)


Out[88]:
<function __main__.find_shapes>

Other stuff not mentioned

  • more widget types, date picker, colour picker, boolean types, strings, radio buttons
  • widgets to play/pause/rewind animations
  • there's a whole layout system which seems to work kind of like css
  • css styles for colour, size, text, etc.
  • more event handling stuff

In [ ]:


In [ ]:


In [ ]:


In [73]:
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches

from skimage import io
from skimage.filters import threshold_otsu
from skimage.segmentation import clear_border
from skimage.measure import label, regionprops
from skimage.morphology import closing, square
from skimage.color import label2rgb, rgb2gray
from skimage.viewer import ImageViewer


image = rgb2gray(io.imread('cells.png'))
io.imshow(image)


Out[73]:
<matplotlib.image.AxesImage at 0x7f6ea8ec1550>

In [82]:
def find_shapes(min_size):
    # apply threshold
    thresh = threshold_otsu(image)

    bw = closing(image > thresh, square(1))

    # remove artifacts connected to image border
    cleared = clear_border(bw)

    # label image regions
    label_image = label(cleared)
    image_label_overlay = label2rgb(label_image, image=image)

    fig, ax = plt.subplots(figsize=(10, 6))
    ax.imshow(image_label_overlay)

    for region in regionprops(label_image):
        # take regions with large enough areas
        if region.area >= min_size:
            # draw rectangle around segmented coins
            minr, minc, maxr, maxc = region.bbox
            rect = mpatches.Rectangle((minc, minr), maxc - minc, maxr - minr,
                                      fill=False, edgecolor='red', linewidth=2)
            ax.add_patch(rect)

    ax.set_axis_off()
    plt.tight_layout()
    plt.show()
    
min_size_slider = IntSlider(min=50, max=1000, continous_update=False)
    
interact(find_shapes, min_size=min_size_slider)


Out[82]:
<function __main__.find_shapes>