Querying a Population and Plotting the Results

Before we can query a population, we must have one. We will use a population of satellites as an example.

In this portion of the tutorial we will query and plot the undigested population data, and not their implications. All of the queries here are analogous to similar or identical queries in Structured Query Language (SQL).

For simplicity we still use the same pre-analyzed population database (.bdb file) as in other portions of the tutorial, even though it is not important that any probabilistic analysis has been done. If we have not yet downloaded that pre-analyzed database, do so now:



In [1]:

    
import os
import subprocess
if not os.path.exists('satellites.bdb'):
    subprocess.check_call(['curl', '-O', 'http://probcomp.csail.mit.edu/bayesdb/downloads/satellites.bdb'])

We construct a Population instance that helps us read, query, and visualize a particular population.



In [2]:

    
from bdbcontrib import Population
satellites = Population(name='satellites', bdb_path='satellites.bdb')









    




BQL [SELECT * FROM bayesdb_generator] ()

Querying the data using SQL

Before querying the implications of a population, it can be useful to look at a sample of the raw data and metadata. This can be done using a combination of ordinary SQL and convenience functions built into bayeslite. We start by finding one of the most well-known satellites, the International Space Station:



In [3]:

    
satellites.q("""
SELECT * FROM satellites 
WHERE Name LIKE 'International Space Station%'
""").transpose()









    




BQL [
SELECT * FROM satellites 
WHERE Name LIKE 'International Space Station%'
] ()






    Out[3]:






  
    
      
      0
    
  
  
    
      Name
      International Space Station (ISS [first elemen...
    
    
      Country_of_Operator
      Multinational
    
    
      Operator_Owner
      NASA/Multinational
    
    
      Users
      Government
    
    
      Purpose
      Scientific Research
    
    
      Class_of_Orbit
      LEO
    
    
      Type_of_Orbit
      Intermediate
    
    
      Perigee_km
      401
    
    
      Apogee_km
      422
    
    
      Eccentricity
      0.00155
    
    
      Period_minutes
      92.8
    
    
      Launch_Mass_kg
      NaN
    
    
      Dry_Mass_kg
      NaN
    
    
      Power_watts
      NaN
    
    
      Date_of_Launch
      36119
    
    
      Anticipated_Lifetime
      30
    
    
      Contractor
      Boeing Satellite Systems (prime)/Multinational
    
    
      Country_of_Contractor
      Multinational
    
    
      Launch_Site
      Baikonur Cosmodrome
    
    
      Launch_Vehicle
      Proton
    
    
      Source_Used_for_Orbital_Data
      www.satellitedebris.net 12/12
    
    
      longitude_radians_of_geo
      NaN
    
    
      Inclination_radians
      0.90059



In [4]:

    
satellites.q("""SELECT COUNT(*) FROM satellites;""")









    




BQL [SELECT COUNT(*) FROM satellites;] ()






    Out[4]:






  
    
      
      "COUNT"(*)
    
  
  
    
      0
      1167

We can select multiple items using a SQL wildcard, in this case the match-anything '%' on either side of "GPS".

We ask for variables as rows and observations as columns by using .transpose() as we did for the ISS above. By default, observations map to rows, and variables map to columns.



In [5]:

    
satellites.q("""SELECT * FROM satellites WHERE Name LIKE '%GPS%'""").transpose()









    




BQL [SELECT * FROM satellites WHERE Name LIKE '%GPS%'] ()






    Out[5]:






  
    
      
      0
      1
      2
      3
      4
      5
      6
      7
      8
      9
      ...
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
    
  
  
    
      Name
      Navstar GPS II-10 (Navstar SVN 23, PRN 32,  US...
      Navstar GPS II-14 (Navstar SVN 26, PRN 26, USA...
      Navstar GPS II-21 (Navstar SVN 39, PRN 09, USA...
      Navstar GPS II-23 (Navstar SVN 34, PRN 04, USA...
      Navstar GPS II-24 (Navstar SVN 36, PRN 06, USA...
      Navstar GPS II-25 (Navstar SVN 33, PRN 03, USA...
      Navstar GPS II-26 (Navstar SVN 40, PRN 10, USA...
      Navstar GPS II-28 (Navstar SVN 38, PRN 08, USA...
      Navstar GPS II-35 (Navstar SVN 35, PRN 30, USA...
      Navstar GPS IIF-1 (Navstar SVN 62, PRN 25, USA...
      ...
      Navstar GPS IIR-7 (Navstar SVN 54, PRN 18, USA...
      Navstar GPS IIR-8 (Navstar SVN 56, PRN 16, USA...
      Navstar GPS IIR-9 (Navstar SVN 45, PRN 21, USA...
      Navstar GPS IIR-M-1 (Navstar SVN 53, PRN 17, U...
      Navstar GPS IIR-M-2 (Navstar SVN 52, PRN 31, U...
      Navstar GPS IIR-M-3 (Navstar SVN 58, PRN 12, U...
      Navstar GPS IIR-M-4 (Navstar SVN 55, PRN 15, U...
      Navstar GPS IIR-M-5 (Navstar SVN 57, PRN 29, U...
      Navstar GPS IIR-M-6 (Navstar SVN 48, PRN 07, U...
      Navstar GPS IIR-M-8 (Navstar SVN 50, PRN 05, U...
    
    
      Country_of_Operator
      USA
      USA
      USA
      USA
      USA
      USA
      USA
      USA
      USA
      USA
      ...
      USA
      USA
      USA
      USA
      USA
      USA
      USA
      USA
      USA
      USA
    
    
      Operator_Owner
      DoD/US Air Force
      DoD/US Air Force
      DoD/US Air Force
      DoD/US Air Force
      DoD/US Air Force
      DoD/US Air Force
      DoD/US Air Force
      DoD/US Air Force
      DoD/US Air Force
      DoD/US Air Force
      ...
      DoD/US Air Force
      DoD/US Air Force
      DoD/US Air Force
      DoD/US Air Force
      DoD/US Air Force
      DoD/US Air Force
      DoD/US Air Force
      DoD/US Air Force
      DoD/US Air Force
      DoD/US Air Force
    
    
      Users
      Military/Commercial
      Military/Commercial
      Military/Commercial
      Military/Commercial
      Military/Commercial
      Military/Commercial
      Military/Commercial
      Military/Commercial
      Military/Commercial
      Military/Commercial
      ...
      Military/Commercial
      Military/Commercial
      Military/Commercial
      Military/Commercial
      Military/Commercial
      Military/Commercial
      Military/Commercial
      Military/Commercial
      Military/Commercial
      Military/Commercial
    
    
      Purpose
      Navigation/Global Positioning
      Navigation/Global Positioning
      Navigation/Global Positioning
      Navigation/Global Positioning
      Navigation/Global Positioning
      Navigation/Global Positioning
      Navigation/Global Positioning
      Navigation/Global Positioning
      Navigation/Global Positioning
      Navigation/Global Positioning
      ...
      Navigation/Global Positioning
      Navigation/Global Positioning
      Navigation/Global Positioning
      Navigation/Global Positioning
      Navigation/Global Positioning
      Navigation/Global Positioning
      Navigation/Global Positioning
      Navigation/Global Positioning
      Navigation/Global Positioning
      Navigation/Global Positioning
    
    
      Class_of_Orbit
      MEO
      MEO
      MEO
      MEO
      MEO
      MEO
      MEO
      MEO
      MEO
      MEO
      ...
      MEO
      MEO
      MEO
      MEO
      MEO
      MEO
      MEO
      MEO
      MEO
      MEO
    
    
      Type_of_Orbit
      N/A
      N/A
      N/A
      N/A
      N/A
      N/A
      N/A
      N/A
      N/A
      N/A
      ...
      N/A
      N/A
      N/A
      N/A
      N/A
      N/A
      N/A
      N/A
      N/A
      N/A
    
    
      Perigee_km
      19781
      19959
      20120
      20104
      19986
      20080
      20134
      19912
      20109
      20188
      ...
      20104
      20155
      20063
      20142
      20020
      20206
      20149
      20150
      20135
      20160
    
    
      Apogee_km
      20582
      20403
      20244
      20260
      20315
      20284
      20227
      20449
      20257
      20224
      ...
      20266
      20344
      20433
      20221
      20342
      20366
      20213
      20311
      20152
      20209
    
    
      Eccentricity
      0.01508
      0.00836
      0.00234
      0.00294
      0.0062
      0.00384
      0.00175
      0.01011
      0.00279
      0.00068
      ...
      0.00305
      0.00355
      0.00695
      0.00149
      0.00606
      0.003
      0.00121
      0.00303
      0.00032
      0.00092
    
    
      Period_minutes
      717.94
      717.93
      717.97
      717.97
      716.69
      717.97
      717.91
      717.91
      718.01
      718.94
      ...
      718.09
      720.71
      720.65
      717.95
      717.93
      722.19
      717.93
      719.92
      716.4
      718.07
    
    
      Launch_Mass_kg
      1816
      1816
      1816
      1816
      1816
      1816
      1816
      1816
      1816
      1630
      ...
      2217
      2217
      2217
      2217
      2060
      2060
      2217
      2060
      2217
      2059
    
    
      Dry_Mass_kg
      844
      844
      844
      844
      844
      844
      844
      844
      844
      NaN
      ...
      980
      980
      980
      980
      NaN
      NaN
      980
      NaN
      980
      980
    
    
      Power_watts
      700
      700
      700
      700
      700
      700
      700
      700
      700
      NaN
      ...
      1136
      1136
      1136
      1136
      NaN
      NaN
      1136
      NaN
      1136
      1136
    
    
      Date_of_Launch
      33203
      33792
      34146
      34268
      34403
      35152
      35262
      35740
      34211
      40326
      ...
      36921
      37650
      37711
      38621
      38985
      39038
      39372
      39436
      39522
      40042
    
    
      Anticipated_Lifetime
      7.5
      7.5
      7.5
      7.5
      7.5
      7.5
      7.5
      7.5
      7.5
      12
      ...
      10
      10
      10
      10
      10
      10
      10
      10
      10
      10
    
    
      Contractor
      Rockwell International
      Rockwell International
      Rockwell International
      Rockwell International
      Rockwell International
      Rockwell International
      Rockwell International
      Rockwell International
      Rockwell International
      Boeing Satellite Systems
      ...
      Lockheed Martin
      Lockheed Martin
      Lockheed Martin
      Lockheed Martin
      Lockheed Martin Missiles and Space
      Lockheed Martin Missiles and Space
      Lockheed Martin Missiles and Space
      Lockheed Martin Missiles and Space
      Lockheed Martin
      Lockheed Martin Missiles and Space
    
    
      Country_of_Contractor
      USA
      USA
      USA
      USA
      USA
      USA
      USA
      USA
      USA
      USA
      ...
      USA
      USA
      USA
      USA
      USA
      USA
      USA
      USA
      USA
      USA
    
    
      Launch_Site
      Cape Canaveral
      Cape Canaveral
      Cape Canaveral
      Cape Canaveral
      Cape Canaveral
      Cape Canaveral
      Cape Canaveral
      Cape Canaveral
      Cape Canaveral
      Cape Canaveral
      ...
      Cape Canaveral
      Cape Canaveral
      Cape Canaveral
      Cape Canaveral
      Cape Canaveral
      Cape Canaveral
      Cape Canaveral
      Cape Canaveral
      Cape Canaveral
      Cape Canaveral
    
    
      Launch_Vehicle
      Delta 2
      Delta 2
      Delta 2
      Delta 2
      Delta 2
      Delta 2
      Delta 2
      Delta 2
      Delta 2
      Delta 4
      ...
      Delta 2
      Delta 2
      Delta 2
      Delta 2
      Delta 2
      Delta 2
      Delta 2
      Delta 2
      Delta 2
      Delta 2
    
    
      Source_Used_for_Orbital_Data
      None
      JMSatcat304
      JMSatcat304
      JMSatcat304
      JMSatcat304
      JMSatcat304
      JMSatcat304
      JMSatcat304
      JMSatcat304
      SC - ASCR
      ...
      JMSatcat304
      JMSatcat304
      JMSatcat304
      SC - ASCR
      SC - ASCR
      SC - ASCR
      None
      SC - ASCR
      SC - ASCR
      SC - ASCR
    
    
      longitude_radians_of_geo
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      Inclination_radians
      0.973545
      0.959931
      0.954695
      0.959931
      0.958186
      0.954695
      0.959931
      0.958186
      0.95644
      0.959931
      ...
      0.959931
      0.959931
      0.958186
      0.961327
      0.958884
      0.96028
      0.95644
      0.959233
      0.961676
      0.962025
    
  

23 rows × 32 columns

Select just a few variables in the data, ordering by the number of minutes it takes for the satellite to complete one orbit, measured in minutes, and sorted ascending (as opposed to DESC), again as in SQL:



In [6]:

    
satellites.q("""
SELECT name, dry_mass_kg, period_minutes, class_of_orbit FROM satellites 
ORDER BY period_minutes ASC LIMIT 10;
""")









    




BQL [
SELECT name, dry_mass_kg, period_minutes, class_of_orbit FROM satellites 
ORDER BY period_minutes ASC LIMIT 10;
] ()






    Out[6]:






  
    
      
      Name
      Dry_Mass_kg
      Period_minutes
      Class_of_Orbit
    
  
  
    
      0
      Advanced Orion 5 (NRO L-32, USA 223)
      NaN
      NaN
      GEO
    
    
      1
      IGS-8B (Information Gathering Satellite 8B, IG...
      NaN
      NaN
      LEO
    
    
      2
      Interstellar Boundary EXplorer (IBEX)
      NaN
      0.22
      Elliptical
    
    
      3
      Spektr-R/RadioAstron
      NaN
      0.22
      Elliptical
    
    
      4
      SDS III-6 (Satellite Data System) NRO L-27, Gr...
      NaN
      14.36
      GEO
    
    
      5
      Advanced Orion 6 (NRO L-15, USA 237)
      NaN
      23.94
      GEO
    
    
      6
      SDS III-7 (Satellite Data System) NRO L-38, Dr...
      NaN
      23.94
      GEO
    
    
      7
      RISat-2 (Radar Imaging Satellite 2)
      NaN
      41.20
      LEO
    
    
      8
      Kuaizhou-1 (KZ-1)
      NaN
      90.61
      LEO
    
    
      9
      X37-B OTV-1 (USA 240)
      NaN
      91.54
      LEO

Note that NaN is ordered before 0 in this sort.

Plots and Graphs

Bayeslite includes statistical graphics procedures designed for easy use with data extracted from an SQL database.

Before we introduce those, let the notebook know that we would like to use and display matplotlib figures within the notebook:



In [7]:

    
%matplotlib inline

Let's see a menu of the easily available plotting utilities:



In [8]:

    
satellites.help("plot")









    



bdbcontrib.plot_utils.mi_hist(col1, col2, num_samples=1000, bins=5)
    Histogram of estimated mutual information between the two columns for each
    of a generator's model instances.
bdbcontrib.plot_utils.pairplot(df, generator_name=None, show_contour=False, colorby=None, show_missing=False, show_full=False, pad=None, h_pad=None, **kwargs)
    Plot array of plots for all pairs of columns.
bdbcontrib.plot_utils.pairplot_vars(varnames, colorby=None, generator_name=None, population_name=None, **kwargs)
    Use pairplot to show the given variables.
bdbcontrib.plot_utils.histogram(df, nbins=15, bins=None, normed=None)
    Plot histogram of one- or two-column table.
bdbcontrib.plot_utils.barplot(df)
    Make bar-plot from categories and their heights.
bdbcontrib.plot_utils.heatmap(data_df, row_ordering=None, col_ordering=None, **kwargs)
    Plot heatmaps, optionally clustered.
bdbcontrib.recipes.quick_explore_vars(varnames, plotfile=None, nsimilar=20)
    Show dependence probabilities and neighborhoods based on those.

We will get more detailed help on each plotting utility as we introduce it.

Pairplots — Exploring two variables at a time

The methods pairplot and pairplot_vars are intended to plot all pairs within a group of variables. The plots are arranged as a lower-triangular matrix of plots.

Along the diagonal, there are histograms with the values of the given variable along the x axis, and the counts of occurrences of those values (or bucketed ranges of those values) on the y axis.

The rest of the lower triangle plots the row variable on the y axis against the column variable on the x axis.

Different kinds of plots are used for categorical vs. numeric values.

The fuller documentation:



In [9]:

    
help(satellites.pairplot)









    



Help on method bdbcontrib.plot_utils.pairplot in bdbcontrib.plot_utils:

bdbcontrib.plot_utils.pairplot(self, *args, **kwargs) method of bdbcontrib.population.Population instance
    wrapped as population.pairplot(df, generator_name=None, show_contour=False, colorby=None, show_missing=False, show_full=False, pad=None, h_pad=None, **kwargs)
    
    Plot array of plots for all pairs of columns.
    
        Plots continuous-continuous pairs as scatter (optional KDE contour).
        Plots continuous-categorical pairs as violinplot.
        Plots categorical-categorical pairs as heatmap.
    
        Parameters
        ----------
    
        df : a pandas.DataFrame or BQL query.
    
        show_contour : bool, optional
            If True, KDE contours are plotted on top of scatter plots
            and histograms.
        show_missing : bool, optional
            If True, rows with one missing value are plotted as lines on scatter
            plots.
        colorby : str, optional
            Name of a column to use to color data points in histograms and scatter
            plots.
        show_full : bool, optional
            Show full pairwise plots, rather than only lower triangular plots.
        pad : number, optional
            Adjust the vertical padding between plot components.
        h_pad : number, optional
            Adjust the horizontal padding between plot components.
        **kwargs : dict, optional
            Options to pass through to underlying plotter for pairs.
    
        Returns
        -------
        figure : matplotlib.figure.Figure

Pairplots: `pairplot_vars`

pairplot_vars is a shortcut to help you just name the variables you want to see, rather than writing the BQL to select those variables. As we will see, you may often start with pairplot_vars, and decide to refine your query in BQL to focus on particular areas of interest:



In [10]:

    
satellites.pairplot_vars(['purpose', 'power_watts', 'launch_mass_kg'], 
                         colorby='class_of_orbit', show_contour=False);









    



/Users/probcomp/GoogleDrive/ProbComp/venv-2.7.11-0.1.4/lib/python2.7/site-packages/numpy/lib/function_base.py:564: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  n = np.zeros(bins, ntype)
/Users/probcomp/GoogleDrive/ProbComp/venv-2.7.11-0.1.4/lib/python2.7/site-packages/numpy/lib/function_base.py:600: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  n += np.bincount(indices, weights=tmp_w, minlength=bins).astype(ntype)
/Users/probcomp/GoogleDrive/ProbComp/venv-2.7.11-0.1.4/lib/python2.7/site-packages/matplotlib/collections.py:590: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  if self._edgecolors == str('face'):

Pairplots: with SQL `WHERE`

The purposes are hard to read, and we may not be interested in all of them. Say we're interested only in meteorology satellites of one variety or another. It's easy to restrict to just those if you use pairplot instead of pairplot_vars, and use a bit of extra BQL:



In [11]:

    
satellites.pairplot("""SELECT purpose, power_watts, launch_mass_kg, class_of_orbit 
                       FROM satellites 
                       WHERE purpose LIKE '%Meteorology%';""", 
                    colorby='class_of_orbit', show_contour=False);









    




BQL [SELECT purpose, power_watts, launch_mass_kg, class_of_orbit 
                       FROM satellites 
                       WHERE purpose LIKE '%Meteorology%';] ()

We might learn that meteorology satellites in geosynchronous orbit use about as much or more power than meteorology satellites in low-earth orbit (see power_watts row of plots), but that they use a little less power at a given mass (see scatter of launch mass vs. power_watts), and that there are no meteorology satellites in medium earth orbit or in elliptical orbits (class_of_orbit color legend box).

An expert might be able to help us interpret these observations, e.g. why certain orbits are preferred for meteorology, what the driving considerations are for power consumption and launch mass, etc., but pairplots are a powerful tool for visually finding questions to ask.

Pairplots: `show_contour`

Why did we choose not to show contours? Let's try:



In [12]:

    
satellites.pairplot("""SELECT purpose, power_watts, launch_mass_kg, class_of_orbit 
                       FROM satellites 
                       WHERE purpose LIKE '%Meteorology%';""", 
                    colorby='class_of_orbit', show_contour=True);









    




BQL [SELECT purpose, power_watts, launch_mass_kg, class_of_orbit 
                       FROM satellites 
                       WHERE purpose LIKE '%Meteorology%';] ()






    



/Users/probcomp/GoogleDrive/ProbComp/venv-2.7.11-0.1.4/lib/python2.7/site-packages/matplotlib/collections.py:650: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  if self._edgecolors_original != str('face'):

So when we show them, the way the underlying plotting utility works, we see suggestions of negative wattages and masses!

The contours in the power vs. mass plot also obscure the small number of data points, lending a false sense of meaning.

When there are enough data points, it can be useful to plot kernel density estimators (contours) on each plot, to see tendencies overlaid above the data points, so long as one keeps the above shortcomings in mind:



In [13]:

    
satellites.pairplot("""SELECT power_watts, launch_mass_kg 
                       FROM satellites""", 
                    show_contour=True);









    




BQL [SELECT power_watts, launch_mass_kg 
                       FROM satellites] ()

Pairplots: two categoricals

Where two variables are both categorical, we show a 2d histogram (a heatmap).

Also, we can turn off the one-variable histograms along the diagonal:



In [14]:

    
satellites.pairplot_vars(['purpose', 'class_of_orbit']);

Pairplots: with SQL `HAVING`

We can use the usual SQL constructs to restrict our plot. For example, in this plot of users vs. countries, restrict to those purposes that have at least five satellites:



In [15]:

    
satellites.pairplot("""SELECT purpose, class_of_orbit FROM %t
                       GROUP BY purpose
                       HAVING COUNT(purpose) >= 5;""");









    




BQL [SELECT purpose, class_of_orbit FROM "satellites"
                       GROUP BY purpose
                       HAVING COUNT(purpose) >= 5;] ()

Pairplots: with `show_missing` and `NULL` values.



In [16]:

    
satellites.q('''SELECT apogee_km FROM %t WHERE period_minutes is NULL;''')









    




BQL [SELECT apogee_km FROM "satellites" WHERE period_minutes is NULL;] ()






    Out[16]:






  
    
      
      Apogee_km
    
  
  
    
      0
      35500
    
    
      1
      NaN

When we pairplot these, normally that data point would simply be missing, but with show_missing, there is a line indicating that period_minutes could be anything at an apogee around 35k.



In [17]:

    
satellites.pairplot_vars(['period_minutes', 'apogee_km'], show_missing=True);

Pairplots: with SQL arithmetic

The values are large enough to be hard to read, but of course we can resolve that in the query:



In [18]:

    
satellites.pairplot("""
SELECT period_minutes / 60.0 as period_hours, 
       apogee_km / 1000.0 as apogee_x1000km FROM %t""", 
                    show_missing=True, show_contour=False);









    




BQL [
SELECT period_minutes / 60.0 as period_hours, 
       apogee_km / 1000.0 as apogee_x1000km FROM "satellites"] ()

Other Plot Types

Barplot



In [19]:

    
help(satellites.barplot)









    



Help on method bdbcontrib.plot_utils.barplot in bdbcontrib.plot_utils:

bdbcontrib.plot_utils.barplot(self, *args, **kwargs) method of bdbcontrib.population.Population instance
    wrapped as population.barplot(df)
    
    Make bar-plot from categories and their heights.
    
        First column specifies names; second column specifies heights.
    
        Parameters
        ----------
    
        df : a pandas.DataFrame or BQL query.
    
        Returns
        ----------
        figure: matplotlib.figure.Figure



In [20]:

    
satellites.barplot("""
SELECT class_of_orbit, count(*) AS class_count FROM satellites 
GROUP BY class_of_orbit
ORDER BY class_count DESC
""");









    




BQL [
SELECT class_of_orbit, count(*) AS class_count FROM satellites 
GROUP BY class_of_orbit
ORDER BY class_count DESC
] ()

Let's add the type of orbit too:



In [21]:

    
satellites.barplot("""
SELECT class_of_orbit || "--" || type_of_orbit as class_type, 
       count(*) AS class_type_count
FROM satellites 
GROUP BY class_type
ORDER BY class_type_count DESC
""");









    




BQL [
SELECT class_of_orbit || "--" || type_of_orbit as class_type, 
       count(*) AS class_type_count
FROM satellites 
GROUP BY class_type
ORDER BY class_type_count DESC
] ()

One can even do a bit of computation here, in this case computing and plotting the average power_watts, rather than the merely the count:



In [22]:

    
satellites.barplot("""
SELECT class_of_orbit || "--" || type_of_orbit as class_type, 
       sum(power_watts)/count(*) AS average_power
FROM satellites
GROUP BY class_type
ORDER BY average_power DESC
""");









    




BQL [
SELECT class_of_orbit || "--" || type_of_orbit as class_type, 
       sum(power_watts)/count(*) AS average_power
FROM satellites
GROUP BY class_type
ORDER BY average_power DESC
] ()

Histogram



In [23]:

    
help(satellites.histogram)









    



Help on method bdbcontrib.plot_utils.histogram in bdbcontrib.plot_utils:

bdbcontrib.plot_utils.histogram(self, *args, **kwargs) method of bdbcontrib.population.Population instance
    wrapped as population.histogram(df, nbins=15, bins=None, normed=None)
    
    Plot histogram of one- or two-column table.
    
        If two-column, subdivide the first column according to labels in
        the second column
    
        Parameters
        ----------
    
        df : a pandas.DataFrame or BQL query.
        nbins : int, optional
            Number of bins in the histogram.
        normed : bool, optional
            If True, normalizes the the area of the histogram (or each
            sub-histogram if df has two columns) to 1.
    
        Returns
        ----------
        figure: matplotlib.figure.Figure



In [24]:

    
satellites.histogram("""SELECT dry_mass_kg FROM %t""", nbins=35);









    




BQL [SELECT dry_mass_kg FROM "satellites"] ()

We can break down that silhouette according to a categorical column that comes second.

We can also show percentages rather than absolute counts using normed.



In [25]:

    
satellites.histogram("""
SELECT dry_mass_kg, class_of_orbit FROM satellites 
WHERE dry_mass_kg < 5000
""", nbins=15, normed=True);









    




BQL [
SELECT dry_mass_kg, class_of_orbit FROM satellites 
WHERE dry_mass_kg < 5000
] ()

Heatmap (a.k.a. 2d histogram)



In [26]:

    
help(satellites.heatmap)









    



Help on method bdbcontrib.plot_utils.heatmap in bdbcontrib.plot_utils:

bdbcontrib.plot_utils.heatmap(self, *args, **kwargs) method of bdbcontrib.population.Population instance
    wrapped as population.heatmap(data_df, row_ordering=None, col_ordering=None, **kwargs)
    
    Plot heatmaps, optionally clustered.
    
        Parameters
        ----------
        deps : a pandas.DataFrame or BQL query.
            Must have two categorical columns and a numeric column.
            The format is assumed to be that the numeric column of values is last,
            and the two categorical columns are immediately before that.
    
            The canonical example of that kind of data is the result of an
            ESTIMATE ... PAIRWISE query, estimating dependence probability,
            mutual information, correlation, etc.
    
            If your columns are not at the end that way, then pass pivot_kws too.
    
        row_ordering, col_ordering : list<int>
            Specify the order of labels on the x and y axis of the heatmap.
            If these are specified, we will not call try to cluster a different way,
            and return a plain heatmap.
    
            To access the row and column indices from a clustermap object, use:
            clustermap.dendrogram_row.reordered_ind (for rows)
            clustermap.dendrogram_col.reordered_ind (for cols)
    
        **kwargs :
            to pass to seaborn.clustermap. See seaborn documentation. Of particular
            importance is the `pivot_kws` kwarg. `pivot_kws` is a dict with entries
            index, column, and values that let clustermap know how to reshape the
            data. If the query does not follow the standard ESTIMATE PAIRWISE
            output, it may be necessary to define `pivot_kws`.
            Other keywords here include vmin and vmax, linewidths, figsize, etc.
    
        Returns
        -------
        clustermap: seaborn.clustermap



In [27]:

    
satellites.heatmap("""
SELECT users, country_of_operator, COUNT(country_of_operator) as country_count FROM %t
GROUP BY country_of_operator
HAVING COUNT(country_of_operator) >= 5;
""")









    




BQL [
SELECT users, country_of_operator, COUNT(country_of_operator) as country_count FROM "satellites"
GROUP BY country_of_operator
HAVING COUNT(country_of_operator) >= 5;
] ()






    



Detected value limits as [5.0, 486.0]. Override with vmin and vmax.






    Out[27]:





<seaborn.matrix.ClusterGrid at 0x10e102c10>

Figsize

But that's a bit too small to read. For most of these plot functions, you can specify a figure size as a tuple (width-in-inches, height-in-inches):



In [28]:

    
satellites.heatmap("""
SELECT users, country_of_operator, COUNT(country_of_operator) as country_count FROM %t
GROUP BY country_of_operator
HAVING COUNT(country_of_operator) >= 5;""", 
                  figsize=(12, 10))









    




BQL [
SELECT users, country_of_operator, COUNT(country_of_operator) as country_count FROM "satellites"
GROUP BY country_of_operator
HAVING COUNT(country_of_operator) >= 5;] ()






    



Detected value limits as [5.0, 486.0]. Override with vmin and vmax.






    Out[28]:





<seaborn.matrix.ClusterGrid at 0x10f14c710>



In [ ]:

Licensed under Apache 2.0 (edit cell for details).



In [ ]:

	0
Name	International Space Station (ISS [first elemen...
Country_of_Operator	Multinational
Operator_Owner	NASA/Multinational
Users	Government
Purpose	Scientific Research
Class_of_Orbit	LEO
Type_of_Orbit	Intermediate
Perigee_km	401
Apogee_km	422
Eccentricity	0.00155
Period_minutes	92.8
Launch_Mass_kg	NaN
Dry_Mass_kg	NaN
Power_watts	NaN
Date_of_Launch	36119
Anticipated_Lifetime	30
Contractor	Boeing Satellite Systems (prime)/Multinational
Country_of_Contractor	Multinational
Launch_Site	Baikonur Cosmodrome
Launch_Vehicle	Proton
Source_Used_for_Orbital_Data	www.satellitedebris.net 12/12
longitude_radians_of_geo	NaN
Inclination_radians	0.90059

	0	1	2	3	4	5	6	7	8	9	...	22	23	24	25	26	27	28	29	30	31
Name	Navstar GPS II-10 (Navstar SVN 23, PRN 32, US...	Navstar GPS II-14 (Navstar SVN 26, PRN 26, USA...	Navstar GPS II-21 (Navstar SVN 39, PRN 09, USA...	Navstar GPS II-23 (Navstar SVN 34, PRN 04, USA...	Navstar GPS II-24 (Navstar SVN 36, PRN 06, USA...	Navstar GPS II-25 (Navstar SVN 33, PRN 03, USA...	Navstar GPS II-26 (Navstar SVN 40, PRN 10, USA...	Navstar GPS II-28 (Navstar SVN 38, PRN 08, USA...	Navstar GPS II-35 (Navstar SVN 35, PRN 30, USA...	Navstar GPS IIF-1 (Navstar SVN 62, PRN 25, USA...	...	Navstar GPS IIR-7 (Navstar SVN 54, PRN 18, USA...	Navstar GPS IIR-8 (Navstar SVN 56, PRN 16, USA...	Navstar GPS IIR-9 (Navstar SVN 45, PRN 21, USA...	Navstar GPS IIR-M-1 (Navstar SVN 53, PRN 17, U...	Navstar GPS IIR-M-2 (Navstar SVN 52, PRN 31, U...	Navstar GPS IIR-M-3 (Navstar SVN 58, PRN 12, U...	Navstar GPS IIR-M-4 (Navstar SVN 55, PRN 15, U...	Navstar GPS IIR-M-5 (Navstar SVN 57, PRN 29, U...	Navstar GPS IIR-M-6 (Navstar SVN 48, PRN 07, U...	Navstar GPS IIR-M-8 (Navstar SVN 50, PRN 05, U...
Country_of_Operator	USA	USA	USA	USA	USA	USA	USA	USA	USA	USA	...	USA	USA	USA	USA	USA	USA	USA	USA	USA	USA
Operator_Owner	DoD/US Air Force	DoD/US Air Force	DoD/US Air Force	DoD/US Air Force	DoD/US Air Force	DoD/US Air Force	DoD/US Air Force	DoD/US Air Force	DoD/US Air Force	DoD/US Air Force	...	DoD/US Air Force	DoD/US Air Force	DoD/US Air Force	DoD/US Air Force	DoD/US Air Force	DoD/US Air Force	DoD/US Air Force	DoD/US Air Force	DoD/US Air Force	DoD/US Air Force
Users	Military/Commercial	Military/Commercial	Military/Commercial	Military/Commercial	Military/Commercial	Military/Commercial	Military/Commercial	Military/Commercial	Military/Commercial	Military/Commercial	...	Military/Commercial	Military/Commercial	Military/Commercial	Military/Commercial	Military/Commercial	Military/Commercial	Military/Commercial	Military/Commercial	Military/Commercial	Military/Commercial
Purpose	Navigation/Global Positioning	Navigation/Global Positioning	Navigation/Global Positioning	Navigation/Global Positioning	Navigation/Global Positioning	Navigation/Global Positioning	Navigation/Global Positioning	Navigation/Global Positioning	Navigation/Global Positioning	Navigation/Global Positioning	...	Navigation/Global Positioning	Navigation/Global Positioning	Navigation/Global Positioning	Navigation/Global Positioning	Navigation/Global Positioning	Navigation/Global Positioning	Navigation/Global Positioning	Navigation/Global Positioning	Navigation/Global Positioning	Navigation/Global Positioning
Class_of_Orbit	MEO	MEO	MEO	MEO	MEO	MEO	MEO	MEO	MEO	MEO	...	MEO	MEO	MEO	MEO	MEO	MEO	MEO	MEO	MEO	MEO
Type_of_Orbit	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	...	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A
Perigee_km	19781	19959	20120	20104	19986	20080	20134	19912	20109	20188	...	20104	20155	20063	20142	20020	20206	20149	20150	20135	20160
Apogee_km	20582	20403	20244	20260	20315	20284	20227	20449	20257	20224	...	20266	20344	20433	20221	20342	20366	20213	20311	20152	20209
Eccentricity	0.01508	0.00836	0.00234	0.00294	0.0062	0.00384	0.00175	0.01011	0.00279	0.00068	...	0.00305	0.00355	0.00695	0.00149	0.00606	0.003	0.00121	0.00303	0.00032	0.00092
Period_minutes	717.94	717.93	717.97	717.97	716.69	717.97	717.91	717.91	718.01	718.94	...	718.09	720.71	720.65	717.95	717.93	722.19	717.93	719.92	716.4	718.07
Launch_Mass_kg	1816	1816	1816	1816	1816	1816	1816	1816	1816	1630	...	2217	2217	2217	2217	2060	2060	2217	2060	2217	2059
Dry_Mass_kg	844	844	844	844	844	844	844	844	844	NaN	...	980	980	980	980	NaN	NaN	980	NaN	980	980
Power_watts	700	700	700	700	700	700	700	700	700	NaN	...	1136	1136	1136	1136	NaN	NaN	1136	NaN	1136	1136
Date_of_Launch	33203	33792	34146	34268	34403	35152	35262	35740	34211	40326	...	36921	37650	37711	38621	38985	39038	39372	39436	39522	40042
Anticipated_Lifetime	7.5	7.5	7.5	7.5	7.5	7.5	7.5	7.5	7.5	12	...	10	10	10	10	10	10	10	10	10	10
Contractor	Rockwell International	Rockwell International	Rockwell International	Rockwell International	Rockwell International	Rockwell International	Rockwell International	Rockwell International	Rockwell International	Boeing Satellite Systems	...	Lockheed Martin	Lockheed Martin	Lockheed Martin	Lockheed Martin	Lockheed Martin Missiles and Space	Lockheed Martin Missiles and Space	Lockheed Martin Missiles and Space	Lockheed Martin Missiles and Space	Lockheed Martin	Lockheed Martin Missiles and Space
Country_of_Contractor	USA	USA	USA	USA	USA	USA	USA	USA	USA	USA	...	USA	USA	USA	USA	USA	USA	USA	USA	USA	USA
Launch_Site	Cape Canaveral	Cape Canaveral	Cape Canaveral	Cape Canaveral	Cape Canaveral	Cape Canaveral	Cape Canaveral	Cape Canaveral	Cape Canaveral	Cape Canaveral	...	Cape Canaveral	Cape Canaveral	Cape Canaveral	Cape Canaveral	Cape Canaveral	Cape Canaveral	Cape Canaveral	Cape Canaveral	Cape Canaveral	Cape Canaveral
Launch_Vehicle	Delta 2	Delta 2	Delta 2	Delta 2	Delta 2	Delta 2	Delta 2	Delta 2	Delta 2	Delta 4	...	Delta 2	Delta 2	Delta 2	Delta 2	Delta 2	Delta 2	Delta 2	Delta 2	Delta 2	Delta 2
Source_Used_for_Orbital_Data	None	JMSatcat304	JMSatcat304	JMSatcat304	JMSatcat304	JMSatcat304	JMSatcat304	JMSatcat304	JMSatcat304	SC - ASCR	...	JMSatcat304	JMSatcat304	JMSatcat304	SC - ASCR	SC - ASCR	SC - ASCR	None	SC - ASCR	SC - ASCR	SC - ASCR
longitude_radians_of_geo	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
Inclination_radians	0.973545	0.959931	0.954695	0.959931	0.958186	0.954695	0.959931	0.958186	0.95644	0.959931	...	0.959931	0.959931	0.958186	0.961327	0.958884	0.96028	0.95644	0.959233	0.961676	0.962025

	Name	Dry_Mass_kg	Period_minutes	Class_of_Orbit
0	Advanced Orion 5 (NRO L-32, USA 223)	NaN	NaN	GEO
1	IGS-8B (Information Gathering Satellite 8B, IG...	NaN	NaN	LEO
2	Interstellar Boundary EXplorer (IBEX)	NaN	0.22	Elliptical
3	Spektr-R/RadioAstron	NaN	0.22	Elliptical
4	SDS III-6 (Satellite Data System) NRO L-27, Gr...	NaN	14.36	GEO
5	Advanced Orion 6 (NRO L-15, USA 237)	NaN	23.94	GEO
6	SDS III-7 (Satellite Data System) NRO L-38, Dr...	NaN	23.94	GEO
7	RISat-2 (Radar Imaging Satellite 2)	NaN	41.20	LEO
8	Kuaizhou-1 (KZ-1)	NaN	90.61	LEO
9	X37-B OTV-1 (USA 240)	NaN	91.54	LEO