Introduction to Glue-Viz

version 0.1


By AA Miller (Northwestern CIERA/Adler Planetarium) 03 May 2018

Introduction

[All of my slides from Tuesday morning]

... that is all

Glue

As a point of review, on Tuesday we learned about ParaView. I'd summarize the major strength of ParaView as providing an interface to create really nice 3D representations of data (and we barely scratched the surface of the most complex renderings that you can create).

On Wednesday, we learned about bokeh. I would summarize the major strengths of bokeh as being the ability to create linked plots, as well as the relative ease of getting the output from bokeh into a server and on the web.

Today we are going to learn about glue, which is a pure python library that designed to explore the relationships between related datasets. glue is actually developed by astronomers (in collaboration with medical imaging researchers), so a lot of the functionality is designed with our needs in mind.

(though note - it is created as a general purpose tool. But, if there is something that you'd like to see in glue that does not exist, then you can reach out and maybe they will develop it)

glue includes elements that we have already explored this week. In particular, glue, due to the medical imaging connection, provides nice functionality for visualizing 3D data sets. Additionally, given the large collection of heterogenous catalogs in astronomy, glue is designed to make linking between data sets very straightforward.

You should have already installed glue, but if not

conda install -c glue glue

Furthermore, our first example will use the data included in this tarball: https://northwestern.box.com/s/uiwq47ir8r4h6njlxv6njtx174wdeoox

Problem 1) Using Glue

Problem 1a

Open glue. The standard way to do this is to launch the program from the command line: glue.

At this stage you will notice 4 primary windows within the application:

  • upper left ––– data collection (lists all the open data, as well as the selected subsets)
  • middle left ––– viewer layers (shows the different layers, and allows control over which are displayed)
  • lower left ––– viewer options (includes global options for the active viewer)
  • right ––– visualization canvas (this is where the data renderings are actually shown)

Problem 1b

Open the w5 fits image in glue.

As a quick note - this image is from the WISE satellite and it is showing the Westerhout 5 (W5) star forming region.

Hint - you can drag and drop, or select the file path in [File $\rightarrow$ Open Data Set.]

Problem 1c

Render the image, by dragging the w5 entry in the data collection window to the visualization canvas.

The will pop up a drop down menu asking what type of render you would like. Select the best option.

Hint - you may want to resize the window within the visualization canvas.

As previously noted, one of the great strengths of glue is the ability to drill down on subsets of linked data.

At the top of the 2D image window there are 5 different methods for selecting subsets of the data. From left to right they include: rectangular selection, vertical selection, horizontal selection, circular selection, and finally freeform selection [this has similar functionality to bokeh's lasso.].

Problem 1d

Use the horizontal selection tool to select the subset of the data near the center of the image (this is done via drag and click).

Then, use the vertical selection tool to select the subset of the data near the center of the image.

Notice that there are now 2 subsets in the data collection panel, as well as additional entries in the view layers panels.

Problem 1e

Adjust the color of subset 1 to be "DSFP blue" and adjust the transparency bar to its maximum (i.e. minimize the transparency of the selection.

Adjust the color of subset 2 to be "DSFP light grey" and make this selection more transparent.

At this point, it is a little difficult to see the emission under the selected subsets.

Problem 1f

Select the w5 data in the data collection, and adjust the data scaling to match the optimal choice for astronomical images.

Hint - think back to Zolt's lecture.

[You may want to adjust the colorbar and range of the data being displayed. Be sure the subset panels can still be seen after making these changes.]

There is a bright knot of emission in the northwest portion of the nebula, we will now focus on that.

Problem 1g

Adjust the subset selections to be centered on the bright emission knot in the northwest portion of the nebula. This can be done by selecting the subset in the data collection and then holding cntl while dragging the mouse over a new region to redefine the subset.

Problem 1h

Create a histogram of the brightness data in the fits image [drag the w5 data from the data collection into the visualization canvas and select the appropriate option from the drop down menu].

Notice that you now have histograms in 3 different colors. This is because the data linking in glue is (in some cases) automatic. By creating the histrogram for the data, you have also automatically created a histogram for the two subsets of the data.

You will also notice that the histogram, as currently constructed, is not particularly informative.

Problem 1i

Update the range of the histogram to extend to a maximum value of 1000. Increase the number of bins to 25. Finally, normalize the histogram.

Does the resulting histogram make sense?

The current histograms are strongly polluted by background pixels. We can improve this with the selection tools.

Problem 1j

Select the pixels in the bright knot by changing the selection mode to "remove" (5th option). Then select the horizontal selection tool in the histogram plot. Drag and click to select the region with pixel values less than 500 to remove those from the selection.

How do the histograms look now? Does the resulting image layers/histrogram make sense?

Note - don't forget to return to the default selection mode after you have done this.

Problem 2) Linking Data Sets

So far we have only employed automatic linking via data subsets. This has some utility (for instance, I could imagine teaching non-experts about source detection using the steps we just covered regarding the removal of faint pixels), but the real power of glue is in linking heterogeneous data sets.

Problem 2a

Open the second data file from the tarball w5_psc.vot.

Aside - this VO table file includes sources in the W5 region that were detected by the Spitzer space telescope. One reason for comparing WISE to Spitzer is that WISE covers the entire sky, while Spitzer offers higher resolution and greater depth, so it has more complete catalogs in the areas that it has observed.

Given that the catalog and image are heterogeneous, linking will not be automatic (as it was for the subsets created in problem 1).

Problem 2b

Link the data sets by selecting the Link Data option in the top of the menu bar.

Select an appropriate component from the image and catalog data, and then link those components by clicking on the glue button.

Get it? Link the things by "glueing" them together, using glue.

.

.

.

Get it?

No seriously,

Do you get it?

Be sure that you glue both of the relevant variables that connect these two data sets.

Hold on, now it's about to get real.

With the catalog and image now linked, subsets selected in either space (e.g., the bright emission knot selected in Problem 1) will automatically be reflected in the other space.

Problem 2c

Create a scatter plot of the the catalog data by dragging w5_psc.vot into the visualization canvas.

For the scatter plot show the [4.5] - [5.8] vs. [3.6] color magnitude diagram.

Problem 2d

Remove the previously created subsets. In the 2D image, choose the circular selection tool and highlight a small region centered on the bright know in the northwest portion of the nebula.

What do you notice when you make this selection?

Problem 2e

Show the individual Spitzer point sources on the image by selecting the subset in the data collection and dragging it onto the 2D image.

Look at the overlap of the sources relative to the bright knot - does this make sense?

Problem 2f

Adjust the plot of the subset of points to provide a linear colormap for the data.

Color the points by their [3.6] magnitude? Does the image make sense?

What about the reverse? Can we select interesting sources in CMD space and highlight their spatial positions in the cluster?

This could be useful, for example, to identify the location of the youngest stars within the W5 star-forming region.

Problem 2g

Select the Spitzer point source catalog in the data collection. Then, using the rectangular selection tool in the CMD, choose all the red sources with [4.5] - [5.8] > 1 mag.

What can you say about the positions of the red sources relative to the 12 micron emission?

Problem 3) Reproducibility

Hopefully at this point it is clear that glue can be very powerful in the way that it allows linking across image and catalog data.

However, everything we have done has been in an interactive mode that may be hard to reproduce.

Fortunately, glue provides multiple different ways to save your work.

You can either save your entire session, save specific plots from your session, or save subsets created via the various selection tools from your session.

Problem 4) Easy (?) False Color Images

You should have already unpacked a tarball with 5 fits images: https://northwestern.box.com/s/hmitigmvcfi2tuzlgt1psatebkyrk0e3

Problem 4a

Open each of the 5 fits files (named g, r, i, z, y) in glue.

Note - as you open each image after the first you will be prompted to "merge" the data. Select no on that option for now.

Problem 4b

Create a 2D image of the g-band data.

Problem 4c

Drag and drop the data from each of the other filters on to the g band image.

Problem 4d

Change the color option from colorbar to "one color per channel". Then select 3 layers for the 2D image, assigning RGB to one of each of the layers.

Problem 4e

Adjust the scalings (and colors if necessary) to create a nice false color image of the galaxy.

Problem 5) 3D scatter plots in glue

Warning 1

3D viewing in glue is relatively new, and as such it does not provide the full range of functionality that is eventually expected within the package.

Read the docs for caveats regarding the use of the 3D viewer.

Warning 2

There is a very very good chance that you may have a non-working version of glue on your machine if you have not updated your anaconda software since session 4. At this point, please proceed carefully to make sure you have the correct install for 3D rendering in glue.

As a first test, please try:

conda list glue

If that returns something like this:

# Name                    Version                   Build  Channel
glue-core                 0.13.2                   py36_0    glueviz
glue-vispy-viewers        0.10                     py36_1    glueviz
glueviz                   0.13.2                        0    glueviz

Namely, glue-core and glueviz versions 0.13.x AND glue-vispy-viewers version 0.10 –– then you are safe and ready to proceed.

Alternatively, if you have something like this:

# Name                    Version                   Build  Channel
glue-core                 0.12.5                   py36_0    glueviz
glue-vispy-viewers        0.10                     py36_1    glueviz
glueviz                   0.13.0                        0    glueviz

Or, any combination of glue-core or glueviz <= 0.12.x AND glue-vispy-viewers version 0.10 –– then 3D viewing is likely not going to be supported in your installation.

The easiest way to address right now is to roll back your glueviz packages:

conda install -c glueviz glueviz=0.12.4
conda install -c glueviz glueviz=0.12.4
conda install -c glueviz glue-vispy-viewers=0.9

If you are unsure about any of this, please raise your hand and I'll stop by to make sure everything is set up correctly.

As an example of a 3D scatter plot in glue, we will create a fits table using the training data from the feature engineering then render the data.

Problem 5a

Create astropy.io.fits columns for each of the 3 data arrays.

Hint - fits.Column takes name, format, and array as optional arguments. For format "D" = double precision, and "J" = integer. You'll want to pass np.arrays to the array argument.


In [1]:
import pandas as pd
from astropy.io import fits
import numpy as np

train_df = pd.read_csv("training_sources.csv")

col1 = fits.Column(name="mean", format="D", array=np.array(train_df["mean"]))
col2 = fits.Column(name="nobs", format="J", array=np.array(train_df["nobs"]))
col3 = fits.Column(name="duration", format="J", array=np.array(train_df["duration"]))

Problem 5b

Merge the columns into a fits hdu object.


In [2]:
hdu = fits.BinTableHDU.from_columns([col1, col2, col3])

Problem 5c

Write the hdu object to a fits file.


In [3]:
hdu.writeto("training_set.fits")

Problem 5d

Open the fits file in glue.

Drag the file to the canvas, and select 3D scatter plot.

Zoom, rotate, adjust the rendering to get a sense of how this 3d scatter plot compares to ParaView.

Problem 5e

Adjust the size of the points and color the data via the value of the mean. Choose a colorbar that makes it easier to see the variation in the data.

Problem 5f

Identify the predominant axis in the data. Adjust its stretch value to 10.

Change the limits on duration to extend from 2000 to 2500.

Do these changes help or harm your visualization?

Problem 5g

Create a new fits table including more informative features from Tuesday's lecture, as well as the class information for each source.

Open the fits table in glue and create a 3D scatterplot highlighting 3 very useful features with the points colored by the classification of each star.

Problem 5h

Use the circle selection tool to create a subset of the data. Generate a histogram of that subset showing a variable that is not displayed in the 3D render.

Problem 6) 3D volume renderings

You should have already downloaded the astropy fits cube for L1448, which includes $^{13}$CO data.

Problem 6a

Open the l1448_13co.fits file in glue.

Adjust the scaling parameters and rotate the cube to get a sense of the data.

Problem 6b

Create a 1D histogram of the same data. Use the horizontal select tool to select the brightest pixels with PRIMARY > 1.5.

How does this change the appearance of the data cube?

Problem 6c

Click the start/stop rotation button.

Preeeeeeeeeeeeeeety.

Problem 6d

Record a movie as you move around the cube. Press the camera button, choose a location to save the movie. At that point you will see a red button which means you are recording –– select this button to stop the recording.

The movie file that this generates is an animated gif. Open it in a browser to see the output.

Note - I could not get the record and rotation button (6c) to work simultaneously.

Finally - unfortunately, I do not have the table data to make a link here, but if you did, you could then select information within the data cube and highlight it in the table, in much the same way we did in Problem 2.

Problem 7) Python Scripting

Like the other tools introduced this week glue supports python scripting.

The easiest way to launch this is to click on the IPython Terminal button at the top of the glue window.

This launches an IPython instace, from which you can perform several operations, so read the docs. Here we will re-create some of the operations from Problem 2 using the command line.

If nesssary, reload the W5 data (both fits and image).

Problem 7a

Launch the IPython Terminal in glue and examine the data collection:

dc

The data collection is effectively a list. We will focus on the catalog data.

Problem 7b

Select the catalog data by creating a variable psc_cat which is equal to the second list entry in dc.

Examine the contents of psc_cat

psc_cat.components

You can now manipulate psc_cat in the same way that you would a dictionary. This also means that np style indexing can be performed on the data.

Problem 7c

To test this, print out the "Hmag" values from the catalog.

Now that we have the data in python we can combine multiple attributes within the same data set.

Problem 7d

Create an array that is equal to the [3.6] - [4.5] color. Add this array to the psc_cat.

Hint - this is handled in exactly the same way as it would be for a ditionary.

Problem 7e

Create a new subset of the data where the [3.6] - [4.5] color is > 1.2.

state = psc_cat.id["__3.6__-__4.5__"] > 1.2
label = "[3.6] - [4.5] > 1.2"
subset_group = dc.new_subset_group(label, state)

You should now see a new subset in the data collection panel. In this way it is possible to create precise selections based on the parameters in the catalog. Or to create new variables within the catalog.

There is more functionality for adjusting the subsets within IPython, to learn more about those details read the docs