[All of my slides from Tuesday morning]
... that is all
As a point of review, on Tuesday we learned about ParaView. I'd summarize the major strength of ParaView as providing an interface to create really nice 3D representations of data (and we barely scratched the surface of the most complex renderings that you can create).
On Wednesday, we learned about bokeh
. I would summarize the major strengths of bokeh
as being the ability to create linked plots, as well as the relative ease of getting the output from bokeh into a server and on the web.
Today we are going to learn about glue
, which is a pure python library that designed to explore the relationships between related datasets. glue
is actually developed by astronomers (in collaboration with medical imaging researchers), so a lot of the functionality is designed with our needs in mind.
(though note - it is created as a general purpose tool. But, if there is something that you'd like to see in glue
that does not exist, then you can reach out and maybe they will develop it)
glue
includes elements that we have already explored this week. In particular, glue
, due to the medical imaging connection, provides nice functionality for visualizing 3D data sets. Additionally, given the large collection of heterogenous catalogs in astronomy, glue
is designed to make linking between data sets very straightforward.
You should have already installed glue
, but if not
conda install -c glue glue
Furthermore, our first example will use the data included in this tarball: https://northwestern.box.com/s/uiwq47ir8r4h6njlxv6njtx174wdeoox
Problem 1a
Open glue
. The standard way to do this is to launch the program from the command line: glue
.
At this stage you will notice 4 primary windows within the application:
Problem 1b
Open the w5 fits image in glue
.
As a quick note - this image is from the WISE satellite and it is showing the Westerhout 5 (W5) star forming region.
Hint - you can drag and drop, or select the file path in [File $\rightarrow$ Open Data Set.]
Problem 1c
Render the image, by dragging the w5 entry in the data collection window to the visualization canvas.
The will pop up a drop down menu asking what type of render you would like. Select the best option.
Hint - you may want to resize the window within the visualization canvas.
As previously noted, one of the great strengths of glue
is the ability to drill down on subsets of linked data.
At the top of the 2D image window there are 5 different methods for selecting subsets of the data. From left to right they include: rectangular selection, vertical selection, horizontal selection, circular selection, and finally freeform selection [this has similar functionality to bokeh
's lasso.].
Problem 1d
Use the horizontal selection tool to select the subset of the data near the center of the image (this is done via drag and click).
Then, use the vertical selection tool to select the subset of the data near the center of the image.
Notice that there are now 2 subsets in the data collection panel, as well as additional entries in the view layers panels.
Problem 1e
Adjust the color of subset 1 to be "DSFP blue" and adjust the transparency bar to its maximum (i.e. minimize the transparency of the selection.
Adjust the color of subset 2 to be "DSFP light grey" and make this selection more transparent.
At this point, it is a little difficult to see the emission under the selected subsets.
Problem 1f
Select the w5 data in the data collection, and adjust the data scaling to match the optimal choice for astronomical images.
Hint - think back to Zolt's lecture.
[You may want to adjust the colorbar and range of the data being displayed. Be sure the subset panels can still be seen after making these changes.]
There is a bright knot of emission in the northwest portion of the nebula, we will now focus on that.
Problem 1g
Adjust the subset selections to be centered on the bright emission knot in the northwest portion of the nebula. This can be done by selecting the subset in the data collection and then drag and click to redefine the subset.
Problem 1h
Create a histogram of the brightness data in the fits image [drag the w5 data from the data collection into the visualization canvas and select the appropriate option from the drop down menu].
Notice that you now have histograms in 3 different colors. This is because the data linking in glue
is (in some cases) automatic. By creating the histrogram for the data, you have also automatically created a histogram for the two subsets of the data.
You will also notice that the histogram, as currently constructed, is not particularly informative.
Problem 1i
Update the range of the histogram to extend to a maximum value of 1000. Increase the number of bins to 25. Finally, normalize the histogram.
Does the resulting histogram make sense?
The current histograms are strongly polluted by background pixels. We can improve this with the selection tools.
Problem 1j
Select the pixels in the bright knot by changing the selection mode to "remove" (5th option). Then select the horizontal selection tool in the histogram plot. Drag and click to select the region with pixel values less than 500 to remove those from the selection.
How do the histograms look now? Does the resulting image layers/histrogram make sense?
Note - don't forget to return to the default selection mode after you have done this.
So far we have only employed automatic linking via data subsets. This has some utility (for instance, I could imagine teaching non-experts about source detection using the steps we just covered regarding the removal of faint pixels), but the real power of glue
is in linking heterogeneous data sets.
Problem 2a
Open the second data file from the tarball w5_psc.vot
.
Aside - this VO table file includes sources in the W5 region that were detected by the Spitzer space telescope. One reason for comparing WISE to Spitzer is that WISE covers the entire sky, while Spitzer offers higher resolution and greater depth, so it has more complete catalogs in the areas that it has observed.
Given that the catalog and image are heterogeneous, linking will not be automatic (as it was for the subsets created in problem 1).
Problem 2b
Link the data sets by selecting the Link Data option in the top of the menu bar.
Select an appropriate component from the image and catalog data, and then link those components by clicking on the glue button.
Get it? Link the things by "glueing" them together, using glue
.
.
.
.
Get it?
No seriously,
Do you get it?
Be sure that you glue both of the relevant variables that connect these two data sets.
Hold on, now it's about to get real.
With the catalog and image now linked, subsets selected in either space (e.g., the bright emission knot selected in Problem 1) will automatically be reflected in the other space.
Problem 2c
Create a scatter plot of the the catalog data by dragging w5_psc.vot
into the visualization canvas.
For the scatter plot show the [4.5] - [5.8] vs. [3.6] color magnitude diagram.
Problem 2d
Remove the previously created subsets. In the 2D image, choose the circular selection tool and highlight a small region centered on the bright know in the northwest portion of the nebula.
What do you notice when you make this selection?
Problem 2e
Show the individual Spitzer point sources on the image by selecting the subset in the data collection and dragging it onto the 2D image.
Look at the overlap of the sources relative to the bright knot - does this make sense?
Problem 2f
Adjust the plot of the subset of points to provide a linear colormap for the data.
Color the points by their [3.6] magnitude? Does the image make sense?
What about the reverse? Can we select interesting sources in CMD space and highlight their spatial positions in the cluster?
This could be useful, for example, to identify the location of the youngest stars within the W5 star-forming region.
Problem 2g
Select the Spitzer point source catalog in the data collection. Then, using the rectangular selection tool in the CMD, choose all the red sources with [4.5] - [5.8] > 1 mag.
What can you say about the positions of the red sources relative to the 12 micron emission?
Fortunately, glue
provides multiple different ways to save your work.
You can either save your entire session, save specific plots from your session, or save subsets created via the various selection tools from your session.
You should have already unpacked a tarball with 5 fits images: https://northwestern.box.com/s/hmitigmvcfi2tuzlgt1psatebkyrk0e3
Problem 4a
Open each of the 5 fits files (named g, r, i, z, y) in glue.
Note - as you open each image after the first you will be prompted to "merge" the data. Select no on that option for now.
Problem 4b
Create a 2D image of the g-band data.
Problem 4c
Drag and drop the data from each of the other filters on to the g band image.
Problem 4d
Change the color option from colorbar to "one color per channel". Then select 3 layers for the 2D image, assigning RGB to one of each of the layers.
Problem 4e
Adjust the scalings (and colors if necessary) to create a nice false color image of the galaxy.
3D viewing in glue
is relatively new, and as such it does not provide the full range of functionality that is eventually expected within the package.
Read the docs for caveats regarding the use of the 3D viewer.
There is a very very good chance that you may have a non-working version of glue
on your machine if you have not updated your anaconda
software since session 4. At this point, please proceed carefully to make sure you have the correct install for 3D rendering in glue
.
As a first test, please try:
conda list glue
If that returns something like this:
# Name Version Build Channel
glue-core 0.13.2 py36_0 glueviz
glue-vispy-viewers 0.10 py36_1 glueviz
glueviz 0.13.2 0 glueviz
Namely, glue-core
and glueviz
versions 0.13.x AND glue-vispy-viewers
version 0.10 –– then you are safe and ready to proceed.
Alternatively, if you have something like this:
# Name Version Build Channel
glue-core 0.12.5 py36_0 glueviz
glue-vispy-viewers 0.10 py36_1 glueviz
glueviz 0.13.0 0 glueviz
Or, any combination of glue-core
or glueviz
<= 0.12.x AND glue-vispy-viewers
version 0.10 –– then 3D viewing is likely not going to be supported in your installation.
The easiest way to address right now is to roll back your glueviz
packages:
conda install -c glueviz glueviz=0.12.4
conda install -c glueviz glueviz=0.12.4
conda install -c glueviz glue-vispy-viewers=0.9
If you are unsure about any of this, please raise your hand and I'll stop by to make sure everything is set up correctly.
As an example of a 3D scatter plot in glue
, we will create a fits table using the training data from the feature engineering then render the data.
Problem 5a
Create astropy.io.fits
columns for each of the 3 data arrays.
Hint - fits.Column
takes name
, format
, and array
as optional arguments. For format
"D" = double precision, and "J" = integer. You'll want to pass np.array
s to the array
argument.
In [ ]:
import pandas as pd
from astropy.io import fits
import numpy as np
train_df = pd.read_csv("training_sources.csv")
col1 = fits.Column( # complete
col2 = # complete
col3 = # complete
Problem 5b
Merge the columns into a fits
hdu object.
In [ ]:
hdu = fits.BinTableHDU.from_columns( # complete
Problem 5c
Write the hdu object to a fits file.
In [ ]:
hdu.writeto( # complete
Problem 5d
Open the fits file in glue
.
Drag the file to the canvas, and select 3D scatter plot.
Zoom, rotate, adjust the rendering to get a sense of how this 3d scatter plot compares to ParaView.
Problem 5e
Adjust the size of the points and color the data via the value of the mean. Choose a colorbar that makes it easier to see the variation in the data.
Problem 5f
Identify the predominant axis in the data. Adjust its stretch value to 10.
Change the limits on duration to extend from 2000 to 2500.
Do these changes help or harm your visualization?
Problem 5g
Create a new fits table including more informative features from Tuesday's lecture, as well as the class information for each source.
Open the fits table in glue
and create a 3D scatterplot highlighting 3 very useful features with the points colored by the classification of each star.
Problem 5h
Use the circle selection tool to create a subset of the data. Generate a histogram of that subset showing a variable that is not displayed in the 3D render.
You should have already downloaded the astropy fits cube for L1448, which includes $^{13}$CO data.
Problem 6a
Open the l1448_13co.fits
file in glue.
Adjust the scaling parameters and rotate the cube to get a sense of the data.
Problem 6b
Create a 1D histogram of the same data. Use the horizontal select tool to select the brightest pixels with PRIMARY
> 1.5.
How does this change the appearance of the data cube?
Problem 6c
Click the start/stop rotation button.
Preeeeeeeeeeeeeeety.
Problem 6d
Record a movie as you move around the cube. Press the camera button, choose a location to save the movie. At that point you will see a red button which means you are recording –– select this button to stop the recording.
The movie file that this generates is an animated gif. Open it in a browser to see the output.
Note - I could not get the record and rotation button (6c) to work simultaneously.
Finally - unfortunately, I do not have the table data to make a link here, but if you did, you could then select information within the data cube and highlight it in the table, in much the same way we did in Problem 2.
Like the other tools introduced this week glue
supports python scripting.
The easiest way to launch this is to click on the IPython Terminal button at the top of the glue
window.
This launches an IPython
instace, from which you can perform several operations, so read the docs.
Like the other tools introduced this week glue
supports python scripting.
The easiest way to launch this is to click on the IPython Terminal button at the top of the glue
window.
This launches an IPython
instace, from which you can perform several operations, so read the docs. Here we will re-create some of the operations from Problem 2 using the command line.
If nesssary, reload the W5 data (both fits and image).
Problem 7a
Launch the IPython Terminal in glue
and examine the data collection:
dc
The data collection is effectively a list. We will focus on the catalog data.
Problem 7b
Select the catalog data by creating a variable psc_cat
which is equal to the second list entry in dc
.
Examine the contents of psc_cat
psc_cat.components
You can now manipulate psc_cat
in the same way that you would a dictionary. This also means that np
style indexing can be performed on the data.
Problem 7c
To test this, print out the "Hmag" values from the catalog.
Now that we have the data in python
we can combine multiple attributes within the same data set.
Problem 7d
Create an array that is equal to the [3.6] - [4.5] color. Add this array to the psc_cat
.
Hint - this is handled in exactly the same way as it would be for a ditionary.
Problem 7e
Create a new subset of the data where the [3.6] - [4.5] color is > 1.2.
state = psc_cat.id["__3.6__-__4.5__"] > 1.2
label = "[3.6] - [4.5] > 1.2"
subset_group = dc.new_subset_group(label, state)
You should now see a new subset in the data collection panel. In this way it is possible to create precise selections based on the parameters in the catalog. Or to create new variables within the catalog.
There is more functionality for adjusting the subsets within IPython
, to learn more about those details read the docs