Searching major star catalogs and Simbad by position or name with vo_conesearch

Authors

P. L. Lim

Learning Goals

  • Perform a cone search around M31 using a web service.
  • Write the result out to a LaTeX table.
  • Perform a SIMBAD query using the cone search result.
  • Extract metadata from the cone search catalog.
  • Sort cone search results by angular distance.
  • Search multiple cone search services at once (synchronously and asynchronously).
  • Estimate the run time of a cone search.

Keywords

astroquery, table, coordinates, units, vo_conesearch, LaTex, SIMBAD, matplotlib

Summary

This tutorial desmonstrates the Cone Search subpackage, which allows you to query a catalog of astronomical sources and obtain those that lie within a cone of a given radius around the given position.

Imports


In [ ]:
# Python standard library
import time
import warnings

# Third-party software
import numpy as np

# Astropy
from astropy import coordinates as coord
from astropy import units as u
from astropy.table import Table

# Astroquery. This tutorial requires 0.3.5 or greater.
import astroquery
from astroquery.simbad import Simbad
from astroquery.vo_conesearch import conf, conesearch, vos_catalog

# Set up matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

If you are running an older version of astroquery, you might need to set vos_baseurl yourself, as follows.


In [ ]:
from astropy.utils import minversion

if not minversion(astroquery, '0.3.10'):
    conf.vos_baseurl = 'https://astroconda.org/aux/vo_databases/'

To start, it might be useful to list the available Cone Search catalogs first. By default, catalogs that pass nightly validation are included. Validation is hosted by Space Telescope Science Institute (STScI).


In [ ]:
conesearch.list_catalogs()

Next, let's pick an astronomical object of interest. For example, M31.


In [ ]:
c = coord.SkyCoord.from_name('M31', frame='icrs')
print(c)

By default, a basic Cone Search goes through the list of catalogs and stops at the first one that returns non-empty VO table. Let's search for objects within 0.1 degree around M31. You will see a lot of warnings that were generated by VO table parser but ignored by Cone Search service validator. VO compliance enforced by Cone Search providers is beyond the control of astroquery.vo_conesearch package.

The result is an Astropy table.


In [ ]:
result = conesearch.conesearch(c, 0.1 * u.degree)

In [ ]:
print('First non-empty table returned by', result.url)
print('Number of rows is', len(result))

In [ ]:
print(result)

This table can be manipulated like any other Astropy table; e.g., re-write the table into LaTeX format.


In [ ]:
result.write('my_result.tex', format='ascii.latex', overwrite=True)

You can now use your favorite text editor to open the my_result.tex file, but here, we are going to read it back into another Astropy table.

Note that the extra data_start=4 option is necessary due to the non-roundtripping nature of LaTeX reader/writer (see astropy issue 5205).


In [ ]:
result_tex = Table.read('my_result.tex', format='ascii.latex', data_start=4)
print(result_tex)

Cone Search results can also be used in conjuction with other types of queries. For example, you can query SIMBAD for the first entry in your result above.


In [ ]:
# Due to the unpredictability of external services,
# The first successful query result (above) might differ
# from run to run.
#
# CHANGE THESE VALUES to the appropriate RA and DEC
# column names you see above, if necessary.
# These are for http://gsss.stsci.edu/webservices/vo/ConeSearch.aspx?CAT=GSC23&
ra_colname = 'ra'
dec_colname = 'dec'

In [ ]:
# Don't run this cell if column names above are invalid.
if ra_colname in result.colnames and dec_colname in result.colnames:
    row = result[0]
    simbad_obj = coord.SkyCoord(ra=row[ra_colname]*u.deg, dec=row[dec_colname]*u.deg)
    print('Searching SIMBAD for\n{}\n'.format(simbad_obj))
    simbad_result = Simbad.query_region(simbad_obj, radius=5*u.arcsec)
    print(simbad_result)
else:
    print('{} or {} not in search results. Choose from: {}'.format(
        ra_colname, dec_colname, ' '.join(result.colnames)))

Now back to Cone Search... You can extract metadata of this Cone Search catalog.


In [ ]:
my_db = vos_catalog.get_remote_catalog_db(conf.conesearch_dbname)
my_cat = my_db.get_catalog_by_url(result.url + '&')
print(my_cat.dumps())

If you have a favorite catalog in mind, you can also perform Cone Search only on that catalog. A list of available catalogs can be obtained by calling conesearch.list_catalogs(), as mentioned above.


In [ ]:
result = conesearch.conesearch(
    c, 0.1 * u.degree, catalog_db='The USNO-A2.0 Catalogue (Monet+ 1998) 1')

In [ ]:
print('Number of rows is', len(result))

Let's explore the 3 rows of astronomical objects found within 0.1 degree of M31 in the given catalog and sort them by increasing distance. For this example, the VO table has several columns that might include:

  • _r = Angular distance (in degrees) between object and M31
  • USNO-A2.0 = Catalog ID of the object
  • RAJ2000 = Right ascension of the object (epoch=J2000)
  • DEJ2000 = Declination of the object (epoch=J2000)

Note that column names, meanings, order, etc. might vary from catalog to catalog.


In [ ]:
col_names = result.colnames
print(col_names)

In [ ]:
# Before sort
print(result)

In [ ]:
# After sort
result.sort('_r')
print(result)

You can also convert the distance to arcseconds.


In [ ]:
result['_r'].to(u.arcsec)

What if you want all the results from all the catalogs? And you also want to suppress all the VO table warnings and informational messages?

Warning: This can be time and resource intensive.


In [ ]:
with warnings.catch_warnings():
    warnings.simplefilter('ignore')
    all_results = conesearch.search_all(c, 0.1 * u.degree, verbose=False)

In [ ]:
for url, tab in all_results.items():
    print(url, 'returned', len(tab), 'rows')

In [ ]:
# Pick out the first one with "I/220" in it.
i220keys = [k for k in all_results if 'I/220' in k]
my_favorite_result = all_results[i220keys[0]]
print(my_favorite_result)

Asynchronous Searches

Asynchronous versions (i.e., search will run in the background) of conesearch() and search_all() are also available. Result can be obtained using the asynchronous instance's get() method that returns the result upon completion or after a given timeout value in seconds.


In [ ]:
async_search = conesearch.AsyncConeSearch(
    c, 0.1 * u.degree, catalog_db='The USNO-A2.0 Catalogue (Monet+ 1998) 1')
print('Am I running?', async_search.running())

time.sleep(3)
print('After 3 seconds. Am I done?', async_search.done())
print()

result = async_search.get(timeout=30)
print('Number of rows returned is', len(result))

In [ ]:
async_search_all = conesearch.AsyncSearchAll(c, 0.1 * u.degree)
print('Am I running?', async_search_all.running())
print('Am I done?', async_search_all.done())
print()

all_results = async_search_all.get(timeout=30)
for url, tab in all_results.items():
    print(url, 'returned', len(tab), 'rows')

Estimating the Search Time

Let's predict the run time of performing Cone Search on http://gsss.stsci.edu/webservices/vo/ConeSearch.aspx?CAT=GSC23& with a radius of 0.1 degrees. For now, the prediction assumes a very simple linear model, which might or might not reflect the actual trend.

This might take a while.


In [ ]:
with warnings.catch_warnings():
    warnings.simplefilter('ignore')
    t_est, n_est = conesearch.predict_search(
        'http://gsss.stsci.edu/webservices/vo/ConeSearch.aspx?CAT=GSC23&',
        c, 0.1 * u.degree, verbose=False, plot=True)

In [ ]:
print('Predicted run time is', t_est, 'seconds')
print('Predicted number of rows is', n_est)

Let's get the actual run time and number of rows to compare with the prediction above. This might take a while.

As you will see, the prediction is not spot on, but it's not too shabby (at least, not when we tried it!). Note that both predicted and actual run time results also depend on network latency and responsiveness of the service provider.


In [ ]:
t_real, tab = conesearch.conesearch_timer(
    c, 0.1 * u.degree,
    catalog_db='http://gsss.stsci.edu/webservices/vo/ConeSearch.aspx?CAT=GSC23&',
    verbose=False)

In [ ]:
print('Actual run time is', t_real, 'seconds')
print('Actual number of rows is', len(tab))