One Table To Rule Them All: Radio

This notebook generates a table of radio components in the CDFS and ELAIS-S1 fields, according to various incarnations of the ATLAS survey. To run it, you will need a MongoDB server with the RGZ database loaded. All other data is fetched from the internet.

In the following cell, specify the MongoDB server details:


In [2]:
MONGO_HOST = 'localhost'
MONGO_PORT = 27017

In this cell, specify if you have access to a crowdastro output file (crowdastro.h5), and if so, where it is:


In [3]:
USING_CROWDASTRO = True
CROWDASTRO_PATH = 'crowdastro-swire.h5'
# To get this file, run `crowdastro import_data --ir swire`.

In this cell, specify if you have access to a CSV of the Fan et al. (2015) cross-identifications, and if so, where it is:


In [4]:
USING_FAN = True
FAN_PATH = 'fan_2015_a.csv'

In this cell, specify if you have access to the 11 January 2014 Franzen catalogue, and if so, where it is:


In [5]:
USING_FRANZEN = True
FRANZEN_PATH = 'ATLASDR3_CDFS_cmpcat_11JAN2014.dat'

Next, we will fetch the resources we need.


In [6]:
NORRIS_COMPONENTS_URI = 'http://www.atnf.csiro.au/people/rnorris/papers/n202/tab4.txt'
NORRIS_CROSS_IDENTIFICATIONS_URI = 'http://www.atnf.csiro.au/people/rnorris/papers/n202/tab6.txt'
MIDDELBERG_COMPONENTS_URI = 'http://iopscience.iop.org/article/10.1086/508275/fulltext/datafile4.txt'
MIDDELBERG_CROSS_IDENTIFICATIONS_URI = 'http://iopscience.iop.org/article/10.1086/508275/fulltext/datafile6.txt'

In [7]:
# Load Norris components.
import requests, io, astropy.io.ascii as asc, astropy.table, pandas
norris_components = astropy.table.Table.from_pandas(
    pandas.read_fwf(
        io.StringIO(
            requests.get(NORRIS_COMPONENTS_URI).text
        ),
        skiprows=[0, 2],
        header=0,
        widths=map(len, [
                '   # ',
                'Name                       ',
                'Radio RA     ',
                'Radio dec  ',
                'err(RA) ',
                'err(dec) ',
                'Peak Flux  ',
                'Int flux   ',
                'Bmaj   ',
                'Bmin   ',
                ' Bpa      ',
                ' rms  ',
            ])
    )
)
norris_components


Out[7]:
<Table length=784>
#NameRadio RARadio decerr(RA)err(dec)Peak FluxInt fluxBmajBminBparms
str4str26str11str11float64float64float64float64float64float64float64float64
C001ATCDFS_J032602.78-284709.03:26:02.785-28:47:09.00.780.730.71.388.32.660.879.3
C002ATCDFS_J032604.15-275659.33:26:04.152-27:56:59.30.550.90.711.9711.57.3-17.071.9
C003ATCDFS_J032605.68-274734.43:26:05.685-27:47:34.40.10.1140.8174.75.95.685.7119.1
C004ATCDFS_J032606.95-275332.23:26:06.955-27:53:32.20.521.190.410.430.00.0-1.076.7
C005ATCDFS_J032611.47-273243.83:26:11.475-27:32:43.80.10.169.65110.95.33.389.6156.3
C006ATCDFS_J032613.70-281717.73:26:13.701-28:17:17.70.570.790.480.540.00.00.077.7
C007ATCDFS_J032615.48-284629.23:26:15.489-28:46:29.20.340.360.450.710.00.00.066.0
C008ATCDFS_J032615.55-280601.03:26:15.557-28:06:01.00.30.470.731.060.00.0-1.056.9
C009ATCDFS_J032616.35-280014.63:26:16.353-28:00:14.60.180.31.241.660.00.0-1.060.1
C010ATCDFS_J032616.41-271621.13:26:16.419-27:16:21.10.150.234.177.849.12.8-40.090.2
....................................
C775ATCDFS_J033544.32-274323.33:35:44.326-27:43:23.30.480.981.362.819.55.43.673.1
C776ATCDFS_J033544.35-282050.93:35:44.350-28:20:50.90.350.770.671.249.64.3-9.570.4
C777ATCDFS_J033544.96-274308.03:35:44.960-27:43:08.00.410.80.581.319.46.216.071.5
C778ATCDFS_J033546.38-283805.23:35:46.383-28:38:05.20.330.560.981.555.53.954.266.9
C779ATCDFS_J033548.20-284402.33:35:48.204-28:44:02.30.120.157.0912.376.24.6-48.876.4
C780ATCDFS_J033549.15-274918.53:35:49.152-27:49:18.50.120.166.0110.976.75.322.279.5
C781ATCDFS_J033553.33-272740.43:35:53.336-27:27:40.40.10.173.88126.96.03.671.9149.7
C782ATCDFS_J033553.48-275510.93:35:53.480-27:55:10.90.350.620.590.995.84.5-43.673.5
C783ATCDFS_J033553.58-280956.83:35:53.589-28:09:56.80.160.40.460.510.00.00.061.2
C784ATCDFS_J033553.63-282923.53:35:53.634-28:29:23.50.640.960.531.177.46.6-61.280.0

In [8]:
# Load Norris cross-identifications.
# This table has inconsistent tabs, so we will have to convert them to "soft tabs".
def replace_tabs(s, tabstop=8):
    """Convert tabs to spaces."""
    out = ''
    upto = 0
    last = None
    for c in s:
        if c == '\t':
            # Fill up to next tabstop.
            diff = tabstop - upto % tabstop
            if diff == 0:
                diff = tabstop
            out += ' ' * diff
            upto += diff
            last = c
            continue
        
        last = c
        out += c
        upto += 1

    return out

test_input = ('S001	ATCDFS_J032602.78-284709.0	C001	            SWIRE3_J032603.15-284708.5	3:26:02.785	-28:47:09.06	1.4	33.8	21.1	-1.0	-1.0	-1.0						 	4					looks like a group in irac 1')
test_output = ('S001    ATCDFS_J032602.78-284709.0      C001                SWIRE3_J032603.15-284708.5  3:26:02.785     -28:47:09.06    1.4     33.8    21.1    -1.0    -1.0    -1.0                                                    4                                       looks like a group in irac 1')

assert test_output == replace_tabs(test_input)


norris_cross_identifications = astropy.table.Table.from_pandas(
    pandas.read_fwf(
        io.StringIO(
            '\n'.join(map(
                    lambda s: replace_tabs(s, 8),
                    requests.get(NORRIS_CROSS_IDENTIFICATIONS_URI).text.split('\r\n'))
             )
        ),
        skiprows=[0, 2],
        header=0,
        widths=[8, 32, 20, 28, 16, 16, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 16, 8, 16]
    )
)
norris_cross_identifications[700:710]


Out[8]:
<Table masked=True length=10>
#NameComponentSWIRERadio RARadio dec20cm3.6µm4.5µm5.8µm8.0µm24µmUGRIZz(sp)IDClassBasis
str4str26str19str26str11str12float64float64str8str8str8float64float64float64float64float64float64float64int64str4str3
S701ATCDFS_J033529.42-285156.5C753SWIRE3_J033529.46-285154.53:35:29.424-28:51:56.541.7716.0521.4194.3146.7-1.020.618.817.516.716.4--3----
S702ATCDFS_J033530.22-281108.4C754SWIRE3_J033530.19-281108.53:35:30.229-28:11:08.450.832.238.5-1.056.2376.399.099.099.099.099.0--1AGNb
S703ATCDFS_J033531.02-272702.2C755--3:35:31.025-27:27:02.2026.1----------------------9----
S704ATCDFS_J033532.11-274547.6C756SWIRE3_J033532.11-274545.23:35:32.115-27:45:47.612.2164.484.655.340.6-1.0------------3----
S705ATCDFS_J033532.19-284801.0C757SWIRE3_J033532.19-284801.13:35:32.192-28:48:01.080.9222.5417.81084.82999.410830.199.024.222.020.720.1--1----
S706ATCDFS_J033533.22-280621.8C758--3:35:33.224-28:06:21.850.3----------------------9----
S707ATCDFS_J033533.96-273313.4C759,C760,C761,C764SWIRE3_J033533.90-273310.93:35:33.963-27:33:13.469.4451.6512.5724.7966.62208.2--19.819.518.9----3AGNb
S708ATCDFS_J033534.47-284213.3C763SWIRE3_J033534.66-284221.73:35:34.479-28:42:13.310.579.853.1-1.044.3-1.099.022.721.320.419.9--4----
S709ATCDFS_J033535.20-281729.0C762,C765SWIRE3_J033535.20-281729.03:35:35.20-28:17:29.008.434.525.3-1.0-1.0-1.099.099.099.022.621.9---1AGNa
S710ATCDFS_J033536.48-282618.2C766(154935)3:35:36.488-28:26:18.270.34.1-1.0-1.0-1.0-1.099.099.099.099.099.0--1----

In [9]:
# Load Middelberg tables.
middelberg_components = asc.read(MIDDELBERG_COMPONENTS_URI)
print(middelberg_components[0])
middelberg_cross_identifications = asc.read(MIDDELBERG_CROSS_IDENTIFICATIONS_URI)
print(middelberg_cross_identifications[0])


Downloading http://iopscience.iop.org/article/10.1086/508275/fulltext/datafile4.txt [Done]
 ID             Name            RAh RAm  RAs  DE- DEd  DEm    DEs   e_RAs  e_DEs  PFlux IFlux MajAxis MinAxis PosAng rms  Com
                                 h  min   s       deg arcmin arcsec arcsec arcsec  mJy   mJy   arcsec  arcsec  deg   uJy     
---- -------------------------- --- --- ----- --- --- ------ ------ ------ ------ ----- ----- ------- ------- ------ ---- ---
C001 ATCDFS_J032602.78-284709.0   3  26 2.785   -  28     47    9.0   0.78   0.73   0.7  1.38     8.3     2.6   60.8 79.3  --
Downloading http://iopscience.iop.org/article/10.1086/508275/fulltext/datafile6.txt [Done]
 ID             Name            CID            SName            RAh RAm  RAs  DE- DEd  DEm    DEs   F20 F3.6 F4.5 F5.8 F8.0 F24  umag gmag rmag Imag zmag  z  Type Class f_Class             Comm            
                                                                 h  min   s       deg arcmin arcsec mJy uJy  uJy  uJy  uJy  uJy  mag  mag  mag  mag  mag                                                     
---- -------------------------- ---- -------------------------- --- --- ----- --- --- ------ ------ --- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- --- ---- ----- ------- ----------------------------
S001 ATCDFS_J032602.78-284709.0 C001 SWIRE3_J032603.15-284708.5   3  26 2.785   -  28     47   9.06 1.4 33.8 21.1 -1.0 -1.0 -1.0   --   --   --   --   --  --    4    --      -- looks like a group in irac 1

In [10]:
# Convert Middelberg data into columns. There's no catalogue matching to do here so we can
# throw everything in right away.
import astropy.coordinates

_middelberg_component_ids = middelberg_components['ID']
_middelberg_component_names = middelberg_components['Name']
_middelberg_component_positions = [
    astropy.coordinates.SkyCoord(ra=(r['RAh'], r['RAm'], r['RAs']),
                                 dec=(-r['DEd'], r['DEm'], r['DEs']),
                                 unit=('hourangle', 'deg'))
    for r in middelberg_components
]
_middelberg_component_ras = [r.ra.deg for r in _middelberg_component_positions]
_middelberg_component_decs = [r.dec.deg for r in _middelberg_component_positions]
_middelberg_component_peak_flux = middelberg_components['PFlux']
_middelberg_component_int_flux = middelberg_components['IFlux']
_middelberg_source_ids = middelberg_components['ID']
_middelberg_cid_to_source_id = {}
_middelberg_cid_to_source_name = {}
_middelberg_cid_to_swire = {}
_middelberg_cid_to_source_z = {}
_middelberg_cid_to_source_ra = {}
_middelberg_cid_to_source_dec = {}
for row in middelberg_cross_identifications:
    for component in row['CID'].split(','):
        component = component.strip()
        _middelberg_cid_to_source_id[component] = row['ID']
        _middelberg_cid_to_source_name[component] = row['Name']
        _middelberg_cid_to_swire[component] = row['SName']
        _middelberg_cid_to_source_z[component] = row['z']
        pos = astropy.coordinates.SkyCoord(ra=(row['RAh'], row['RAm'], row['RAs']),
                                           dec=(-row['DEd'], row['DEm'], row['DEs']),
                                           unit=('hourangle', 'deg'))
        _middelberg_cid_to_source_ra[component] = pos.ra.deg
        _middelberg_cid_to_source_dec[component] = pos.dec.deg

_middelberg_component_source_ids = [_middelberg_cid_to_source_id[c] for c in _middelberg_component_ids]
_middelberg_component_source_names = [_middelberg_cid_to_source_name[c] for c in _middelberg_component_ids]
_middelberg_component_swires = [_middelberg_cid_to_swire[c] for c in _middelberg_component_ids]
_middelberg_component_source_zs = [_middelberg_cid_to_source_z[c] for c in _middelberg_component_ids]
_middelberg_component_source_ras = [_middelberg_cid_to_source_ra[c] for c in _middelberg_component_ids]
_middelberg_component_source_decs = [_middelberg_cid_to_source_dec[c] for c in _middelberg_component_ids]

In [13]:
# Load RGZ.
import pymongo, numpy

client = pymongo.MongoClient(MONGO_HOST, MONGO_PORT)
db = client['radio']

_rgz_sources = []
_rgz_coords = []
_rgz_zids = []

for subject in db.radio_subjects.find({'metadata.survey': 'atlas'}):
    source = subject['metadata']['source']
    ra, dec = subject['coords']
    zid = subject['zooniverse_id']
    _rgz_sources.append(source)
    _rgz_coords.append((ra, dec))
    _rgz_zids.append(zid)

_rgz_coords = numpy.array(_rgz_coords)

In [71]:
# Load consensuses from crowdastro.
import h5py
with h5py.File(CROWDASTRO_PATH, 'r') as crowdastro_h5:
    # (atlas_i, ir_i, success, percentage)
    _crowdastro_consensus_objects = crowdastro_h5['/atlas/cdfs/consensus_objects']
    _crowdastro_zids = [r[0].decode('ascii') for r in crowdastro_h5['/atlas/cdfs/string']]
    _crowdastro_swire_names = [r.decode('ascii') for r in crowdastro_h5['/swire/cdfs/string']]
    _crowdastro_zid_to_swire = {}
    _crowdastro_zid_to_percentages = {}
    _crowdastro_zid_to_fit_success = {}
    for atlas_i, ir_i, success, percentage in _crowdastro_consensus_objects:
        _crowdastro_zid_to_swire[_crowdastro_zids[int(atlas_i)]] = _crowdastro_swire_names[int(ir_i)]
        _crowdastro_zid_to_percentages[_crowdastro_zids[int(atlas_i)]] = percentage
        _crowdastro_zid_to_fit_success[_crowdastro_zids[int(atlas_i)]] = bool(success)

In [15]:
# Load Franzen.
franzen = asc.read(FRANZEN_PATH)
# Note that multi-component Franzen objects are matched to exactly one RGZ object, which is associated with the
# first component of said object. We will not make the same assumption here.
franzen


Out[15]:
<Table length=3079>
IDNameRADECRA_ERRDEC_ERRSNRRMSBWSSpSp_ERRSS_ERRDECONVDECONV_ERRVOBS_FREQSINDEXINDEX_ERR
str8str27float64float64float64float64float64float64float64float64float64float64float64float64float64float64float64float64float64
CI0001C1ATLAS3_J032836.59-284145.1C52.152477-28.695870.001990.0017623135.610.0410.96948.560.32984.240.342.0880.0131.01472.61-0.7030.005
CI0001C2ATLAS3_J032836.49-284201.1C52.152059-28.7003120.01310.009564266.590.0410.96174.930.33200.050.384.080.0381.01472.46-0.7030.005
CI0001C3ATLAS3_J032836.38-284155.0C52.151573-28.6986180.05030.0441924.150.0410.9637.890.27112.080.815.0660.0391.01472.53-0.7030.005
CI0001C4ATLAS3_J032835.99-284104.2C52.149971-28.6844961.0712.60615.660.0410.960.6420.1991.210.3810.1272.5250.9151473.08-0.7030.005
CI0001C5ATLAS3_J032836.77-284148.2C52.153192-28.696730.1140.1825223.660.0410.969.170.3428.51.115.6320.1981.01472.58-0.7030.005
CI0002ATLAS3_J032734.01-284621.5C51.89172-28.7726260.0170.0354819.00.0310.97153.357.819159.3147.9662.1231.9491.01480.37-0.2330.012
CI0003ATLAS3_J032611.46-273243.0C51.547764-27.545290.0170.0353889.00.0260.95106.6445.438107.6355.3821.0383.9881.01488.15-0.4810.014
CI0004ATLAS3_J033409.28-282420.0C53.538672-28.4055430.0170.0353616.00.0180.9765.1283.32175.9893.7994.3970.9411.01493.32-0.6210.014
CI0005C1ATLAS3_J032737.35-280132.8C51.905615-28.0257720.06210.0487447.330.0150.976.710.420.00.00.00.01.01488.39-0.4740.013
CI0005C2ATLAS3_J032738.11-280126.6C51.908796-28.0240420.01430.018651445.330.0150.9721.680.1823.220.192.870.2351.01488.21-0.4740.013
.........................................................
CI2929ATLAS3_J032711.74-272743.0C51.798926-27.4619450.9832.3945.010.0140.990.0720.0150.00.00.00.00.2921516.36-99.0-99.0
CI2932ATLAS3_J032711.09-284100.8C51.796229-28.6835530.9832.3945.010.0210.980.1090.0220.00.00.00.00.5411491.79-99.0-99.0
CI2933ATLAS3_J032655.06-273633.9C51.729407-27.6094230.9792.3835.030.0150.980.0790.0160.00.00.00.00.3691507.75-99.0-99.0
CI2935ATLAS3_J033459.59-284302.6C53.748297-28.7174010.9862.3985.00.0160.980.0820.0170.00.00.00.00.3941504.4-99.0-99.0
CI2944ATLAS3_J032926.44-282423.6C52.36016-28.4065570.9762.3765.050.0150.970.0770.0160.00.00.00.00.3441486.02-99.0-99.0
CI2947ATLAS3_J032713.94-273944.3C51.808064-27.6623110.9832.395.010.0140.980.0740.0150.00.00.00.00.3051501.18-99.0-99.0
CI2949ATLAS3_J032959.65-280220.3C52.498556-28.0389630.9832.3945.010.0130.970.0690.0140.00.00.00.00.2581484.95-99.0-99.0
CI2983ATLAS3_J033059.50-284952.0C52.747925-28.8311070.9832.3945.010.0230.950.1190.0240.00.00.00.00.5711469.73-99.0-99.0
CI2998ATLAS3_J032911.97-282138.3C52.29988-28.3606340.9862.3985.00.0150.960.0780.0160.00.00.00.00.3531487.55-99.0-99.0
CI3180ATLAS3_J033202.85-274656.2C53.011863-27.782280.9832.395.010.0210.940.1120.0230.00.00.00.00.5511417.17-99.0-99.0

In [27]:
# Match Franzen to Norris.
import scipy.spatial
_franzen_cid_to_norris = {}  # Maps Franzen CID -> Norris CID (RGZ uses Franzen CIDs)
_norris_cids = [r['#'] for r in norris_components]
_norris_coords = [astropy.coordinates.SkyCoord(
    ra=r['Radio RA'],
    dec=r['Radio dec'],
    unit=('hourangle', 'deg')) for r in norris_components]

_norris_coords = numpy.array([(p.ra.deg, p.dec.deg) for p in _norris_coords])
_norris_tree = scipy.spatial.KDTree(_norris_coords)
_franzen_coords = numpy.array(list(zip([franzen['RA'], franzen['DEC']])))[:, 0, :].T
_franzen_cids = franzen['ID']
_dists, _indices = _norris_tree.query(_franzen_coords)
_matches = _dists < 5 / 60 / 60

for cid, match, index in zip(_franzen_cids, _matches, _indices):
    if not match:
        continue
    
    _franzen_cid_to_norris[cid] = _norris_cids[index]
_norris_to_franzen_cid = {j:i for i, j in _franzen_cid_to_norris.items()}

In [21]:
# Load Fan.
fan_cross_identifications = asc.read(FAN_PATH, header_start=0, delimiter=',')
_fan_source_ids = fan_cross_identifications['id']
_fan_id_to_swire = {r['id']:r['swire'] for r in fan_cross_identifications}
# Assuming that CID in Fan = CID in Norris.
_fan_component_to_source = {}
_fan_component_to_swire = {}
for row in fan_cross_identifications:
    components = row['radios'].split(',')
    for component in components:
        component = component.strip()
        _fan_component_to_source[component] = row['id']
        _fan_component_to_swire[component] = row['swire']

Now, we can construct the table. We will have the following columns:

  • Key
  • [component columns in Norris] (Norris)
  • [source columns in Norris] (Norris)
  • Source ID (Fan)
  • SWIRE Name (Fan)
  • [component columns in Franzen] (Franzen)
  • Component ID (RGZ)
  • Zooniverse ID (RGZ)
  • SWIRE Name (RGZ-MV)
  • Gaussian Click Fit Success (RGZ)
  • Click Agreement (RGZ)

In [72]:
names = ['Key'] + [
    'Component ' + k + ' (Norris)' for k in norris_components.columns.keys()] + [
    'Source ' + k + ' (Norris)' for k in norris_cross_identifications.columns.keys()] + [
    'Source # (Fan)',
    'Source SWIRE Name (Fan)'] + [
    'Component ' + k + ' (Franzen)' for k in franzen.columns.keys()] + [
    'Component Zooniverse ID (RGZ)',
    'Primary Component ID (RGZ)',
    'Source SWIRE Name (RGZ)',
    'Gaussian Click Fit Success (RGZ)',
    'Click Agreement (RGZ)']

names


Out[72]:
['Key',
 'Component # (Norris)',
 'Component Name (Norris)',
 'Component Radio RA (Norris)',
 'Component Radio dec (Norris)',
 'Component err(RA) (Norris)',
 'Component err(dec) (Norris)',
 'Component Peak Flux (Norris)',
 'Component Int flux (Norris)',
 'Component Bmaj (Norris)',
 'Component Bmin (Norris)',
 'Component Bpa (Norris)',
 'Component rms (Norris)',
 'Source # (Norris)',
 'Source Name (Norris)',
 'Source Component (Norris)',
 'Source SWIRE (Norris)',
 'Source Radio RA (Norris)',
 'Source Radio dec (Norris)',
 'Source 20cm (Norris)',
 'Source 3.6µm (Norris)',
 'Source 4.5µm (Norris)',
 'Source 5.8µm (Norris)',
 'Source 8.0µm (Norris)',
 'Source 24µm (Norris)',
 'Source U (Norris)',
 'Source G (Norris)',
 'Source R (Norris)',
 'Source I (Norris)',
 'Source Z (Norris)',
 'Source z(sp) (Norris)',
 'Source ID (Norris)',
 'Source Class (Norris)',
 'Source Basis (Norris)',
 'Source # (Fan)',
 'Source SWIRE Name (Fan)',
 'Component ID (Franzen)',
 'Component Name (Franzen)',
 'Component RA (Franzen)',
 'Component DEC (Franzen)',
 'Component RA_ERR (Franzen)',
 'Component DEC_ERR (Franzen)',
 'Component SNR (Franzen)',
 'Component RMS (Franzen)',
 'Component BWS (Franzen)',
 'Component Sp (Franzen)',
 'Component Sp_ERR (Franzen)',
 'Component S (Franzen)',
 'Component S_ERR (Franzen)',
 'Component DECONV (Franzen)',
 'Component DECONV_ERR (Franzen)',
 'Component V (Franzen)',
 'Component OBS_FREQ (Franzen)',
 'Component SINDEX (Franzen)',
 'Component INDEX_ERR (Franzen)',
 'Component Zooniverse ID (RGZ)',
 'Primary Component ID (RGZ)',
 'Source SWIRE Name (RGZ)',
 'Gaussian Click Fit Success (RGZ)',
 'Click Agreement (RGZ)']

In [91]:
import astropy.table

# Component (Norris)
comp_columns = []
for column in norris_components.columns:
    comp_columns.append(list(norris_components[column]))

# Source (Norris)
source_columns = []
for column in norris_cross_identifications.columns:
    _component_to_value = {}
    for row in norris_cross_identifications:
        components = row['Component'].split(',')
        for component in components:
            component = component.strip()
            _component_to_value[component] = row[column]
    
    column = []
    for component in comp_columns[0]:
        column.append(_component_to_value[component])
    source_columns.append(column)

# Add in the Fan matches.
fan_sources = [_fan_component_to_source.get(c, '') for c in comp_columns[0]]
fan_swires = [_fan_component_to_swire.get(c, '') for c in comp_columns[0]]
fan_columns = [fan_sources, fan_swires]

# Add in the Franzen matches.
franzen_columns = []
for column in franzen.columns:
    _component_to_value = {}
    for row in franzen:
        component = row['ID']
        _component_to_value[component] = row[column]
    
    column = []
    for norris_cid in norris_components['#']:
        if norris_cid not in _norris_to_franzen_cid:
            column.append(float('nan'))
        else:
            franzen_cid = _norris_to_franzen_cid[norris_cid]
            column.append(_component_to_value[franzen_cid])
    franzen_columns.append(column)

# Add in all the Franzen objects with no corresponding Norris.
_all_franzen_ids = set(franzen['ID'])
_included_franzen_ids = {i for i in franzen_columns[0] if isinstance(i, str)}
assert 700 < len(_included_franzen_ids) < 800  # Sanity check.
_missing_franzen_ids = _all_franzen_ids - _included_franzen_ids
# Pad existing columns to include new datapoints.
for column in itertools.chain(comp_columns, source_columns, fan_columns):
    column.extend([float('nan')] * len(_missing_franzen_ids))
# Fill in the Franzen columns.
for row in franzen:
    if row['ID'] in _missing_franzen_ids:
        for column, column_name in zip(franzen_columns, franzen.columns):
            column.append(row[column_name])

# Add in the RGZ data.
# Columns: Zooniverse ID, Primary Component ID, SWIRE Name, Gaussian Click Fit Success,
#          Click Agreement.
# RGZ is a proper subset of Franzen, so all objects in RGZ should already be in the table.
# For multi-component radio objects like CI0001C1..CI0001C5, there is only one RGZ object,
# and it is associated with the first component. Each component here should be associated
# with the same Zooniverse ID, Component ID, and SWIRE Name. The component ID will be the
# ID of the primary component.
_cid_to_zid = dict(zip(_rgz_sources, _rgz_zids))
zooniverse_ids = []
rgz_primary_components = []
rgz_swire_names = []
rgz_successes = []
rgz_percentages = []
import re
for cid in franzen_columns[0]:
    if not cid or not isinstance(cid, str):
        zooniverse_ids.append('')
        rgz_primary_components.append('')
        rgz_swire_names.append('')
        rgz_successes.append('')
        rgz_percentages.append('')
        continue
    multi_component = re.match(r'(CI\d+)C\d+', cid)
    if multi_component:
        primary = multi_component.group(1) + 'C1'
    else:
        primary = cid
    if primary not in _cid_to_zid:
        zooniverse_ids.append('')
        rgz_primary_components.append('')
        rgz_swire_names.append('')
        rgz_successes.append('')
        rgz_percentages.append('')
        continue
    rgz_primary_components.append(primary)
    zooniverse_ids.append(_cid_to_zid[primary])
    rgz_swire_names.append(_crowdastro_zid_to_swire.get(_cid_to_zid[primary], ''))
    rgz_successes.append(_crowdastro_zid_to_fit_success.get(_cid_to_zid[primary], ''))
    rgz_percentages.append(_crowdastro_zid_to_percentages.get(_cid_to_zid[primary], ''))
# Check that all RGZ objects are in the table.
assert all(z in zooniverse_ids for z in _rgz_zids)
assert all(c in zooniverse_ids for c in _crowdastro_zid_to_swire)
rgz_columns = [zooniverse_ids, rgz_primary_components, rgz_swire_names, rgz_successes, rgz_percentages]

# Key
keys = list(range(len(comp_columns[0])))

# Assemble the table data.
data = [keys] + comp_columns + source_columns + fan_columns + franzen_columns + rgz_columns

# Mask everything that is NaN or empty.
for i, column in enumerate(data):
    column = numpy.array(column)
    try:
        masked = numpy.ma.MaskedArray(data=column,
                                      mask=numpy.logical_or(numpy.isnan(column), column == -99.0))
    except TypeError:
        masked = numpy.ma.MaskedArray(data=column,
                                      mask=numpy.logical_or(column == '', column == 'nan'))
    data[i] = masked

# Assemble the whole table.
table = astropy.table.Table(data=data,
                            names=names)
table


/usr/local/lib/python3.6/site-packages/ipykernel/__main__.py:112: UserWarning: Warning: converting a masked element to nan.
Out[91]:
<Table masked=True length=3155>
KeyComponent # (Norris)Component Name (Norris)Component Radio RA (Norris)Component Radio dec (Norris)Component err(RA) (Norris)Component err(dec) (Norris)Component Peak Flux (Norris)Component Int flux (Norris)Component Bmaj (Norris)Component Bmin (Norris)Component Bpa (Norris)Component rms (Norris)Source # (Norris)Source Name (Norris)Source Component (Norris)Source SWIRE (Norris)Source Radio RA (Norris)Source Radio dec (Norris)Source 20cm (Norris)Source 3.6µm (Norris)Source 4.5µm (Norris)Source 5.8µm (Norris)Source 8.0µm (Norris)Source 24µm (Norris)Source U (Norris)Source G (Norris)Source R (Norris)Source I (Norris)Source Z (Norris)Source z(sp) (Norris)Source ID (Norris)Source Class (Norris)Source Basis (Norris)Source # (Fan)Source SWIRE Name (Fan)Component ID (Franzen)Component Name (Franzen)Component RA (Franzen)Component DEC (Franzen)Component RA_ERR (Franzen)Component DEC_ERR (Franzen)Component SNR (Franzen)Component RMS (Franzen)Component BWS (Franzen)Component Sp (Franzen)Component Sp_ERR (Franzen)Component S (Franzen)Component S_ERR (Franzen)Component DECONV (Franzen)Component DECONV_ERR (Franzen)Component V (Franzen)Component OBS_FREQ (Franzen)Component SINDEX (Franzen)Component INDEX_ERR (Franzen)Component Zooniverse ID (RGZ)Primary Component ID (RGZ)Source SWIRE Name (RGZ)Gaussian Click Fit Success (RGZ)Click Agreement (RGZ)
int64str4str26str11str11float64float64float64float64float64float64float64float64str4str26str19str32str11str12float64float64str32str32str32float64float64float64float64float64float64float64float64str32str32str5str26str8str27float64float64float64float64float64float64float64float64float64float64float64float64float64float64float64float64float64str10str8str26str5str32
0C001ATCDFS_J032602.78-284709.03:26:02.785-28:47:09.00.780.730.71.388.32.660.879.3S001ATCDFS_J032602.78-284709.0C001SWIRE3_J032603.15-284708.53:26:02.785-28:47:09.061.433.821.1-1.0-1.0-1.0------------4.0----91SWIRE3_J032603.15-284708.5CI0412ATLAS3_J032602.82-284708.1C51.511734-28.7855750.1710.41429.00.0450.931.3950.0862.2110.1198.2340.5760.9951450.89----ARG0003rb2CI0412SWIRE3_J032602.36-284711.5True1.0
1C002ATCDFS_J032604.15-275659.33:26:04.152-27:56:59.30.550.90.711.9711.57.3-17.071.9S002ATCDFS_J032604.15-275659.3C002--3:26:04.152-27:56:59.392.0----------------------8.0--------CI0354ATLAS3_J032604.21-275659.6C51.517555-27.9498750.1410.34235.250.0360.931.350.0791.8260.0986.3930.720.9941449.61--------------
2C003ATCDFS_J032605.68-274734.43:26:05.685-27:47:34.40.10.1140.8174.75.95.685.7119.1S003ATCDFS_J032605.68-274734.4C003--3:26:05.685-27:47:34.4874.7----------------------8.0--------------------------------------------------------
3C004ATCDFS_J032606.95-275332.23:26:06.955-27:53:32.20.521.190.410.430.00.0-1.076.7S004ATCDFS_J032606.95-275332.2C004--3:26:06.955-27:53:32.260.4----------------------8.0--------------------------------------------------------
4C005ATCDFS_J032611.47-273243.83:26:11.475-27:32:43.80.10.169.65110.95.33.389.6156.3S005ATCDFS_J032611.47-273243.8C005--3:26:11.475-27:32:43.81110.9----------------------8.0--------CI0003ATLAS3_J032611.46-273243.0C51.547764-27.545290.0170.0353889.00.0260.95106.6445.438107.6355.3821.0383.9881.01488.15-0.4810.014----------
5C006ATCDFS_J032613.70-281717.73:26:13.701-28:17:17.70.570.790.480.540.00.00.077.7S006ATCDFS_J032613.70-281717.7C006(441298)3:26:13.701-28:17:17.710.56.4-1.0-1.0-1.0-1.0------------3.0--------------------------------------------------------
6C007ATCDFS_J032615.48-284629.23:26:15.489-28:46:29.20.340.360.450.710.00.00.066.0S007ATCDFS_J032615.48-284629.2C007SWIRE3_J032615.41-284630.73:26:15.489-28:46:29.240.7265.0136.9132.767.1-1.0------------2.0----153SWIRE3_J032615.41-284630.7CI0614ATLAS3_J032615.49-284629.4C51.564555-28.7748470.2670.64818.510.0360.940.7030.0521.2990.0749.9130.5460.9271465.03----ARG0003rfrCI0614SWIRE3_J032615.41-284630.7True1.0
7C008ATCDFS_J032615.55-280601.03:26:15.557-28:06:01.00.30.470.731.060.00.0-1.056.9S008ATCDFS_J032615.55-280601.0C008SWIRE3_J032615.52-280559.83:26:15.557-28:06:01.051.140.351.4-1.0-1.0-1.0------------2.0----156SWIRE3_J032615.52-280559.8CI0320ATLAS3_J032615.55-280559.8C51.564799-28.0999550.1210.29241.320.0260.941.1260.0640.00.00.00.00.9851472.47----ARG0003r8sCI0320SWIRE3_J032615.52-280559.8True1.0
8C009ATCDFS_J032616.35-280014.63:26:16.353-28:00:14.60.180.31.241.660.00.0-1.060.1S009ATCDFS_J032616.35-280014.6C009SWIRE3_J032616.31-280014.73:26:16.353-28:00:14.611.7143.673.9-1.056.7-1.0------------1.0----996SWIRE3_J032616.31-280014.7CI0240ATLAS3_J032616.36-280013.6C51.568155-28.0037870.0780.1964.330.0270.941.8230.0970.00.00.00.00.9981468.25-0.0470.186----------
9C010ATCDFS_J032616.41-271621.13:26:16.419-27:16:21.10.150.234.177.849.12.8-40.090.2S010ATCDFS_J032616.41-271621.1C010--3:26:16.419-27:16:21.107.8----------------------8.0--------------------------------------------------------
....................................................................................................................................................................................
3145----------------------------------------------------------------------CI2929ATLAS3_J032711.74-272743.0C51.798926-27.4619450.9832.3945.010.0140.990.0720.0150.00.00.00.00.2921516.36----ARG0003sxaCI2929------
3146----------------------------------------------------------------------CI2932ATLAS3_J032711.09-284100.8C51.796229-28.6835530.9832.3945.010.0210.980.1090.0220.00.00.00.00.5411491.79----ARG0003sxbCI2932------
3147----------------------------------------------------------------------CI2933ATLAS3_J032655.06-273633.9C51.729407-27.6094230.9792.3835.030.0150.980.0790.0160.00.00.00.00.3691507.75----ARG0003sxcCI2933------
3148----------------------------------------------------------------------CI2935ATLAS3_J033459.59-284302.6C53.748297-28.7174010.9862.3985.00.0160.980.0820.0170.00.00.00.00.3941504.4----ARG0003sxdCI2935SWIRE3_J033459.65-284301.3True1.0
3149----------------------------------------------------------------------CI2944ATLAS3_J032926.44-282423.6C52.36016-28.4065570.9762.3765.050.0150.970.0770.0160.00.00.00.00.3441486.02--------------
3150----------------------------------------------------------------------CI2947ATLAS3_J032713.94-273944.3C51.808064-27.6623110.9832.395.010.0140.980.0740.0150.00.00.00.00.3051501.18----ARG0003sxeCI2947------
3151----------------------------------------------------------------------CI2949ATLAS3_J032959.65-280220.3C52.498556-28.0389630.9832.3945.010.0130.970.0690.0140.00.00.00.00.2581484.95----ARG0003sxfCI2949------
3152----------------------------------------------------------------------CI2983ATLAS3_J033059.50-284952.0C52.747925-28.8311070.9832.3945.010.0230.950.1190.0240.00.00.00.00.5711469.73----ARG0003sxgCI2983------
3153----------------------------------------------------------------------CI2998ATLAS3_J032911.97-282138.3C52.29988-28.3606340.9862.3985.00.0150.960.0780.0160.00.00.00.00.3531487.55----ARG0003sxhCI2998------
3154----------------------------------------------------------------------CI3180ATLAS3_J033202.85-274656.2C53.011863-27.782280.9832.395.010.0210.940.1120.0230.00.00.00.00.5511417.17----ARG0003sxiCI3180------

In [92]:
table.write('one-table-to-rule-them-all.tbl', format='csv')


WARNING: AstropyDeprecationWarning: one-table-to-rule-them-all.tbl already exists. Automatically overwriting ASCII files is deprecated. Use the argument 'overwrite=True' in the future. [astropy.io.ascii.ui]

In [46]:
import matplotlib.pyplot as plt
%matplotlib inline
def decimalify(ras, decs):
    from astropy.coordinates import SkyCoord
    coords = []
    for ra, dec in zip(ras, decs):
        sc = SkyCoord(ra=ra, dec=dec, unit=('hourangle', 'deg'))
        coords.append((sc.ra.deg, sc.dec.deg))
    return zip(*coords)
not_in_franzen = table[[not bool(i) for i in table['Component ID (Franzen)']]]
ras = not_in_franzen['Component Radio RA (Norris)']
decs = not_in_franzen['Component Radio dec (Norris)']
plt.scatter(*decimalify(ras, decs), color='green', marker='+')
plt.show()



In [ ]: