Assignment 1

Hui Lyu

Downloading Gaia data

This notebook is a simple example of downloading Gaia data, loading it in, and doing simple plots of it.


In [1]:
!curl -OJ https://girder.hub.yt/api/v1/file/57fcf27bb8805f000164ab40/download
# Windows 10


curl: Saved to filename 'gaia_validp.h5'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  2 93.8M    2 1968k    0     0  2569k      0  0:00:37 --:--:--  0:00:37 2569k
  5 93.8M    5 5260k    0     0  2978k      0  0:00:32  0:00:01  0:00:31 2978k
  8 93.8M    8 8472k    0     0  3062k      0  0:00:31  0:00:02  0:00:29 3062k
 12 93.8M   12 11.4M    0     0  3115k      0  0:00:30  0:00:03  0:00:27 3115k
 15 93.8M   15 14.6M    0     0  3144k      0  0:00:30  0:00:04  0:00:26 3144k
 18 93.8M   18 17.6M    0     0  3138k      0  0:00:30  0:00:05  0:00:25 3225k
 21 93.8M   21 20.6M    0     0  3122k      0  0:00:30  0:00:06  0:00:24 3172k
 24 93.8M   24 23.3M    0     0  3080k      0  0:00:31  0:00:07  0:00:24 3090k
 27 93.8M   27 25.9M    0     0  3031k      0  0:00:31  0:00:08  0:00:23 2968k
 29 93.8M   29 28.0M    0     0  2942k      0  0:00:32  0:00:09  0:00:23 2748k
 32 93.8M   32 30.2M    0     0  2874k      0  0:00:33  0:00:10  0:00:23 2570k
 33 93.8M   33 31.7M    0     0  2760k      0  0:00:34  0:00:11  0:00:23 2273k
 36 93.8M   36 34.2M    0     0  2720k      0  0:00:35  0:00:12  0:00:23 2174k
 38 93.8M   38 36.2M    0     0  2699k      0  0:00:35  0:00:13  0:00:22 2117k
 40 93.8M   40 38.3M    0     0  2659k      0  0:00:36  0:00:14  0:00:22 2108k
 42 93.8M   42 39.8M    0     0  2566k      0  0:00:37  0:00:15  0:00:22 1920k
 44 93.8M   44 41.9M    0     0  2564k      0  0:00:37  0:00:16  0:00:21 2100k
 47 93.8M   47 44.6M    0     0  2570k      0  0:00:37  0:00:17  0:00:20 2176k
 49 93.8M   49 46.0M    0     0  2502k      0  0:00:38  0:00:18  0:00:20 1966k
 50 93.8M   50 47.5M    0     0  2463k      0  0:00:39  0:00:19  0:00:20 1884k
 53 93.8M   53 50.6M    0     0  2497k      0  0:00:38  0:00:20  0:00:18 2270k
 55 93.8M   55 51.9M    0     0  2437k      0  0:00:39  0:00:21  0:00:18 2013k
 56 93.8M   56 53.3M    0     0  2400k      0  0:00:40  0:00:22  0:00:18 1796k
 59 93.8M   59 55.4M    0     0  2388k      0  0:00:40  0:00:23  0:00:17 1954k
 62 93.8M   62 58.3M    0     0  2414k      0  0:00:39  0:00:24  0:00:15 2220k
 65 93.8M   65 61.2M    0     0  2435k      0  0:00:39  0:00:25  0:00:14 2181k
 68 93.8M   68 64.3M    0     0  2463k      0  0:00:39  0:00:26  0:00:13 2581k
 72 93.8M   72 67.8M    0     0  2502k      0  0:00:38  0:00:27  0:00:11 2962k
 75 93.8M   75 70.7M    0     0  2517k      0  0:00:38  0:00:28  0:00:10 3128k
 78 93.8M   78 73.6M    0     0  2533k      0  0:00:37  0:00:29  0:00:08 3121k
 81 93.8M   81 76.5M    0     0  2547k      0  0:00:37  0:00:30  0:00:07 3120k
 86 93.8M   86 80.7M    0     0  2602k      0  0:00:36  0:00:31  0:00:05 3345k
 89 93.8M   89 84.1M    0     0  2626k      0  0:00:36  0:00:32  0:00:04 3315k
 91 93.8M   91 85.4M    0     0  2592k      0  0:00:37  0:00:33  0:00:04 3024k
 94 93.8M   94 88.7M    0     0  2614k      0  0:00:36  0:00:34  0:00:02 3100k
 99 93.8M   99 93.1M    0     0  2666k      0  0:00:36  0:00:35  0:00:01 3399k
100 93.8M  100 93.8M    0     0  2674k      0  0:00:35  0:00:35 --:--:-- 3221k

In [2]:
%matplotlib inline

In [3]:
import h5py
import numpy as np
import matplotlib.pyplot as plt

In [4]:
data = {}
with h5py.File("gaia_validp.h5") as f:
    for k in f:
        data[k] = f[k][:]

In [5]:
print(data.keys())


dict_keys(['pmra', 'dec', 'ra', 'phot_g_mean_mag', 'pmdec', 'parallax'])

In [6]:
type(data)


Out[6]:
dict

In [7]:
data


Out[7]:
{'dec': array([ 0.23539165,  0.20006769,  0.24882544, ..., -0.34317732,
        -0.2281136 , -0.22130082]),
 'parallax': array([ 6.35295075,  3.90032894,  3.15531322, ...,  6.03693811,
         1.48414231,  2.68011134]),
 'phot_g_mean_mag': array([  7.99137783,  10.58095872,  10.74310238, ...,   9.23885216,
          9.01706935,   9.73257118]),
 'pmdec': array([ -7.64198999, -55.10917286,  -1.6028671 , ..., -27.85234475,
          1.84710791,   3.15173424]),
 'pmra': array([ 43.75231342,  10.036263  ,   2.93228368, ...,  15.71355591,
         11.35288892,   2.89787878]),
 'ra': array([  45.03433035,   45.16500677,   45.08615484, ...,  315.28287959,
         314.74064816,  314.9607306 ])}

GAIA Data Documentation

https://gaia.esac.esa.int/documentation/GDR1/datamodel/Ch1/gaia_source.html

  • dec : Declination (double, Angle[deg])

Barycentric declination δ of the source in ICRS at the reference epoch ref_epoch

  • parallax : Parallax (double, Angle[mas] )

Absolute barycentric stellar parallax ϖ of the soure at the reference epoch ref_epoc

  • phot_g_mean_mag : G-band mean magnitude (double, Magnitude[mag])

Mean magnitude in the G band. This is computed from the G-band mean flux applying the magnitude zero-point in the Vega scale.

  • pmdec : Proper motion in declination direction (double, Angular Velocity[mas/year] )

Proper motion in declination μδ of the source at the reference epoch ref_epoch. This is the projection of the proper motion vector in the direction of increasing declination.

  • pmra : Proper motion in right ascension direction (double, Angular Velocity[mas/year] )

Proper motion in right ascension μα⁣* of the source in ICRS at the reference epoch ref_epoch. This is the projection of the proper motion vector in the direction of increasing right ascension.

  • ra : Right ascension (double, Angle[deg])

Barycentric right ascension α of the source in ICRS at the reference epoch ref_epoch


In [8]:
# Plot a histogram of dec to show its distribution.
# This could give a general understanding of the dec data 
# based on the amount of data in different intervals.
plt.hist(data['dec'])
plt.title('Histogram of dec')
plt.xlabel('dec')
plt.grid(True)
plt.show()
# It can be found that most values are between -75 and 75 approximately. 
# Among them, basically (-75,-18) and (18,55) intervlas have the most values.



In [9]:
# Based on the nearly centralized distribution of dec, a boxplot can also be intuitionistic.
# It has meaningful quantiles. There is no outlier.
plt.boxplot(data['dec'])
plt.title('Boxplot of dec')
plt.show()



In [10]:
# Plot of parallax
# Direct plot of parallax data can demonstrate each value. There is no approximation.
# In this way, the representation of detailed values may imply some patterns such as a cycle.
plt.plot(data['parallax'],"g")
plt.title('parallax')
plt.grid(True)
plt.show()
# It seems there is a rough cycle period of the values.
# Further observation of the top values can help understand the phenomenon better.



In [11]:
# Plot a histogram of phot_g_mean_mag to show its distribution.
# It could be an overview of the data.
# I do not want to see each value at this time.
plt.hist(data['phot_g_mean_mag'], 30)
plt.title('Histogram of phot_g_mean_mag')
plt.xlabel('phot_g_mean_mag')
plt.grid(True)
plt.show()
# Most values are between 8 and 13 in general.
# A left skewed distribution can also be found based on the graph.



In [12]:
# Hexbin plot of dec vs ra
# I just want to test the use of hexbin plot.
# I tried several parameters in the given data, 
# and found that dec and ra data could generate a seemingly beautiful graph.
# More background knowledge is needed for me to make an analysis of the graph.
plt.hexbin(data['ra'],data['dec'],cmap='plasma')
plt.title("Hexbin plot of ra vs dec")
plt.show()



In [13]:
data['pmra'].size


Out[13]:
2057050

In [14]:
data['ra'].size


Out[14]:
2057050

In [15]:
# Plot a scatter plot of ra and pmra to show their relationship
# I would like to know how proper motion affects the values.
plt.scatter(data['ra'],data['pmra'],alpha=0.5,c=np.arange(data['ra'].size),edgecolors='none')
# The color bar is generated according to the index of data.
# I suppose the index has its special meaning but it may not be true.
# Set the edgecolors to be none can clearly help to see most values.
plt.colorbar()
plt.xlabel('ra')
plt.ylabel('pmra')
plt.title('ra vs pmra')
plt.show()



In [ ]: