In [1]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
%config InlineBackend.figure_format = "retina"
import pandas as pd
sns.set_context("talk")
In [2]:
names = ['byte_range', 'data_type', 'col_ID', 'desc']
fwf_cols = pd.read_fwf('../data/synthetic/gum_mw_columns.tsv',names=names)
In [3]:
fwf_cols.head()
Out[3]:
In [4]:
col_names = fwf_cols.col_ID.values
I had to modify the raw data to get it to read in conveniently. I try not to modify raw data formats (for reproducibility purposes) but there didn't seem to be a convenient way otherwise. The problem was the Vtype column is undefined for most of the file, so a fixed-width-file appears to have no column there, which screws up the last few columns. I simply labeled the first three columns as "ajun" to make it look like there was something there. So I will just drop those rows.
In [5]:
gum_mw_alt = pd.read_fwf('../data/synthetic/gum_mw.sam', names=col_names)
In [6]:
gum_mw_alt.drop(["Vamp", "Vper", "Vphase", "Vtype"], inplace=True, axis=1)
In [7]:
gum_mw_alt.head()
Out[7]:
In [8]:
col_names
Out[8]:
In [9]:
gum = gum_mw_alt
In [10]:
plt.figure(figsize=[5, 8])
#plt.plot(gum['V-I'], gum.Mbol, '.')
plt.plot(gum.Teff, gum.Mbol, '.')
plt.xlim(10000, 2000)
plt.ylim(20, -5)
plt.xlabel("$T_{\mathrm{eff}}$")
plt.ylabel("$M_{\mathrm{bol}}$");
These are the input stars. The absolute magnitude versus effective temperature.
In [11]:
plt.figure(figsize=[8, 8])
#plt.plot(gum['V-I'], gum.Mbol, '.')
sc = plt.scatter(gum.r/1000.0, gum.Gmag, c=gum.Teff, s=20, marker='o', vmin=2000, vmax=10000, cmap="Spectral")
plt.xlabel("$d$ (kpc)")
plt.ylabel("$G$")
plt.hlines(12, 0, 10, colors = 'b', linestyles='--')
plt.colorbar(sc)
plt.ylim(20, 5)
plt.xlim(0, 10)
Out[11]:
This is sort-of a Malmquist bias plot. At a given distance, you can detect the brighter stars.
In [12]:
plt.figure(figsize=[8, 8])
plt.plot(gum.pmRA, gum.pmDE, '.', alpha=0.2)
plt.xlabel("$\delta_{\mathrm{RA}}$ (mas/yr)")
plt.ylabel("$\delta_{\mathrm{DEC}}$ (mas/yr)")
Out[12]:
This is a typical proper motion scatter plot.
In [13]:
plt.figure(figsize=[8, 8])
pm = np.sqrt(gum.pmDE**2 + gum.pmDE**2)
plt.plot(gum.r/1000.0, pm, '.', alpha=0.5)
plt.xlabel("$d$ (kpc)")
plt.ylabel("$\delta$ (mas/yr)")
Out[13]:
Large proper-motion sources are generally closer.
Overall I am dis-satisfied with the number of sources. 1000 is barely enough to do anything with. I sort of wish it was a mock catalog: real stars with unknown distances, but with distances assigned based on some Galactic model and noised up according to the Gaia limitations. Then we could do some cross-matching, etc. As it stands, all I can do is make irrelevant plots.
So an interesting thing to do would be to make dummy clusters and sprinkle them into the data, and see if I can detect them from clustering. That would let me hone an algorithm to then apply to real data. But the truth is that this is way more complicated-- the Galactic potential starts to matter, extinction, binaries, variability. So modeling this the right way is more complicated than a simple afternoon hack.