gully
May 2, 2018
In [1]:
# %load /Users/obsidian/Desktop/defaults.py
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
In [2]:
! du -hs ../data/dr2/Gaia/gdr2/vari_rotation_modulation/csv
In [3]:
df0 = pd.read_csv('../data/dr2/Gaia/gdr2/vari_rotation_modulation/csv/VariRotationModulation_0.csv.gz')
In [4]:
df0.shape
Out[4]:
The catalog has many columns. What are they?
In [5]:
df0.columns
Out[5]:
Gaia Documentation section 14.3.6 explains that some of the columns are populated with arrays! So this catalog can be thought of as a table-of-tables. The typical length of the tables are small, usually just 3-5 entries.
In [6]:
df0.num_segments.describe()
Out[6]:
In [7]:
df0.loc[1]
Out[7]:
I think the segments further consist of lightcurves, for which merely the summary statistics are listed here, but I'm not sure.
Since all the files are still only 150 MB, we can just read in all the files and concatenate them.
In [8]:
import glob
In [9]:
fns = glob.glob('../data/dr2/Gaia/gdr2/vari_rotation_modulation/csv/VariRotationModulation_*.csv.gz')
In [10]:
n_files = len(fns)
This step only takes a few seconds. Let's use a progress bar to keep track.
In [11]:
from astropy.utils.console import ProgressBar
In [12]:
df_rotmod = pd.DataFrame()
with ProgressBar(n_files, ipython_widget=True) as bar:
for i, fn in enumerate(fns):
df_i = pd.read_csv(fn)
df_rotmod = df_rotmod.append(df_i, ignore_index=True)
bar.update()
In [13]:
df_rotmod.shape
Out[13]:
We have 147,535 rotationally modulated variable stars. What is the typical number of segments across the entire catalog?
In [14]:
df_rotmod.num_segments.hist(bins=11)
plt.yscale('log')
plt.xlabel('$N_{\mathrm{segments}}$')
plt.ylabel('occurence');
What are these segments? Are they the arbitrary Gaia segments, or are they something else?
Let's ask our first question: What are the distribution of periods?
best_rotation_period
: Best rotation period (double, Time[day])this field is an estimate of the stellar rotation period and is obtained by averaging the periods obtained in the different segments
In [15]:
df_rotmod.best_rotation_period.hist(bins=30)
plt.yscale('log')
plt.xlabel('$P_{\mathrm{rot}}$ [days]')
plt.ylabel('$N$')
Out[15]:
Next up: What are the distribution of amplitudes?
We will use the segments_activity_index
:
$$A=mag_{95}−mag_{5}$$segments_activity_index : Activity Index in segment (double, Magnitude[mag])
this array stores the activity indexes measured in the different segments. In a given segment the amplitude of variability A is taken as an index of the magnetic activity level. The amplitude of variability is measured by means of the equation:
where $mag_{95}$ and $mag_{5}$ are the 95-th and the 5-th percentiles of the G-band magnitude values.
In [16]:
df_rotmod.max_activity_index.hist(bins=30)
plt.yscale('log')
plt.xlabel('$95^{th} - 5^{th}$ variability percentile[mag]')
plt.ylabel('$N$');
Wow, $>0.4$ magnitudes is a lot! Most have much lower amplitudes.
The problem with max activity index is that it may be sensitive to flares. Instead, let's use the $A$ and $B$ coefficients of the $\sin{}$ and $\cos{}$ functions:
$$mag(t) = mag_0 + A\cos(2\pi T_0 t) + B \sin(2\pi T_0 t)$$
segments_cos_term
: Coefficient of cosine term of linear fit in segment (double, Magnitude[mag])if a significative period T0 is detected in a time-series segment, then the points of the time-series segment are fitted with the function
Let's call the total amplitude $\alpha$, then we can apply:
$\alpha = \sqrt{A^2+B^2}$
In [17]:
val = df_rotmod.segments_cos_term[0]
In [18]:
val
Out[18]:
Gasp! The arrays are actually stored as strings! We need to first convert them to numpy arrays.
In [19]:
np.array(eval(val))
Out[19]:
In [20]:
NaN = np.NaN #Needed for all the NaN values in the strings.
clean_strings = lambda str_in: np.array(eval(str_in))
Only run this once:
In [21]:
if type(df_rotmod['segments_cos_term'][0]) == str:
df_rotmod['segments_cos_term'] = df_rotmod['segments_cos_term'].apply(clean_strings)
df_rotmod['segments_sin_term'] = df_rotmod['segments_sin_term'].apply(clean_strings)
else:
print('Skipping rewrite.')
In [22]:
amplitude_vector = (df_rotmod.segments_sin_term**2 + df_rotmod.segments_cos_term**2)**0.5
In [23]:
df_rotmod['mean_amplitude'] = amplitude_vector.apply(np.nanmean)
Let's compare the max_activity_index
with the newly determined mean amplitude.
The $95^{th}$ to $5^{th}$ percentile should be almost-but-not-quite twice the amplitude:
In [24]:
amp_conv_factor = 1.97537
In [25]:
x_dashed = np.linspace(0,1, 10)
y_dashed = amp_conv_factor * x_dashed
plt.figure(figsize=(5,5))
plt.plot(df_rotmod.mean_amplitude, df_rotmod.max_activity_index, '.', alpha=0.05)
plt.plot(x_dashed, y_dashed, 'k--')
plt.xlim(0,0.5)
plt.ylim(0,1);
plt.xlabel(r'Mean cyclic amplitude, $\alpha$ [mag]')
plt.ylabel(r'$95^{th} - 5^{th}$ variability percentile[mag]');
The lines track decently well. There's some scatter! Probably in part due to non-sinusoidal behavior.
Let's convert the mean magnitude amplitude to an unspotted-to-spotted flux ratio:
In [26]:
df_rotmod['amplitude_linear'] = 10**(-df_rotmod.mean_amplitude/2.5)
In [27]:
df_rotmod['amplitude_linear'].hist(bins=500)
plt.xlim(0.9, 1.01)
Out[27]:
In [28]:
plt.figure(figsize=(5,5))
plt.plot(df_rotmod.best_rotation_period, df_rotmod.amplitude_linear, '.', alpha=0.01)
plt.xlim(0, 60)
plt.ylim(0.9, 1.04)
plt.xlabel('$P_{\mathrm{rot}}$ [days]')
plt.text(1, 0.92, ' Rapidly rotating\n spot dominated')
plt.text(36, 1.02, ' Slowly rotating\n facular dominated')
plt.ylabel('Flux decrement $(f_{\mathrm{spot, min}})$ ');
Promising!
Let's read in the Kepler data and cross-match! This cross-match with Gaia and K2 data comes from Meg Bedell.
In [29]:
from astropy.table import Table
k2_fun = Table.read('../../K2-metadata/metadata/k2_dr2_1arcsec.fits', format='fits')
In [30]:
len(k2_fun), len(k2_fun.columns)
Out[30]:
We only want a few of the 95 columns, so let's select a subset.
In [31]:
col_subset = ['source_id', 'epic_number', 'tm_name', 'k2_campaign_str']
In [32]:
k2_df = k2_fun[col_subset].to_pandas()
The to_pandas()
method returns byte strings. Arg! We'll have to clean it. Here is a reuseable piece of code:
In [33]:
def clean_to_pandas(df):
'''Cleans a dataframe converted with the to_pandas method'''
for col in df.columns:
if type(k2_df[col][0]) == bytes:
df[col] = df[col].str.decode('utf-8')
return df
In [34]:
k2_df = clean_to_pandas(k2_df)
In [35]:
df_rotmod.columns
Out[35]:
In [36]:
keep_cols = ['source_id', 'num_segments', 'best_rotation_period', 'amplitude_linear']
We can merge (e.g. SQL join) these two dataframes on the source_id
key.
In [37]:
k2_df.head()
Out[37]:
In [38]:
df_rotmod[keep_cols].head()
Out[38]:
We'll only keep columns that are in both catalogs.
In [39]:
df_comb = pd.merge(k2_df, df_rotmod[keep_cols], how='inner', on='source_id')
In [40]:
df_comb.head()
Out[40]:
In [41]:
df_comb.shape
Out[41]:
Only 524 sources appear in both catalogs! Boo! Well, better than nothing!
It's actually even fewer K2 targets, since some targets are single in K2 but have two or more matches in Gaia. These could be background stars or bona-fide binaries. Let's flag them.
In [42]:
multiplicity_count = df_comb.groupby('epic_number').\
source_id.count().to_frame().\
rename(columns={'source_id':'multiplicity'})
In [43]:
df = pd.merge(df_comb, multiplicity_count, left_on='epic_number', right_index=True)
In [44]:
df.head(20)
Out[44]:
Let's cull the list and just use the "single" stars, which is really the sources for which Gaia did not identify more than one target within 1 arcsecond.
In [45]:
df_single = df[df.multiplicity == 1]
In [46]:
df_single.shape
Out[46]:
A mere 224 sources! Boo hoo!
In [47]:
plt.figure(figsize=(5,5))
plt.plot(df_single.best_rotation_period, df_single.amplitude_linear, '.', alpha=0.1)
plt.xlim(0, 60)
plt.ylim(0.9, 1.04)
plt.xlabel('$P_{\mathrm{rot}}$ [days]')
plt.text(1, 0.92, ' Rapidly rotating\n spot dominated')
plt.text(36, 1.02, ' Slowly rotating\n facular dominated')
plt.ylabel('Flux decrement $(f_{\mathrm{spot, min}})$ ')
plt.title('K2 x Gaia x rotational modulation');
The points look drawn from their parent population.
In [48]:
df_single.sort_values('amplitude_linear', ascending=True).head(25).style.format({'source_id':"{:.0f}",
'epic_number':"{:.0f}"})
Out[48]:
In [50]:
df_single.to_csv('../data/analysis/k2_gaia_rotmod_single.csv', index=False)
Let's see if we can examine some of these Gaia lightcurves and compare them to K2 lightcurves. We'll do that in the next notebook.