Outline

GC per Galaxy

Harris Data Inspection

  • clean data
    • add iMType
    • exclude 0: Milky Way Galaxy & 356: A1689-BCG with NaN VMag
    • remove duplicate NGC4417(228=NaN), select better result for VCC-1386 (273=NaN)
  • missing data
    • VMag -> KMag: linear Regression for low Magnitude galaxies
* VMag == null: 1 MWG
* KMag == null: 77
* -Ngc == 0: 6
* sigma == null: 148
* Reff == null: 79
* -logMd == null: 166
* -logMGC == null: 1
* logMB == null: 357
  • explore data
    • MType correlation: sigma, KMag, Reff...
    • Ngc correlation: KMag, VMag, Reff...
    • pairplot for different iMType: E & S0 have bimode
      • Harris fig 10 & MType not agree; classification statistics not agree
    • skymap

GWGC Data Inspection

  • clean data
    • duplicate are not affecting
  • missing VMag
    • cross check Galaxy with Harris
    • kde fitting with B-Mag or I-Mag: b_max=-19.6276150479/I_max=-21.0082876063 while v_max=-20.5158289467

Ngc estimation model

  • Model Selection
    • Ngc vs. logMd: jointplot with regression, well fit; however, errorbar very wild
    • Sn vs. VMag: also dirty, but doable
  • LOWESS model
    • prediction plot of Sn vs VMag, Sn is cut to 0 for VMag notin GCpG.VMag.range (-11.17, -24.19)
    • total amount of GC: 649111
    • skymap of Galaxy with GC
    • Ngc vs VMag regression plot with distribution

55 Globular Cluster Data

  • data explore
    • age vs. Z, HBtype, ...
  • GC age estimator
    • random number generator based on kde
    • kde plot for estimation and data

Build GC library

  • relational database?

    • VGG DataFrame
    • GC@G DataFrame
    • BHB@GC DataFrame
  • bhlib: load all BH info

  • GClib: manipulate from VGG.Ngc & assign random Model

  • BHBdata: draw BHB from BHlib, combine host GC & Galaxy info. from GClib

host galaxy info GC info BH info
1-5 6-7 8-14
GC, RA, DEC, Dist, VMag Model, T_GC(GC birth time=13Gyr-Age), [Fe/H, f_b, ...] T_eject, Type1, Type2, M1, M2, Seperation,Ecc, (BHlib.Model=GClib.Model)
1       2   3    4     5     6      7    8        9      10     11     12     13  14
GALAXY, RA, DEC, DIST, VMag, Model, Age, T_eject, Type1, Type2, Mass1, Mass2, AP, ECC
25 0.00792 17.22027 16.222 -16.7982138988 207-5 11.1515151515 0.015 14.0 14.0 22.743 7.4624 7.3712 0.1834
25 0.00792 17.22027 16.222 -16.7982138988 207-5 11.1515151515 0.015 14.0 14.0 23.152 8.137 59.156 0.2299
25 0.00792 17.22027 16.222 -16.7982138988 207-5 11.1515151515 0.0301 14.0 14.0 22.379 6.9613 124.3 0.9227
25 0.00792 17.22027 16.222 -16.7982138988 207-5 11.1515151515 0.0301 14.0 14.0 10.707 7.7546 10.758 0.0
25 0.00792 17.22027 16.222 -16.7982138988 207-5 11.1515151515 0.0301 14.0 14.0 22.894 7.7425 438.42 0.0
25 0.00792 17.22027 16.222 -16.7982138988 207-5 11.1515151515 0.0301 14.0 14.0 22.963 7.8389 113.13 0.0
25 0.00792 17.22027 16.222 -16.7982138988 207-5 11.1515151515 0.0301 14.0 14.0 23.403 8.6012 48.091 0.6402
25 0.00792 17.22027 16.222 -16.7982138988 207-5 11.1515151515 0.0301 14.0 14.0 23.582 8.8515 26.886 0.5168
25 0.00792 17.22027 16.222 -16.7982138988 207-5 11.1515151515 0.0301 14.0 14.0 23.77 9.2126 91.431 0.7603
25 0.00792 17.22027 16.222 -16.7982138988 207-5 11.1515151515 0.0301 14.0 14.0 25.773 11.163 141.72 0.2939

GC model setting

Harris MW GC property

To improve

To explore


In [1]:
%pylab inline
import numpy as np
from datetime import datetime
import random
import pandas as pd
import os


Populating the interactive namespace from numpy and matplotlib

Harris Catalog: Sn vs VMag with LOWESS model


In [2]:
from scipy.interpolate import interp1d
import statsmodels.api as sm

## load Harris catalog
GCpG=pd.read_csv("/Users/domi/Dropbox/Research/Local_universe/data/GCpG.csv") # read csv version data
# GCpG[GCpG.duplicated()]
GCpG.drop_duplicates(['Name'],inplace=True)

## calculate Sn 
Sn=GCpG.Ngc*10**(0.4*(GCpG.VMag+15.0))
sn_fit=GCpG[['VMag']]
sn_fit['Sn']=pd.Series(Sn,index=sn_fit.index)

## LOWESS model to predict value

# introduce some floats in our x-values
x = sn_fit.VMag
y = sn_fit.Sn
# lowess will return our "smoothed" data with a y value for at every x-value
lowess = sm.nonparametric.lowess(y, x, frac=0.36)
# unpack the lowess smoothed points to their values
lowess_x = list(zip(*lowess))[0]
lowess_y = list(zip(*lowess))[1]
# run scipy's interpolation. There is also extrapolation I believe
f = interp1d(lowess_x, lowess_y, bounds_error=False)


/Users/domi/anaconda3/envs/py35/lib/python3.5/site-packages/ipykernel/__main__.py:12: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

GW Galaxy catalog with $d<30Mpc$ -> VGG


In [3]:
## load Galaxy catalog
Galaxy=pd.read_csv('/Users/domi/Dropbox/Research/Local_universe/data/GWGCCatalog_IV.csv',
                   delim_whitespace=True)

## Convert data with ~
i=0
for column in Galaxy:
    i+=1
    if (i>4 and i<23):
        Galaxy[column]=Galaxy[column].convert_objects(convert_numeric=True)
        
VGG=Galaxy[Galaxy.Dist<30]

## add and convert new VMag for VGG
# VGG['VMag']=pd.Series(VGG.Abs_Mag_B + 4.83 - 5.48,index=VGG.index)
# extra=VGG.Abs_Mag_B.isnull()&VGG.Abs_Mag_I.notnull()
# VGG.VMag[extra]=VGG.Abs_Mag_I[extra] + 4.83 - 4.08


/Users/domi/anaconda3/envs/py35/lib/python3.5/site-packages/ipykernel/__main__.py:10: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.

VGG: convert B/I to VMag based on VGG&Harris


In [4]:
import seaborn as sns

# ## use cross checked galaxy for magnitude convertion
# ind_GCpG=GCpG.Name.isin(VGG.Name)
# ind_VGG=VGG.Name.isin(GCpG.Name)

# f_joint, (ax1, ax2, ax3) = subplots(3, sharex=True, sharey=True, figsize=(8,8))
# xlim([-25,-10])
# p_v=sns.kdeplot(GCpG.VMag[ind_GCpG].get_values(),shade=True,ax=ax1,label='VMag in Harris')
# x,y = p_v.get_lines()[0].get_data()
# v_max=x[y.argmax()]
# ax1.vlines(v_max, 0, y.max())

# p_b=sns.kdeplot(VGG.Abs_Mag_B[ind_VGG].get_values(),shade=True,ax=ax2,label='BMag in White')
# x,y = p_b.get_lines()[0].get_data()
# b_max=x[y.argmax()]
# ax2.vlines(b_max, 0, y.max())

# p_I=sns.kdeplot(VGG.Abs_Mag_I[ind_VGG].get_values(),shade=True,ax=ax3,label='IMag in White')
# x,y = p_I.get_lines()[0].get_data()
# I_max=x[y.argmax()]
# ax3.vlines(I_max, 0, y.max())

## add and convert new VMag for VGG: VMag = V_max + B/I - B/I_max
VGG['VMag']=pd.Series(VGG.Abs_Mag_B -20.515828946699045 + 19.627615047946385,index=VGG.index)
extra=VGG.Abs_Mag_B.isnull()&VGG.Abs_Mag_I.notnull()
VGG.VMag[extra]=VGG.Abs_Mag_I[extra] -20.515828946699045 + 21.008287606264027


/Users/domi/anaconda3/envs/py35/lib/python3.5/site-packages/ipykernel/__main__.py:25: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/Users/domi/anaconda3/envs/py35/lib/python3.5/site-packages/ipykernel/__main__.py:27: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/Users/domi/anaconda3/envs/py35/lib/python3.5/site-packages/pandas/core/generic.py:4702: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._update_inplace(new_data)
/Users/domi/anaconda3/envs/py35/lib/python3.5/site-packages/IPython/core/interactiveshell.py:2881: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  exec(code_obj, self.user_global_ns, self.user_ns)

Compute Ngc for VGG


In [5]:
## calculate Ngc based on Sn from f(VMag)
VGG['Ngc']=pd.Series((map(lambda x: 0 if isnan(f(x)) else int(f(x)/10**(0.4*(x+15))),VGG.VMag)),index=VGG.index)
# sum(VGG.Ngc)


/Users/domi/anaconda3/envs/py35/lib/python3.5/site-packages/ipykernel/__main__.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app

In [6]:
GCage=pd.read_csv("/Users/domi/Dropbox/Research/Local_universe/data/55GC_age.csv")

In [7]:
def generate_rand_from_pdf(pdf, x_grid):
    cdf = np.cumsum(pdf) # pdf from kde plot x128
    cdf = cdf / cdf[-1]  # normalization
    values = np.random.rand(sum(VGG.Ngc)) # sample size Ngc
    value_bins = np.searchsorted(cdf, values) # group Ngc into nearest 128 bins
    random_from_cdf = x_grid[value_bins]
    return random_from_cdf  # return Ngc

In [8]:
age_kde=sns.kdeplot(GCage.Age,kernel='gau',bw='silverman')
#x_grid = np.linspace(min(GCage.Age)-1, max(GCage.Age)+1, len(age_kde.get_lines()[0].get_data()[1])) 
age_curve=age_kde.get_lines()[0].get_data()
f2=interp1d(age_curve[0],age_curve[1],kind='cubic')



In [9]:
age_grid=np.linspace(min(age_curve[0]), max(age_curve[0]), 10000)

In [10]:
## original method, age not well spreaded. 

# x_grid = np.linspace(min(GCage.Age)-1, max(GCage.Age)+1, len(age_kde.get_lines()[0].get_data()[1])) 
# # define how many GC need to estimate the age
# GCage_from_kde = generate_rand_from_pdf(age_kde.get_lines()[0].get_data()[1], x_grid) 
# sns.distplot(GCage_from_kde)

In [11]:
## Age better spreaded
GCage_from_kde = generate_rand_from_pdf(f2(age_grid), age_grid) 
# sns.distplot(GCage_from_kde)

In [12]:
# cut off at 13.5 Gyrs
GCage_from_kde[GCage_from_kde>=13.5]=np.random.choice(GCage_from_kde[GCage_from_kde<13.5],sum(GCage_from_kde>=13.5))

In [13]:
# numpy.savetxt("/Users/domi/Dropbox/Research/Local_universe/data/GCage.dat", GCage_from_kde, delimiter=",")

In [14]:
# GCage_from_kde=numpy.loadtxt("/Users/domi/Dropbox/Research/Local_universe/data/GCage.dat",delimiter=",")

Multiple realization

  1. load bhblib
  2. load GClib
  3. build BHBdata

In [15]:
## Load BHLIB
bhlib=pd.DataFrame()
###################################
for i in range(1,325): # model
    for j in range(1,11):  # model id
###################################
        bhe=pd.read_csv('/Users/domi/Dropbox/Research/Local_universe/data/BHsystem/%d-%d-bhe.dat' %(i,j),
                    usecols=[0, 2, 3, 4, 6, 8, 10, 20], names=['T_eject','Type1','Type2','M1','M2','Seperation','Ecc','Model'], 
                    header=None, delim_whitespace=True)
        bhe.Model='%d-%d' %(i,j)
        bhlib=pd.concat([bhlib,bhe],ignore_index=False)

In [16]:
# BHB binary
bhblib=bhlib[bhlib.Type1==14*(bhlib.Type2==14)].copy(deep=True)
bhsys=bhlib[-(bhlib.Type1==14*(bhlib.Type2==14))].copy(deep=True)

In [17]:
bhblib=bhblib.drop(bhblib.columns[[1, 2]], axis=1)

In [18]:
bhblib.to_csv('/Users/domi/Dropbox/Research/Local_universe/data/bhblib.dat',
              index=True, sep=' ', header=True, float_format='%2.6f')

In [19]:
# bhblib=pd.read_csv('/Users/domi/Dropbox/Research/Local_universe/data/bhblib.dat',
#               sep=' ', index_col=0)

In [20]:
print(shape(bhsys),shape(bhblib),shape(bhlib))


(11095, 8) (87364, 6) (98459, 8)

bhlib: T_eject Type1 Type2 M1 M2 Seperation Ecc Model

Build GClib, assign GC Model to each GC in VGG Galaxy, add Age to GC Model

Slow, only run it when necessary


In [23]:
# # index of the Galaxy with GC
# ind_GC=VGG.index[VGG.Ngc>0]
# # initialize the GClib
# GClib=pd.DataFrame()

# for row in ind_GC:
#     GClib=GClib.append([VGG.ix[[row],['RA','Dec','Dist','VMag']]]*VGG.Ngc[row]) # creat GCs in each Galaxy

# GClib=GClib.reset_index()
# GClib=GClib.rename(columns = {'index':'Galaxy'})

# # write to file
# GClib.to_csv('/Users/domi/Dropbox/Research/Local_universe/data/GClib.dat',
#              index=False,sep=' ', header=False, float_format='%2.6f')

In [21]:
GClib=pd.DataFrame()
GClib=pd.read_csv('/Users/domi/Dropbox/Research/Local_universe/data/GClib.dat',
                sep=' ',header=None, names=['Galaxy','RA','Dec','Dist','VMag','Model','Age'])

# assign model to GClib
GClib['Model']=GClib['RA'].apply(lambda x: str(randint(1,325))+'-'+str(randint(1,11)))
# assign Age to GClib
GClib['Age']=np.random.choice(GCage_from_kde,size(GCage_from_kde))

GClib: Galaxy RA Dec Dist VMag Model Age

Galaxy is the index of GC in VGG


In [23]:
# to check pd.merge correctly assign the right BH infor from bhlib to GClib based on Model
display(GClib.ix[[133],GClib.columns],pd.merge(GClib.ix[[133],GClib.columns],bhblib,on='Model'),bhblib[bhblib.Model=='64-9'])


Galaxy RA Dec Dist VMag Model Age
133 133 0.05414 16.14555 13.183 -20.568214 64-9 11.805858
Galaxy RA Dec Dist VMag Model Age T_eject M1 M2 Seperation Ecc
0 133 0.05414 16.14555 13.183 -20.568214 64-9 11.805858 0.1401 24.971 11.4590 15.927 0.6669
1 133 0.05414 16.14555 13.183 -20.568214 64-9 11.805858 0.2287 37.825 56.7130 64.290 0.1099
2 133 0.05414 16.14555 13.183 -20.568214 64-9 11.805858 0.3125 14.788 29.0980 85.299 0.9994
3 133 0.05414 16.14555 13.183 -20.568214 64-9 11.805858 0.3168 27.653 21.9470 228.440 0.6048
4 133 0.05414 16.14555 13.183 -20.568214 64-9 11.805858 0.3168 27.984 22.5070 123.260 0.8348
5 133 0.05414 16.14555 13.183 -20.568214 64-9 11.805858 0.3170 26.214 29.0010 232.590 0.8620
6 133 0.05414 16.14555 13.183 -20.568214 64-9 11.805858 0.4201 81.988 56.9500 42.399 0.4616
7 133 0.05414 16.14555 13.183 -20.568214 64-9 11.805858 0.4613 18.768 25.1270 38229.000 0.4662
8 133 0.05414 16.14555 13.183 -20.568214 64-9 11.805858 0.4733 26.200 15.3020 107.360 0.5582
9 133 0.05414 16.14555 13.183 -20.568214 64-9 11.805858 1.2949 29.481 31.1780 58.136 0.9673
10 133 0.05414 16.14555 13.183 -20.568214 64-9 11.805858 1.3678 48.780 9.2538 81.883 0.5036
11 133 0.05414 16.14555 13.183 -20.568214 64-9 11.805858 1.5758 27.536 28.2480 25.898 0.9574
12 133 0.05414 16.14555 13.183 -20.568214 64-9 11.805858 2.0091 21.239 26.4280 110.190 0.2828
T_eject M1 M2 Seperation Ecc Model
0 0.1401 24.971 11.4590 15.927 0.6669 64-9
1 0.2287 37.825 56.7130 64.290 0.1099 64-9
2 0.3125 14.788 29.0980 85.299 0.9994 64-9
3 0.3168 27.653 21.9470 228.440 0.6048 64-9
4 0.3168 27.984 22.5070 123.260 0.8348 64-9
5 0.3170 26.214 29.0010 232.590 0.8620 64-9
6 0.4201 81.988 56.9500 42.399 0.4616 64-9
7 0.4613 18.768 25.1270 38229.000 0.4662 64-9
8 0.4733 26.200 15.3020 107.360 0.5582 64-9
9 1.2949 29.481 31.1780 58.136 0.9673 64-9
10 1.3678 48.780 9.2538 81.883 0.5036 64-9
11 1.5758 27.536 28.2480 25.898 0.9574 64-9
14 2.0091 21.239 26.4280 110.190 0.2828 64-9

In [24]:
# with pd.option_context('display.max_columns', None):
#     display(bhlib.sample(5),VGG.sample(5))

Build BHBdata and write to file


In [25]:
# ## Slow
# BHBdata=pd.merge(GClib,bhblib,on='Model')
# BHBdata.to_csv('/Users/domi/Dropbox/Research/Local_universe/data/BHBdata.dat.gz',
#                index=False,sep=' ', header=False, float_format='%2.6f',compression='gzip')

In [ ]:


In [ ]:


In [23]:
from gcpg import build

In [45]:
reload(build)


Out[45]:
<module 'gcpg.build' from 'gcpg/build.py'>

In [24]:
build.run(1)


Out[24]:
1

In [9]:
from astropy.table import Table

In [30]:
help(Table.read)


Help on method read in module astropy.table.table:

read(cls, *args, **kwargs) method of __builtin__.type instance
    Read and parse a data table and return as a Table.
    
    This function provides the Table interface to the astropy unified I/O
    layer.  This allows easily reading a file in many supported data formats
    using syntax such as::
    
      >>> from astropy.table import Table
      >>> dat = Table.read('table.dat', format='ascii')
      >>> events = Table.read('events.fits', format='fits')
    
    The arguments and keywords (other than ``format``) provided to this function are
    passed through to the underlying data reader (e.g. `~astropy.io.ascii.read`).
    
    The available built-in formats are:
    
    =========================== ==== ===== ============= ==========
               Format           Read Write Auto-identify Deprecated
    =========================== ==== ===== ============= ==========
                          ascii  Yes   Yes            No           
                   ascii.aastex  Yes   Yes            No           
                    ascii.basic  Yes   Yes            No           
                      ascii.cds  Yes    No            No           
         ascii.commented_header  Yes   Yes            No           
                      ascii.csv  Yes   Yes            No           
                  ascii.daophot  Yes    No            No           
                     ascii.ecsv  Yes   Yes            No           
               ascii.fast_basic  Yes   Yes            No           
    ascii.fast_commented_header  Yes   Yes            No           
                 ascii.fast_csv  Yes   Yes            No           
           ascii.fast_no_header  Yes   Yes            No           
                 ascii.fast_rdb  Yes   Yes            No           
                 ascii.fast_tab  Yes   Yes            No           
              ascii.fixed_width  Yes   Yes            No           
    ascii.fixed_width_no_header  Yes   Yes            No           
     ascii.fixed_width_two_line  Yes   Yes            No           
                     ascii.html  Yes   Yes           Yes           
                     ascii.ipac  Yes   Yes            No           
                    ascii.latex  Yes   Yes           Yes           
                ascii.no_header  Yes   Yes            No           
                      ascii.rdb  Yes   Yes           Yes           
               ascii.sextractor  Yes    No            No           
                      ascii.tab  Yes   Yes            No           
                           fits  Yes   Yes           Yes           
                           hdf5  Yes   Yes           Yes           
                        votable  Yes    No            No           
                         aastex  Yes   Yes            No        Yes
                            cds  Yes    No            No        Yes
                            csv  Yes   Yes           Yes        Yes
                        daophot  Yes    No            No        Yes
                           html  Yes   Yes            No        Yes
                           ipac  Yes   Yes            No        Yes
                          latex  Yes   Yes            No        Yes
                            rdb  Yes   Yes            No        Yes
    =========================== ==== ===== ============= ==========
    
    Deprecated format names like ``aastex`` will be removed in a future version.
    Use the full name (e.g. ``ascii.aastex``) instead.


In [17]:
from astropy.time import Time

In [18]:
Time('2015-11-2').gps


Out[18]:
1130457617.0

In [ ]:


In [ ]: