This Jupyter notebook assumes that the R kernel for Jupyter (IRkernel) has been installed; see https://irkernel.github.io/installation/

R code for logistic regression analysis of bar fractions

Requirements

This notebook is meant to be run within the full s4g_barfractions repository, including the associated data files.

In addition, this notebook requires the following R packages:

survey

Setup



In [20]:

    
library(survey)

Set the following so that it points to the directory with the (text) data files:



In [21]:

    
basedir <- "./data/"

Weighted logistic regression for Sample 1: log(M_star) alone

Logistic regression for fraction of galaxies with bars as a function of stellar mass $\log (M_{\star} / M_{\odot})$, using S4G galaxies in Sample 1 (spirals at $D \leq 25$ Mpc) with stellar masses between $\log M_{\star} = 8.5$ and 11, with $V/V_{\rm max}$ weighting to account for S4G angular diameter limit.

Load data into table and then Survey-package design object



In [22]:

    
ff <- paste(basedir, "barpresence_vs_logmstar_for_R_w25_m8.5-11.txt", sep="")
logmstarBarWTable <- read.table(ff, header=TRUE)
logmstarBarWDesign <- svydesign(ids=~0, data=logmstarBarWTable, weights=~weight)
length(logmstarBarWTable$bar)

Standard linear logistic regression: bar fraction versus log of stellar mass



In [23]:

    
logMstarWFit1 <- svyglm(bar ~ logmstar, design=logmstarBarWDesign, family=quasibinomial)
summary(logMstarWFit1)









    





Call:
svyglm(formula = bar ~ logmstar, design = logmstarBarWDesign, 
    family = quasibinomial)

Survey design:
svydesign(ids = ~0, data = logmstarBarWTable, weights = ~weight)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  -1.4487     1.8675  -0.776    0.438
logmstar      0.1934     0.1897   1.019    0.308

(Dispersion parameter for quasibinomial family taken to be 1.00235)

Number of Fisher Scoring iterations: 4

Quadratic linear logistic regression: bar fraction versus log of stellar mass + square of same



In [24]:

    
logMstarWFit2 <- svyglm(bar ~ logmstar + I(logmstar^2), design=logmstarBarWDesign, family=quasibinomial)
summary(logMstarWFit2)









    





Call:
svyglm(formula = bar ~ logmstar + I(logmstar^2), design = logmstarBarWDesign, 
    family = quasibinomial)

Survey design:
svydesign(ids = ~0, data = logmstarBarWTable, weights = ~weight)

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)   -83.7853    25.6579  -3.265 0.001160 ** 
logmstar       17.3692     5.2634   3.300 0.001028 ** 
I(logmstar^2)  -0.8911     0.2690  -3.313 0.000984 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for quasibinomial family taken to be 1.00151)

Number of Fisher Scoring iterations: 4

Comparison of AIC values



In [25]:

    
AIC(logMstarWFit1)
AIC(logMstarWFit2)









    





	eff.p
		2.09143063431733
	AIC
		762.585777790543
	deltabar
		2.09143063431733








    





	eff.p
		3.47717601693876
	AIC
		747.729940370879
	deltabar
		1.73858800846938



In [26]:

    
747.73 - 762.586

Summary

Since the quadratic fit has $\Delta$AIC $= -14.9$ relative to the linear fit, it is clearly preferred.

Weighted logistic regression for Sample 1: f(bar) vs log(M_star) and g-r

Same as previous section, but now we do logistic regression versus both stellar mass and $g - r$ color, using a subsample of Sample 1 galaxies with color data.



In [27]:

    
ff <- paste(basedir, "barpresence_vs_logmstar-gmr_for_R_w25.txt", sep="")
logmstargmrBarWTable <- read.table(ff, header=TRUE)
gmrBarWDesign <- svydesign(ids=~0, data=logmstargmrBarWTable, weights=~weight)
length(logmstargmrBarWTable$bar)

Linear fit of $f_{\rm bar}$ vs just $g - r$



In [29]:

    
gmrWFit_gmr <- svyglm(bar ~ gmr, design=gmrBarWDesign, family=quasibinomial)
summary(gmrWFit_gmr)









    





Call:
svyglm(formula = bar ~ gmr, design = gmrBarWDesign, family = quasibinomial)

Survey design:
svydesign(ids = ~0, data = logmstargmrBarWTable, weights = ~weight)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   0.4544     0.3806   1.194    0.233
gmr          -0.4394     0.7836  -0.561    0.575

(Dispersion parameter for quasibinomial family taken to be 1.003128)

Number of Fisher Scoring iterations: 4

Fit vs just logMstar for same sample: linear, then quadratic



In [30]:

    
# same sample, vs logmstar (linear) only
gmrWFit_logmstar <- svyglm(bar ~ logmstar, design=gmrBarWDesign, family=quasibinomial)
summary(gmrWFit_logmstar)









    





Call:
svyglm(formula = bar ~ logmstar, design = gmrBarWDesign, family = quasibinomial)

Survey design:
svydesign(ids = ~0, data = logmstargmrBarWTable, weights = ~weight)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  -2.2772     2.3387  -0.974    0.331
logmstar      0.2685     0.2333   1.151    0.251

(Dispersion parameter for quasibinomial family taken to be 1.003995)

Number of Fisher Scoring iterations: 3



In [31]:

    
# same sample, vs logmstar (quadratic) only
gmrWFit_logmstar2 <- svyglm(bar ~ logmstar + I(logmstar^2), design=gmrBarWDesign, family=quasibinomial)
summary(gmrWFit_logmstar2)









    





Call:
svyglm(formula = bar ~ logmstar + I(logmstar^2), design = gmrBarWDesign, 
    family = quasibinomial)

Survey design:
svydesign(ids = ~0, data = logmstargmrBarWTable, weights = ~weight)

Coefficients:
              Estimate Std. Error t value Pr(>|t|)  
(Intercept)   -77.0345    34.3172  -2.245   0.0255 *
logmstar       15.7187     6.9564   2.260   0.0245 *
I(logmstar^2)  -0.7938     0.3512  -2.260   0.0245 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for quasibinomial family taken to be 1.002662)

Number of Fisher Scoring iterations: 4

Finally, fit vs logMstar (quadratic) and g-r



In [32]:

    
gmrWFit_gmrlogmstar2 <- svyglm(bar ~ logmstar + I(logmstar^2) + gmr, design=gmrBarWDesign, family=quasibinomial)
summary(gmrWFit_gmrlogmstar2)









    





Call:
svyglm(formula = bar ~ logmstar + I(logmstar^2) + gmr, design = gmrBarWDesign, 
    family = quasibinomial)

Survey design:
svydesign(ids = ~0, data = logmstargmrBarWTable, weights = ~weight)

Coefficients:
              Estimate Std. Error t value Pr(>|t|)  
(Intercept)   -70.3952    34.7265  -2.027   0.0435 *
logmstar       14.2743     7.0501   2.025   0.0437 *
I(logmstar^2)  -0.7111     0.3572  -1.990   0.0474 *
gmr            -1.1542     1.0728  -1.076   0.2828  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for quasibinomial family taken to be 1.001188)

Number of Fisher Scoring iterations: 4

Comparison of AIC values



In [33]:

    
AIC(gmrWFit_gmr)
AIC(gmrWFit_logmstar)
AIC(gmrWFit_logmstar2)
AIC(gmrWFit_gmrlogmstar2)









    





	eff.p
		1.41668065582433
	AIC
		438.038874982745
	deltabar
		1.41668065582433








    





	eff.p
		2.03774799944452
	AIC
		436.992069103173
	deltabar
		2.03774799944452








    





	eff.p
		3.74054021314305
	AIC
		431.339578820795
	deltabar
		1.87027010657153








    





	eff.p
		5.39543250991863
	AIC
		432.824119755678
	deltabar
		1.79847750330621

Summary

Best fit from AIC standpoint is quadratic logMstar (without $g - r$) -- note that its AIC is actually lower than the AIC for the quadratic logMstar + $g - r$ fit.

Weighted logistic regression for Sample 1: f(bar) vs log(M_star) and log(f_gas)

Same as previous section, but now we do logistic regression versus both log of stellar mass and log of gas mass ratio $f{\rm gas} = M_{\rm HI} / M_{\star}$, using a subsample of Sample 1 galaxies with H I data.



In [34]:

    
basedir <- "/Users/erwin/Documents/Working/Projects/Project_BarSizes/"
ff <- paste(basedir, "barpresence_vs_logmstar-logfgas_for_R_w25.txt", sep="")
logMstarfgasBarWTable <- read.table(ff, header=TRUE)
logMstarfgasBarWDesign <- svydesign(ids=~0, data=logMstarfgasBarWTable, weights=~weight)
length(logMstarfgasBarWTable$bar)

Fit vs just log(f_gas)



In [35]:

    
logMstarlogfgasWFit_fgas <- svyglm(bar ~ logfgas, design=logMstarfgasBarWDesign, family=quasibinomial)
summary(logMstarlogfgasWFit_fgas)









    





Call:
svyglm(formula = bar ~ logfgas, design = logMstarfgasBarWDesign, 
    family = quasibinomial)

Survey design:
svydesign(ids = ~0, data = logMstarfgasBarWTable, weights = ~weight)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)  0.42820    0.16632   2.575   0.0103 *
logfgas      0.04696    0.20902   0.225   0.8223  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for quasibinomial family taken to be 1.001815)

Number of Fisher Scoring iterations: 4

Fit vs just logMstar: linear, then quadratic



In [36]:

    
logMstarlogfgasWFit_logmstar <- svyglm(bar ~ logmstar, design=logMstarfgasBarWDesign, family=quasibinomial)
summary(logMstarlogfgasWFit_logmstar)









    





Call:
svyglm(formula = bar ~ logmstar, design = logMstarfgasBarWDesign, 
    family = quasibinomial)

Survey design:
svydesign(ids = ~0, data = logMstarfgasBarWTable, weights = ~weight)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  -1.2368     1.8897  -0.654    0.513
logmstar      0.1730     0.1919   0.901    0.368

(Dispersion parameter for quasibinomial family taken to be 1.00228)

Number of Fisher Scoring iterations: 4



In [37]:

    
logMstarlogfgasWFit_logmstar2 <- svyglm(bar ~ logmstar + I(logmstar^2), design=logMstarfgasBarWDesign, family=quasibinomial)
summary(logMstarlogfgasWFit_logmstar2)









    





Call:
svyglm(formula = bar ~ logmstar + I(logmstar^2), design = logMstarfgasBarWDesign, 
    family = quasibinomial)

Survey design:
svydesign(ids = ~0, data = logMstarfgasBarWTable, weights = ~weight)

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)   -84.2791    25.7663  -3.271 0.001139 ** 
logmstar       17.4959     5.2858   3.310 0.000994 ***
I(logmstar^2)  -0.8987     0.2702  -3.327 0.000938 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for quasibinomial family taken to be 1.001702)

Number of Fisher Scoring iterations: 4

Finally, fit vs logMstar (quadratic) and log(f_gas)



In [38]:

    
logMstarlogfgasWFit_fgaslogmstar2 <- svyglm(bar ~ logmstar + I(logmstar^2) + logfgas, design=logMstarfgasBarWDesign, family=quasibinomial)
summary(logMstarlogfgasWFit_fgaslogmstar2)









    





Call:
svyglm(formula = bar ~ logmstar + I(logmstar^2) + logfgas, design = logMstarfgasBarWDesign, 
    family = quasibinomial)

Survey design:
svydesign(ids = ~0, data = logMstarfgasBarWTable, weights = ~weight)

Coefficients:
              Estimate Std. Error t value Pr(>|t|)   
(Intercept)   -84.2707    26.0267  -3.238  0.00128 **
logmstar       17.4063     5.3113   3.277  0.00111 **
I(logmstar^2)  -0.8879     0.2701  -3.288  0.00107 **
logfgas         0.2109     0.3453   0.611  0.54162   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for quasibinomial family taken to be 1.001131)

Number of Fisher Scoring iterations: 4

Comparison of AIC values



In [39]:

    
AIC(logMstarlogfgasWFit_fgas)
AIC(logMstarlogfgasWFit_logmstar)
AIC(logMstarlogfgasWFit_logmstar2)
AIC(logMstarlogfgasWFit_fgaslogmstar2)









    





	eff.p
		2.31228900497439
	AIC
		753.721772291837
	deltabar
		2.31228900497439








    





	eff.p
		2.11682385197635
	AIC
		751.715326944463
	deltabar
		2.11682385197635








    





	eff.p
		3.49151277295186
	AIC
		736.604824165618
	deltabar
		1.74575638647593








    





	eff.p
		7.46475123954232
	AIC
		743.083606530677
	deltabar
		2.48825041318077

Summary

The quadratic fit using logMstar (without log f_gas) is clearly the best model.