This Jupyter notebook assumes that the R kernel for Jupyter (IRkernel) has been installed; see https://irkernel.github.io/installation/

R code for logistic regression analysis of bar fractions

Requirements

This notebook is meant to be run within the full s4g_barfractions repository, including the associated data files.

In addition, this notebook requires the following R packages:

Setup


In [20]:
library(survey)

Set the following so that it points to the directory with the (text) data files:


In [21]:
basedir <- "./data/"

Weighted logistic regression for Sample 1: log(M_star) alone

Logistic regression for fraction of galaxies with bars as a function of stellar mass $\log (M_{\star} / M_{\odot})$, using S4G galaxies in Sample 1 (spirals at $D \leq 25$ Mpc) with stellar masses between $\log M_{\star} = 8.5$ and 11, with $V/V_{\rm max}$ weighting to account for S4G angular diameter limit.

Load data into table and then Survey-package design object


In [22]:
ff <- paste(basedir, "barpresence_vs_logmstar_for_R_w25_m8.5-11.txt", sep="")
logmstarBarWTable <- read.table(ff, header=TRUE)
logmstarBarWDesign <- svydesign(ids=~0, data=logmstarBarWTable, weights=~weight)
length(logmstarBarWTable$bar)


563

Standard linear logistic regression: bar fraction versus log of stellar mass


In [23]:
logMstarWFit1 <- svyglm(bar ~ logmstar, design=logmstarBarWDesign, family=quasibinomial)
summary(logMstarWFit1)


Call:
svyglm(formula = bar ~ logmstar, design = logmstarBarWDesign, 
    family = quasibinomial)

Survey design:
svydesign(ids = ~0, data = logmstarBarWTable, weights = ~weight)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  -1.4487     1.8675  -0.776    0.438
logmstar      0.1934     0.1897   1.019    0.308

(Dispersion parameter for quasibinomial family taken to be 1.00235)

Number of Fisher Scoring iterations: 4

Quadratic linear logistic regression: bar fraction versus log of stellar mass + square of same


In [24]:
logMstarWFit2 <- svyglm(bar ~ logmstar + I(logmstar^2), design=logmstarBarWDesign, family=quasibinomial)
summary(logMstarWFit2)


Call:
svyglm(formula = bar ~ logmstar + I(logmstar^2), design = logmstarBarWDesign, 
    family = quasibinomial)

Survey design:
svydesign(ids = ~0, data = logmstarBarWTable, weights = ~weight)

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)   -83.7853    25.6579  -3.265 0.001160 ** 
logmstar       17.3692     5.2634   3.300 0.001028 ** 
I(logmstar^2)  -0.8911     0.2690  -3.313 0.000984 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for quasibinomial family taken to be 1.00151)

Number of Fisher Scoring iterations: 4

Comparison of AIC values


In [25]:
AIC(logMstarWFit1)
AIC(logMstarWFit2)


eff.p
2.09143063431733
AIC
762.585777790543
deltabar
2.09143063431733
eff.p
3.47717601693876
AIC
747.729940370879
deltabar
1.73858800846938

In [26]:
747.73 - 762.586


-14.856

Summary

Since the quadratic fit has $\Delta$AIC $= -14.9$ relative to the linear fit, it is clearly preferred.

Weighted logistic regression for Sample 1: f(bar) vs log(M_star) and g-r

Same as previous section, but now we do logistic regression versus both stellar mass and $g - r$ color, using a subsample of Sample 1 galaxies with color data.


In [27]:
ff <- paste(basedir, "barpresence_vs_logmstar-gmr_for_R_w25.txt", sep="")
logmstargmrBarWTable <- read.table(ff, header=TRUE)
gmrBarWDesign <- svydesign(ids=~0, data=logmstargmrBarWTable, weights=~weight)
length(logmstargmrBarWTable$bar)


319

Linear fit of $f_{\rm bar}$ vs just $g - r$


In [29]:
gmrWFit_gmr <- svyglm(bar ~ gmr, design=gmrBarWDesign, family=quasibinomial)
summary(gmrWFit_gmr)


Call:
svyglm(formula = bar ~ gmr, design = gmrBarWDesign, family = quasibinomial)

Survey design:
svydesign(ids = ~0, data = logmstargmrBarWTable, weights = ~weight)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   0.4544     0.3806   1.194    0.233
gmr          -0.4394     0.7836  -0.561    0.575

(Dispersion parameter for quasibinomial family taken to be 1.003128)

Number of Fisher Scoring iterations: 4

Fit vs just logMstar for same sample: linear, then quadratic


In [30]:
# same sample, vs logmstar (linear) only
gmrWFit_logmstar <- svyglm(bar ~ logmstar, design=gmrBarWDesign, family=quasibinomial)
summary(gmrWFit_logmstar)


Call:
svyglm(formula = bar ~ logmstar, design = gmrBarWDesign, family = quasibinomial)

Survey design:
svydesign(ids = ~0, data = logmstargmrBarWTable, weights = ~weight)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  -2.2772     2.3387  -0.974    0.331
logmstar      0.2685     0.2333   1.151    0.251

(Dispersion parameter for quasibinomial family taken to be 1.003995)

Number of Fisher Scoring iterations: 3

In [31]:
# same sample, vs logmstar (quadratic) only
gmrWFit_logmstar2 <- svyglm(bar ~ logmstar + I(logmstar^2), design=gmrBarWDesign, family=quasibinomial)
summary(gmrWFit_logmstar2)


Call:
svyglm(formula = bar ~ logmstar + I(logmstar^2), design = gmrBarWDesign, 
    family = quasibinomial)

Survey design:
svydesign(ids = ~0, data = logmstargmrBarWTable, weights = ~weight)

Coefficients:
              Estimate Std. Error t value Pr(>|t|)  
(Intercept)   -77.0345    34.3172  -2.245   0.0255 *
logmstar       15.7187     6.9564   2.260   0.0245 *
I(logmstar^2)  -0.7938     0.3512  -2.260   0.0245 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for quasibinomial family taken to be 1.002662)

Number of Fisher Scoring iterations: 4

Finally, fit vs logMstar (quadratic) and g-r


In [32]:
gmrWFit_gmrlogmstar2 <- svyglm(bar ~ logmstar + I(logmstar^2) + gmr, design=gmrBarWDesign, family=quasibinomial)
summary(gmrWFit_gmrlogmstar2)


Call:
svyglm(formula = bar ~ logmstar + I(logmstar^2) + gmr, design = gmrBarWDesign, 
    family = quasibinomial)

Survey design:
svydesign(ids = ~0, data = logmstargmrBarWTable, weights = ~weight)

Coefficients:
              Estimate Std. Error t value Pr(>|t|)  
(Intercept)   -70.3952    34.7265  -2.027   0.0435 *
logmstar       14.2743     7.0501   2.025   0.0437 *
I(logmstar^2)  -0.7111     0.3572  -1.990   0.0474 *
gmr            -1.1542     1.0728  -1.076   0.2828  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for quasibinomial family taken to be 1.001188)

Number of Fisher Scoring iterations: 4

Comparison of AIC values


In [33]:
AIC(gmrWFit_gmr)
AIC(gmrWFit_logmstar)
AIC(gmrWFit_logmstar2)
AIC(gmrWFit_gmrlogmstar2)


eff.p
1.41668065582433
AIC
438.038874982745
deltabar
1.41668065582433
eff.p
2.03774799944452
AIC
436.992069103173
deltabar
2.03774799944452
eff.p
3.74054021314305
AIC
431.339578820795
deltabar
1.87027010657153
eff.p
5.39543250991863
AIC
432.824119755678
deltabar
1.79847750330621

Summary

Best fit from AIC standpoint is quadratic logMstar (without $g - r$) -- note that its AIC is actually lower than the AIC for the quadratic logMstar + $g - r$ fit.

Weighted logistic regression for Sample 1: f(bar) vs log(M_star) and log(f_gas)

Same as previous section, but now we do logistic regression versus both log of stellar mass and log of gas mass ratio $f{\rm gas} = M_{\rm HI} / M_{\star}$, using a subsample of Sample 1 galaxies with H I data.


In [34]:
basedir <- "/Users/erwin/Documents/Working/Projects/Project_BarSizes/"
ff <- paste(basedir, "barpresence_vs_logmstar-logfgas_for_R_w25.txt", sep="")
logMstarfgasBarWTable <- read.table(ff, header=TRUE)
logMstarfgasBarWDesign <- svydesign(ids=~0, data=logMstarfgasBarWTable, weights=~weight)
length(logMstarfgasBarWTable$bar)


556

Fit vs just log(f_gas)


In [35]:
logMstarlogfgasWFit_fgas <- svyglm(bar ~ logfgas, design=logMstarfgasBarWDesign, family=quasibinomial)
summary(logMstarlogfgasWFit_fgas)


Call:
svyglm(formula = bar ~ logfgas, design = logMstarfgasBarWDesign, 
    family = quasibinomial)

Survey design:
svydesign(ids = ~0, data = logMstarfgasBarWTable, weights = ~weight)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)  0.42820    0.16632   2.575   0.0103 *
logfgas      0.04696    0.20902   0.225   0.8223  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for quasibinomial family taken to be 1.001815)

Number of Fisher Scoring iterations: 4

Fit vs just logMstar: linear, then quadratic


In [36]:
logMstarlogfgasWFit_logmstar <- svyglm(bar ~ logmstar, design=logMstarfgasBarWDesign, family=quasibinomial)
summary(logMstarlogfgasWFit_logmstar)


Call:
svyglm(formula = bar ~ logmstar, design = logMstarfgasBarWDesign, 
    family = quasibinomial)

Survey design:
svydesign(ids = ~0, data = logMstarfgasBarWTable, weights = ~weight)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  -1.2368     1.8897  -0.654    0.513
logmstar      0.1730     0.1919   0.901    0.368

(Dispersion parameter for quasibinomial family taken to be 1.00228)

Number of Fisher Scoring iterations: 4

In [37]:
logMstarlogfgasWFit_logmstar2 <- svyglm(bar ~ logmstar + I(logmstar^2), design=logMstarfgasBarWDesign, family=quasibinomial)
summary(logMstarlogfgasWFit_logmstar2)


Call:
svyglm(formula = bar ~ logmstar + I(logmstar^2), design = logMstarfgasBarWDesign, 
    family = quasibinomial)

Survey design:
svydesign(ids = ~0, data = logMstarfgasBarWTable, weights = ~weight)

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)   -84.2791    25.7663  -3.271 0.001139 ** 
logmstar       17.4959     5.2858   3.310 0.000994 ***
I(logmstar^2)  -0.8987     0.2702  -3.327 0.000938 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for quasibinomial family taken to be 1.001702)

Number of Fisher Scoring iterations: 4

Finally, fit vs logMstar (quadratic) and log(f_gas)


In [38]:
logMstarlogfgasWFit_fgaslogmstar2 <- svyglm(bar ~ logmstar + I(logmstar^2) + logfgas, design=logMstarfgasBarWDesign, family=quasibinomial)
summary(logMstarlogfgasWFit_fgaslogmstar2)


Call:
svyglm(formula = bar ~ logmstar + I(logmstar^2) + logfgas, design = logMstarfgasBarWDesign, 
    family = quasibinomial)

Survey design:
svydesign(ids = ~0, data = logMstarfgasBarWTable, weights = ~weight)

Coefficients:
              Estimate Std. Error t value Pr(>|t|)   
(Intercept)   -84.2707    26.0267  -3.238  0.00128 **
logmstar       17.4063     5.3113   3.277  0.00111 **
I(logmstar^2)  -0.8879     0.2701  -3.288  0.00107 **
logfgas         0.2109     0.3453   0.611  0.54162   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for quasibinomial family taken to be 1.001131)

Number of Fisher Scoring iterations: 4

Comparison of AIC values


In [39]:
AIC(logMstarlogfgasWFit_fgas)
AIC(logMstarlogfgasWFit_logmstar)
AIC(logMstarlogfgasWFit_logmstar2)
AIC(logMstarlogfgasWFit_fgaslogmstar2)


eff.p
2.31228900497439
AIC
753.721772291837
deltabar
2.31228900497439
eff.p
2.11682385197635
AIC
751.715326944463
deltabar
2.11682385197635
eff.p
3.49151277295186
AIC
736.604824165618
deltabar
1.74575638647593
eff.p
7.46475123954232
AIC
743.083606530677
deltabar
2.48825041318077

Summary

The quadratic fit using logMstar (without log f_gas) is clearly the best model.