- People: Risa, Phil, David.
- Date: 2017/04/17
- Major Theme:
**Make model/data/parameters more interesting/relevant for the community. How to best achieve that?**

Assortment of Notes from beggining of meeting:

- Photometric more relevant than spectroscopic.
- Would be given: central_id, redshift, observed luminosity.
- Cori, Knight's Landing machine.
- Could swap CLF to one that applies to both centrals and satellites.

Notes on future direction:

- Centrals and total accuracy (main result for desc note).
- Explore satellite relation.
- Run on Knight's landing.

More Discussion Notes:

- 3d dark matter density, $P(L_c| M, z, d)$ where $d$ is distance to nearest cluster.
- might need to go to 1 million objects.

- Shapes of galaxies in different parts of halos 'galaxy environments' in 3d mass map.
- Can we tie to assembly bias?
- (A lot of talk about scaling relation M, mass mapping)
- Could push the scatter S(M) down to low mass.
- might not have enough constraining info in our model.

- Risa mentioned relationship between:
- halo mass
- galaxy number density
- galaxy luminosity

- 3d dark matter density, $P(L_c| M, z, d)$ where $d$ is distance to nearest cluster.

- Number of halos in the field of view: 115919
- Number of samples per halo: 100
- Total number of integrations in one likelihood calculations: $2 \times 115919 = 231838$
- Performance (c++ on Mac laptop, Apple LLVM version 8.0.0 (clang-800.0.42.1), 2.4 GHz Intel Core i5, 8 GB 1600 MHz DDR3): $$\frac{375 \text{ seconds}}{50 \text{ likelihoods}} \approx ~7 \text{ seconds / likelihood}$$ cpu bound $\implies$ will scale close to linearly with number of cores (might also see big boost from gpu since primarily vectorized sequences of math operations)
- Assuming we would require around 10,000 simple monte carlo hyperparameter samples, and we can utilize 1,000 cores (500 16-core nodes) through MPI and OpenMP then the computation should take approximately 1 minute: $$\left(\frac{10,000\text{ samples}}{1,000 \text{ cores}}\right) \left(\frac{7 \text{ seconds}}{\text{sample}}\right) = 70 \text{ seconds/core} \approx 1 \text{ minute}$$
- Doing MCMC would inhibit the capacity for parallelization, might be a lot slower. Doing MCMC on my laptop (utilizing dual-core) would take about 10 hours: $$\frac{10,000 \text{ samples} \cdot 7 \text{ seconds}}{2 \text{cores}} = 35,000 \text{ seconds} \approx 10 \text{ hours}$$

- Random items:
- Yashar may have c++/MPI emcee
- hyper-parameter [samples/prior/posterior] => hyper-[samples/prior/posterior]
- Zenhub for project management
- Recommended Mackay's book and Schneider's Astrophysics and Cosmology book

- Questions Phil Marshall would like to explore:
Basic setup is observer observing a source through a sequence of central and satellite galaxies/halos.
- Ingredients:
- halos (shape, concentration)
- galaxies (centrals, satellites)
- membership uncertainty
- filaments

- Constraints:
- position, galaxy photometry (brightness, color)
- uncertain redshifts => uncertain, correlated luminosity
- "observed clustering" => Redmapper clusters, membership, cluster redshift, luminosity
- Redmapper is interesting because it gives us a set of new observables

- weak lensing

- Ingredients:
- Include weak lensing in two step inference
- first infer hyper-parameters and generate mass maps
- use mass maps to further refine hyper-parameters.

- Potential application: denoising strong lenses
- CFHTLS, publicly available weak lensing data
- Rusu, Marshall paper 2017
- Would need to do inference through stellar-mass-halo-mass relation but might be able to reuse bigmali core
- Might be worth applying Pangloss to this problem, reread Spencer's thesis with eye for mass modelling accuracy.

- Stress the scaling
- distribute bigmali computation
- build hyper-posterior
- confirm hyper-posterior consistent with seed hyperparameter when using masses sampled from mass prior
- get familiar with Sherlock/SLURM

- Mass mapping
- Use $P(\alpha, S|data)$ to make $P(M_k|data)$ and asses its accuracy. Review derivation \begin{align*} P(M_k|d) &= \int P(M_k,M,\alpha,S|data)dMd\alpha dS\\ &\propto \int P(M_k|\alpha, S, data_k?)P(\alpha,S|data_k?)d\alpha dS\\ &\approx \frac{1}{N}\sum P(M_k|\alpha, S, data_k?)\\ \end{align*} Consider using median or mean squared error to assess accuracy.

- Read up on central/satellite luminosity relation in Reddick's thesis.
- How can we incorporate into our model?

- Explore information gain in weak lensing and mass luminosity
- KL divergence
- Can we use mass luminsotiy and mass luminosity then weak lensing to get sense of information gain from weak lensing only.

- Next meeting Thursday @ 3