| notebook.community

notebook.community

RITs Pseudocode

The following is pseudocode to use for the RITs algorithm
It is based on the original Meinshausen/ Shah paper

RITs inputs

M Number of trees to build
D Max Tree Depth
p Children sample node probability threshold (= 0 for no split, i.e. based on uniform (0, 1) RNG with respect the the threshold)
n Min number of children to sample at each node (if p != 0 then at each node if the split node prob <= p, then sample n children at that node, else sample n + 1 children at that node each node)

i.e. if we want just a binary RIT i.e. always 2 children sampled at each node then set p = 0 and n = 2.

RITs outputs

Our version of the RITs should output the following:

Node class and The RIT class
The random number list of nodes that we generated i.e. as a generator function (for reproducibility and testing)
The entire RITs (for all M trees)

RIT Node class

We need to return the rich RIT object
- The authors mention calculating prevalence and sparsity, how should we best calculate these metrics?
- Needs to return clean attributes:
  - IsNode
  - HasChildren
  - NumChildren
  - Is leaf node
  - getIntersectedPath

Summary

At it's core, the RIT is comprised of 3 main modules
FILTERING: Subsetting to either the 1's or the 0's
RANDOM SAMPLING: The path-nodes in a weighted manner, with/ without replacement, within tree/ outside tree
INTERSECTION: Intersecting the selected node paths in a systematic manner



In [ ]: