RITs Pseudocode

RITs inputs

  • M Number of trees to build
  • D Max Tree Depth
  • p Children sample node probability threshold (= 0 for no split, i.e. based on uniform (0, 1) RNG with respect the the threshold)
  • n Min number of children to sample at each node (if p != 0 then at each node if the split node prob <= p, then sample n children at that node, else sample n + 1 children at that node each node)

i.e. if we want just a binary RIT i.e. always 2 children sampled at each node then set p = 0 and n = 2.

RITs outputs

Our version of the RITs should output the following:

  • Node class and The RIT class
  • The random number list of nodes that we generated i.e. as a generator function (for reproducibility and testing)
  • The entire RITs (for all M trees)

RIT Node class

  • We need to return the rich RIT object
    • The authors mention calculating prevalence and sparsity, how should we best calculate these metrics?
    • Needs to return clean attributes:
      • IsNode
      • HasChildren
      • NumChildren
      • Is leaf node
      • getIntersectedPath

Summary

  • At it's core, the RIT is comprised of 3 main modules
  • FILTERING: Subsetting to either the 1's or the 0's
  • RANDOM SAMPLING: The path-nodes in a weighted manner, with/ without replacement, within tree/ outside tree
  • INTERSECTION: Intersecting the selected node paths in a systematic manner

In [ ]: