K-d tree

Approximate Nearest Neighborhood(ANN)

Best Bin First (BBF)

Original Paper
- Shape indexing using approximate nearest-neighbour search in high-dimensional spaces (1997)
Reference Source Code
- https://github.com/iwyoo/kd_tree/blob/master/kd_tree.cxx
Current Status

In the backtracking stage, whole branches of the tree can be pruned if the region of space they represent is further from the query point than Dcur (the distance from q to the closest neighbour yet seen).
Issue to solve

This process can be very effective in low-dimensional spaces, but in higher dimensions there are many more bins adjacent to the central one that must be examined, and per- formance degrades rapidly. Interestingly,
Strategy

If we are willing to settle for an ap- proximate NN, then we can avoid prolonged search by lim- iting the number of leaf nodeswe are willing to examine (to Emax ), and settling for the best neighbour found up to that point

Search nearer bins first
- The distance to a q bin is defined to be the minimum distance between and any point on the bin boundary.

BBF VS Previous Version
- In Backtracking stage (Backtracking 단계에서만 다르다.)
  - Previous Version
    - Just Depth First (단순한 Depth First Search인데 구현에 따라서 Left,Right선택시에 순서를 다르게 할 수 있다.)
  - BBF
    - Use Priority Queue, which of priority is defined as distance node from query point
      - There are many version of the definition of distance
        
        Distance from Bounding Box of the node is reasonable.
    - Counting inspected leaf nodes and stop earlier
      - If more leaf nodes than limit is checked, inspection is over
Detailed Description of the picuture below
- In Depth first search in original NN search, tree architecture can affect the order of leaf node inspection. This can cause conflict with distance of real leaf nodes.
  - (Depth first Search시에 Depth가 실제 거리보다도 tree Architecture에 따라서 순서가 결정되는데 이 순서는 실제 가까운 순서가 아니다.)
- Previous Version
  - If a subtree is chosen in depth first search, we inspect all the leaf node under the subtree.
    - (그림에서 Basic NN의 경우 한쪽 subtree의 순서가 결정되면 그 subtree의 하위 Node를 먼저 검토하게 된다. 검사 순서가 실제 거리와 상관없이 상위 Node에 거리 차이에 영향을 받게 된다.)
- BBF
  - Branch Node over node 2,3 will be located prior to the leaf node 4 in the priority queue, so node 4 is the last node to inspect.
    - BBF는 subtree의 구조와는 무관하게 4번 Node가 2,3 번의 상위 Node보다 멀리 있기 때문에 순서가 뒤로 밀리게 되고, 2,3번의 상위 Node를 검사하게 되고, 2,3 번 영역이 역시 4번 Node보다 가깝기 때문에 결국 4번 leaf node를 마지막에 검사하게 된다.

Experiment
- D = 12, N = 100,000
- 2**12 = There are 4096 bins => average 100,000/4,096 points per bin (24.41... points per bin)

Randomized KD-Trees (Tree "s")

Original Paper
- Optimised KD-trees for fast image descriptor matching (2008)
Strategy
- Randomized Mulitple Tree
- Randomized Tree
  - What "Random"ized means?
    
    Several variations in building a tree, including randomisation in selecting the partitioning value, were suggested. Later
  - Random Direction(Random axis)
Issuses to solve
- Even though increasing Searched nodes, error is not decreasing linearly
  - Searched Nodes are dependent on each other
  - Less productive
    On the other hand, if KD-trees are built with different parameters, with different ways to select partitioning value for example, the order of search node and search results on these KD-trees may be different. This leads to the idea of using multiple search-trees to enhance independence.



In [ ]: