In [1]:
# %load ../../preconfig.py
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
plt.rcParams['axes.grid'] = False
from __future__ import division
#import numpy as np
#import pandas as pd
#import itertools
import logging
logger = logging.getLogger()
B-Trees are balanced search trees, similar to red-black trees, but they are better at minizing disk I/O operations.
We use the number of pages read or written as a approximation of the total time spent accessing the disk.
A B-tree node is usually as large as a whole disk page, and this size limits the number of children a B-tree node can have.
In [2]:
plt.imshow(plt.imread('./res/fig18_3.png'))
Out[2]:
node attributes:
Each internal node has $x.n+1$ points $x.c_1, \dotsc, x.c_{x.n+1}$ to its children.
$x.c_i.keys \leq x.key_i \leq x.c_{i+1}.keys$
All leaves have the same depth $h$.
minimum degree of the B-tree $T$: a fixed integer $t \geq 2$.
The simplest B-tree occurs when $t=2$: 2-3-4 tree.
If $n \geq 1$, then for any $n$-key B-tree $T$ of height $h$ and minimum degree $t \geq 2$.
$$h \leq \log_t \frac{n+1}{2}$$
Proof: The root of a B-tree $T$ contains at least one key, and all other nodes contain at least $t-1$ keys.
In [3]:
plt.figure(figsize=(15,10))
plt.imshow(plt.imread('./res/fig18_4.png'))
Out[3]:
the number $n$ of keys statisfies the inequality: $$ n \geq 1 + (t-1) \sum_{i=1}^h 2t^{i-1} = 2t^h -1$$
B-trees save a factor of about $\lg t$ over red-black trees in the number of nodes examined for most tree operations.
%maybe: exercise
In [4]:
#todo: code
In [5]:
#todo: code
In [6]:
plt.imshow(plt.imread('./res/fig18_5.png'))
Out[6]:
If $T$ is full tree, then create a new root node, and split them. (Height increase 1).
(b)
B-TREE-INSERT(T,K)
if T is FULL:
S = B-TREE-CREATE()
S.children = T
R = B-TREE-SPLIT-CHILD(S)
else
R = T
B-TREE-INSERT-NONFULL(R,K)
In [7]:
plt.imshow(plt.imread('./res/fig18_6.png'))
Out[7]:
(c)
B-TREE-INSERT-NONFULL(T,K)
if T is LEAF
FIND_AND_INSERT(T,K)
else
C = FIND_CHILD(T,K)
if C is FULL
N = B-TREE-SPLIT-CHILD(C)
R = FIND_CHILD(N,K)
else
R = T
B-TREE-INSERT-NONFULL(R, K)
In [8]:
#todo: exercises
When going down, if the node's child has only $t-1$ keys, move its key down to the child.
If the node to delete is a leaf node, then directly remove it.
If the node to delete is a internal node, then
Base on the three priciples, we summary all possible cases as follows:
In [15]:
plt.figure(figsize=(10,20))
plt.subplot(2,1,1)
plt.imshow(plt.imread('./res/fig18_8.png'))
plt.subplot(2,1,2)
plt.imshow(plt.imread('./res/fig18_9.png'))
Out[15]:
In [16]:
#todo: exercise