CS241 -- Lecture Notes: B-Trees

Daisy Tang

Back To Lectures Notes


This lecture covers Section 10.2 of our text book and here.
Introduction to B-Trees
A B-tree is a tree data structure that keeps data sorted and allows searches, insertions, and deletions in logarithmic amortized time. Unlike self-balancing binary search trees, it is optimized for systems that read and write large blocks of data. It is most commonly used in database and file systems.
The B-Tree Rules
Important properties of a B-tree:
  • B-tree nodes have many more than two children.
  • A B-tree node may contain more than just a single element.

The set formulation of the B-tree rules: Every B-tree depends on a positive constant integer called MINIMUM, which is used to determine how many elements are held in a single node.

  • Rule 1: The root can have as few as one element (or even no elements if it also has no children); every other node has at least MINIMUM elements.
  • Rule 2: The maximum number of elements in a node is twice the value of MINIMUM.
  • Rule 3: The elements of each B-tree node are stored in a partially filled array, sorted from the smallest element (at index 0) to the largest element (at the final used position of the array).
  • Rule 4: The number of subtrees below a nonleaf node is always one more than the number of elements in the node.
    • Subtree 0, subtree 1, ...
  • Rule 5: For any nonleaf node:
    1. An element at index i is greater than all the elements in subtree number i of the node, and
    2. An element at index i is less than all the elements in subtree number i + 1 of the node.
  • Rule 6: Every leaf in a B-tree has the same depth. Thus it ensures that a B-tree avoids  the problem of a unbalanced tree.

 

The Set Class Implementation with B-Trees

Remember that "Every child of a node is also the root of a smaller B-tree".

public class IntBalancedSet implements Cloneable
{
    private static final int MINIMUM = 200;
    private static final int MAXIMUM = 2*MINIMUM;
    int dataCount;
    int[] data = new int[MAXIMUM + 1];
    int childCount;
    IntBalancedSet[] subset = new IntBalancedSet[MAXIMUM + 2];

    // Constructor: initialize an empty set
    public IntBalancedSet()

    // add: add a new element to this set, if the element was already in the set, then there is no change.
    public void add(int element)

    // clone: generate a copy of this set.
    public IntBalancedSet clone()

    // contains: determine whether a particular element is in this set
    pubic boolean contains(int target)

    // remove: remove a specified element from this set
    public boolean remove(int target)
}

 

Searching for a Target in a Set

The psuedocode:

  1. Make a local variable, i, equal to the first index such that data[i] >= target. If there is no such index, then set i equal to dataCount, indicating that none of the elements is greater than or equal to the target.
  2. if (we found the target at data[i])
    return true;
    else if (the root has no children)
    return false;
    else return subset[i].contains(target);

See the following example, try to search for 10.

We can implement a private method:

  • private int firstGE(int target), which returns the first location in the root such that data[x] >= target. If there's no such location, then return value is dataCount.
 

Adding an Element to a B-Tree

It is easier to add a new element to a B-tree if we relax one of the B-tree rules.

Loose addition allows the root node of the B-tree to have MAXIMUM + 1 elements. For example, suppose we want to add 18 to the tree:

The above result is an illegal B-tree. Our plan is to perform a loose addition first, and then fix the root's problem.

The Loose Addition Operation for a B-Tree:

private void looseAdd(int element)
{
    1. i = firstGE(element) // find the first index such that data[i] >= element
    2. if (we found the new element at data[i]) return; // since there's already a copy in the set
    3. else if (the root has no children)
           Add the new element to the root at data[i]. (shift array)
    4. else {
           subset[i].looseAdd(element);
           if the root of subset[i] now has an excess element, then fix that problem before returning.
       }
}

private void fixExcess(int i)
// precondition: (i < childCount) and the entire B-tree is valid except that subset[i] has MAXIMUM + 1 elements.
// postcondition: the tree is rearranged to satisfy the loose addition rule

Fixing a Child with an Excess Element:

  • To fix a child with MAXIMIM + 1 elements, the child node is split into two nodes that each contain MINIMUM elements. This leaves one extra element, which is passed up to the parent.
  • It is always the middle element of the split node that moves upward.
  • The parent of the split node gains one additional child and one additional element.
  • The children of the split node have been equally distributed between the two smaller nodes.

 

 

Fixing the Root with an Excess Element:

  • Create a new root.
  • fixExcess(0).

 

Removing an Element from a B-Tree

Loose removal rule: Loose removal allows to leave a root that has one element too few.

public boolean remove(int target)
{
    answer = looseRemove(target);
    if ((dataCount == 0) && (childCount == 1))
        Fix the root of the entire tree so that it no longer has zero elements;
    return answer;
}

private boolean looseRemove(int target)
{
1. i = firstGE(target)
2. Deal with one of these four possibilities:
   2a. if (root has no children and target not found) return false.
   2b. if( root has no children but target found) {
           remove the target
           return true
       }
   2c. if (root has children and target not found) {
           answer = subset[i].looseRemove(target)
           if (subset[i].dataCount < MINIMUM)
               fixShortage(i)
           return true
       }
   2d. if (root has children and target found) {
           data[i] = subset[i].removeBiggest()
           if (subset[i].dataCount < MINIMUM)
               fixShortage(i)
           return true
       }
}

private void fixShortage(int i)
// Precondition: (i < childCount) and the entire B-tree is valid except that subset[i] has MINIMUM - 1 elements.
// Postcondition: problem fixed based on the looseRemoval rule.

private int removeBiggest()
// Precondition: (dataCount > 0) and this entire B-tree is valid
// Postcondition: the largest element in this set has been removed and returned. The entire B-tree is still valid based on the looseRemoval rule.

Fixing Shortage in a Child:

When fixShortage(i) is activated, we know that subset[i] has MINIMUM - 1 elements. There are four cases that we need to consider:

Case 1: Transfer an extra element from subset[i-1]. Suppose subset[i-1] has more than the MINIMUM number of elements.

  1. Transfer data[i-1] down to the front of subset[i].data.
  2. Transfer the final element of subset[i-1].data up to replace data[i-1].
  3. If subset[i-1] has children, transfer the final child of subset[i-1] over to the front of subset[i].

Case 2: Transfer an extra element from subset[i+1]. Suppose subset[i+1] has more than the MINIMUM number of elements.

Case 3: Combine subset[i] with subset[i-1]. Suppose subset[i-1] has only MINIMUM elements.

  1. Transfer data[i-1] down to the end of subset[i-1].data.
  2. Transfer all the elements and children from subset[i] to the end of subset[i-1].
  3. Disconnect the node subset[i] from the B-tree by shifting subset[i+1], subset[i+2] and so on leftward.

Case 4: Combine subset[i] with subset[i+1]. Suppose subset[i+1] has only MINIMUM elements.

We may need to continue activating fixShortage() until the B-tree rules are satisfied.

Removing the Biggest Element from a B-Tree:

private int removeBiggest()
{
    if (root has no children)
        remove and return the last element
    else {
        answer = subset[childCount-1].removeBiggest()
        if (subset[childCount-1].dataCount < MINIMUM)
            fixShortage(childCount-1)
        return answer
    }
}

A more concrete example for node deletion:

 

2-3 Tree (applet)
A 2-3 tree is a type of B-tree where every node with children (internal node) has either two children and one data element (2-nodes) or three children and two data elements (3-node). Leaf nodes have no children and one or two data elements.

Practice: insert the following values in sequence to a 2-3 tree: 50, 19, 21, 66, 84, 29, and 54.

Practice: now, delete 50, 66, 54 from the above B-tree.

Take a look at this 2-3 tree animation.

Exercises:

Build a 2-3 tree with the following ten values: 10, 9, 8, ..., and 1. Delete 10, 8 and 6 from the tree.

Trees -- Time Analysis
The implementation of a B-tree is efficient since the depth of the tree is kept small.

Worst-case times for tree operations: the worst-case time performance for the following operations are all O(d), where d is the depth of the tree:

  1. Adding an element to a binary search tree (BST), a heap, or a B-tree.
  2. Removing an element from a BST, a heap, or a B-tree.
  3. Searching for a specified element in a BST or a B-tree.

Time Analysis for BST

Suppose a BST has n elements. What is the maximum depth the tree could have?

  • A BST with n elements could have a depth as big as n-1.

Worst-Case Times for BSTs:

  • Adding an element, removing an element, or searching for an element in a BST with n elements is O(n).

Time Analysis for Heaps

Remember that a heap is a complete BST, so each level must be full before proceeding to the next level.

Number of nodes needed for a heap to reach depth d is: (1 + 2 + 4 + 8 + ... + 2d-1) + 1 = 2d = n. Thus d = log2n.

Worst-Case Times for Heap Operations:

  • Adding or removing an element in a heap with n elements is O(log n).

Time Analysis for B-Tree

Suppose a B-tree has n elements and M is the maximum number of children a node can have. What is the maximum depth the tree could have? What is the minimum depth the tree could have?

  • The worst-case depth (maximum depth) of a B-tree is: logM/2 n.
  • The best-case depth (minimum depth) of a B-tree is: logM n.

Worst-Case Times for B-Trees:

  • Adding or removing an element in a B-tree with n elements is O(log n).
Learning Objectives
When you complete this section, you will be able to:
  • list the rules for a B-tree and determine whether a tree satisfies these rules.
  • do a simulation by hand of the algorithms for searching, inserting, and removing an element from a B-tree.
  • use the B-tree data structure to implement a set class.
  • use Java's DefaultMutableTreeNode and JTree classes in simple programs that use trees.
  • understand the complexities of each operation.

Last updated: Oct. 2012