last updated 11/6/12

An early technology attempting fast retrieval of individual records and maintained sorted order. Much knowledge of how the data is mapped to the physical data storage device.

ISAM files are generally an integrated storage structure

- Clustered
- Each index entry contains an entire row
- A node contains F keys and F+1 pointers

This node structure is a generalization of a binary search tree node.

Separator entry = **( k_{i }, p_{i})** where

**k _{i }**separates set of search key values in the two subtrees pointed
at by

**k _{i-1 }**<= (all entries in

** F** is determined by the index (or primary key) size of the data,

If

Example B=4096 and S=30, then F = _{⌊}4092/34_{⌋} = 120. Note this is the log base! A node holds up to 120 key value boundaries.

Supports equality and range searches, multiple attribute keys and partial key searches

Either a secondary index (sometimes in a separate file) or the basis for an integrated storage structure

Responds to dynamic changes in the table.

- Growth inserts into non-leaf nodes and if the tree needs to be larger, it will create a new root.
- If a leaf node is full, it splits.

Leaf level nodes consist of a **sorted** linked list of index entries

- all data tuples are represented by the index entry.
- range searches are accommodated, not just specific lookup
- sometimes the whole tuple is stored (the number of entries may then be smaller in a leaf node)
- the pointer may reference leaf nodes in another b-tree! (non-clustered, slower when sequential access is used)

Sibling pointers support range searches in spite of allocation and deallocation of leaf pages (note that leaf pages might not be physically contiguous)

Structure of tree changes to handle row insertion and deletion - *no*
overflow chains ever

Tree remains balanced: all paths from root to index entries have same length

Algorithm guarantees that the number of separator entries in an index page
is between * _{⌊}F / 2_{⌋}* and F (at least 50% full, exception is root which may have as few as one)

- Hence the maximum search cost is
*(**log*_{ F/2}Q) + 1) - where Q is the size of the dataset
- So in our case where F was calculated as 120, then we have nodes containing 60 to 120 key value boundaries. If on average the node is midrange at 90 values, then a 4 level B+ tree has capacity of 90
^{4}=65.6 million records.

In this case the insertion can just fill in a hole in the leaf node .

Insert “vera”: Since there is no room in leaf page:

- Create new leaf page, C
- Split index entries between B and C (but maintain sorted order)
- Add separator entry at parent level

Insert “rob”. Since there is no room in leaf page A:

- Split A into A1 and A2 and divide index entries between the two (but maintain sorted order)
- Split D into D1 and D2 to make room for additional pointer
- Three separators are needed: “sol”, “tom” and “vince”

When splitting a separator page, push a separator up

Repeat process at next level

Height of tree increases by one

Deletion can cause page to have fewer than F */2*
entries

- Entries can be redistributed over adjacent pages to maintain minimum occupancy requirement
- Ultimately, adjacent pages must be merged, and if merge propagates up the tree, height might be reduced

In practice, tables generally grow, and merge algorithm is often not implemented

- Reconstruct tree to compact it