About Data Storage in Databases

Vivek Gupta
3 min readJun 7, 2023

Before getting into other things, let’s understand what is a binary tree?

Binary Tree

Binary Tree is defined as a tree data structure where each node has at most 2 children. Since each element in a binary tree can have only 2 children, we typically name them the left and right child.

A Binary tree is represented by a pointer to the topmost node (commonly known as the “root”) of the tree. If the tree is empty, then the value of the root is NULL. Each node of a Binary Tree contains the following parts:

  1. Data
  2. Pointer to left child
  3. Pointer to right child

Binary Search Tree is a node-based binary tree data structure which has the following properties:

  • The left subtree of a node contains only nodes with keys lesser than the node’s key.
  • The right subtree of a node contains only nodes with keys greater than the node’s key.
  • The left and right subtree each must also be a binary search tree.

B+ Tree & Indexing

B+ tree file organisation is the advanced method of an indexed sequential access method. It uses a tree structure to store records in a file.

  • A B+ tree file consist of a data file which contains LRECs and an index file which contains TLRECs.
  • The tree index file, which consists of blocks (nodes), is maintained internally by the TPFDF product. The fiel has its own ID, DSECT and DBDEF.
  • The prime block of the B+ tree index file (root node) is pointed to > by the header in the prime block of the B+ tree data file.

Indexing types:

  • Primary Index: For Ordered data file on the key field (Primary key usually).
  • Secondary Index: Generated from a field which is a candidate key and has a unique value in every record or a non-key with duplicate values.
  • Clustering Index: Defined on data file ordered on a non-key field.

Indexing Techniques:

  1. In the dense index, there is an index record for every search key value in the database. This makes the search fast but we need extra memory to store index file record.
  2. In the sparse index, index records are not created for every search key. An index record contains a search key and an actual pointer to the data on disk. If data is not found according to index record then sequential search starts.
  3. In multilevel index, index records comprise search-key values and data pointers. The size of index record grows proportional to DB size. If single index is used, then a large size index cannot be kept in memory which leads to multiple disk accesses.
    Multilevel index helps in breaking down the index into several smaller indices in order to make the outermost level so small that it can be saved in a single disk block.

B+ Tree: Multilevel Indexing

It is a BST which follows multi level index format. It is used for large files that have unusual, unknowing, or changing distributions because it reduces I/O processing when files are read.

The leaf nodes of the tree denote the actual data pointers. Since it’s a BST, all the leaf nodes remain at same height. Moreover, the nodes are linked using linked list hence support random & sequential access.

  • Internal (non-leaf) nodes contain at least [n/2] pointers, except the root node.
  • At most, an internal node can contain n pointers.
  • Leaf nodes contain at least [n/2] record pointers ans [n/2] key values.
  • At most, a leaf node can contain n record pointers and n key values.
  • Every leaf node contains one block pointer P to point to next leaf node and forms a linked list.

Also checkout my law blog here: https://qr.ae/pyRsLp

--

--