PostgreSQL Index with B+Tree

Published in

Jongho’s Tech Blog

7 min readJul 31, 2021

PostgreSQL Index는 보통 B+Tree 타입을 사용한다. (MySQL — InnoDB engine도 동일하다) 이 B+Tree에 대한 PostgreSQL의 공식 문서를 읽어보고 이해해본다. RDBMS Index는 실무에서 필수적으로 사용하는만큼, 자료 구조까지는 이해해둘 필요가 있다.

Introduction

아래는 PostgreSQL과 MySQL의 인덱스 타입에 대한 문서들이다.

11.2. Index Types

PostgreSQL provides several index types: B-tree, Hash, GiST, SP-GiST, GIN and BRIN. Each index type uses a different…

www.postgresql.org

MySQL :: MySQL 8.0 Reference Manual :: 8.3.1 How MySQL Uses Indexes

Indexes are used to find rows with specific column values quickly. Without an index, MySQL must begin with the first…

dev.mysql.com

본 포스트에서 참고할 문서는 아래와 같다.

63.4. Implementation

This section covers B-Tree index implementation details that may be of use to advanced users. See…

www.postgresql.org

68.6. Database Page Layout

This section provides an overview of the page format used within PostgreSQL tables and indexes. Sequences and TOAST…

www.postgresql.org

B-tree - Wikipedia

In computer science, a B-tree is a self-balancing tree data structure that maintains sorted data and allows searches…

en.wikipedia.org

B+ tree - Wikipedia

A B+ tree is an m-ary tree with a variable but often large number of children per node. A B+ tree consists of a root…

en.wikipedia.org

본 포스트의 본문에 앞서 독자가 B-Tree, B+Tree를 이미 이해하고 있다고 가정한다. Tree에서 다루는 key-value 중 key는 indexing한 column value에 해당하고, value는 pointer to the table row에 해당한다.

B-Tree에 대해서는 아래 포스트가 참고가 될 듯하다.

Data Structure - B-tree

B 트리는 자식을 두개만 가질 수 잇는 이진 트리를 확장하여 더 많은 자식을 가질 수 있게 고안한 것이다. 오라클과 같은 상용 DB에서 많이 사용하는 자료구조로 외부 검색(주 저장장치인 메모리 외의 저장장치에서의…

hochulshin.com

B+Tree for the index in PostgreSQL

공식 문서를 한 번 읽어보자. 아래 포인트들을 짚어보았다.

Tree는 Multi-level로 이루어짐
각 level은 page들의 double linked list임
index마다 하나의 metapage가 존재하고, 나머지는 internal page or leaf page임
각 leaf page는 테이블 row를 가리키는 포인터 튜플들을 가짐
각 internal page는 child node를 가리키는 포인터 튜플들을 가짐
page의 99%는 leaf page임
여기서 말하는 page는 아래 문서에 서술된 page format을 의미함

68.6. Database Page Layout

This section provides an overview of the page format used within PostgreSQL tables and indexes. Sequences and TOAST…

www.postgresql.org

각 page의 사이즈 (각 page가 가지는 item의 개수?)도 위 문서를 참고하면 된다.

Add new tuples to the index

새로운 tuple에 대하여 기존 leaf page에 추가할 공간이 없을 경우, 새로운 leaf page를 추가한다. page split 명령은 기존 page에 있던 item들 중 일부를 새로운 page로 옮기는 작업이다. 당연히 parent page에 link도 추가해야 하며, 이로 인하여 순차적으로 parent page도 page split이 일어날 수 있다.

즉 page split은 재귀적으로 상향식으로 cascading한다. 마침내 root page가 item을 추가할 공간이 없을 경우 root page split이 수행된다. 이를 통해 root page의 parent page로 새로운 root page를 추가하게 된다.

Why B+Tree for the index

왜 B+Tree를 index 구현에 많이 사용할까? 아래는 B+Tree wikipedia에서 인용한 내용이다

The primary value of a B+ tree is in storing data for efficient retrieval in a block-oriented storage context — in particular, filesystems. This is primarily because unlike binary search trees, B+ trees have very high fanout (number of pointers to child nodes in a node,[1] typically on the order of 100 or more), which reduces the number of I/O operations required to find an element in the tree.

binary search tree와 비교할 때, B+Tree는 여러 개의 child node를 가지므로 Filesystem에 발생하는 I/O 빈도를 낮추는 이점을 가진다.

Etc

B+Tree를 관리하다보면 duplication이 자주 발생하므로, deduplication에 관한 내용도 자세히 서술되어 있었다. 내용이 꽤 많으므로 본 포스트에서 거기까지 다루지는 않겠다.

여담

Index 관련하여 기억에 남는 것들

MySQL에는 특별한 index로서 (PK를 key로 하는) Clustered index가 있다. 이는 MySQL 테이블의 row data 자체의 저장 방법이다.

MySQL :: MySQL 8.0 Reference Manual :: 15.6.2.1 Clustered and Secondary Indexes

Each InnoDB table has a special index called the clustered index that stores row data. Typically, the clustered index…

dev.mysql.com

Hash로 구현된 index는 당연하지만 등호 비교만 지원한다. Hash 특성상 부등호 비교를 지원할 수 없다.

PostgreSQL Index with B+Tree

Introduction

11.2. Index Types

PostgreSQL provides several index types: B-tree, Hash, GiST, SP-GiST, GIN and BRIN. Each index type uses a different…

MySQL :: MySQL 8.0 Reference Manual :: 8.3.1 How MySQL Uses Indexes

Indexes are used to find rows with specific column values quickly. Without an index, MySQL must begin with the first…

63.4. Implementation

This section covers B-Tree index implementation details that may be of use to advanced users. See…

68.6. Database Page Layout

This section provides an overview of the page format used within PostgreSQL tables and indexes. Sequences and TOAST…

B-tree - Wikipedia

In computer science, a B-tree is a self-balancing tree data structure that maintains sorted data and allows searches…

B+ tree - Wikipedia

A B+ tree is an m-ary tree with a variable but often large number of children per node. A B+ tree consists of a root…

Data Structure - B-tree

B 트리는 자식을 두개만 가질 수 잇는 이진 트리를 확장하여 더 많은 자식을 가질 수 있게 고안한 것이다. 오라클과 같은 상용 DB에서 많이 사용하는 자료구조로 외부 검색(주 저장장치인 메모리 외의 저장장치에서의…

B+Tree for the index in PostgreSQL

68.6. Database Page Layout

This section provides an overview of the page format used within PostgreSQL tables and indexes. Sequences and TOAST…

Add new tuples to the index

Why B+Tree for the index

Etc

여담

MySQL :: MySQL 8.0 Reference Manual :: 15.6.2.1 Clustered and Secondary Indexes

Each InnoDB table has a special index called the clustered index that stores row data. Typically, the clustered index…

11.2. Index Types

PostgreSQL provides several index types: B-tree, Hash, GiST, SP-GiST, GIN and BRIN. Each index type uses a different…

Written by Jongho Jeon