[MySQL] 3. MySQL Index
What is an Index?
- Index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure.
Index Types
By Data Structure
Hash Index
- Hash Index is based on a hash table data structure. It uses a hash function to map keys to specific locations in the hash table, allowing for very fast data retrieval.
- Algorithm Complexity: O(1)
- Advantages:
- Very fast for equality searches (e.g.,
=
).
- Disadvantages:
- Not suitable for range queries (e.g.,
<
, >
, BETWEEN
). - Hash collisions can occur, leading to performance degradation.
why not support order by
- let us see the algorithm of hash index

- We see that hash index which is k-v structure. So v is not sorted, so it can not support
order by
.
BTree(B-Tree) Index
why not AVL or Red-Black Tree
- As to Tree Index, we can think about balanced search tree like AVL tree or Red-Black tree
- Algorithm Complexity: O(log n)

- Obviously, they are binary balance search tree which is not suitable for database index because:
- High tree height: Binary trees can become tall, leading to increased search times.
- Frequent rebalancing: Insertions and deletions often require tree rotations to maintain balance, which can be costly in terms of performance.
- Poor disk I/O performance: Binary trees do not take advantage of spatial locality, leading to inefficient disk access patterns.
- Disk I/O cost much more than memory access.
Why BTree
- So, to reduce the tree height and disk I/O, BTree comes out naturally.
- Firstly, multi-way tree is used to reduce the tree height.
- Secondly, each node contains entire disk page data to reduce the disk I/O.

- But we still have some problems:
- When we insert or delete data, the tree may become unbalanced.
- It is hard to search in order that we need to use in-order traversal, much cost, to get sorted data.
- So, B+Tree comes out.
B+Tree Index
- Firstly, B+Tree reduce the tree node size by only storing keys in internal nodes, which allows more keys to fit in memory and reduces tree height.
- no value means more keys in one node => less tree height => less disk I/O
- Secondly, B+Tree stores all actual data in leaf nodes, which are linked together in a linked list, so that range queries more efficient.

By Usage
Primary Key Index
- Primary key index is automatically created for primary key columns. If there is no primary key, MySQL will create one.
- Ensures uniqueness and fast access to records, because only one action can get the entire record.
- Primary Key is B+Tree

Composite Index
- We know MySQL support composite index which is index on multiple columns.

Secondary Index (Non-Clustered Index)
- Secondary index is also called non-clustered index, which is different from clustered index (primary key index).
- Key Difference: Secondary index leaf nodes only store index columns + primary key values, not the complete row data.
- Index Structure: B+Tree structure, but leaf nodes contain references to primary key instead of full row data.

Covering Index vs Table Lookup
Covering Index: When query fields are all included in the secondary index, no table lookup needed.
1
2
| -- Index: idx_AB (A, B)
SELECT A, B FROM table WHERE A = 1; -- ✅ Covering index, no lookup
|
Table Lookup: When query needs fields not in the secondary index, must lookup primary key.
1
2
| -- Index: idx_AB (A, B), but need field C
SELECT A, B, C FROM table WHERE A = 1; -- ❌ Need table lookup for field C
|
Table Lookup Process
- Search Secondary Index: Find matching records, get primary key values
- Lookup Primary Index: Use primary key to fetch complete row data from clustered index
- Return Results: Combine data from both indexes
This is why SELECT *
often requires table lookup, while selecting only indexed columns can avoid it.
Full-Text Index
- Full-Text index is designed to solve finding words in large text problems using inverted index structure.

Key Differences from B+Tree Index
Feature | B+Tree Index | Full-Text Index |
---|
Data Structure | Balanced tree | Inverted index (word → documents) |
Storage | Complete row data in leaves | Word-to-document mappings |
Query Type | Exact match, range queries | Text search, relevance ranking |
Complexity | O(log n) | Depends on word frequency |
Inverted Index Mechanism
- Tokenization: Split text into individual words
- Word Mapping: Create word → document list mappings
- Search Process: Find documents containing query words
- Intersection: Combine results for multi-word queries
Supported Data Types
Create Table Grammar
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| -- Single column full-text index
CREATE TABLE articles (
id INT PRIMARY KEY AUTO_INCREMENT,
title VARCHAR(255),
content TEXT,
FULLTEXT KEY ft_content (content)
) ENGINE=InnoDB;
-- Multi-column full-text index
CREATE TABLE articles (
id INT PRIMARY KEY AUTO_INCREMENT,
title VARCHAR(255),
content TEXT,
FULLTEXT KEY ft_title_content (title, content)
) ENGINE=InnoDB;
|
Search Modes
Natural Language Mode (Default)
1
2
3
4
5
6
7
8
| SELECT * FROM articles
WHERE MATCH(title, content) AGAINST('MySQL optimization');
-- With relevance score
SELECT *, MATCH(title, content) AGAINST('MySQL optimization') as score
FROM articles
WHERE MATCH(title, content) AGAINST('MySQL optimization')
ORDER BY score DESC;
|
Boolean Mode
1
2
3
4
5
6
7
8
9
10
11
| -- Must contain "MySQL", must not contain "Oracle"
SELECT * FROM articles
WHERE MATCH(title, content) AGAINST('+MySQL -Oracle' IN BOOLEAN MODE);
-- Exact phrase search
SELECT * FROM articles
WHERE MATCH(title, content) AGAINST('"index optimization"' IN BOOLEAN MODE);
-- Wildcard search
SELECT * FROM articles
WHERE MATCH(title, content) AGAINST('optim*' IN BOOLEAN MODE);
|
Advanced Index Optimization Concepts
Let’s analyze various SQL queries to understand Covering Index, Index Condition Pushdown (ICP), and Key Lookup (回表) behaviors.
Table Structure and Index
1
2
3
4
5
6
7
8
9
10
| CREATE TABLE articles (
id INT PRIMARY KEY AUTO_INCREMENT,
A BIGINT NOT NULL,
B BIGINT NOT NULL,
C BIGINT NOT NULL,
D BIGINT NOT NULL
) ENGINE=InnoDB;
-- Composite index on columns A, B, C
CREATE INDEX idx_articles_query ON articles (A, B, C);
|
Index Structure: idx_articles_query (A, B, C)
(secondary index) + Primary Key id
(automatically included in secondary indexes)
Covering Index
- Covering Index: When all query columns are included in the index, eliminating the need for table lookup.
- So, if the query need a table lookup, it must not be a
Covering Index
Not Covering Index: table lookup for other columns return
1
2
3
4
5
6
| SELECT * FROM articles WHERE A = 1;
SELECT * FROM articles WHERE A = 1 AND B > 1;
SELECT * FROM articles WHERE A = 1 AND B > 1 AND C = 1;
SELECT * FROM articles WHERE A = 1 AND B > 1 AND C = 1 AND D = 1;
SELECT A, B, C, D FROM articles WHERE A = 1 AND B > 1 AND C = 1;
SELECT A, B, C, D FROM articles WHERE A = 1 AND B > 1 AND C = 1 AND D = 1;
|
Not Covering Index: table lookup for other columns as conditions
1
2
3
| SELECT * FROM articles WHERE A = 1 AND B > 1 AND C = 1 AND D = 1;
SELECT A, B, C FROM articles WHERE A = 1 AND B > 1 AND C = 1 AND D = 1;
SELECT A, B, C, D FROM articles WHERE A = 1 AND B > 1 AND C = 1 AND D = 1;
|
Covering Index success
1
| SELECT A, B, C FROM articles WHERE A = 1 AND B > 1 AND C = 1;
|
Index Condition Pushdown
ICP: Pushes WHERE conditions that can use the index down to the storage engine level, reducing table lookups.
Not ICP : Just one condition
1
| SELECT * FROM articles WHERE A = 1;
|
ICP Success
even skip one index column
1
| SELECT * FROM articles WHERE A = 1 AND C > 1;
|
even has other index column
1
2
3
| SELECT * FROM articles WHERE A = 1 AND B > 1 AND C = 1 AND D = 1;
SELECT A, B, C FROM articles WHERE A = 1 AND B > 1 AND C = 1 AND D = 1;
SELECT A, B, C, D FROM articles WHERE A = 1 AND B > 1 AND C = 1 AND D = 1;
|
normal
1
2
3
4
| SELECT * FROM articles WHERE A = 1 AND B > 1;
SELECT * FROM articles WHERE A = 1 AND B > 1 AND C = 1;
SELECT A, B, C FROM articles WHERE A = 1 AND B > 1 AND C = 1;
SELECT A, B, C, D FROM articles WHERE A = 1 AND B > 1 AND C = 1;
|
Key Lookup
- Secondary index only save
id
that key lookup will occur if there has other column. - Key Lookup is the action we
need to avoid
. Because it cost more.
Key Lookup : return more columns
1
2
3
4
5
6
| SELECT * FROM articles WHERE A = 1;
SELECT * FROM articles WHERE A = 1 AND B > 1;
SELECT * FROM articles WHERE A = 1 AND B > 1 AND C = 1;
SELECT * FROM articles WHERE A = 1 AND B > 1 AND C = 1 AND D = 1;
SELECT A, B, C, D FROM articles WHERE A = 1 AND B > 1 AND C = 1;
SELECT A, B, C, D FROM articles WHERE A = 1 AND B > 1 AND C = 1 AND D = 1;
|
Key Lookup : conditions contain more column
1
2
3
| SELECT * FROM articles WHERE A = 1 AND B > 1 AND C = 1 AND D = 1;
SELECT A, B, C FROM articles WHERE A = 1 AND B > 1 AND C = 1 AND D = 1;
SELECT A, B, C, D FROM articles WHERE A = 1 AND B > 1 AND C = 1 AND D = 1;
|
Not Key Lookup(Good)
1
| SELECT A, B, C FROM articles WHERE A = 1 AND B > 1 AND C = 1;
|
id contains in secondary index leaf
1
| SELECT id, A, B, C FROM articles WHERE A = 1 AND B > 1 AND C = 1;
|
Index Usage Experience
When to Use Hash Index
- Just search
one
picture. - The column content nearly random or unrelated, such as UUID.
Filter rate is the point for index usage
- If reusable, index should be less.
- The index design is point to more filter rate.
Classic Question: (id, begin, end) vs (id, end, begin)
user_id | begin | end |
---|
~ | ~ | ~ |
7 | 1234567890000 | 1234567890100 |
7 | 1234567891000 | 1234567891100 |
7 | 1234567892000 | 1234567892100 |
7 | 1234567893000 | 1234567893100 |
7 | 1234567894000 | 1234567894100 |
7 | 1234567895000 | 1234567895100 |
7 | 1234567896000 | 1234567896100 |
7 | 1234567897000 | 1234567897100 |
7 | 1234567898000 | 1234567898100 |
~ | ~ | ~ |
find the actives coupons.
- Let us analysis if we search the below SQL.
- SELECT * FROM
table
WHERE id = 7 AND begin < 1234567894000 AND end > 1234567894000
Index Condition Pushdown
can be used for this query.- Only
first two column index
can be used which is (id, begin) in (id, begin, end) or (id, end) in (id, end, begin). - But, as to (id, begin) part, begin > 1234567894000 always filter less data, which the time always be
now
.- Because there are always data begin less than now.
- As to (id, end) part, end > 1234567894000 always filter more data, which time always be
now
.
So the better one is (id, end, begin)
Classic Question: types index
nearly average like computer types
- INDEX idx_computer_type(type)
- In this situation, when we use type = 1, just nearly 20% filter rate.
not average like payment status
- INDEX idx_payment_status(status)
- In this situation, almost payment will be status = final.
But
we always query the status != final which just occupy a little data like less than 1% (nearly more than 99% filter rate).
Data Size VS Index Size
- People always think that if data is just a little, there is no need to create index.
- When data is not large, it does not matter if we have many indexes.
- Because indexes will not use much storage.
- When data is large, it does not matter if we have many indexes.
- Because indexes will be important for us to query quickly.
Do not modify index column frequently
- We know the index is balanced and in order. If we modify the index column value, the B+Tree need to rebalance itself which cost much.