MySQL Performance Tuning: From Indexing to Architectural Scaling
This technical document outlines a multi-layered approach to MySQL optimization, ranging from fine-grained query tuning to macro-level architectural strategies.
1. Indexing Strategy: The Foundation of Performance
Optimization begins at the storage engine level. In InnoDB, the goal is to minimize disk I/O by reducing the number of pages read from the B+Tree.
1.1 Covering Indexes
The most effective way to eliminate “Back-to-Table” (Look-up) operations. By including all columns required by a query in the index itself, MySQL can return results directly from the secondary index leaf nodes.
- Strategy: Identify high-frequency queries and create composite indexes that satisfy the
SELECTclause.
1.2 The Leftmost Prefix Rule
Composite indexes (a, b, c) are only effective if the query filters by a, or a and b, etc.
- Note: If you filter by
bandcwithouta, the index is skipped. - Optimization: Order columns in the index based on cardinality (highest uniqueness first) to filter out the maximum number of rows early.
1.3 Avoiding Index Failure
Indexing is bypassed if columns are wrapped in functions or operators.
- Bad:
WHERE YEAR(create_time) = 2026 - Good:
WHERE create_time >= '2026-01-01 00:00:00'
2. Analyzing Execution Plans (EXPLAIN)
The EXPLAIN command (or EXPLAIN ANALYZE in MySQL 8.0+) is the primary diagnostic tool for SQL bottlenecks.
2.1 Critical “Type” Field
The type column indicates how MySQL joins tables or scans data. Rank from best to worst:
- system/const: Primary key or unique index match (O(1)).
- eq_ref: Primary key used in a JOIN.
- ref: Non-unique index match.
- range: Index range scan (e.g.,
BETWEEN,>,IN). - index: Full Index Scan (scanning the whole index tree).
- ALL: Full Table Scan (Performance killer).
2.2 Key “Extra” Indicators
Watch for these red flags in the Extra column:
- Using filesort: MySQL must perform an extra pass to find out how to retrieve the rows in sorted order. Fix: Add an index on the
ORDER BYcolumns. - Using temporary: MySQL creates an intermediate table to hold results (common in
GROUP BY). Fix: Ensure grouping columns are indexed. - Using index: Indicates a covering index is used (Optimal).
3. Read/Write Splitting
When vertical scaling (upgrading CPU/RAM) reaches its limit, horizontal scaling via Read/Write splitting is the next step.
3.1 Master-Slave Replication
- Master Node: Handles all
INSERT,UPDATE,DELETEoperations. - Slave Nodes: Replicate data from the Master to handle
SELECTqueries.
3.2 Implementation Strategy
- Middleware Approach: Use tools like ProxySQL or MyCat to automatically route traffic based on the SQL verb.
- Application Level: Configure two data sources in the application (e.g., via Spring AbstractRoutingDataSource) to manage routing manually.
Challenge: Replication Lag. Applications must be designed to handle cases where a “Read-after-Write” might return stale data from a slave.
4. Database Sharding (Horizontal Partitioning)
Sharding is used when a single table’s volume (typically >20M rows or >100GB) degrades B+Tree depth and maintenance becomes unmanageable.
4.1 Sharding Dimensions
- Vertical Sharding: Splitting a “wide” table with many columns into multiple tables based on business modules (e.g.,
user_basevs.user_profile). - Horizontal Sharding (Sharding): Splitting rows across multiple databases based on a Sharding Key (e.g.,
user_id % 64).
4.2 Sharding Challenges
- Distributed Transactions: Ensuring ACID properties across multiple DB instances (often solved by using Saga patterns or TCC).
- Cross-Shard Joins: Joins become impossible across shards. Data must be denormalized or aggregated at the application layer.
- Global IDs: Standard auto-increment IDs will collide. Use Snowflake IDs or UUIDs.
5. Summary Checklist for Tuning
| Level | Action |
|---|---|
| SQL | Eliminate SELECT *, use LIMIT, avoid functions on indexed columns. |
| Index | Use covering indexes, minimize “back-to-table” lookups. |
| Schema | Use appropriate data types (e.g., INT over VARCHAR for IDs). |
| Arch | Implement Redis caching for hot data; use Read/Write splitting for high QPS. |