TL;DR
- Indexes = direct lookups — milliseconds vs full table scans (seconds).
-
B-tree for most queries — Supports
=,<,>,BETWEEN,LIKE 'prefix%',ORDER BY. -
Index
WHERE/JOIN/ORDER BYcolumns — Otherwise full scan. -
Composite index order matters —
(a, b, c)works fora,a+b,a+b+c— notborcalone. Equality first, then range. -
Partial indexes —
WHERE active = true= smaller, faster. -
Covering indexes with
INCLUDE— Query never touches table. -
Maintenance — Drop unused (
idx_scan = 0), remove duplicates,REINDEX,ANALYZE. -
Common mistakes — Indexing everything, wrong column order, low-selectivity columns (booleans), functions like
YEAR(date) =, skipping maintenance.
Database indexes are the most powerful tool for query optimization. Without indexes, databases scan entire tables to find matching rows. With proper indexes, databases locate data directly. The difference between scanning millions of rows and looking up a handful determines whether queries take seconds or milliseconds.
How Database Indexes Work
Indexes are auxiliary data structures that enable fast data lookup. Think of a book's index: instead of reading every page to find a topic, you look up the topic in the index and go directly to the relevant pages.
| Scenario | Operation | Performance Impact |
|---|---|---|
| Without index | Full table scan — every row read and evaluated | For a million-row table: reads 1 million rows |
| With index | B-tree traversal + fetch matching rows | Traverses 3–4 levels, each = one disk read |
With an index, databases perform index lookups. Finding matching rows requires only the tree traversal plus fetching the actual rows. For queries returning few rows, this is orders of magnitude faster.
Indexes have costs. They consume storage space. They slow down INSERT, UPDATE, and DELETE operations because indexes must be maintained. Index too much, and write performance suffers.
Query Planner Decision Factors:
- Estimates costs for different strategies
- Chooses the cheapest execution plan
- Sometimes full scans are faster than index lookups — particularly when queries return most rows
Types of Indexes
B-tree indexes suit equality and range queries. They efficiently handle =, <, >, BETWEEN, and LIKE 'prefix%' conditions. Most databases use B-tree as the default index type.
-- B-tree index (default)
CREATE INDEX idx_users_email ON users(email);
-- Supports these queries efficiently
SELECT * FROM users WHERE email = 'user@example.com';
SELECT * FROM users WHERE email LIKE 'user%';
SELECT * FROM users WHERE email BETWEEN 'a%' AND 'm%';
Hash indexes provide fast equality lookups only. They don't support range queries. Some databases use hash indexes for specific use cases.
GIN indexes (Generalized Inverted Index) suit full-text search and array columns. PostgreSQL uses GIN for JSONB containment queries and full-text search.
-- GIN index for JSONB
CREATE INDEX idx_products_metadata ON products USING GIN (metadata);
-- Supports JSONB containment queries
SELECT * FROM products WHERE metadata @> '{"category": "electronics"}';
GiST indexes (Generalized Search Tree) suit geometric data and range types. PostGIS uses GiST for spatial queries.
Partial indexes index only a subset of rows — useful when queries consistently filter on a condition.
-- Partial index for active users only
CREATE INDEX idx_active_users ON users(email) WHERE active = true;
Expression indexes index computed expressions rather than raw columns.
-- Index on lowercase email for case-insensitive matching
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
Choosing What to Index
Index these:
- Columns used in WHERE clauses — If queries filter on a column, it likely needs an index
- Columns used in JOIN conditions — Joining tables on unindexed columns requires scanning
- Columns used in ORDER BY — Indexes can provide pre-sorted data, eliminating sort operations
Analyze query patterns before indexing. Look at actual queries your application runs. Slow query logs reveal what needs optimization.
Selectivity Guidelines:
| Column Type | Selectivity | Index Benefit |
|---|---|---|
| Highly selective (many unique values) | High | Benefits greatly from indexing |
| Low selectivity (few unique values, e.g., boolean) | Low | Rarely benefits from indexing |
- Primary keys — Automatically indexed
- Foreign keys — Often need explicit indexes for join performance
Don't index everything. Each index adds write overhead and storage. Index strategically based on query patterns.
Use EXPLAIN to verify index usage. Query plans show whether indexes are used and how queries execute.
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 123;
-- Look for "Index Scan" vs "Seq Scan" in output
Composite Index Strategy
Composite indexes cover multiple columns. A single index on (a, b, c) can support queries filtering on a, or a and b, or a and b and c.
Column order matters critically. An index on (a, b) supports queries on a alone, but not queries on b alone.
-- Composite index
CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);
-- Efficiently supports:
SELECT * FROM orders WHERE customer_id = 123;
SELECT * FROM orders WHERE customer_id = 123 AND order_date > '2025-01-01';
-- Does NOT efficiently support:
SELECT * FROM orders WHERE order_date > '2025-01-01'; -- Can't use leading column
Put equality conditions before range conditions. In an index (a, b), if queries use a = value AND b > value, this works well. If queries use a > value AND b = value, the index is less effective for b.
Consider covering indexes. Including all columns a query needs in the index avoids fetching the actual table rows. PostgreSQL's INCLUDE clause adds columns without affecting sort order.
-- Covering index (PostgreSQL)
CREATE INDEX idx_orders_covering ON orders(customer_id) INCLUDE (total, status);
-- Query can be satisfied entirely from the index
SELECT total, status FROM orders WHERE customer_id = 123;
Index Maintenance and Monitoring
Monitor index usage statistics. Databases track how often indexes are used. Unused indexes waste resources.
-- PostgreSQL: Find unused indexes
SELECT schemaname, relname, indexrelname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0;
Remove unused indexes. Indexes not used for queries only slow down writes. Periodically audit and drop unused indexes.
Rebuild fragmented indexes. Over time, indexes become fragmented, reducing efficiency. Periodic rebuilding restores performance.
-- PostgreSQL: Rebuild index
REINDEX INDEX idx_orders_customer;
-- MySQL: Optimize table (rebuilds indexes)
OPTIMIZE TABLE orders;
Monitor index size relative to table size. Indexes larger than expected may indicate problems.
Check for duplicate indexes. Multiple indexes on the same columns waste resources. A composite index (a, b) makes a single-column index on a redundant.
Analyze tables regularly. Statistics help query planners make good decisions. Outdated statistics lead to poor query plans.
-- PostgreSQL
ANALYZE orders;
-- MySQL
ANALYZE TABLE orders;
Common Indexing Mistakes
| Mistake | Problem | Solution |
|---|---|---|
| Indexing every column | Wastes resources, adds write overhead | Index only columns in query conditions |
| Ignoring column order in composite indexes | Index on (a, b) doesn't help b alone |
Put equality first, range last |
| Over-indexing for write-heavy workloads | Slow INSERT/UPDATE/DELETE | Balance read vs write performance |
| Not considering index-only scans | Queries need table access | Design covering indexes |
| Indexing low-selectivity columns | Boolean index points to half the table | Rarely worth it |
| Functions preventing index usage |
WHERE YEAR(date) = 2025 can't use index |
Rewrite as date >= '2025-01-01' AND date < '2026-01-01'
|
| Forgetting index maintenance | Fragmentation reduces efficiency | Plan for regular rebuilding |
Database-Specific Considerations
PostgreSQL offers diverse index types — B-tree, Hash, GIN, GiST, BRIN, and more — suited to different data types and access patterns.
MySQL clusters data with the primary key. Secondary indexes include the primary key, affecting size and lookup performance. Choose primary keys carefully.
-- MySQL: Primary key affects all secondary indexes
CREATE TABLE users (
id BIGINT PRIMARY KEY, -- Clustered
email VARCHAR(255),
INDEX idx_email (email) -- Includes id implicitly
);
SQL Server distinguishes clustered and non-clustered indexes. One clustered index per table determines physical row order.
Cloud databases may have specific index features. Amazon Aurora, Google Cloud SQL, and Azure SQL have optimization features beyond their base engines.
Conclusion
Database indexing is both science and art. The science: B-trees, composite column order, covering indexes — clear, measurable behavior. The art: choosing which indexes to create, balancing read vs write overhead, knowing when a full table scan is actually faster.
The process:
- Start with the slowest queries
- Run
EXPLAIN ANALYZE - Look for full table scans (
Seq Scanin PostgreSQL,ALLin MySQL) - Add indexes for columns in
WHERE,JOIN, andORDER BYclauses - Build composite indexes with equality columns first, range columns last
- Monitor index usage — unused indexes waste resources
- Rebuild occasionally. Drop duplicates.
- Always verify with
EXPLAINthat your index is actually being used
An index that isn't used is just wasted storage and slower writes. Done right, indexes transform query performance from unbearable to instant. Done wrong, they add complexity without benefit.
👉 Talk to Our Engineers | See Case Studies
FAQs
1. How do I know if a query needs an index?
Run EXPLAIN ANALYZE before and after adding the index. Look for:
-
Before:
Seq Scan(PostgreSQL) orALL(MySQL) — database reads every row -
After:
Index ScanorIndex Only Scan— database used your index
If a query is slow and EXPLAIN shows a full table scan on a large table, that's your candidate. Also check the rows estimate vs actual — wildly inaccurate estimates often indicate missing statistics or a need for ANALYZE.
2. What's the difference between a composite index and multiple single-column indexes?
| Aspect | Composite Index (a, b, c)
|
Multiple Single-Column Indexes on a, b, c
|
|---|---|---|
| Structure | One B-tree sorted by a, then b, then c
|
Three separate B-trees |
Supports a alone |
✅ Yes (leading column) | ✅ Yes |
Supports a + b
|
✅ Excellent | ⚠️ Can combine (bitmap scans) but less efficient |
Supports b alone |
❌ No | ✅ Yes |
| Storage | One index | Three indexes (more storage) |
| Rule of thumb | Use for columns frequently used together in filters | Use for columns frequently filtered independently |
3. How many indexes is too many?
There's no magic number — it depends on your read/write ratio. Every INSERT, UPDATE, and DELETE must update every index on that table. If your write throughput is high, many indexes become expensive.
Signs of over-indexing:
- Write queries are slow, but
EXPLAINshows they're waiting on index updates - High
idx_blks_writtenrelative toidx_blks_read(PostgreSQL) - Unused indexes (
idx_scan = 0)
Start with: indexes for primary keys, foreign keys, and the top 5–10 slowest queries. Add more as needed. Remove unused indexes quarterly. A lean index set beats a bloated one.
This article was originally published by DEV Community and written by Safdar Wahid.
Read original article on DEV Community
