When “Unused” Indexes Aren’t Unused

A practical example of why large, apparently unused indexes cannot be evaluated safely in isolation, and why schema context matters when making optimisation decisions.

Index usage statistics are often one of the first things examined when reviewing a large database system. In many cases, they provide a clear indication of which indexes are actively supporting query workloads and which are not.

In one recent system, a large PostgreSQL table contained approximately 23 billion rows and occupied around 1.24 TB. The associated indexes were significantly larger, totalling over 2.6 TB.

At that scale, it is natural to ask whether all of those indexes are necessary.

A simple query against the PostgreSQL statistics views showed that several of the indexes had never been scanned.

SELECT
  relname AS table_name,
  indexrelname AS index_name,
  idx_scan,
  idx_tup_read,
  idx_tup_fetch
FROM pg_stat_user_indexes
WHERE relname = 'test_results';

Querying index usage statistics for the table

The results reveal three unused indexes on the table:

table_name    | index_name                           | idx_scan | idx_tup_read | idx_tup_fetch 
--------------+--------------------------------------+----------+--------------+---------------
 test_results | test_results_pkey                    |        0 |            0 |             0
 test_results | test_results_part_id_652b9f56        |        0 |            0 |             0
 test_results | test_results_part_result_id_e21d8800 |    54700 |     95149677 |      95149677
 test_results | test_results_session_id_24cb91ea     |        0 |            0 |             0
 test_results | test_results_test_id_f2c8cefb        |   109400 |       109400 |         54700
(5 rows)

The results show three indexes with zero scans.

Three of the indexes, each hundreds of gigabytes in size, showed zero scans.

At first glance, the conclusion seemed straightforward: these indexes were not being used and could potentially be removed, recovering a significant amount of disk space and reducing write overhead.

This is a reasonable line of thought. Large indexes do have a measurable impact:

they increase storage requirements
they add overhead to insert and update operations
they contribute to WAL generation and checkpoint activity

However, index usage statistics only describe one aspect of how an index is used. They show how often an index is used to satisfy queries. They do not indicate whether an index is structurally required.

A second query provided the missing context.

SELECT
    conname AS constraint_name,
    CASE contype
        WHEN 'p' THEN 'primary key'
        WHEN 'f' THEN 'foreign key'
        WHEN 'u' THEN 'unique'
    END AS constraint_type,
    conrelid::regclass AS table_name,
    array_agg(a.attname ORDER BY a.attnum) AS columns,
    confrelid::regclass AS referenced_table
FROM pg_constraint c
JOIN pg_attribute a ON a.attrelid = c.conrelid AND a.attnum = ANY(c.conkey)
WHERE conrelid = 'public.test_results'::regclass
AND contype IN ('p', 'f', 'u')
GROUP BY c.oid, conname, contype, conrelid, confrelid;

Inspecting constraints defined on the table

The results provided the missing context:

constraint_name                                          | constraint_type |  table_name  |     columns      | referenced_table 
---------------------------------------------------------+-----------------+--------------+------------------+------------------
 test_results_pkey                                       | primary key     | test_results | {id}             | -
 test_results_part_id_652b9f56_fk_parts_id               | foreign key     | test_results | {part_id}        | parts
 test_results_part_result_id_e21d8800_fk_part_results_id | foreign key     | test_results | {part_result_id} | part_results
 test_results_session_id_24cb91ea_fk_sessions_id         | foreign key     | test_results | {session_id}     | sessions
 test_results_test_id_f2c8cefb_fk_tests_id               | foreign key     | test_results | {test_id}        | tests
(5 rows)

Constraint names are derived from the index name, constraint type and column names.

This showed that the “unused” indexes were backing primary key and foreign key constraints.

In PostgreSQL, constraints of this type are implemented using indexes. Even if those indexes are never used directly by application queries, they are still required:

the primary key index enforces uniqueness
foreign key constraints rely on supporting indexes to enable efficient referential integrity checks
constraint enforcement relies on indexes to avoid full table scans

In other words, these indexes were not optional optimisation structures. They were part of the integrity model of the database.

The initial conclusion, based purely on scan statistics, was therefore incomplete. Removing those indexes would not simply have been a performance optimisation. It would have required removing or redesigning the constraints.

This example illustrates a broader point.

Operational statistics are valuable, but they must be interpreted in the context of the schema and the workload. An index that appears unused from a query perspective may still be essential for maintaining data integrity or supporting write operations.

At large scale, this distinction becomes more important. Changes that appear to be simple optimisations can have wider consequences:

removing a constraint-backed index may compromise data integrity
altering constraints can have downstream effects on other parts of the system
the cost of correcting a mistaken change grows with the size of the dataset

If the data model itself is not appropriate for the workload, that is a separate discussion. In some systems, particularly analytical or warehouse-style environments, it may be reasonable to revisit the use of foreign keys or to enforce integrity in upstream processes.

However, that is a design decision, not an index clean-up exercise.

The key point is that database objects should be evaluated in terms of their role within the system as a whole. Statistics provide useful signals, but they do not replace an understanding of how the system is structured.

As a general rule, before removing large or apparently unused indexes, it is worth confirming:

whether the index supports a primary key, foreign key, or unique constraint
how data integrity is enforced across the system
what assumptions other parts of the application make about those constraints

This is particularly important when working with very large datasets, where the cost of reversing a change can be significant.

Tools and automated analysis can highlight potential improvements, but they often operate on partial information. They are most effective when used as a starting point for investigation rather than as a substitute for understanding the system.