Postgres Query Optimization: LEFT JOIN vs UNION ALL
Introduction
The PostgreSQL optimizer is an amazing thing, getting only more amazing with each release. It is able to take information about your data definitions, your data distribution, constraints, and the specific queries and come up with the generally most efficient way to return the results of that query.
Since SQL is a declarative language, we're explicitly giving up defining how the database determines the results and trusting it to get the correct results in whatever method it deems most efficient. Sometimes we can structure queries in a certain way to get a better result than we get with the straightforward approach.
The issue
I was working with one of our clients hosted on
Crunchy Bridge, looking
at the top queries in pg_stat_statements
by mean time spent. The top outlier
was a query of the form:
SELECT
"table_a".*,
COALESCE("table_b"."count", 0) AS table_count
FROM
table_a
LEFT JOIN
table_b ON table_a.id = table_b.a_id
WHERE
NOT bool_1 AND NOT bool_2 AND NOT bool_3
ORDER BY
table_count DESC
LIMIT 100
The average time on this query, as reported by pg_stat_statements
was in the
2-3 second range, which for this client was too long. The tables involved here
were decently-sized, with millions of rows per table.
The specific query involved here utilized a table which stored the rollup counts
for an additional table; this query was used to support optimizing a COUNT(*)
for each row in a corresponding table. For this specific table, the data
distribution was a bit lopsided, so for a common case, the underlying COUNT(*)
would be fast; however there were enough outliers in the table that had huge
numbers of counts that the general-purpose plan would not work. For this reason,
we use an unlogged table to store the results of the lookups. (Since this
version of PostgreSQL does not support UNLOGGED MATERIALIZED VIEWS, we opted for
this approach as the generation of this table generated quite a bit of WAL.)
Here is what the plan for the unmodified query looked like:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=978034.80..978036.03 rows=10 width=2934)
-> Gather Merge (cost=978034.80..1430258.92 rows=3680894 width=2934)
Workers Planned: 7
-> Sort (cost=977034.68..978349.28 rows=525842 width=2934)
Sort Key: (COALESCE(table_b.count, '0'::bigint)) DESC
-> Parallel Hash Left Join (cost=87599.95..965671.42 rows=525842 width=2934)
Hash Cond: (table_a.id = table_b.a_id)
-> Parallel Seq Scan on table_a (cost=0.00..874743.68 rows=525842 width=2926)
Filter: ((NOT bool_1) AND (NOT bool_2) AND (NOT bool_3))
-> Parallel Hash (cost=62782.20..62782.20 rows=1985420 width=12)
-> Parallel Seq Scan on table_b (cost=0.00..62782.20 rows=1985420 width=12)
(11 rows)
Since the ORDER BY
clause utilizes an expression, we would not be able to
directly use an index here, and since the whole point of this COALESCE
is to
provide a default value for rows that do not exist in the joined table, we
cannot use some sort of an expression index. As such, we cannot directly use an
index to return the values we are interested in and we will need to resort to an
exernal sort.
So how can we see about improving this? One thing to note here is that we are
substituting in a constant value in our LEFT JOIN
case; basically we are
assuming that if there is a missing row, the count for this is now 0
. Let us
get our light bulbs ready, since we can use this fact as follows:
A LEFT JOIN
for tables A and B is the intersection of A and B plus the rows in
A that do not appear in B. (This is the NULL matching id case.) We cannot index
a left join directly, however the intersection case can be indexed quite
nicely, and we can perhap do something better than the current case for the null
join case. So rather than using a LEFT JOIN
directly, let's create a query
which explicitly calculates each part of the result set and appends them, and
see how we can improve performance.
For the intersection, we have the following:
SELECT
"table_a".*,
"table_b"."count" AS table_count
FROM
table_a
JOIN
table_b ON table_a.id = table_b.a_id
WHERE
NOT bool_1 AND NOT bool_2 AND NOT bool_3
ORDER BY
table_count DESC
For the rows in A not in B we have:
SELECT
"table_a".*,
0 AS table_count
FROM
table_a
LEFT JOIN
table_b ON table_a.id = table_b.a_id
WHERE
NOT bool_1 AND NOT bool_2 AND NOT bool_3
AND table_b.id IS NULL
Combining these two gives us the following query:
SELECT * FROM (
SELECT
"table_a".*,
"table_b"."count" AS table_count
FROM
table_a
JOIN
table_b ON table_a.id = table_b.a_id
WHERE
NOT bool_1 AND NOT bool_2 AND NOT bool_3
ORDER BY
table_count DESC
) intersect
UNION ALL
SELECT * FROM (
SELECT
"table_a".*,
0 AS table_count
FROM
table_a
LEFT JOIN
table_b ON table_a.id = table_b.a_id
WHERE
NOT bool_1 AND NOT bool_2 AND NOT bool_3
AND table_b.id IS NULL
) antijoin
LIMIT 100
Since in our original query there was no secondary ordering, we don't care what
the order of the antijoin rows are; they are always the same constant sort
value, so the natural order returned by the database is sufficient. Also, since
our intersecting set is an indexed column, our ORDER BY count DESC
can be
fulfilled by using a reverse btree index scan, saving us a sort node.
That gives us the following plan, which we can see now is much more performant:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.87..19.30 rows=10 width=6116) (actual time=0.049..0.132 rows=10 loops=1)
-> Append (cost=0.87..6784732.67 rows=3680878 width=6116) (actual time=0.048..0.130 rows=10 loops=1)
-> Nested Loop (cost=0.87..5172687.85 rows=1956032 width=2934) (actual time=0.048..0.119 rows=10 loops=1)
-> Index Scan Backward using table_b_count on table_b (cost=0.43..191262.84 rows=7941680 width=12) (actual time=0.016..0.033 rows=12 loops=1)
-> Index Scan using table_a_pkey on table_a (cost=0.43..0.63 rows=1 width=2926) (actual time=0.006..0.006 rows=1 loops=12)
Index Cond: (id = table_b.a_id)
Filter: ((NOT bool_1) AND (NOT bool_2) AND (NOT bool_3))
Rows Removed by Filter: 0
-> Subquery Scan on "*SELECT* 2" (cost=0.87..1554519.81 rows=1724847 width=2934) (never executed)
-> Merge Anti Join (cost=0.87..1532959.22 rows=1724847 width=2930) (never executed)
Merge Cond: (table_a_1.id = table_b_1.a_id)
-> Index Scan using table_a_pkey on table_a table_a_1 (cost=0.43..1332577.10 rows=3680879 width=2926) (never executed)
Filter: ((NOT bool_1) AND (NOT bool_2) AND (NOT bool_3))
-> Index Only Scan using table_b_a_id on table_b table_b_1 (cost=0.43..152158.13 rows=7941680 width=4) (never executed)
Heap Fetches: 0
Planning Time: 1.107 ms
Execution Time: 0.244 ms
(17 rows)
We went from 2-3 seconds execution time to under 1ms for the exact same result
set. This same technique could come in handy for any time there is a LEFT JOIN
that part of the results can be answered with an index, so rather than using a
heavier ORDER BY
for all total results, you can optimize each part
individually and combine.
Additionally, since we were applying a LIMIT
to the result set, you can see
that the antijoin subselect is not even calculated in this case, since the
number of rows we were requesting had already been fulfilled by the intersecting
set. It goes without saying that skipping a calculation is always faster than
running the calculation, no matter how fast.
I hope this technique comes in handy in your own query optimization journey.