PostgreSQL BRIN Indexes: Big Data Performance With Minimal Storage
Many applications today record data from sensors, devices, tracking information, and other things that share a common attribute: a timestamp that is always increasing. This timestamp is very valuable, as it serves as the basis for types of lookups, analytical queries, and more.
PostgreSQL 9.5 introduced a feature called block range indexes (aka BRIN) that is incredibly helpful in efficiently searching over large time series data and has the benefit of taking up significantly less space on disk than a standard B-tree index. A block range index entry points to a page (the atomic unit of how PostgreSQL stores data) and stores two values: the page's minimum value and the maximum value of the item to be indexed.
In fact, when used appropriately, a BRIN index will not only outperform a B-tree but will also save over 99% of space on disk!
So, how can we use BRIN indexes to help save disk space and keep your application performant, and how does this compare to using PostgreSQL's partitioning system?
Setting up our Sensor Reading Application
Let's say we have an application that reads from a sensor every two seconds and we store the value that is read as well as the time it is recorded. We can accomplish this with a table that looks like this:
CREATE TABLE scans (
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
scan float NOT NULL,
created_at timestamptz NOT NULL
);
For the purposes of testing, I used UNLOGGED tables, which do not generate any WAL, to load the data to help improve performance times. If you want to follow the same methodology I used, you would run the following query instead:
CREATE UNLOGGED TABLE scans (
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
scan float NOT NULL,
created_at timestamptz NOT NULL
);
All of my tests are run on PostgreSQL 11.2 on my laptop which has eight cores available. I have tuned my PostgreSQL configuration file as well. Please use all the times in this article as directional.
Let's insert some data. For the first test, I decided to use 10,000,000 rows (well, 10,000,001) given the guidance on BRIN indexes is to use larger data sets. I calculated a range of data that would accomplish this, and used the below query to generate 10,000,001 scans:
INSERT INTO scans (scan, created_at)
SELECT random(), x
FROM generate_series('2012-01-01 0:00'::timestamptz,
'2018-05-03 20:33:20'::timestamptz, '2 seconds'::interval) x;
SELECT count(*) FROM scans;
count
-----------
100000001
Let's say I need to look at what the average scan value is over every day in a given month. I can use a query like this to accomplish this task:
SELECT date_trunc('day', created_at), avg(scan)
FROM scans
WHERE created_at BETWEEN '2017-02-01 0:00' AND '2017-02-28 11:59:59'
GROUP BY 1
ORDER BY 1;
(Note: I know I can do something like created_at >= '2017-02-01' AND created_at < '2017-03-01' as well).
For a baseline, let's run this query without an index. PostgreSQL will perform a sequential scan over the data, and if you have query parallelism enabled, the performance may surprise you. For the first test, I disabled parallel query to get a baseline value:
SET max_parallel_workers = 0;
EXPLAIN ANALYZE SELECT date_trunc('day', created_at), avg(scan)
FROM scans
WHERE created_at BETWEEN '2017-02-01 0:00' AND '2017-02-28 11:59:59'
GROUP BY 1
ORDER BY 1;
which returned the following results:
Finalize GroupAggregate (cost=128919.29..294297.77 rows=1147587 width=16) (actual time=1335.080..1587.770 rows=28 loops=1)
Group Key: (date_trunc('day'::text, created_at))
-> Gather Merge (cost=128919.29..271346.02 rows=1147588 width=40) (actual time=1326.056..1587.715 rows=28 loops=1)
Workers Planned: 4
Workers Launched: 0
-> Partial GroupAggregate (cost=127919.23..133657.17 rows=286897 width=40) (actual time=1325.865..1587.464 rows=28 loops=1)
Group Key: (date_trunc('day'::text, created_at))
-> Sort (cost=127919.23..128636.47 rows=286897 width=16) (actual time=1316.989..1443.944 rows=1188000 loops=1)
Sort Key: (date_trunc('day'::text, created_at))
Sort Method: external merge Disk: 30232kB
-> Parallel Seq Scan on scans (cost=0.00..101911.77 rows=286897 width=16) (actual time=88.985..1026.441 rows=1188000 loops=1)
Filter: ((created_at >= '2017-02-01 00:00:00-05'::timestamp with time zone) AND (created_at <= '2017-02-28 11:59:59-05'::timestamp with time zone))
Rows Removed by Filter: 8812001
Planning Time: 0.098 ms
Execution Time: 1595.763 ms
In my PostgreSQL environment, the query took about 1600ms to execute. Note that even though it planned to launch four parallel workers, none were actually launched. Let's see what happens when we re-enable parallel query:
SET max_parallel_workers = 8;
EXPLAIN ANALYZE SELECT date_trunc('day', created_at), avg(scan)
FROM scans
WHERE created_at BETWEEN '2017-02-01 0:00' AND '2017-02-28 11:59:59'
GROUP BY 1
ORDER BY 1;
which returns:
Finalize GroupAggregate (cost=131287.23..310221.83 rows=1241656 width=16) (actual time=396.036..453.017 rows=28 loops=1)
Group Key: (date_trunc('day'::text, created_at))
-> Gather Merge (cost=131287.23..285388.71 rows=1241656 width=40) (actual time=392.398..458.091 rows=140 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Partial GroupAggregate (cost=130287.17..136495.45 rows=310414 width=40) (actual time=385.556..435.999 rows=28 loops=5)
Group Key: (date_trunc('day'::text, created_at))
-> Sort (cost=130287.17..131063.21 rows=310414 width=16) (actual time=383.855..401.129 rows=237600 loops=5)
Sort Key: (date_trunc('day'::text, created_at))
Sort Method: quicksort Memory: 24859kB
Worker 0: Sort Method: quicksort Memory: 16919kB
Worker 1: Sort Method: quicksort Memory: 16852kB
Worker 2: Sort Method: quicksort Memory: 17094kB
Worker 3: Sort Method: quicksort Memory: 16830kB
-> Parallel Seq Scan on scans (cost=0.00..101971.47 rows=310414 width=16) (actual time=31.678..334.686 rows=237600 loops=5)
Filter: ((created_at >= '2017-02-01 00:00:00-05'::timestamp with time zone) AND (created_at <= '2017-02-28 11:59:59-05'::timestamp with time zone))
Rows Removed by Filter: 1762400
Planning Time: 0.128 ms
Execution Time: 459.501 ms
In this case, PostgreSQL decided to launch four parallel workers and the overall query performance improved almost 3.5 times.
However, this article is not about parallel query, but BRIN indexes. But before we look at using BRIN, let's take another baseline using the standard B-tree index. Let's create the B-tree index using aggressive VACUUM settings
CREATE INDEX scans_created_at_idx ON scans (created_at);
VACUUM FREEZE ANALYZE;
When we run the query again:
EXPLAIN ANALYZE SELECT date_trunc('day', created_at), avg(scan)
FROM scans
WHERE created_at BETWEEN '2017-02-01 0:00' AND '2017-02-28 11:59:59'
GROUP BY 1
ORDER BY 1;
we get:
GroupAggregate (cost=170490.94..197720.71 rows=1210212 width=16) (actual time=872.049..1128.159 rows=28 loops=1)
Group Key: (date_trunc('day'::text, created_at))
-> Sort (cost=170490.94..173516.47 rows=1210212 width=16) (actual time=861.351..985.978 rows=1188000 loops=1)
Sort Key: (date_trunc('day'::text, created_at))
Sort Method: external merge Disk: 30232kB
-> Index Scan using scans_created_at_idx on scans (cost=0.43..48218.21 rows=1210212 width=16) (actual time=0.044..561.311 rows=1188000 loops=1)
Index Cond: ((created_at >= '2017-02-01 00:00:00-05'::timestamp with time zone) AND (created_at <= '2017-02-28 11:59:59-05'::timestamp with time zone))
Planning Time: 0.145 ms
Execution Time: 1137.303 ms
which does perform better than the baseline without parallel query. Let's also take a look at the size of this index:
SELECT pg_size_pretty(pg_relation_size('scans_created_at_idx'));
pg_size_pretty
----------------
214 MB
Now, let's see what happens when we use a BRIN index. To use BRIN on a column, you need to utilize the USING brin as such:
DROP INDEX scans_created_at_idx;
CREATE INDEX scans_created_at_brin_idx ON scans USING brin(created_at);
VACUUM FREEZE ANALYZE;
Let's look at how the query performs. When we run:
EXPLAIN ANALYZE SELECT date_trunc('day', created_at), avg(scan)
FROM scans
WHERE created_at BETWEEN '2017-02-01 0:00' AND '2017-02-28 11:59:59'
GROUP BY 1
ORDER BY 1;
we get something that looks like:
GroupAggregate (cost=205442.58..232278.17 rows=1192693 width=16) (actual time=702.798..954.675 rows=28 loops=1)
Group Key: (date_trunc('day'::text, created_at))
-> Sort (cost=205442.58..208424.31 rows=1192693 width=16) (actual time=694.105..813.864 rows=1188000 loops=1)
Sort Key: (date_trunc('day'::text, created_at))
Sort Method: external merge Disk: 30232kB
-> Bitmap Heap Scan on scans (cost=316.09..85065.32 rows=1192693 width=16) (actual time=1.834..423.803 rows=1188000 loops=1)
Recheck Cond: ((created_at >= '2017-02-01 00:00:00-05'::timestamp with time zone) AND (created_at <= '2017-02-28 11:59:59-05'::timestamp with time zone))
Rows Removed by Index Recheck: 17760
Heap Blocks: lossy=7680
-> Bitmap Index Scan on scans_created_at_brin_idx (cost=0.00..17.92 rows=1204833 width=0) (actual time=0.392..0.392 rows=76800 loops=1)
Index Cond: ((created_at >= '2017-02-01 00:00:00-05'::timestamp with time zone) AND (created_at <= '2017-02-28 11:59:59-05'::timestamp with time zone))
Planning Time: 0.143 ms
Execution Time: 967.445 ms
Using a BRIN index, the query performs about 40% faster than having no index without parallel query and 15% faster than the B-tree. It is still slower than the parallel sequential scan in this case, but that could be a result of the data set we are looking over. How about the disk utilization, how much disk space does the BRIN index take up?
SELECT pg_size_pretty(pg_relation_size('scans_created_at_brin_idx'));
pg_size_pretty
----------------
32 kB
Yes, the BRIN index takes up only 32 kilobytes! This means that the BRIN index takes up 1/100th of the space that is required to store a B-tree index over the same data and it out performs it for these analytical type of queries.
One might ask, well, given the parallel sequential scan outperformed why would I need any index? Well, I can give you two reasons: 1) our query and 2) the data set may still be small.
Let's change up the query slightly. Let's say I to look at the average scan value on an hourly basis on Feb 14, 2017. I would run a query like this:
SELECT date_trunc('hour', created_at), avg(scan)
FROM scans
WHERE created_at BETWEEN '2017-02-14 0:00' AND '2017-02-14 11:59:59'
GROUP BY 1
ORDER BY 1;
With parallel query enabled, I get results like this:
Finalize GroupAggregate (cost=102558.20..105786.81 rows=22403 width=16) (actual time=337.682..338.889 rows=12 loops=1)
Group Key: (date_trunc('hour'::text, created_at))
-> Gather Merge (cost=102558.20..105338.75 rows=22404 width=40) (actual time=337.611..349.192 rows=60 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Partial GroupAggregate (cost=101558.14..101670.16 rows=5601 width=40) (actual time=330.144..331.137 rows=12 loops=5)
Group Key: (date_trunc('hour'::text, created_at))
-> Sort (cost=101558.14..101572.14 rows=5601 width=16) (actual time=330.029..330.453 rows=4320 loops=5)
Sort Key: (date_trunc('hour'::text, created_at))
Sort Method: quicksort Memory: 251kB
Worker 0: Sort Method: quicksort Memory: 251kB
Worker 1: Sort Method: quicksort Memory: 435kB
Worker 2: Sort Method: quicksort Memory: 391kB
Worker 3: Sort Method: quicksort Memory: 454kB
-> Parallel Seq Scan on scans (cost=0.00..101209.43 rows=5601 width=16) (actual time=55.838..328.854 rows=4320 loops=5)
Filter: ((created_at >= '2017-02-14 00:00:00-05'::timestamp with time zone) AND (created_at <= '2017-02-14 11:59:59-05'::timestamp with time zone))
Rows Removed by Filter: 1995680
Planning Time: 0.115 ms
Execution Time: 349.280 ms
whereas with a BRIN index, I get this:
GroupAggregate (cost=60909.70..61385.39 rows=21142 width=16) (actual time=13.448..16.902 rows=12 loops=1)
Group Key: (date_trunc('hour'::text, created_at))
-> Sort (cost=60909.70..60962.55 rows=21142 width=16) (actual time=13.081..14.367 rows=21600 loops=1)
Sort Key: (date_trunc('hour'::text, created_at))
Sort Method: quicksort Memory: 1781kB
-> Bitmap Heap Scan on scans (cost=21.35..59390.88 rows=21142 width=16) (actual time=1.534..9.303 rows=21600 loops=1)
Recheck Cond: ((created_at >= '2017-02-14 00:00:00-05'::timestamp with time zone) AND (created_at <= '2017-02-14 11:59:59-05'::timestamp with time zone))
Rows Removed by Index Recheck: 18592
Heap Blocks: lossy=256
-> Bitmap Index Scan on scans_created_at_brin_idx (cost=0.00..16.06 rows=40160 width=0) (actual time=0.123..0.123 rows=2560 loops=1)
Index Cond: ((created_at >= '2017-02-14 00:00:00-05'::timestamp with time zone) AND (created_at <= '2017-02-14 11:59:59-05'::timestamp with time zone))
Planning Time: 0.110 ms
Execution Time: 16.982 ms
In this case, using the BRIN index is about 20x faster than the parallel sequential scan. Now let's look at some much bigger data.
BRIN vs. B-Tree with 100 Million Rows
Let's increase our data set ten times and using 100 million rows. After dropping and creating my scans table, I populated the data using this query:
INSERT INTO scans (scan, created_at)
SELECT random(), x
FROM generate_series('2012-01-01 0:00'::timestamptz,
'2018-05-03 20:33:20'::timestamptz, '2 seconds'::interval) x;
which produces 100,000,001 rows:
SELECT count(*) FROM scans;
count
-----------
100000001
We will use the original query for the next step of experiments, i.e. finding the average scan for every day in the month of February, 2017, i.e.
SELECT date_trunc('day', created_at), avg(scan)
FROM scans
WHERE created_at BETWEEN '2017-02-01 0:00' AND '2017-02-28 11:59:59'
GROUP BY 1
ORDER BY 1;
Here is the baseline output with parallel query disabled:
Finalize GroupAggregate (cost=906718.36..1086472.33 rows=1240794 width=16) (actual time=9787.788..10042.382 rows=28 loops=1)
Group Key: (date_trunc('day'::text, created_at))
-> Gather Merge (cost=906718.36..1061656.45 rows=1240794 width=40) (actual time=9777.030..10043.054 rows=28 loops=1)
Workers Planned: 6
Workers Launched: 0
-> Partial GroupAggregate (cost=905718.27..909854.25 rows=206799 width=40) (actual time=9776.767..10041.978 rows=28 loops=1)
Group Key: (date_trunc('day'::text, created_at))
-> Sort (cost=905718.27..906235.26 rows=206799 width=16) (actual time=9765.630..9888.880 rows=1188000 loops=1)
Sort Key: (date_trunc('day'::text, created_at))
Sort Method: external merge Disk: 30232kB
-> Parallel Seq Scan on scans (cost=0.00..887460.12 rows=206799 width=16) (actual time=7496.431..9365.666 rows=1188000 loops=1)
Filter: ((created_at >= '2017-02-01 00:00:00-05'::timestamp with time zone) AND (created_at <= '2017-02-28 11:59:59-05'::timestamp with time zone))
Rows Removed by Filter: 98812001
Planning Time: 3.534 ms
Execution Time: 10051.513 ms
And parallel query enabled:
Finalize GroupAggregate (cost=906718.36..1086472.33 rows=1240794 width=16) (actual time=2547.817..2604.061 rows=28 loops=1)
Group Key: (date_trunc('day'::text, created_at))
-> Gather Merge (cost=906718.36..1061656.45 rows=1240794 width=40) (actual time=2545.522..2615.073 rows=196 loops=1)
Workers Planned: 6
Workers Launched: 6
-> Partial GroupAggregate (cost=905718.27..909854.25 rows=206799 width=40) (actual time=2526.779..2575.461 rows=28 loops=7)
Group Key: (date_trunc('day'::text, created_at))
-> Sort (cost=905718.27..906235.26 rows=206799 width=16) (actual time=2525.088..2541.112 rows=169714 loops=7)
Sort Key: (date_trunc('day'::text, created_at))
Sort Method: quicksort Memory: 14277kB
Worker 0: Sort Method: quicksort Memory: 13945kB
Worker 1: Sort Method: quicksort Memory: 14225kB
Worker 2: Sort Method: quicksort Memory: 13945kB
Worker 3: Sort Method: quicksort Memory: 14094kB
Worker 4: Sort Method: quicksort Memory: 14372kB
Worker 5: Sort Method: quicksort Memory: 13840kB
-> Parallel Seq Scan on scans (cost=0.00..887460.12 rows=206799 width=16) (actual time=1910.448..2474.151 rows=169714 loops=7)
Filter: ((created_at >= '2017-02-01 00:00:00-05'::timestamp with time zone) AND (created_at <= '2017-02-28 11:59:59-05'::timestamp with time zone))
Rows Removed by Filter: 14116000
Planning Time: 0.138 ms
Execution Time: 2615.837 ms
The parallel sequential scans certainly help in the case of having no indexes. Let's create a B-tree, this time seeing how long it takes to create, and prepare to run the next query:
\timing
CREATE INDEX scans_created_at_idx ON scans (created_at);
Time: 34434.702 ms (00:34.435)
VACUUM FREEZE ANALYZE;
Let's see what happens when we run the query:
EXPLAIN ANALYZE SELECT date_trunc('day', created_at), avg(scan)
FROM scans
WHERE created_at BETWEEN '2017-02-01 0:00' AND '2017-02-28 11:59:59'
GROUP BY 1
ORDER BY 1;
we get the following results:
GroupAggregate (cost=175102.37..203032.61 rows=1241344 width=16) (actual time=854.269..1118.307 rows=28 loops=1)
Group Key: (date_trunc('day'::text, created_at))
-> Sort (cost=175102.37..178205.73 rows=1241344 width=16) (actual time=845.117..971.251 rows=1188000 loops=1)
Sort Key: (date_trunc('day'::text, created_at))
Sort Method: external merge Disk: 30232kB
-> Index Scan using scans_created_at_idx on scans (cost=0.57..49456.81 rows=1241344 width=16) (actual time=0.020..531.775 rows=1188000 loops=1)
Index Cond: ((created_at >= '2017-02-01 00:00:00-05'::timestamp with time zone) AND (created_at <= '2017-02-28 11:59:59-05'::timestamp with time zone))
Planning Time: 0.094 ms
Execution Time: 1126.357 ms
At 100 million rows, the B-tree index helps this query outperform the parallelized sequential scan. How much space does it take up on disk?
SELECT pg_size_pretty(pg_relation_size('scans_created_at_idx'));
pg_size_pretty
----------------
2142 MB
This index takes up over 2GB on disk. Let's see how this compares to the BRIN index, both in terms of space utilization and performance. First, let's set up our BRIN index, ensuring we time the creation of it this time too:
DROP INDEX scans_created_at_idx;
CREATE INDEX scans_created_at_brin_idx ON scans USING brin(created_at);
Time: 18396.309 ms (00:18.396)
VACUUM FREEZE ANALYZE;
The BRIN index appears to take a lot less time to create than the B-tree index. Given I am running PostgreSQL 11, I have the added benefit of parallelized B-tree index creation too, so you can see really how efficiently BRIN index creation can be! Now, let's see how the query performs. When I run:
EXPLAIN ANALYZE SELECT date_trunc('day', created_at), avg(scan)
FROM scans
WHERE created_at BETWEEN '2017-02-01 0:00' AND '2017-02-28 11:59:59'
GROUP BY 1
ORDER BY 1;
my output is:
GroupAggregate (cost=785242.87..810191.48 rows=1108827 width=16) (actual time=703.571..968.501 rows=28 loops=1)
Group Key: (date_trunc('day'::text, created_at))
-> Sort (cost=785242.87..788014.94 rows=1108827 width=16) (actual time=693.621..821.642 rows=1188000 loops=1)
Sort Key: (date_trunc('day'::text, created_at))
Sort Method: external merge Disk: 30232kB
-> Bitmap Heap Scan on scans (cost=362.00..673913.30 rows=1108827 width=16) (actual time=1.424..416.475 rows=1188000 loops=1)
Recheck Cond: ((created_at >= '2017-02-01 00:00:00-05'::timestamp with time zone) AND (created_at <= '2017-02-28 11:59:59-05'::timestamp with time zone))
Rows Removed by Index Recheck: 17760
Heap Blocks: lossy=7680
-> Bitmap Index Scan on scans_created_at_brin_idx (cost=0.00..84.79 rows=1125176 width=0) (actual time=1.146..1.146 rows=76800 loops=1)
Index Cond: ((created_at >= '2017-02-01 00:00:00-05'::timestamp with time zone) AND (created_at <= '2017-02-28 11:59:59-05'::timestamp with time zone))
Planning Time: 0.111 ms
Execution Time: 975.262 ms
The BRIN index helps this query to outperform its equivalent using the B-tree index and the parallel sequential scan. How about size on disk?
SELECT pg_size_pretty(pg_relation_size('scans_created_at_brin_idx'));
pg_size_pretty
----------------
184 kB
Yes, the first time I read that, I did a double take too. The BRIN index takes up only 184 KB versus over 2GB for the B-tree! And our query using the BRIN index outperforms the B-tree index as well by around 15%, wow!
Does This Mean BRIN Indexes Are a Panacea?
The answer to that is: maybe. First, as you may have observed, you really start seeing the benefit of BRIN indexes as your tables grow to significant size. It also demonstrates PostgreSQL's ability to scale vertically: a BRIN index should certainly help you efficiently run queries for many of the problems you are trying to solve, especially around temporal analytics.
The biggest thing is the storage space: being able to reduce your index footprint by over 99% is huge if your data set allows you to take advantage of BRIN indexes.
What about partitioning? PostgreSQL 11 introduced improved support for its internal partitioning system, and it would be interesting to see how partitioning affects your index selection. But that topic is for the next post in this series.