The Integer at the End of the Universe: Integer Overflow in Postgres
Integer overflow occurs when a computer program tries to store an integer but the value being stored exceeds the maximum value that can be represented by the data type being used to store it. We have helped a few Crunchy Data clients navigate this recently and wanted to write up some notes.
In Postgres, there are three integer types:
smallint
- A 2-byte integer, -32768 to 32767integer
- A 4-byte integer, -2147483648 to 2147483647bigint
- An 8-byte integer, -9223372036854775808 to +9223372036854775807
It is not uncommon to use a 4-byte integer as a primary key when defining a new table. This can cause problems if the value to be represented is more than 4-bytes can hold. If a sequence’s limit is reached you might see an error in your logs that looks like this:
ERROR: nextval: reached maximum value of sequence "test_id_seq" (2147483647)
Don’t Panic! We have some helpful and intelligible PostgreSQL solutions.
How do you know if you are close to overflowing an integer?
The following query will identify any auto-incrementing columns, which SEQUENCE object it owns, data types of the column and SEQUENCE object, and percent until the sequence value exceeds the sequence or column data type:
SELECT
seqs.relname AS sequence,
format_type(s.seqtypid, NULL) sequence_datatype,
CONCAT(tbls.relname, '.', attrs.attname) AS owned_by,
format_type(attrs.atttypid, atttypmod) AS column_datatype,
pg_sequence_last_value(seqs.oid::regclass) AS last_sequence_value,
TO_CHAR((
CASE WHEN format_type(s.seqtypid, NULL) = 'smallint' THEN
(pg_sequence_last_value(seqs.relname::regclass) / 32767::float)
WHEN format_type(s.seqtypid, NULL) = 'integer' THEN
(pg_sequence_last_value(seqs.relname::regclass) / 2147483647::float)
WHEN format_type(s.seqtypid, NULL) = 'bigint' THEN
(pg_sequence_last_value(seqs.relname::regclass) / 9223372036854775807::float)
END) * 100, 'fm9999999999999999999990D00%') AS sequence_percent,
TO_CHAR((
CASE WHEN format_type(attrs.atttypid, NULL) = 'smallint' THEN
(pg_sequence_last_value(seqs.relname::regclass) / 32767::float)
WHEN format_type(attrs.atttypid, NULL) = 'integer' THEN
(pg_sequence_last_value(seqs.relname::regclass) / 2147483647::float)
WHEN format_type(attrs.atttypid, NULL) = 'bigint' THEN
(pg_sequence_last_value(seqs.relname::regclass) / 9223372036854775807::float)
END) * 100, 'fm9999999999999999999990D00%') AS column_percent
FROM
pg_depend d
JOIN pg_class AS seqs ON seqs.relkind = 'S'
AND seqs.oid = d.objid
JOIN pg_class AS tbls ON tbls.relkind = 'r'
AND tbls.oid = d.refobjid
JOIN pg_attribute AS attrs ON attrs.attrelid = d.refobjid
AND attrs.attnum = d.refobjsubid
JOIN pg_sequence s ON s.seqrelid = seqs.oid
WHERE
d.deptype = 'a'
AND d.classid = 1259;
To show this query in action, let me set up a test table with an integer
primary key, where the sequence has been artificially advanced to 2 Billion:
postgres=# create table test(id serial primary key, value integer);
CREATE TABLE
postgres=# select setval('test_id_seq', 2000000000);
setval
------------
2000000000
(1 row)
postgres=# \d test
Table "public.test"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+----------------------------------
id | integer | | not null | nextval('test_id_seq'::regclass)
value | integer | | |
Indexes:
"test_pkey" PRIMARY KEY, btree (id)
Now when running the query above to find the integer overflow percent, I can see
that that the data types for both the column and the sequence are both
integer
, and since the sequence’s next value is 2 Billion, it is 93% through
the acceptable range:
sequence | sequence_datatype | owned_by | column_datatype | last_sequence_value | sequence_percent | column_percent
-------------+-------------------+----------+-----------------+---------------------+------------------+----------------
test_id_seq | integer | test.id | integer | 2000000001 | 93.13% | 93.13%
(1 row)
Changing to negative number sequencing
Since the integer
types in Postgres include negative numbers, a simple way to
deal with integer overflow is to flip to sequencing with negative numbers. This
can be done by giving the sequence a new start value of -1
and converting to a
descending sequence by giving it a negative INCREMENT
value:
alter sequence test_id_seq no minvalue start with -1 increment -1 restart;
If the purpose of the generated key is purely to create uniqueness, negative values are perfectly acceptable, but in some application frameworks or other use cases negative numbers may be undesirable or not work at all. In those cases we can change the field type entirely.
Keep in mind that the data type will need to be changed for any fields that reference this ID as well, or else they will also be out of bounds. Also any foreign key constraints will need to be dropped and reapplied after the both fields’ types have been updated.
Benefits of the negative number approach:
- No change to the column structure
- Very fast: just change the sequence start number
Drawbacks:
- Negative numbers might not work with your application framework
- You only buy yourself double the amount of IDs. You could be in this situation again soon
In general, this is a buy you some time approach and seen as a short term fix.
Changing to bigint
The more complete fix to your sequence exhaustion is changing to the bigint
data type.
In order to change the field type of the above test
table, we will first
create a new ID of type bigint
that will eventually replace the current id
,
and create a unique constraint on it:
alter table test add column id_new bigint;
CREATE UNIQUE INDEX CONCURRENTLY test_id_new ON test (id_new);
The new column will also need a new sequence of type bigint
. The sequence
needs to start at some point after the latest value that had been recorded.
CREATE SEQUENCE test_id_new_seq START 2147483776 AS bigint;
ALTER TABLE test ALTER COLUMN id_new SET DEFAULT nextval ('test_id_new_seq');
alter sequence test_id_new_seq owned by test.id_new;
Now new values can be added to the table, but there are two different sequences being incremented - the old and the new, ie:
postgres=# select * from test;
id | value | id_new
------------+-------+------------
2000000007 | |
2000000008 | |
2000000009 | |
2000000010 | |
2000000011 | | 2147483776
2000000012 | | 2147483777
2000000013 | | 2147483778
2000000014 | | 2147483779
In a single transaction, we will drop the old ID constraint and default, rename columns, and add an invalid “not null” constraint on the new ID column:
BEGIN;
ALTER TABLE test DROP CONSTRAINT test_pkey;
ALTER TABLE test ALTER COLUMN id DROP DEFAULT;
ALTER TABLE test RENAME COLUMN id TO id_old;
ALTER TABLE test RENAME COLUMN id_new TO id;
ALTER TABLE test ALTER COLUMN id_old DROP NOT NULL;
ALTER TABLE test ADD CONSTRAINT id_not_null CHECK (id IS NOT NULL) NOT VALID;
COMMIT;
Now new IDs are being added to the table. Thanks to the NOT NULL
constraint on
id
, new NULL values cannot be added, but since it is also NOT VALID
the
existing NULL values are permitted. In order to make id
back into a primary
key, the id_old
data must be backfilled so that the constraint can be made
valid. This can be done in batches, ie:
WITH unset_values AS (
SELECT
id_old
FROM
test
WHERE
id IS NULL
LIMIT 1000)
UPDATE
test
SET
id = unset_values.id_old
FROM
unset_values
WHERE
unset_values.id_old = test.id_old;
Once all rows have been backfilled, the NOT NULL
constraint can be validated,
the UNIQUE index on id
can be converted to a primary key, and finally the
standalone NOT NULL
constraint can be dropped:
ALTER TABLE test VALIDATE CONSTRAINT id_not_null;
ALTER TABLE test ADD CONSTRAINT test_pkey PRIMARY KEY USING INDEX test_id_new;
ALTER TABLE test DROP CONSTRAINT id_not_null;
At any point now the 4-byte id_old
column can be dropped, as the bigint has
taken its place:
postgres=# ALTER table test drop column id_old;
ALTER TABLE
postgres=# \d test
Table "public.test"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+--------------------------------------
value | integer | | |
id | bigint | | not null | nextval('test_id_new_seq'::regclass)
Indexes:
"test_pkey" PRIMARY KEY, btree (id)
The new 8-byte bigint id should be sufficient for a very, very, very long time:
sequence | sequence_datatype | owned_by | column_datatype | last_sequence_value | sequence_percent | column_percent
-----------------+-------------------+----------+-----------------+---------------------+------------------+----------------
test_id_new_seq | bigint | test.id | bigint | 2147483788 | 0.00% | 0.00%
Benefits of the bigint
:
- This is a long term fix and you won't have to worry about running out of sequence numbers for a very long time.
Drawbacks:
- You probably need to update a lot of other things to larger integers
- Takes coordination with the entire database. In our experience, this is a large project.
SERIAL
types
In Postgres, the SERIAL
data types (smallserial
, serial
, and bigserial
)
are shortcuts for creating auto-incrementing identifier columns whose values are
assigned the next value from a Postgres SEQUENCE object.
Creating a column of type SERIAL
will default to as type integer
,
simultaneously creating an integer sequence object owned by the specified table
column and make its nextval() the default value for the column.
For new tables, consider using BIGSERIAL
.
Summary
- You can check with a query if you’re running out of sequence numbers.
- Changing to negative numbers can be a short term fix.
- Changing to
bigint
is the recommended long term fix. - When you are setting up a new database that’s likely to have a lot of data in
it using
SERIAL
look atBIGSERIAL
instead.
Integer overflow may appear at a glance to be insanely complicated. I have written this to keep Postgres DBAs and intergalactic travelers from panicking.