5 Ways to Get Table Creation Information in Postgres
A question I hear from time to time with Crunchy Data clients and the Postgres community is:
When was my Postgres database table created?
Postgres does not store the creation date of tables, or any other database object. But fear not, there are a plethora of direct and indirect ways to find out when your table creation happened. Let's go through some ways to do this, ranging from easy to somewhat hard. All these solutions apply to indexes and other database objects, but tables are by far the most common request.
1. Logging
The easiest solution is to look in your Postgres logs and see exactly when the
CREATE TABLE
command ran. Unfortunately, this does require that the
log_statement
parameter be set before you create the table! Changing
log_statement
to ddl
is a great idea already, and has many benefits other
than seeing object creation times. Once you have it set, Postgres will log
nearly everything you do to a table that is not a SELECT
, INSERT
, UPDATE
,
or DELETE
. So yes, that covers CREATE TABLE
as well as ALTER TABLE
and
DROP TABLE
. Of course, this also requires that your log_line_prefix
includes
a timestamp, but that is already enabled by default. Let's see what it looks
like in action:
## Change our logging, then ask Postgres to reload the configuration files:
$ psql -c 'alter system set log_statement = ddl' -c 'select pg_reload_conf()'
## Create some test tables with the pgbench program:
$ pgbench -i
## From the current logfile, grab all the lines, split them into rows,
## use a regular expression to find all the CREATE TABLE statements,
## and pull back the latest four
$ psql -Atc 'select * from regexp_split_to_table( pg_read_file( pg_current_logfile() ), chr(10) )
as x where x ~* $$create table$$' | tail -4 | cut -c1-90
2017-03-03 16:17:10 EDT [13] LOG: statement: create table pgbench_history(tid int
2017-03-03 16:17:10 EDT [13] LOG: statement: create table pgbench_tellers(tid int
2017-03-03 16:17:10 EDT [13] LOG: statement: create table pgbench_accounts(aid int
2017-03-03 16:17:10 EDT [13] LOG: statement: create table pgbench_branches(bid int
2. Operating system file
If you are not logging create statements, you may need more indirect methods.
When you create a table in Postgres, it creates a file on disk representing that
table. So it becomes possible to see the file creation time as a proxy for the
creation time in the database. Be warned that there are a few actions that will
rewrite the entire table, which will create a completely new file on disk. The
two most common are the VACUUM FULL
and CLUSTER
commands, actions that are
rarely, if ever, done. So how can we view the table creation time on disk?
Perhaps we can use some of the built in
file tools
Postgres has?
$ psql -c 'CREATE TABLE daruk(id int)'
$ psql -xc "select * from pg_stat_file( pg_relation_filepath( 'daruk' ) )"
-[ RECORD 1 ]+-----------------------
size | 0
access | 2017-03-03 16:56:40-04
modification | 2017-03-03 16:56:40-04
change | 2017-03-03 16:56:40-04
creation | ☃
isdir | f
## Verify the above by checking the actual timestamp:
$ psql -c 'select * from regexp_split_to_table( pg_read_file(
pg_current_logfile()),chr(10)) as x where x ~* $$daruk$$'
x
--------------------------------------------------------------------------
2017-03-03 16:56:40 EDT [146] LOG: statement: CREATE TABLE daruk(id int)
That appears to work, as everything agrees on "2017-03-03 16:56:40", but
unfortunately, those times returned by pg_stat_file
are not useful. Let's see
what happens when we make changes to the table:
$ psql -c 'insert into daruk select 123' -c checkpoint
$ psql -xc "select * from pg_stat_file( pg_relation_filepath( 'daruk' ) )"
-[ RECORD 1 ]+-----------------------
size | 8192
access | 2017-03-03 16:56:40-04
modification | 2017-03-03 17:00:19-04
change | 2017-03-03 17:00:19-04
creation | ☃
isdir | f
As expected, the modification
and change
fields have changed, but the
access
one has not. This is because Postgres has already opened it. If we were
to force Postgres to reopen the file (e.g. by restarting the server), the field
will change and no longer be a useful proxy for the table creation time:
$ pg_ctl -D $(psql -Atc 'show data_directory') restart
## Still the same access time, because Postgres has not looked at it yet:
$ psql -xc "select access,modification,change from pg_stat_file(pg_relation_filepath('daruk'))"
-[ RECORD 1 ]+-----------------------
access | 2017-03-03 16:56:40-04
modification | 2017-03-03 17:00:19-04
change | 2017-03-03 17:00:19-04
## This causes Postgres to open the file and load it into shared buffers:
$ psql -c 'explain analyze select id from daruk'
$ psql -xc "select access,modification,change from pg_stat_file(pg_relation_filepath('daruk'))"
-[ RECORD 1 ]+-----------------------
access | 2017-03-03 17:05:14-04
modification | 2017-03-03 17:04:01-04
change | 2017-03-03 17:04:01-04
The only thing we know here is that the table creation happened no later
than March 3 at 17:04 local time. So what can we do? Well, while
pg_stat_file()
reads some of the metadata for the underlying file, it does not
read all the metadata. We can ask the operating system about the file and get
its "birth date":
$ cd $( psql -Atc 'show data_directory' )
## Check out the "Birth" field:
$ stat $( psql -Atc "select pg_relation_filepath('daruk')" )
File: base/5/1034735
Size: 8192 Blocks: 16 IO Block: 4096 regular file
Device: 820h/2080d Inode: 366999 Links: 1
Access: (0600/-rw-------) Uid: ( 1000/ greg) Gid: ( 1000/ greg)
Access: 2017-03-03 17:05:14.946563872 -0400
Modify: 2017-03-03 17:04:01.726563832 -0400
Change: 2017-03-03 17:04:01.726563832 -0400
Birth: 2017-03-03 16:56:40.926567946 -0400
What if we suspect the table has been fully vacuumed, and thus rewritten on disk? One way is to look at what other objects the database created around the same time. If you know that those have not changed, then you can be sure that the timestamp indicated by "Birth" above is accurate:
## Create four tables, urbosa_1, urbosa_2, etc.
$ for x in 1 2 3 4; do psql -c "create table urbosa_$x(id int)"; done
$ cd $( psql -Atc 'show data_directory' )
$ for x in 1 2 3 4; do stat `psql -Atc "select pg_relation_filepath('urbosa_$x')"`
done | grep Birth
Birth: 2023-05-12 21:49:35.523869139 -0400
Birth: 2023-05-12 21:49:35.543869139 -0400
Birth: 2023-05-12 21:49:35.563869139 -0400
Birth: 2023-05-12 21:49:35.593869139 -0400
## Wait a few hours, then force one of them to get recreated on disk:
$ psql -c 'vacuum full urbosa_2'
$ for x in 1 2 3 4; do stat `psql -Atc "select pg_relation_filepath('urbosa_$x')"`;
done | grep Birth
Birth: 2023-05-12 21:49:35.523869139 -0400
Birth: 2023-05-13 01:18:24.867530999 -0400
Birth: 2023-05-12 21:49:35.563869139 -0400
Birth: 2023-05-12 21:49:35.593869139 -0400
## One of those is no longer like the others!
It may be that we don't know the names of the tables in advance. We can also check the directory which stores the tables and find all objects created around the same time as our target table:
$ ls -r --time=birth /home/greg/pg/15/data/base/5 \
| grep -B4 -A4 `psql -Atc "select pg_relation_filenode('urbosa_3')"` \
| grep -E '^[0-9]+$' \
| xargs -IZ psql -Atc "select pg_filenode_relation(0,Z)"
pgbench_accounts_pkey
daruk
urbosa_1
urbosa_3
urbosa_4
revali
dbd_pg_testschema.dbd_pg_testsequence
pg_toast.pg_toast_1060929_index
pg_toast.pg_toast_1060929
urbosa_2
3. System catalogs
Scenario: you have a bunch of tables, but you do not know the creation time of
all of them. But you do have information on other tables created before or after
your table of interest. All tables have been recreated via VACUUM FULL
on the
database. We can use information in the system catalogs to see when the files
were created relative to other ones. While a full vacuum changes the file on
disk, things like the pg_type table, and the oid column of pg_class remain the
same. Let's create five tables, shuffle them up, and use the pg_class.oid
column to restore the original creation order:
## Create five tables in a specific order (yes, a table can have zero columns!)
$ for x in 5 4 3 2 1 ; do psql -c "create table hestu_$x()"; done
## Rewrite them and shuffle up their order:
$ for x in 3 5 2 1 4 ; do psql -c "vacuum full hestu_$x"; done
## Ordering by relfilenode shows the order most recently recreated:
$ psql -c "select relfilenode, oid, relname from pg_class where relname ~ 'hestu' order by 1"
relfilenode | oid | relname
-------------+---------+---------
1061009 | 1061000 | hestu_3
1061012 | 1060994 | hestu_5
1061015 | 1061003 | hestu_2
1061018 | 1061006 | hestu_1
1061021 | 1060997 | hestu_4
(5 rows)
## Ordering by the oid column reveals the original creation order:
$ psql -c "select relfilenode, oid, relname from pg_class where relname ~ 'hestu' order by 2"
relfilenode | oid | relname
-------------+---------+---------
1061012 | 1060994 | hestu_5
1061021 | 1060997 | hestu_4
1061009 | 1061000 | hestu_3
1061015 | 1061003 | hestu_2
1061018 | 1061006 | hestu_1
(5 rows)
4. Parsing WAL files
As the Write Ahead Log (aka WAL) contains a complete log of every change made to the database, we can scan it to determine the approximate table creation time. Let's say it has been about a week since you created a table named "sales". While it has undergone many rewrites via vacuum full, you need to determine its original creation time.
The first step is to find the WAL files in question, based on their timestamps. WAL files get created, written, and then never changed again, so the modification timestamps on them are immutable. If the WAL file is no longer around because it has been removed/recycled, you may need to grab them from your pgBackRest archive directory (such an action is beyond the scope of this article, but once copied/unzipped, the process is the same).
Once you have the WAL files, you can use the
pg_waldump program to
view the information in the WAL stream, and grep for the relfilenode
- the
name of the file on disk storing that table. After finding the right one, you
can see the WAL creation and last modification times to get a ballpark estimate
of the table's creation time.
# To find a table named 'daruk', first find and store its filenode:
$ FILENODE=`psql -Atc "select pg_relation_filenode('daruk')"`; echo $FILENODE
1061087
# Find the place where WAL is stored:
$ WALDIR=`psql -Atc 'show data_directory'`/pg_wal; echo $WALDIR
/home/greg/pg/15/data/pg_wal
# Scan all the WAL files to find a WAL file referencing that table
$ find $WALDIR -size 16M -exec sh -c "pg_waldump {} 2>/dev/null | grep 'CREATE.*/$FILENODE' && stat {} " \;
rmgr: Storage len (rec/tot): 42/ 42, tx: 0, lsn: 15/1AA89C28, prev 15/1AA89BF0, desc: CREATE base/5/1061087
File: /home/greg/pg/15/data/pg_wal/00000001000000150000001A
Size: 16777216 Blocks: 32768 IO Block: 4096 regular file
Device: 820h/2080d Inode: 1008177 Links: 1
Access: (0600/-rw-------) Uid: ( 1000/ greg) Gid: ( 1000/ greg)
Access: 2023-05-12 10:47:31.931105494 -0400
Modify: 2023-05-12 10:46:33.371107499 -0400
Change: 2023-05-12 10:46:33.371107499 -0400
Birth: 2023-05-12 05:45:33.941842312 -0400
## Do the same, but look at the timestamps in the WAL itself:
$ find $WALDIR -size 16M -exec sh -c "pg_waldump {} 2>/dev/null | grep -E '(CREATE|COMMIT).*$FILENODE' " \;
rmgr: Storage len (rec/tot): 42/ 42, tx: 0, lsn: 15/1B0000F8, prev 15/1B0000D8, desc: CREATE base/5/1061087
rmgr: Transaction len (rec/tot): 373/ 373, tx: 565623, lsn: 15/1B0151C8, prev 15/1B015198, desc: COMMIT 2023-05-12 00:59:25.01634 EDT; inval msgs: catcache 80 catcache 79 ... snapshot 2608 relcache 1061087
5. Backups
If you have a good backup system, like
pgBackRest,
you can look at the saved files to find out the approximate file creation time.
It is possible that the table creation occurred before your earliest backup, but
if the underlying file does NOT exist in a particular full backup, and then
exists in a subsequent backup, you can narrow down the creation time quite a
bit. First, you will need to know the relfilenode
of the table in question:
$ psql -c "SELECT relname, relfilenode FROM pg_class WHERE relname = 'mipha'"
OR:
$ psql -c "SELECT pg_relation_filenode('mipha')"
pg_relation_filenode
----------------------
1055179
This is the file on disk inside the backups, underneath the base
directory.
pgBackRest will compress it by default, but it will have the same name, so we
can grep our pgBackRest repository:
$ pgbackrest info --stanza=mydemo | grep backup:
full backup: 20180517-123005F
full backup: 20180518-123005F
incr backup: 20180518-123005F_20180518-163002I
incr backup: 20180518-123005F_20180519-020002I
incr backup: 20180518-123005F_20180518-080003I
full backup: 20180519-123005F
$ find /var/lib/pgbackrest -printf "%c %p\n" | grep 1055132
Fri May 18 20:42:33 2018 /var/lib/pgbackrest/backup/mydemo/20180518-123005F/pg_data/base/5/1055132.gz
Sat May 19 20:33:56 2018 /var/lib/pgbackrest/backup/mydemo/20180518-123005F_20180518-163002I/pg_data/base/5/1055132.gz
We can tell from the above that the first time we saw the file appear was in the
20180518-123005F
backup, which the name tells us is a full (complete) backup
from May 18, 2018 at 12:30. It also appeared in a subsequent incremental backup,
which means that at some point the file changed (e.g. via insert/update/delete
of rows). We also note that it did NOT appear in the 20180517-123005F
backup.
Note that this is not foolproof, as a VACUUM FULL could have changed the
relfilenode.
Final ways to tell the table creation time
There are other methods to find the table creation time, but we listed some of
the easiest, common ones. For example, if you have log_autovacuum
set, you may
spot an entry in your Postgres logs referencing when the table was first
autovacuumed. The data inside the table itself may provide a clue, or your
application log might reveal it as well.
For the easiest Postgres troubleshooting, make sure you are logging all of your DDL additions, modification, or all of your actions with one of these:
log_statement = 'ddl'
log_satetment = 'mod'
log_statement = 'all'