Control Runaway Postgres Queries With Statement Timeout
Most queries against a database are short lived. Whether you're inserting a new record or querying for a list of upcoming tasks for a user, you're not typically aggregating millions of records or sending back thousands of rows to the end user. A typical short lived query in Postgres can easily be accomplished in a few milliseconds or less. For the typical application, this means a well tuned production Postgres database is capable of easily running thousands or up to hundreds of thousands of queries per second on a beefy instance.
But lying in wait is a query that can bring everything crashing to a crawl. I've far too often lived through these experiences and (hate to admit that) at times I have been the cause of the issues. Queries that run for too long are often going to create some cascading effects, most commonly these queries take one of four forms:
- An intensive BI/reporting query that is scanning a lot of records and performing some aggregation
- A database migration that inadvertently updates a few too many records
- A miswritten query that wasn't intended to be a reporting query, but now is joining a few million records.
- A runaway recursive query
Each of the above queries is likely to scan a lot of records and shuffle the cache within your database. It may even spill from memory to disk in sorting data... It could be as bad as holding some locks so new data can't be written (this isn't always the case but can happen). Even if not holding locks, long running queries can create lag on your replicas if not configured correctly.
The short moral of the story is that you want to proceed with caution when doing any of the above intentionally. For BI/reporting queries, it is best to run them against a read replica so as to not impact production. For migrations, breaking the steps apart in smaller bits that can be run incrementally. But how do you ensure some of the above doesn't accidentally happen by some impatient dev (in this case a past version of me).
Enter your key defense to keep your PostgreSQL database safe from these disaster situations:
A statement timeout for your own protection
Postgres allows you to set a database timeout. You can set it at multiple levels:
- Statement
- User
- Database
Setting a default statement timeout for your database is a good starting point. This ensures any application or person connecting to the database by default will not have queries running longer than that. A sane default is 30 or 60 seconds:
ALTER DATABASE mydatabase SET statement_timeout = '60s';
Per session changes
Now most of your transactions you may expect to complete within that 60 second timeframe. But there still can be a case for other long running transactions–intentional long running BI queries or migrations that do need to update a lot of data as examples. For these specific cases you can intentionally update the session you’re in:
SET statement_timeout = '30min';
Additional options
Personally I find myself using seconds and minutes most commonly for the settings. But you can still go smaller or larger on any number of options: "us", "ms", "s", "min", "h", and "d".
The other piece that you may want to configure is your timeout for
idle_in_transaction
. This is helpful for any connections to your database that
could be holding a transaction but not actually doing work at the time. Idle
transactions aren’t inherently bad to see within your database, but you don’t
want them lingering any longer than expected.
Keeping your database healthy
If you haven’t already gone and setup your statement_timeout get to it right away. This is just one more piece in addition to proper tuning that will help to ensure your database stays healthy and available.