Working with Time in Postgres
Since humans first started recording data, they’ve been keeping track of time. Time management is one of those absolutely crucial database tasks and Postgres does a great job of it. Postgres has a lot of options for storing and querying time so I wanted to provide an overview of some of the most common needs for storing and retrieving time data.
This blog is also available as a hands on tutorial running in your local browser via our Postgres playground.
If you ask Postgres what time it is,
SELECT now();
You’ll get something like
now
-----------------------------
2023-05-15 18:23:58.5603+00
The default time representation here is a full timestamp string, containing the date, time, and a reference to timezone. In this case, the +00 represents equal with UTC. UTC has long been a standard time measurement following suit from the Greenwich mean time (if you’re as old as I am).
If I want to know the time in my local timezone
SELECT now() AT TIME ZONE 'America/Chicago';
The full list of timezones names you can use is stored in a system table and can
be retrieved with select * from pg_timezone_names;
Data types for time
Postgres has a TIME
data type, with and without a time zone if you want to
store that separately from a date. This is generally not recommended since in
most cases time requires an accompanying date. There’s a TIMESTAMP
datatype.
Adding timezone to TIMESTAMP is TIMESTAMP WITH TIMEZONE
or aliased as the
TIMESTAMPTZ
. Without a doubt TIMESTAMPTZ is going to be the MVP of
Postgres time storage. If you store data in with the full date, time, and
timezone you’ll never have to worry about the server time, what time the user
entered the data, what time it is where you’re querying data, or any of those
crazy calculations. And you or your application can pull out the time and
display it in whatever local user timezone you need.
When working with Postgres, you’ll also see epoch which is how seconds are represented. This is not a timestamp, its an integer (a double precision floating-point number, 64 bits) and it represents the number of seconds since January 1st, 1970. This can be used if you need a specific comparison or need time in that format. Postgres can easily convert back at forth between timestamps and epochs. To find the current epoch:
SELECT EXTRACT (EPOCH FROM now());
Time formats & functions
I’m an American midwesterner so of course, I would write Bastille Day like - July 14th, 1789 or 7-14-1789. Of course all my French friends would write it 14 July 1789 or 14-07-1789. And while I’d love to debate with you all over beers about the best way to do this, ISO has some standards for time formats, namely ISO 8601 which states that dates will be read like this 1789-07-14 17:30:00.000, year-month-day-time. This date format is what used in TIMESTAMP and what you’ll see most often in the database and engineering world.
Time storage has the ISO8601 best practice, however, depending on your end users
or business needs, you may want to change the time format in your queries whey
they’re output. So to change the time format of a query you can use the
TO_CHAR
function which will translate a time string into different characters.
SELECT TO_CHAR(NOW(), 'DY, Mon dd, yyyy HH24:MI:SS OF');
TO_CHAR
let’s you convert the time interval string to text and characters.
Then using some
formatting functions,
I can pull out the day of the week, an American date format, and UTC time. The
result of that query would be:
MON, May 15, 2023 14:22:28 +00
Time intervals
Now that we’re fancy and can get dates in any format we want, how about calculating intervals and lapsed time in different formats?
We’ve loaded in a sample table with some train schedule data, take a peek
SELECT * FROM train_schedule LIMIT 3;
and it looks like this
trip_id | track_number | train_number | scheduled_departure | scheduled_arrival | actual_departure | actual_arrival
---------+--------------+--------------+------------------------+------------------------+------------------------+------------------------
1 | 1 | 683 | 2023-04-29 11:15:00+00 | 2023-04-29 12:35:00+00 | 2023-04-29 11:21:00+00 | 2023-04-29 12:52:00+00
2 | 1 | 953 | 2023-04-29 13:49:00+00 | 2023-04-29 15:10:00+00 | 2023-04-29 13:50:00+00 | 2023-04-29 15:17:00+00
3 | 1 | 140 | 2023-04-29 15:06:00+00 | 2023-04-29 15:23:00+00 | 2023-04-29 15:06:00+00 | 2023-04-29 15:22:00+00
(3 rows)
Let’s say you are storing an update_time fields. To find your the lower and upper bounds of arrival times times in your data set you would do:
SELECT min(actual_arrival) FROM train_schedule;
and
SELECT max(actual_arrival) FROM train_schedule;
To find the interval between them:
SELECT
(SELECT max(actual_arrival) FROM train_schedule)
- (SELECT min(actual_arrival)
FROM train_schedule);
Ok, so we have about 10 days of train schedule information in here.
Taking this a step further, if I want to look at intervals between scheduled time of departure and actual time of departure. I can create an arrival_delta and a subquery that compares actual arrival minus scheduled arrival.
SELECT avg(arrival_delta)
FROM (SELECT scheduled_arrival, actual_arrival,
actual_arrival - scheduled_arrival AS arrival_delta
FROM train_schedule)q;
You can also add a filter to find interval sizes. If we build on the above query but only for departures that were more than 10 minutes later than their original scheduled time we can add this interval > ‘10 minutes`.
SELECT avg(arrival_delta)
FROM (select scheduled_arrival, actual_arrival,
actual_arrival - scheduled_arrival AS arrival_delta
FROM train_schedule WHERE (actual_arrival - scheduled_arrival)
> INTERVAL '10 minutes')q;
Overlapping / intersecting time
What if I wanted to find all of the trains that were running at a specific time - or now. You can use the OVERLAP operator with the INTERVAL.
SELECT count(*) FROM train_schedule
WHERE (actual_departure, actual_arrival)
OVERLAPS (now(), now() - INTERVAL '2 hours');
Time Range Types
Postgres also supports working with time ranges that include both a single
range, and even
multiple ranges.
Single ranges of the timestamptz is called tstzrange
and one for multiple
ranges would be tstzmultirange
For example, if we wanted to create a table in our train database that has some peak travel season fares, we could do:
CREATE TABLE fares
(peak_id int,
peak_name text,
peak_times tstzmultirange,
fare_change numeric);
INSERT INTO fares(peak_id, peak_name, peak_times, fare_change)
VALUES (1, 'holiday', '{[2023-12-24 00:00:, 2023-12-27 00:00],[2023-12-31 00:00, 2024-01-02 00:00]}', 50),
(1, 'peak_summer', '{[2023-05-27 00:00:, 2023-05-30 00:00],[2023-07-03 00:00, 2023-08-30 00:00]}', 30);
And now to query something with the multi-timezone range, Postgres has a special
operator for this, @>
. Let’s see if travel today is during peak time.
SELECT * from fares WHERE peak_times @> now();
Indexing time columns
Anytime you’re querying time a lot, you’ll want to add an index so that time lookups are faster. Timestamps column indexes work will with the traditional B-tree index as well as BRIN. In general, if you have tons of data entered sequentially a BRIN index is probably recommended.
A B-tree would be created like this:
CREATE INDEX btree_actual_departure ON train_schedule (actual_departure);
And a BRIN
CREATE INDEX brin_sequential ON train_schedule USING BRIN (actual_departure);
Roll ups
So let’s say you have quite a bit of time data. Using the date_trunc
function
you can easily pull out timestamp data by day or date and then you can use a
query to count by the date/date.
If I want to find in my train data a count of train trips per day, that would look like this:
SELECT
date_trunc('day', train_schedule.actual_departure) d,
COUNT (actual_departure)
FROM
train_schedule
GROUP BY
d
ORDER BY
d;
Roll ups won’t be the only way to deal with lots and lots of time data. Partitioning can be really helpful once you have lots of time data that can be easing sectioned off. If you’re getting into measuring analytics or metrics, there’s some options for that as well, like hyperloglog.
Summary
Thanks for spending your time learning about time ;) Some takeaways
- store time in UTC +/- values
timestamptz
is your bffto_char
and all of the formatting functions let you query time however you want- Postgres has lots of functions for
interval
andoverlap
so you can look at data that intersects date_trunc
can be really helpful if you want to roll up time fields and count by day or month