Postgres: The batteries included database
At most of the places I've worked, the primary language used was not what I gravitated to naturally. If you're going to ask for a language of choice personally, it's python. I appreciate the explicit nature, that it's often pseudocode that can execute and it has a rich ecosystem of libraries (though that’s most languages these days). But as much as anything I latched onto Django in its early days. If you're not familiar with Django it's a popular framework for web apps written in Python. What I loved about it in particular was one of its taglines early on: batteries included.
For Django batteries included meant it had:
- Built-in ORM
- Authentication
- Admin tooling
- Template engine
- Middleware layer
- and more
Just recently I was on the changelog Podcast talking about "What's so exciting about Postgres?" I talked for a while and probably could have kept going, a lot of these features are top of mind in what we’re building with Crunchy Bridge. In reflecting on things a bit I can see a big parallel personally to what I appreciated in Django in Postgres. To me Postgres is the batteries included database. So what's in the PostgreSQL box:
- Solid database foundations
- Expressive SQL
- Full text search
- Geospatial support
- Extensions
Solid Database Foundations
To me, there are few things more important than data. And knowing my data is safe and secure is really key. Whether it’s my bank balance or my food order, when I transact in some way I want to know the data is there. Same applies for the applications I build and PostgreSQL has had a long sturdy track record here of being a safe and reliable database. At its core Postgres MVCC is a reliable way to help ensure a level of transactional consistency while still operating at large scale. Hundreds of thousands of transactions per second really can be quite manageable in a single PostgreSQL instance.
It has also evolved over the years to be as powerful as it is reliable. Data types are a big piece of that. Data types may on the cover not seem like something to write home about, yet not all databases have support for types that you’re using on a daily basis in your application. Just a few noteworthy ones:
- range types - especially useful for calendaring when you need from and to, which come with an ability to enforce constraints.
- timestamps - working with time in any language is painful, timestamps with timezone make conversion easier, coupled with time intervals getting things like users that signed up in last hour is trivial.
- JSONB - this one has gotten plenty of attention over the years, but it’s still new to many. A binary JSON datatype that you can easily index with GIN indexes.
Expressive SQL
SQL is the lingua franca for data. SQL’s origins in relational algebra and relational calculus make it a well thought out language for working with data - even if it isn’t always the most beautiful language to read. SQL is the standard language for data that every NoSQL database that doesn’t have it eventually recreates their version of SQL because it’s simply the right answer for data.
Within Postgres you’ve got all the standard ways of accessing data, and then you’ve also got the more exotic:
- CTEs (Common Table Expressions) - useful for composing more complex queries, recursion, and making SQL readable.
- Window functions - great for analytics such as ordering results, calculating things like median and percentiles.
- Functions - Postgres comes with a ton of functions already to make common actions like date math, parsing out characters and other things trivial.
Full text search
This one good friend Rach summarized it all in a post far better than I can: “Postgres full-text search is good enough!” - simply give it a read.
The broader sentiment here, is that if you have an existing application and you’re adding search to it, before you go and reach for elastic (which you then have to sync data to, and maintain and support that system) consider using something that you already have supported and contains the data - Postgres.
Geospatial support
This is a whole bucket in and of itself. PostGIS is the most advanced geospatial open source database. It’s an extension to Postgres but I put it in a special class from other extensions. While extensions are becoming more and more advanced, PostGIS has been pretty rich and advanced from day one. PostGIS has its own community that runs a bit parallel to Postgres, releasing new major and minor versions of the extension at a regular cadence.
When you enable PostGIS (simply CREATE EXTENSION postgis;
) you instantly have
a geospatial database. It brings new data types, new spatial predicates for
querying and interacting with those geometries. You can go further and enable
pgRouting which helps for route planning. Looking to figure out how to optimize
10 food deliveries across 3 drivers for optimum route and timing? PostGIS can
help.
In fact we just wrapped up a marathon PostGIS day that started early on the east coast, and wrapped up late on the west coast. While the day has since wrapped up, we’ll be posting those talks in the near future so feel free to sign up if you want to get some updates when those are live.
Extensions
I guess I went a bit out of order here leading with PostGIS. Backing up a bit… extensions are unique to Postgres. Personally the name doesn’t really do much to convey the power and value of them. Extensions allow you to expand and change what Postgres can do, but what’s unique is what Postgres allows. Postgres has these deep low level hooks that allow you to do things like add new data types, new functions, even hooks into the query planner.
Postgres comes with some stock extensions, you may still have to enable them,
but they’re at least available to you. pg_stat_statements
is one of the most
useful extensions that exists, it was written and updated in Postgres 9.2 by my
colleague Peter Geoghegan to be incredibly useful. It will give you an almost
immediate 30,000 foot view on
database performance
without having to go deep diving into your application to understand which
queries are being called.
The list of extensions is pretty long. PGXN, the Postgres Extension Network, has an index of over 200 extensions, some more production ready than others. All could be worthy of their own deep dive but a few particular ones worthy of some highlight:
- pg_partman - An extension that builds on native time partitioning in Postgres to make working with time series data simple and performant. A nice bonus for me, this one is maintained by a Crunchy Data colleague.
- pgsql-http - A simple extension, but potentially incredibly useful one that allows you to make http requests. Want to trigger a web hook? Call out to an API from your PL/pgSQL or PL/Python?
- zhparser, pgroonga - I group these two together despite being two different extensions. If you’re looking for full text search but the native Postgres support doesn’t work (the case for Japanese and Chinese). Here is your answer.
- madlib - Who needs an entire ETL and data science pipeline when you can do it all where the data lives (in Postgres). Madlib isn’t a new hobby library from some weekend dabbling, it’s nearly 10 years old and has rich support for data science broadly. Naive bayes, k-means, sketch estimations - those have all been in there since it’s 1.0 release in 2011. It’s part of the Apache foundation, and continues to evolve and simply make data science in your Postgres accessible.
- postgres_fdw - Foreign Data Wrappers (FDWs) are an interesting class of extension that allow you to query from within Postgres to something else (read: another Postgres, Redis, Mongo, you name it). If any of the FDWs are production ready it’s the Postgres one which ships with Postgres itself.
A big part of the goal of Crunchy Bridge, our multi-cloud PostgreSQL managed service, is to take advantage of the broad ecosystem of extensions so you can do more with your database.
The question isn’t why Postgres, it’s why not Postgres?
Over 8 years ago I wrote a blog post explaining “Why Postgres” was a database you should consider using. Five years later I posted a new version with some updates. I haven’t even gotten to the fascinating future that could exist with extensions or pluggable storage. (Want columnar storage in Postgres… I promise it’ll be here one day, maybe sooner than you think).
But today you don’t need a reason for choosing Postgres, if you do all your answers are above. Today the question has really shifted to what are you not using Postgres for, and how can the community better support that happening natively in Postgres in the future.
This is core to what we do Crunchy Data and what we believe. As a community how can we advance Postgres as a great open source database and continue to include more within it? It’s the reason we’ve spent nearly 10 years now ourselves helping folks run and support production PostgreSQL, whether VM based HA setups, on Kubernetes with our PostgreSQL Operator, or most recently on our database as a service.
As I look at the Django website today I see “batteries included” has mostly faded from the primary pitch for the framework, so I’m taking this opportunity to commandeer the phrase for Postgres, it really is a full batteries included database.