Getting Started with Postgres Functions in PL/Python
Python is one of many procedural languages (PLs) included in standard PostgreSQL distributions. In this post, we'll take a quick look at how to get started with using PL/Python to write Postgres functions.
If you're new to user-defined functions in Postgres, the idea is the same as in general programming: they're blocks of code you've written that are basically saved for use later. Each function carries out a specific task whenever it's called. A powerful feature of Postgres, especially if you're an app or web developer, is that it allows you to write these functions in languages other than plain SQL. A commonly used one is PL/pgSQL, which essentially builds on SQL by adding in more syntax for processing (and not just querying) data.
There are still cases in which it might be better to use yet another language such as Python, R, or Java. For example, Python and R are preferred languages for data analysis. While PL/pgSQL might possibly be able to do some of the type of processing you need, having access to Python or R's suite of specialized tools would likely just make things much easier for you overall.
Why use Python in PostgreSQL?
Python has plenty of open source packages that are widely used across different industries; Python also shines when it comes to scientific computing because it's got entire libraries, communities, and ecosystems built up that are dedicated to scientific computing such as SciPy, NumPy, and Pandas.
So, using PL/Python lets you extend Postgres in ways that PL/pgSQL isn't as well-equipped to do out-of-the-box. Also, if you already know Python and neither want nor have the time to learn one of the trusted languages, PL/Python becomes a convenient option.
I also want to emphasize that PL/Python is specifically for using Python embedded inside of Postgres. Many of you may be connecting Python and Postgres from your Python environment, either via a standard driver like psycopg2, or on a higher level perhaps with an object relational mapper (ORM) such as in Django. PL/Python is used in different scenarios and for different purposes -- if you're newer to Postgres like I am, I hope this helps clear up the distinction.
What you'll need
- I'll assume you already have a PostgreSQL database. While PL/Python is included in core PostgreSQL, you'll still have to make sure that Python is also already installed on the same machine (it may or may not already be shipped with the OS). For convenience, the Postgres container on the Crunchy Developer Portal does include Python. Your functions need to access the Python interpreter which lives on the server.
- You'll need to have (or get someone with) superuser privileges to load PL/Python into your database:
CREATE EXTENSION plpython3u;
plpython3u
will use Python 3; the default version (i.e. if you install using
the generic plpythonu
language name) is Python 2. The official documentation
does say that it is possible to use both Python 2 and 3 in the same database but
in separate sessions; in any case, the
Python 2 vs Python 3
section is helpful to read even if you know you'll just be using the one
variant.
With that said, Python 2.7 (the end of the 2.x series) is
end-of-life, so you'll probably go
ahead and use plpython3u
for now.
PL/Python is an "untrusted" language in Postgres -- the Python code can access the host machine's filesystem and there isn't a way to restrict that. As an untrusted language (denoted by the "u" in the language name), there is more risk with using PL/Python compared to a trusted language like PL/pgSQL.
PL/Python function example
You also need to have superuser privileges to create PL/Python functions. An alternative is to have a separate development environment where PL/Python could be set as a trusted language. In development, other users can create functions which can then be tested and vetted thoroughly before transferring them to production. This kind of workflow might meet your needs, but definitely proceed with caution!
Below is an example of a simple function that uses
urllib.parse
,
which is included in the Python standard library.
CREATE OR REPLACE FUNCTION url_quote (url text)
RETURNS TEXT
AS $$
from urllib.parse import quote
return quote(url)
$$
LANGUAGE 'plpython3u';
SELECT url_quote('https://www.postgresql.org/docs/12/plpython-data.html#id-1.8.11.11.3');
url_quote
--------------------------------------------------------------------------
https%3A//www.postgresql.org/docs/12/plpython-data.html%23id-1.8.11.11.3
Since urllib.parse
is a built-in module, there isn't anything else you need to
do to use it other than include the import statement in your function code.
Hopefully this helps demonstrate how it might be easier and quicker to use
PL/Python instead of trying to accomplish the same result using standard SQL or
PL/pgSQL.
And what about external packages such as Pandas, NumPy, or SciPy? Basically, as long as the package is installed on the same machine as Python, you should be able to use import as you would normally. For data visualization or other resource-intensive processing, you might consider having PL/Python do some of the heavy lifting right where the data itself already lives -- instead of shipping the data off (and potentially having to transform it first to some other format) to be consumed by an external client.
More on procedural languages in Postgres
If you're brand new to PL/Python and want to take it out for a quick spin, check out the Introduction to PL/Python in PostgreSQL class in our Learning Portal. You might also be interested in trying out PL/R, or if you want a refresher on creating functions in Postgres, a basics of functions class is available as well.
Looking for deeper dives into PLs? For Python, I'd recommend Craig Kerstiens' walkthrough on building a recommendation engine with PL/Python and Pandas. For R, check out this great blog series on spatial analysis with PL/R, written by my colleague Joe Conway. Enjoy!