Postgres Query Boost: Using ANY Instead of IN
The IN
operator has to be in the top-3 SQL-specific operators that people
learn, the close competitors are LIKE
and BETWEEN
. Thus, IN
feels familiar
when crafting SQL: you can use it with nested SQL statements, or create your own
list. But, it does have limitations — what if you wanted to send a list of
unknown length? You can’t really use SQL placeholders without also modifying the
SQL statement.
What if I told you there was a less ubiquitous operator that offered all of the
power and less of the drawbacks. In this article we'll talk about using
= ANY(array)
in instead of IN(list)
to improve code and query quality.
We don't give Java enough love in our tutorials, so these examples are written
for JDK with the
Vert.x reactive Postgres library.
Don’t worry, you’ll be able to read it. We chose a direct Postgres client
because ORMs typically default to IN(list)
, and I’m making an argument they
should convert to = ANY(array)
where applicable.
Unforeseenn Consequences
It starts simply enough: I have a collection of items that I want to filter by
id
. Perhaps, this is a user-selectable form that permits users to multi-select
values. The most common query looks like this:
SELECT i.*
FROM item AS i
JOIN shipment AS s ON s.item_id=i.id
WHERE s.status <> 'shipped' -- Grab only Unshipped items
AND i.id IN (1,2,3,4,5,6);
Seems perfectly reasonable. I'll just copy that query into our code and replace the literal list with a placeholder right?
Introduce Complexity
Back in the code window, I drop it into some code approximately like this.
// Prepare Tuple of IDs
Tuple t_list = Tuple.of(1, 2, 3, 4, 5, 6);
// Prepare a parameterized query
pool.preparedQuery("SELECT i.*\\n" +
"FROM item AS i\\n" +
"JOIN shipment AS s ON s.item_id=i.id\\n" +
"WHERE s.status <> 'shipped'\\n" +
" AND i.id IN ($1)")
// Attempt to run it
.execute(t_list)
// Log the IDs returned
.onSuccess(rows -> rows.forEach(
row -> logger.info("id: " + row.getInteger("id"))))
// Failure: Log the complaint
.onFailure(t -> logger.error("QUERY FAIL", t));
But this doesn’t work. It will throw an exception about the wrong number of
parameters. The $1
placeholder expects a single value, but it receives 6
values. Aha, that Tuple
must be packed wrong for the IN
parameter. Maybe
sending an array is better.
// Prepare Tuple of array of IDs
Integer[] items = {1, 2, 3, 4, 5, 6};
Tuple t_list = Tuple.of(items);
// Prepare a parameterized query
pool.preparedQuery("SELECT i.*\\n" +
"FROM item AS i\\n" +
"JOIN shipment AS s ON s.item_id=i.id\\n" +
"WHERE s.status <> 'shipped'\\n" +
" AND i.id IN ($1)")
// Attempt to run it
.execute(t_list)
// Log the IDs returned
.onSuccess(rows -> rows.forEach(
row -> logger.info("id: " + row.getInteger("id"))))
// Failure: Log the complaint
.onFailure(t -> logger.error("QUERY FAIL", t));
This doesn’t work either. It throws a wrong type error. Because Postgres is
strongly typed, when comparing id
SQL expects a Number
, but gets
Integer[]
.
Time to Read the Fine Manual
The exceptions coming back from these attempts are not getting better. Let's head over to the PostgreSQL 15 documentation to see if anything leaps out, regarding this.
9.24.1
IN
The right-hand side is a parenthesized list of scalar expressions.
How do I send a parenthesized list of scalars? Is that even possible?
Well, I tried sending an array and I tried sending multiple scalars, but none of
those matched a parenthesized list of scalar expressions. As it turns out, you
can't prepare a variable list of scalar expressions. If you mean to send 5
scalars, you must write IN ($1, $2, $3, $4, $5)
. This operator is finicky.
The case for ANY
(or SOME
)
Fortunately the solution is easily found on the same PostgreSQL 15 documentation page if I continue scrolling.
9.24.3.
ANY
/SOME
(array)expression operator ANY (array expression) expression operator SOME (array expression)
The right-hand side is a parenthesized expression, which must yield an array value.
Possible match? From the definitions, expression IN (...)
is equivalent to
expression = ANY (...)
but for the parameter type!
Endgame
Let's give it a shot.
// Prepare Tuple of array of IDs
Integer[] items = {1, 2, 3, 4, 5, 6};
Tuple t_list = Tuple.of(items);
// Prepare a parameterized query
// this time replacing IN with =ANY
pool.preparedQuery("SELECT i.*\\n" +
"FROM item AS i\\n" +
"JOIN shipment AS s ON s.item_id=i.id\\n" +
"WHERE s.status <> 'shipped'\\n" +
" AND i.id = ANY($1)")
// Attempt to run it
.execute(t_list)
// Log the IDs returned
.onSuccess(rows -> rows.forEach(
row -> logger.info("id: " + row.getInteger("id") + " name: " + row.getString("name"))))
// Failure: Log the complaint
.onFailure(t -> logger.error("QUERY FAIL", t));
Success! My array binds as a single parameter to ANY
.
Conclusions
Given the constraints on IN
and the need to send parameters securely, it makes
sense use = ANY
where you might otherwise use IN
.
In addition to parameter binding, it's worth mentioning that ANY
works with
all boolean comparison operators:
* foo LIKE ANY ('{"%bar", "%baz"}')
* foo ILIKE ANY ('{"%bar", "%baz"}')
* id <> ANY ('{1, 2, 3}')