Postgres Goodies in Ruby on Rails 7.1
I just spent last week at Rails World in Amsterdam and had a blast digging back into the Rails and Active Record world. In conversations with developers over the week, I had some notable takeaways from the newest version of Ruby on Rails that I just had to get written up.
A quick summary before we dig in:
-
async queries: send long-running queries the background while the code runs along, great for pages with multiple long-running queries that can be run in parallel
-
composite primary keys: native support for using two or more columns as a primary key
-
common table expression (CTEs): native integration for a subquery for use later in the statement
-
unlogged tables: native support for disabling Postgres’ WAL logs on a table (mostly for test environments), so that you get better performance on your tests that use databases
-
value normalization: a native, universal syntax for normalization of values (like downcase of a email column) instead of using
before_validation
Expansion of Async queries
A feature (not a bug IMO) of Ruby is that it has traditionally been used as a blocking (i.e. not-asynchronous) language. While it does have asynchronous capabilities, in the typical use-case people do not have to grok asynchronous workflows to use it effectively.
In the 7.0 release, Active Record added load_async
for loading whole objects.
In 7.1, asynchronous queries have been enabled for aggregations and in full SQL
queries using async_find_by_sql
.
First, you'll need to define the async_query_executor
in your environment
files (config/environments/{development, production}.rb
).
To use a global setting, use something like the following:
config.active_record.async_query_executor = :global_thread_pool
config.active_record.global_executor_concurrency = 5
To use a per-database setting, use something like the following and
min_threads
+ max_threads
the database.yml
:
config/environments/{development, production}.rb
config.active_record.async_query_executor = :multi_thread_pool
config/database.yml
development:
adapter: postgresql
pool: 5
max_threads: 5
min_threads: 5
After setting one of those configurations, we can see how the queries work asynchronously:
irb> u = User.async_find_by_sql("SELECT *, pg_sleep(3) FROM users") # this query will sleep for 3 seconds for each record in your database
=> #<ActiveRecord::Promise status=pending>
Then, sometime later you can use the value
syntax to retrieve the value:
# … sometime later …
irb> u.value
=> {returned results}
Once you call the .value
method, if the query has returned, it will return
instantly. If the query has not returned, processing is blocked until the query
is complete. This is great for dashboards, charts, and reports that generate
more complex queries. Send the query to the background, and let it process while
you complete other elements of the request.
Composite primary keys
Composite primary keys have been noticeably absent from Ruby on Rails for a while -- unless you chose the CPK gem. Rails 7.1 added two native methods for implementing Primary Keys: database level and application level.
Database Level Composite Primary Keys: When defining the table in the
database migration, you can pass an array of column names to the primary_key
attribute. For databases capable of composite primary keys (like Postgres),
Active Record will infer from the schema:
create_table :user_accounts, primary_key: [:user_id, :account_id] do |t|
t.belongs_to(:user, foreign_key: true)
t.belongs_to(:account, foreign_key: true)
t.string :role
t.timestamps
end
After running this migration, if you run \d user_accounts
on your Postgres
database, you’ll see the following line for the composite key:
"user_accounts_pkey" PRIMARY KEY, btree (user_id, account_id)
Then, when querying with the composite keys, you do the following:
UserAccounts.find([1, 2]) # where user_id = 1 and account_id = 2
In the hypothetical use-case above, we use the composite primary keys for a join
table between users and accounts. Typically, in the past I would use an id
column with a unique constraint on the user_id
and account_id
values. Now,
with Postgres, we can use the composite primary key for the row
Application Level Composite Primary Keys: Rails documentation calls this a "virtual primary key". And, you’ll want to know that it’s a bit more restrictive than the composite primary keys above because it enforces an explicit foreign key definition on relationships:
class UserAccounts < ActiveRecord::Base
query_constraints :user_id, :account_id
belongs_to :user, foreign_key: :user_id
belongs_to :account, foreign_key: :account_id
end
You have to be explicit on the foreign_key definitions of the relationship, else
it tries to find user_id
and account_id
on every related model.
My recommendation is to use Postgres and the native composite primary keys in a database.
Native Support for CTEs
A "CTE" is a "Common Table Expression". A CTE is a type of a nested SQL statement that is defined before the SQL. Below is an example using native-SQL that would find the latest event for each user on an account:
WITH latest_event_per_user AS (
SELECT
user_id,
MAX(event_logs.created_at) AS last_created_at
FROM event_logs
WHERE
event_logs.account_id = 1
GROUP BY 1
)
SELECT
event_logs.user_id,
event_logs.name,
event_logs.created_at
FROM event_logs
INNER JOIN latest_event_per_user ON
event_logs.user_id = latest_event_per_user.user_id AND
event_logs.created_at = latest_event_per_user.last_created_at;
When would you use something like this? Above is a query that returns the latest events for each user on an account. Another common usage of CTEs is when generating the data for charts and reports. Most of the time, generating this data at the SQL level is much faster than generating it using application level logic. Application logic would require some type of N + 1 query, which can be avoided with a more expressive SQL query.
To support CTEs, Active Record added .with()
method for chaining queries. The
with()
accepts an Object, which is quite nice when using a large block.
Writing the same query from above in Active Record would look like the
following:
latest_event_per_user = EventLog.
where(account_id: 1).
group(:user_id).
select(:user_id, "max(event_logs.created_at) AS last_created_at")
el = EventLog.
with(my_cte: latest_event_per_user).
joins("JOIN my_cte ON event_logs.user_id = my_cte.user_id AND event_logs.created_at = my_cte.last_created_at")
My recommendation: if you are SQL-nerd enough to use CTEs, consider using
ActiveRecord::Base.connection.execute()
to execute the raw sql. The one time I
could see using this Active Record CTE syntax is if you need the chaining
capabilities due to conditional query creation.
Support for unlogged tables (test env only!)
Postgres in test environments do not need the persistence that Postgres needs in
production. Enter UNLOGGED TABLE
, which does not apply Postgres durability of
a table, but improves performance:
# config/environments/test.rb
ActiveSupport.on_load(:active_record_postgresqladapter) do
self.create_unlogged_tables = true
end
ActiveRecord::Base.normalizes
Most modern databases (including Postgres) are case-sensitive by default. And, users have been known to randomly capitalize values. So, it's best to sanitize values. If you've written a Rails application, you've probably written something like the following:
class User < ApplicationRecord
before_validation do
self.email = self.email.strip.downcase
end
end
Now, we have an native way to do this with normalizes
:
class User < ApplicationRecord
normalizes :email, with: -> given_value { given_value.strip.downcase }
end
For those who don't know about the ->
syntax, this is a function with a single
argument called given_value
.
Summary
The PostgreSQL, Active Record, and Ruby on Rails communities continue to show investment in features to make this a strong stack for data heavy production applications.