How to Solve Advent of Code 2022 Using Postgres - Day 3
Spoiler Alert!
This article will contain spoilers both on how I solved 2022 Day 3's challenge "Rucksack Reorganization" using SQL, as well as general ideas on how to approach the problem. I recommend trying to solve it yourself first, using your favorite language. This article is delayed from the actual puzzle's release. Also note that my solutions are seldom going to be the "best" solutions - they were solved as quickly as possible, and these articles will show my first solutions, with some minor reformatting and cleaning up.
Hands on Tutorial
We've also loaded a tutorial for Day's 3 challenge if you want to try it with a pre-loaded data set.
AOC Day 3
Today's challenge was solved using the following Postgres tech:
- file_fdw, a built-in foreign data wrapper for Postgres
- The functions
length()
,left()
,right()
, andregexp_replace()
- The COLLATE command to allow us to carefully compare characters
- The
window function
lag()
to gather nearby rows - A sequence to allow picking certain rows
For day 3 of the AOC (Advent of Code), we are given a file in which each line
represents items in a rucksack. There are two compartments, divided equally, and
we need to find which item is in common for the sides, generate a score based on
what character it is, then sum up the scores. As before, we are going to use
file_fdw
, so let's set things up and create a simple table with a single
column for each row.
CREATE EXTENSION if not exists file_fdw;
CREATE SERVER if not exists aoc2022 foreign data wrapper file_fdw;
DROP FOREIGN TABLE if exists aoc_day3;
CREATE FOREIGN TABLE aoc_day3 (items text)
SERVER aoc2022 options(filename '/tmp/aoc2022.day3.input');
SELECT * FROM aoc_day3;
items
--------------------------------------------------
lflZfgnSnlmmlgGfjGthQPtLNsQhvbHLLpSS
zrCVDVFMJTCTcCJMwCThWbtbpbWpPbtbHPLQssLsHP
rBFcrwFzFwwVDcDrzTzJfnRGjllBdGZnnZfhqmdn
FjpnFRDmbRtnbJrFJmSTsGShWVhGqGVVsmqs
ZwPvNPdzNZwfzBNLdNNNNcLvhnQhqMTVsTGSWSqGqTdVWhMT
vgLZHfvLffNLPbggnrbFpJnCbC
hzJzGjGfqmGtDQtDSvVV
plpcMBNBcCTlTgCMbvtrsSVsVJDJlrwDQr
McHBMMcTTHgJnWqnRqjzZnnRzR
ppvsGZhDGprrSjSllwfZ
So a random assortment of strings, with the limitation that all characters are
[a-zA-z]
and there are an even number of characters in each row. Our first
step is to divide things into two sections. We will use the built-in SQL
functions length()
, left()
, and right()
to accomplish this:
SELECT *, left(items, length(items)/2) AS first, right(items, length(items)/2) AS second
FROM aoc_day3;
-[ RECORD 1 ]--------------------------------------------
items | lflZfgnSnlmmlgGfjGthQPtLNsQhvbHLLpSS
first | lflZfgnSnlmmlgGfjG
second | thQPtLNsQhvbHLLpSS
-[ RECORD 2 ]--------------------------------------------
items | zrCVDVFMJTCTcCJMwCThWbtbpbWpPbtbHPLQssLsHP
first | zrCVDVFMJTCTcCJMwCThW
second | btbpbWpPbtbHPLQssLsHP
-[ RECORD 3 ]--------------------------------------------
items | rBFcrwFzFwwVDcDrzTzJfnRGjllBdGZnnZfhqmdn
first | rBFcrwFzFwwVDcDrzTzJ
second | fnRGjllBdGZnnZfhqmdn
I'm trying to not repeat any functions or actions when possible during this
challenge, but I think we can let the double length() slide. Now we need to find
letters that are in common to both the first and second sections. This can be
done in many ways, but we'll simply replace any letters in the first section
that also exists in the second with an empty character, using the
regexp_replace()
function:
WITH q AS (
SELECT *, length(items), left(items, length(items)/2) AS first, right(items, length(items)/2) AS second
FROM aoc_day3
)
,r AS (SELECT first, second, regexp_replace(first, '[^'||second||']', '', 'g') AS common FROM q)
SELECT * FROM r;
first | second | common
--------------------------+--------------------------+--------
lflZfgnSnlmmlgGfjG | thQPtLNsQhvbHLLpSS | S
zrCVDVFMJTCTcCJMwCThW | btbpbWpPbtbHPLQssLsHP | W
rBFcrwFzFwwVDcDrzTzJ | fnRGjllBdGZnnZfhqmdn | B
FjpnFRDmbRtnbJrFJm | STsGShWVhGqGVVsmqs | mm
ZwPvNPdzNZwfzBNLdNNNNcLv | hnQhqMTVsTGSWSqGqTdVWhMT | dd
vgLZHfvLffNLP | bggnrbFpJnCbC | g
hzJzGjGfqm | GtDQtDSvVV | GG
plpcMBNBcCTlTgCMb | vtrsSVsVJDJlrwDQr | ll
McHBMMcTTHgJn | WqnRqjzZnnRzR | n
ppvsGZhDGp | rrSjSllwfZ | Z
We will use the left() function again to grab a single character, then do a
simple mapping to get the score for that character. The puzzle states that a-z
are worth 1-26 and A-Z
are worth 27-52. Note that we need to use collation
'C', to ensure our "less than" check works in a case-insensitive manner. The
ascii() function has the curious effect of only converting the first letter of
any string passed to it, so we pass the duplicated letters directly to it.
Finally, we add up all the score to get our final result.
WITH q AS (
SELECT *, left(items, length(items)/2) AS first, right(items, length(items)/2) AS second
FROM aoc_day3
)
,r AS (SELECT regexp_replace(first, '[^'||second||']', '', 'g') AS common FROM q)
,s AS (
SELECT ascii(common) -
(case when left(common,1) < 'a' COLLATE "C" THEN 38 else 96 end) AS score
FROM r)
SELECT sum(score) FROM s;
sum
-----
257
(Note: your sums might be different if you used a unique file)
This puzzle has a second part. Using the same input, we now need to group the
lines into groups of three, and find out which character is in common across all
three lines, then generate a score for that character. The first trick is
grouping things by threes. We are going to approach this with a windowing
function. First, we want to grab our current line, and the values from the two
rows "before" it. I changed it to left(x,5)
just to make the output fit a
little better:
SELECT left(items,5) AS one,
left( LAG(items,1) over(), 5) AS two,
left( LAG(items,2) over(), 5) AS three
FROM aoc_day3;
one | two | three
-------+-------+-------
lflZf | ☃ | ☃
zrCVD | lflZf | ☃
rBFcr | zrCVD | lflZf
FjpnF | rBFcr | zrCVD
ZwPvN | FjpnF | rBFcr
vgLZH | ZwPvN | FjpnF
hzJzG | vgLZH | ZwPvN
plpcM | hzJzG | vgLZH
McHBM | plpcM | hzJzG
ppvsG | McHBM | plpcM
(Do you want to build a snowman? In the example above, the snowman appears as a
symbol for a NULL entry, which I accomplished by adding this to my ~/.psqlrc
file: \pset null ☃
)
When we get to the third row, we have also captured the two rows before it. In other words, we have groups of three that we can manipulate. The next issue is to only pick every third row. As this is coming from a file via file_fdw, we can be sure that the order stays consistent. We will use a sequence to allow counting by threes and filter out the other two rows in every triad. To be on the safe side, we reset the sequence it case it already exists, to force Postgres to start counting from the number one:
CREATE SEQUENCE if not exists aoc;
SELECT setval('aoc', 1, false);
WITH p AS (SELECT items AS one, lag(items,1) over () AS two, lag(items,2) over() AS three FROM aoc_day3)
SELECT left(one,5), left(two,5), left(three,5)
FROM p
WHERE 0 = nextval('aoc')%3;
left | left | left
-------+-------+-------
rBFcr | zrCVD | lflZf
vgLZH | ZwPvN | FjpnF
McHBM | plpcM | hzJzG
Next we need to find, for each row, the character that all three share. We will do this similar to the regexp replace before, but this time we need to find all shared characters between two of them, and then check that returned list against the final one. The setval is crucial here, so we add that to our statement as well:
WITH setup AS (SELECT setval('aoc',1,false))
,p AS (
SELECT items AS one, lag(items,1) over () AS two, lag(items,2) over() AS three
FROM aoc_day3, setup)
,q AS (SELECT * FROM p WHERE 0=nextval('aoc')%3)
,r AS (
SELECT *,
regexp_replace(regexp_replace(one, '[^'||two||']', '','g'), '[^'||three||']', '','g') AS common
FROM q)
SELECT left(one,5), common FROM r;
left | common
-------+--------
rBFcr | h
vgLZH | nn
McHBM | J
Finally, we simply use our code from before to grab the first character from the common column, assign it a score, and then tally the whole thing up:
WITH setup AS (SELECT setval('aoc',1,false))
,p AS (
SELECT items AS one, lag(items,1) over () AS two, lag(items,2) over() AS three
FROM aoc_day3, setup)
,q AS (SELECT * FROM p WHERE 0=nextval('aoc')%3)
,r AS (
SELECT *,
regexp_replace(regexp_replace(one, '[^'||two||']', '','g'), '[^'||three||']', '','g') AS common
FROM q)
,s AS (
SELECT ascii(common) -
(CASE WHEN left(common,1) < 'a' COLLATE "C" THEN 38 ELSE 96 END) AS score
FROM r)
SELECT sum(score) from s;
sum
-----
58
All done! Onwards to Day 4...