How to Solve Advent of Code 2022 Using Postgres - Day 15
This article will contain spoilers both on how I solved 2022 Day 15's challenge "Beacon Exclusion Zone" using SQL, as well as general ideas on how to approach the problem. I recommend trying to solve it yourself first, using your favorite language. This article is delayed from the actual puzzle's release. Also note that my solutions are seldom going to be the "best" solutions - they were solved as quickly as possible, and these articles will show my first solutions, with some minor reformatting and cleaning up.
Hands on Tutorial
We've also loaded a tutorial for Day 15's challenge if you want to try it with a pre-loaded data set.
AOC Day 15
For this day's challenge, we are presented with an input file that contains the locations of many sensors, along with the closest beacon to that sensor. The first goal is to find out how many beacons CANNOT be in a certain row. The distance from the sensor to the beacon is measured in taxi distance also known as the snake distance or the Manhattan distance. See the Wikipedia entry if you are unfamiliar with it, as it will help to understand this problem. The file consists of lines like these:
Sensor at x=1555825, y=18926: closest beacon is at x=1498426, y=-85030
Sensor at x=697941, y=3552290: closest beacon is at x=595451, y=3788543
Sensor at x=3997971, y=2461001: closest beacon is at x=3951198, y=2418718
As always, we are going to employ a handful of Postgres / SQL tools to help us solve this, including:
- file_fdw to read text files in from the system
- sequences to help us do some basic counting
- The
regexp_substr
andregexp_match
functions to help extract numbers from a text string - CTEs (Common Table Expressions) to keep our queries readable and not descending into sub-select chaos.
- The
int4range
range type, to efficiently gather together a lot of numbers along a line - The
range_agg
andunnest
functions to help us sort the int4range output - The
ctid
system column as a quick way to exclude the "current" row from a list of rows - PL/pgSQL functions and some upsert logic
First things first, we need to put everything into a schema for neatness, and use a FDW / foreign table to get the flat file of our inputs into the database:
CREATE EXTENSION if not exists file_fdw;
CREATE SERVER if not exists aoc2022 FOREIGN DATA WRAPPER file_fdw;
DROP SCHEMA if exists aoc_beacon CASCADE;
CREATE SCHEMA aoc_beacon;
SET search_path = aoc_beacon;
DROP FOREIGN TABLE if exists aoc_day15;
CREATE FOREIGN TABLE aoc_day15 (line text)
SERVER aoc2022 options(filename '/tmp/aoc2022.day15.input');
DROP SEQUENCE if exists aoc;
CREATE SEQUENCE aoc;
The input file is a fairly standard form - we want to first extract all the numbers from each line to eventually put them into a table. This is an excellent job for the regexp_substr() function:
WITH x AS (SELECT nextval('aoc') AS id, line FROM aoc_day15)
,y AS (
SELECT id
,regexp_substr(line, '=(\-?\d+)', 1, 1, '', 1)::int AS s1
,regexp_substr(line, '=(\-?\d+)', 1, 2, '', 1)::int AS s2
,regexp_substr(line, '=(\-?\d+)', 1, 3, '', 1)::int AS b1
,regexp_substr(line, '=(\-?\d+)', 1, 4, '', 1)::int AS b2
FROM x)
,coords as (SELECT *, abs(s1-b1) + abs(s2-b2) AS taxi FROM y)
SELECT * FROM coords LIMIT 10;
However, your version of Postgres may not support regexp_substr (needs to be at least Postgres 15), so older versions will need to use regexp_match instead:
WITH x AS (SELECT nextval('aoc') AS id, line FROM aoc_day15)
,y AS (
SELECT id
,(regexp_match(line, 'x=(\-?\d+)'))[1]::int AS s1
,(regexp_match(line, 'y=(\-?\d+)'))[1]::int AS s2
,(regexp_match(line, 'beacon is at x=(\-?\d+)'))[1]::int AS b1
,(regexp_match(line, 'y=(\-?\d+)$'))[1]::int AS b2
FROM x)
,coords as (SELECT *, abs(s1-b1) + abs(s2-b2) AS taxi FROM y)
SELECT * FROM coords LIMIT 10;
For both, we use a simple regex to grab the number: =(\-?\d+)
The parens
specify what part we want to capture, we put a backslash before the minus sign,
and use a question mark to make it optional (e.g. 0 or 1 occurrences), and end
with a backslash-d to indicate a digit (the plus indicates one or more). We then
use all four numbers to create the snake distance for each line. The query
breaks things into place nicely:
id | s1 | s2 | b1 | b2 | taxi
----+---------+---------+---------+---------+---------
2 | 1555825 | 18926 | 1498426 | -85030 | 161355
3 | 697941 | 3552290 | 595451 | 3788543 | 338743
4 | 3997971 | 2461001 | 3951198 | 2418718 | 89056
5 | 3818312 | 282332 | 4823751 | 1061753 | 1784860
6 | 2874142 | 3053631 | 3074353 | 3516891 | 663471
7 | 1704479 | 2132468 | 1749091 | 2000000 | 177080
8 | 3904910 | 2080560 | 3951198 | 2418718 | 384446
9 | 657061 | 3898803 | 595451 | 3788543 | 171870
10 | 3084398 | 3751092 | 3074353 | 3516891 | 244246
11 | 2582061 | 972407 | 1749091 | 2000000 | 1860563
(10 rows)
For part one, we need to only look at row 2,000,000 and determine how many positions there are which cannot have the beacon we are looking for. We basically need to find all places on the line at 2,000,000 that are within the taxi distance of any sensor. For this, we are going to make use of the int4range() function to build a list of line segments within row 2 million, and then filter out the duplicates. First, we use a CASE() statement to only worry about sensors within range. If they are in range, we pull out which columns intersect row 2,000,000 based on the taxi distance for the sensor. If you have a constant like 2 million, that's also a great use of CTEs:
WITH x AS (SELECT nextval('aoc') as id, line from aoc_day15)
,y AS (
SELECT id
,(regexp_match(line, 'x=(\-?\d+)'))[1]::int AS s1
,(regexp_match(line, 'y=(\-?\d+)'))[1]::int AS s2
,(regexp_match(line, 'beacon is at x=(\-?\d+)'))[1]::int AS b1
,(regexp_match(line, 'y=(\-?\d+)$'))[1]::int AS b2
FROM x)
,coords AS (SELECT *, abs(s1-b1) + abs(s2-b2) AS taxi FROM y)
,numbers AS (SELECT 2000000 AS magicrow)
,inty AS (
SELECT id, s1, s2, taxi,
CASE WHEN abs(s2 - magicrow) <= taxi
THEN int4range(s1 - (taxi-abs(s2-magicrow)), s1 + taxi-(abs(s2-magicrow)), '[]')
ELSE int4range(0,0)
END AS cannotbe
FROM coords, numbers)
SELECT * FROM inty limit 10;
id | s1 | s2 | taxi | cannotbe
----+---------+---------+---------+-------------------
35 | 1555825 | 18926 | 161355 | empty
36 | 697941 | 3552290 | 338743 | empty
37 | 3997971 | 2461001 | 89056 | empty
38 | 3818312 | 282332 | 1784860 | [3751120,3885505)
39 | 2874142 | 3053631 | 663471 | empty
40 | 1704479 | 2132468 | 177080 | [1659867,1749092)
41 | 3904910 | 2080560 | 384446 | [3601024,4208797)
42 | 657061 | 3898803 | 171870 | empty
43 | 3084398 | 3751092 | 244246 | empty
44 | 2582061 | 972407 | 1860563 | [1749091,3415032)
(10 rows)
Now that we have these into a collection of ranges, we can use the awesome
range_agg()
function to combine all these ranges into a single collection of
non-overlapping ranges. We pass that list to unnest to combine it back into a
single list of values. In our case, this is a single range, but it could have
been many. It looks like this:
WITH x AS (SELECT nextval('aoc') as id, line from aoc_day15)
,y AS (
SELECT id
,(regexp_match(line, 'x=(\-?\d+)'))[1]::int AS s1
,(regexp_match(line, 'y=(\-?\d+)'))[1]::int AS s2
,(regexp_match(line, 'beacon is at x=(\-?\d+)'))[1]::int AS b1
,(regexp_match(line, 'y=(\-?\d+)$'))[1]::int AS b2
FROM x)
,coords AS (SELECT *, abs(s1-b1) + abs(s2-b2) AS taxi FROM y)
,numbers AS (SELECT 2000000 AS magicrow)
,inty AS (
SELECT id, s1, s2, taxi,
CASE WHEN abs(s2 - magicrow) <= taxi
THEN int4range(s1 - (taxi-abs(s2-magicrow)), s1 + taxi-(abs(s2-magicrow)), '[]')
ELSE int4range(0,0)
END AS cannotbe
FROM coords, numbers)
,aggy AS (SELECT unnest(range_agg(cannotbe)) AS r FROM inty)
SELECT * FROM aggy;
r
-------------------
[-671176,4208797)
(1 row)
At this point, we are ready to solve the puzzle, so we extract all the non-overlapping ranges, figure out how "wide" they are by subtracting their lower bounds from their upper bounds, then add those all up together to get our answer. We also subtract one, as our final combined range is inclusive on one end only.
WITH x AS (SELECT nextval('aoc') as id, line from aoc_day15)
,y AS (
SELECT id
,(regexp_match(line, 'x=(\-?\d+)'))[1]::int AS s1
,(regexp_match(line, 'y=(\-?\d+)'))[1]::int AS s2
,(regexp_match(line, 'beacon is at x=(\-?\d+)'))[1]::int AS b1
,(regexp_match(line, 'y=(\-?\d+)$'))[1]::int AS b2
FROM x)
,coords AS (SELECT *, abs(s1-b1) + abs(s2-b2) AS taxi FROM y)
,numbers AS (SELECT 2000000 AS magicrow)
,inty AS (
SELECT id, s1, s2, taxi,
CASE WHEN abs(s2 - magicrow) <= taxi
THEN int4range(s1 - (taxi-abs(s2-magicrow)), s1 + taxi-(abs(s2-magicrow)), '[]')
ELSE int4range(0,0)
END AS cannotbe
FROM coords, numbers)
,aggy AS (SELECT unnest(range_agg(cannotbe)) AS r FROM inty)
,rangesums AS (SELECT upper(r)-lower(r) AS mysum FROM aggy)
SELECT sum( upper(r)-lower(r) ) -1 AS partone FROM aggy;
partone
---------
4879972
(1 row)
Part two gets a lot more challenging. We are tasked with finding the spot on the grid that is not within range of any sensor. In other words, there is a spot on the grid somewhere that is not within the taxi range of any of our rows. Sounds pretty simple, right? Yes...but only up to a point. At first I tried a brute force solution of walking through the grid, and then walking through all the sensor ranges, but both of those were quickly ruled out due to the size of the grid. As an example, here is one of the lines from the input file. Look closely at the size of the "diamond" that is generated by the sensor expanding out to the taxi distance:
-- Sensor at x=2582061, y=972407: closest beacon is at x=1749091, y=2000000
id | s1 | s2 | b1 | b2 | taxi
----+---------+---------+----------+---------+---------
11 | 2582061 | 972407 | 1749091 | 2000000 | 1860563
That diamond has a radius of 1.8 million! So this grid is entirely too large to do anything resembling a brute force approach. Not going to lie here - I struggled with this a little, until I found a good approach. The key is to realize that any open space must be bordered by more than one sensor "edge". So, we can walk the edges of each sensor (sensor plus taxi distance along the whole diamond), and look for any spot just outside the edge that is NOT in range of any other sensor. Still not cheap to do in SQL, but way better than any alternative!
Once again, we are going to put things into a grid table with rows and columns. All we care about is the location of the sensors: now that we have the taxi distance, we can discard the beacon's locations. So we will store the location of each sensor, and its taxi distance. Using the SQL above, we can quickly populate the table:
CREATE UNLOGGED TABLE sensorfarm(col int, row int, taxi int, item text);
CREATE UNIQUE INDEX aa on sensorfarm(col,row);
WITH x AS (SELECT nextval('aoc') AS id, line FROM aoc_day15)
,y AS (
SELECT id
,(regexp_match(line, 'x=(\-?\d+)'))[1]::int AS s1
,(regexp_match(line, 'y=(\-?\d+)'))[1]::int AS s2
,(regexp_match(line, 'beacon is at x=(\-?\d+)'))[1]::int AS b1
,(regexp_match(line, 'y=(\-?\d+)$'))[1]::int AS b2
FROM x)
,coords as (SELECT *, abs(s1-b1) + abs(s2-b2) AS taxi FROM y)
INSERT into sensorfarm(col,row,taxi) select s1,s2,taxi FROM coords;
At one point, I tried using arrays, but they were slow and ran into some Postgres limits:
ERROR: array size exceeds the maximum allowed (1073741823)
For the final function, I was able to implement the logic above: walk the outside of the diamond, rule out things that are out of bounds (the problem says it will be in the zero to four million row/column range), and check all the other diamonds for overlap. The full function is at the bottom. Let's look at it section by section:
CREATE or replace FUNCTION diamond_walker()
returns void language plpgsql as $$
DECLARE
maxrange int = 4000000;
myrec record;
item int = 0;
mysensor record;
current_top_row int; current_bottom_row int;
current_left_col int; current_right_col int;
northwest bool; northeast bool; southwest bool; southeast bool;
oob int = 0;
BEGIN
FOR myrec IN
SELECT row_number() OVER (ORDER BY taxi), row, col, taxi FROM sensorfarm
LOOP
...
This is a standard PL/pgSQL function, which takes no arguments, and returns
null, as we are going to throw an exception when we find the value we are
looking for. We DECLARE
all of the variables we are going to use, then start
the main loop, in which we iterate over each sensor in our sensorfarm
table.
The row_number
is not needed to solve things, but helps give us some nice
output as we are searching.
current_top_row = myrec.row;
current_bottom_row = myrec.row;
current_left_col = myrec.col - (myrec.taxi + 1);
current_right_col = myrec.col + (myrec.taxi + 1);
We are going to walk the outside of the diamond formed by the sensor and its taxi distance. We start with the same row as the sensor, but shoot out bot columns as wide as the taxi distance - in other words, the widest part of the diamond.
-- Walk the outside of the diamond for this sensor
FOR t IN 1 .. myrec.taxi+1 LOOP
-- Every once in a while, give a status report
IF (item % 10000) = 0 THEN
RAISE NOTICE '%Trying sensor #% at % / % taxi is % [% candidates] [OOB: %]', E'\033c',
myrec.row_number,
to_char(myrec.col, 'FM999,999,999'), to_char(myrec.row, 'FM9,999,999'),
to_char(myrec.taxi, 'FM9,999,999'), to_char(item, 'FM999,999,999'),
to_char(oob, 'FM999,999,999');
END IF;
item = item + 1;
Now we start the loop in which we will walk the outside of the diamond, checking
each point along the way. As this can take a very long time to run, I added some
output so that clears the screen (that magic 033c
code), then does some pretty
output showing which row we are on, and some totals on about how many places we
have checked, and how many were declared out of bounds (explained below).
-- There are cases where the whole diamond has wandered out of range
-- If that happens, we abandon the entire loop
if current_left_col > maxrange
or current_right_col < 0
or current_top_row > maxrange
or current_bottom_row < 0
THEN exit;
END IF;
It may be that our diamond is no longer in the range that we are considering, so we have a short-circuit here to simply abandon this entire sensor if that should happen.
-- Set each direction as not found
northwest = false; northeast = false; southwest = false; southeast = false;
-- Skip if this candidate is out of range
IF current_left_col < 0
THEN northwest = true; southwest = true; oob = oob + 1;
END IF;
IF current_right_col > maxrange
THEN northeast = true; southeast = true; oob = oob + 1;
END IF;
IF current_top_row < 0
THEN northwest = true; northeast = true; oob = oob + 1;
END IF;
IF current_bottom_row > maxrange
THEN southwest = true; southeast = true; oob = oob + 1;
END IF;
We are going to track each of the four edges of the diamond: northeast
,
northwest
, southeast
, and southwest
. These variables are all booleans, and
start as false. If we find something that disqualifies them as a potential
candidate for "winning", we mark it as true. To start, we do a quick check to
see if any of the four edges are out of range. If they are, we mark the
appropriate boolean and also increment our out of bounds variable.
-- Do any other diamonds contain this spot?
FOR mysensor in SELECT col,row,taxi FROM sensorfarm WHERE ctid <> myrec.ctid LOOP
-- Does this sensor cover our "northwest" pixel?
IF not northwest AND abs(mysensor.col - current_left_col) + abs(mysensor.row - current_top_row) <= mysensor.taxi
THEN northwest = true;
END IF;
-- Does this sensor cover our "northeast" pixel?
IF not northeast AND abs(mysensor.col-current_right_col) + abs(mysensor.row-current_top_row) <= mysensor.taxi
THEN northeast = true;
END IF;
-- Does this sensor cover our "southwest" pixel?
IF not southwest AND abs(mysensor.col - current_left_col) + abs(mysensor.row - current_bottom_row) <= mysensor.taxi
THEN southwest = true;
END IF;
-- Does this sensor cover our "southeast" pixel?
IF not southeast AND abs(mysensor.col - current_right_col) + abs(mysensor.row - current_bottom_row) <= mysensor.taxi
THEN southeast = true;
END IF;
-- Stop checking other sensors if all four directions are covered
IF northwest and northeast and southwest and southeast
THEN exit;
END IF;
END LOOP;
This bit of code is the meat of our function. We want to iterate through all of
the other sensors to see if, for each of our edges, that sensor overlaps the
spot we are looking at. Note that we use the special ctid
column (the physical
location of the row, more or less) to exclude comparing our current sensor to
itself. If we find that another sensor has covered any of the four edges, we
immediately exit the loop as we know this spot is not a winner.
-- If any of our four directions had no coverage, they are the winner
IF not northeast or not northwest or not southeast or not southwest
THEN
mysensor.col = CASE when northeast or southeast then current_right_col else current_left_col END;
mysensor.row = CASE when northeast or northwest then current_top_row else current_bottom_row END;
RAISE 'Winner from row %, edges of sensor at %,%. Spot %,% has a value of %',
myrec.row_number, myrec.col, myrec.row, mysensor.col, mysensor.row,
mysensor.col * maxrange::bigint + mysensor.row;
END IF;
Finally, we can check if we won, as indicated by the fact that one of our edges is still false. If so, we throw an exception and give the information about the winning spot.
current_left_col = current_left_col + 1;
current_right_col = current_right_col - 1;
current_top_row = current_top_row - 1;
current_bottom_row = current_bottom_row + 1;
END LOOP; -- outside of the diamond
END LOOP; -- each sensor
return; END; $$;
The final bit of the function is just cleanup. We move our current positions up, down, left, and right, as we continue to walk the edge of the diamond. Then we end the diamond loop, the sensor, loop, and the function ends. Here is what it looks like as it runs:
And here is the final output, showing the correct solution to part two:
select diamond_walker();
Trying sensor #21 at 3,505,508 / 2,805,537 taxi is 532,165 [4,300,000 candidates] [OOB: 1,865,312]
ERROR: Winner from edges of sensor at 3505508,2805537. Spot 3879585,2647448 has a value of 15518342647448
CONTEXT: PL/pgSQL function diamond_walker() line 92 at RAISE
The complete function:
CREATE or replace FUNCTION diamond_walker()
returns void language plpgsql as $$
DECLARE
maxrange int = 4000000;
myrec record;
item int = 0;
mysensor record;
current_top_row int; current_bottom_row int;
current_left_col int; current_right_col int;
northwest bool; northeast bool; southwest bool; southeast bool;
oob int = 0;
BEGIN
FOR myrec IN
SELECT row_number() OVER (ORDER BY taxi), row, col, taxi FROM sensorfarm
LOOP
current_top_row = myrec.row;
current_bottom_row = myrec.row;
current_left_col = myrec.col - (myrec.taxi + 1);
current_right_col = myrec.col + (myrec.taxi + 1);
-- Walk the outside of the diamond
FOR t IN 1 .. myrec.taxi+1 LOOP
-- Every once in a while, give a status report
IF (item % 10000) = 0 THEN
RAISE NOTICE '%Trying sensor #% at % / % taxi is % [% candidates] [OOB: %]', E'\033c',
myrec.row_number,
to_char(myrec.col, 'FM999,999,999'), to_char(myrec.row, 'FM9,999,999'),
to_char(myrec.taxi, 'FM9,999,999'), to_char(item, 'FM999,999,999'),
to_char(oob, 'FM999,999,999');
END IF;
item = item + 1;
-- There are cases where the whole diamond has wandered out of range
-- If that happens, we abandon the entire loop
if current_left_col > maxrange
or current_right_col < 0
or current_top_row > maxrange
or current_bottom_row < 0
THEN exit;
END IF;
-- Set each direction as not found
northwest = false; northeast = false; southwest = false; southeast = false;
-- Skip if this candidate is out of range
IF current_left_col < 0
THEN northwest = true; southwest = true; oob = oob + 1;
END IF;
IF current_right_col > maxrange
THEN northeast = true; southeast = true; oob = oob + 1;
END IF;
IF current_top_row < 0
THEN northwest = true; northeast = true; oob = oob + 1;
END IF;
IF current_bottom_row > maxrange
THEN southwest = true; southeast = true; oob = oob + 1;
END IF;
-- Do any other diamonds contain this spot?
FOR mysensor in SELECT col,row,taxi FROM sensorfarm WHERE ctid <> myrec.ctid LOOP
-- Does this sensor cover our "northwest" pixel?
IF not northwest AND abs(mysensor.col - current_left_col) + abs(mysensor.row - current_top_row) <= mysensor.taxi
THEN northwest = true;
END IF;
-- Does this sensor cover our "northeast" pixel?
IF not northeast AND abs(mysensor.col-current_right_col) + abs(mysensor.row-current_top_row) <= mysensor.taxi
THEN northeast = true;
END IF;
-- Does this sensor cover our "southwest" pixel?
IF not southwest AND abs(mysensor.col - current_left_col) + abs(mysensor.row - current_bottom_row) <= mysensor.taxi
THEN southwest = true;
END IF;
-- Does this sensor cover our "southeast" pixel?
IF not southeast AND abs(mysensor.col - current_right_col) + abs(mysensor.row - current_bottom_row) <= mysensor.taxi
THEN southeast = true;
END IF;
-- Stop checking other sensors if all four directions are covered
IF northwest and northeast and southwest and southeast
THEN exit;
END IF;
END LOOP;
-- If any of our four directions had no coverage, they are the winner
IF not northeast or not northwest or not southeast or not southwest
THEN
mysensor.col = CASE when northeast or southeast then current_right_col else current_left_col END;
mysensor.row = CASE when northeast or northwest then current_top_row else current_bottom_row END;
RAISE 'Winner from row %, edges of sensor at %,%. Spot %,% has a value of %',
myrec.row_number, myrec.col, myrec.row, mysensor.col, mysensor.row,
mysensor.col * maxrange::bigint + mysensor.row;
END IF;
current_left_col = current_left_col + 1;
current_right_col = current_right_col - 1;
current_top_row = current_top_row - 1;
current_bottom_row = current_bottom_row + 1;
END LOOP; -- outside of the diamond
END LOOP; -- each sensor
return; END; $$;
That one was a doozy, but we got it solved! Stay tuned for more solutions soon.