Advent of Code Day 3, in Raku!
published 2024-12-03
showcasing Raku Grammars, an elegant parsing tool for every programmer
I did Day 3 in Raku again! (Previously: Day 1 in Raku, Day 2 in Raku.)
As usual, I won’t re-describe the puzzles: take a look on Advent of Code.
Today I’ll be using Raku Grammars, which are a great tool for writing parsers quickly.
Part 1
We are supposed to find strings matching mul(left, right)
where left
and right
are integers, multiply left
and right
, and then add up the results of all these mul
calls.
Here’s my solution:
grammar Mul {
token TOP { [ <mul> || . ] +
{ make $<mul>.map: *.made }
}
token mul { 'mul(' <left> ',' <right> ')'
{ make $<left> * $<right> }
}
token left { \d+ }
token right { \d+ }
}
say [+] Mul.parsefile('data.txt').made;
We haven’t discussed Grammars before on this blog. Grammars are a kind of class that group regexes1, not methods. They’re very nice constructs for parsing text.
I’ll walk through the grammar, working our way up from the smallest tokens.
token left { \d+ }
token right { \d+ }
A token is a regex (like other languages’ regexes), but it does not backtrack2 and ignores whitespace in the token grammar. If you need a refresher on regex syntax: \d
means any digit, and +
means “one or more”, so \d+
is “one or more digits”. Thus, left
and right
match numbers.
token mul { 'mul(' <left> ',' <right> ')'
{ make $<left> * $<right> }
}
<left>
means “match the token named left
here”. Ignore the stuff inside the inner curly brackets for a moment; they’re not part of the token. Stuff in quotes is matched literally; this token matches “mul(123,456)” and other valid “mul” calls.
Raku is a successor to Perl, so you can expect it to have very good text manipulation skills. Inside the inner curly brackets is a grammar action. Every time the engine has a successful match on the mul
token, it runs that action. The match itself is available in-scope; < >
is the operator for getting values from a container (e.g. a hashmap), so $<left>
gives us whatever matched the left
token.
All tokens (and regexes, etc.) must return a Match
. However, we want to return an integer! The function make
stores an arbitrary payload inside the returned Match
object, which we will retrieve later.
token TOP { [ <mul> || . ] +
{ make $<mul>.map: *.made }
}
[ ]
is a grouping mechanism in Raku regexes. <mul> || .
means “match either a mul
token or any character”, and +
means “do that one or more times”.
Since we are matching against the mul
token repeatedly, $<mul>
is a list of Matches. We use map
with the Whatever Star again (discussed on Day 1) to call the method made
on each match. .made
gives us the payload we previously stored in the Match
object with make
: the product of the two numbers.
We then store this list inside the match with make
.
say [+] Mul.parsefile('data.txt').made;
parsefile
is a nice helper function that parses the entire file. parse
and parsefile
will automatically use the token named TOP
if none is specified. We then call made
on the match to get our list of products (from the payload).
On Day 1 we discussed the reduction metaoperator [ ]
, which reduces the list with +
as the operator. So this line is summing up the list of products and then printing it.
Part 2
For this part we’re tasked with ignoring mul(...,...)
s that follow don't()s
, until a do()
appears.
grammar Mul {
token TOP { [ <mul> || <do_call> || <dont_call> || . ] +
:my $*paused = False;
{ make $<mul>.map(*.made) }
}
token mul { 'mul(' <left> ',' <right> ')'
{
make $<left> * $<right> * $*paused
}
}
token do_call { 'do()'
{ $*paused = False; }
}
token dont_call { 'don\'t()'
{ $*paused = True; }
}
token left { \d+ }
token right { \d+ }
}
say [+] Mul.parsefile('data.txt').made;
I’ve highlighted the sections that have significantly changed (the TOP
token has also changed to match against do()
and don't()
tokens).
:
in a token lets us declare a new variable. It has to be a dynamic variable (which looks up the value in the caller’s scope) instead of a normal variable (which would look up the value in the current scope). We need this behavior because we need our state (here, whether we “pause” calculating products) to be shared across tokens. We declare this variable as dynamic with the $*
twigil.
We can then use this variable like usual in the other blocks. Our do_call
token has an action that sets paused
to false; the dont_call
token’s action sets paused
to true. The mul
token’s action gives us the product of left and right, multiplied with paused
; since paused
is a boolean, when coerced to a number it becomes 0
if False
and 1
if True.
Over the past 3 days I’ve been trying to make the case that Raku is very powerful for one-off scripts. I hope today’s solution encourages you to use Raku Grammars for all some of your bespoke parsing needs!
- Some of you are already saying “That doesn’t look like Perl-Compatible Regular Expressions and I thought Raku was a successor to Perl!“. I suppose there’s a reason why they changed the name :). More seriously, Raku made these changes to make regexes more maintainable. In my opinion, Raku regexes and grammars are much easier to read and modify.↩
- For example, in all regex engines, if you have “1234” and you want to match with the regex
.*3
, the engine will expand.*
to “123” (*
is greedy unless followed by?
) and compare the regex’s “3” with the string’s “4”; because that fails, if backtracking is enabled (which is typically the case) the engine backtracks and tries “12” as the expansion for.*
, compares “3” with the string’s “3”, which succeeds.↩