Advent of Code Day 3, in Raku!

published 2024-12-03

showcasing Raku Grammars, an elegant parsing tool for every programmer

I did Day 3 in Raku again! (Previously: Day 1 in Raku, Day 2 in Raku.)

As usual, I won’t re-describe the puzzles: take a look on Advent of Code.

Today I’ll be using Raku Grammars, which are a great tool for writing parsers quickly.

Part 1

We are supposed to find strings matching mul(left, right) where left and right are integers, multiply left and right, and then add up the results of all these mul calls.

Here’s my solution:

grammar Mul {
  token TOP { [ <mul> || . ] +
    { make $<mul>.map: *.made }
  }
  token mul { 'mul(' <left> ',' <right> ')'
    { make $<left> * $<right> }
  }
  token left { \d+ }
  token right { \d+ }
}

say [+] Mul.parsefile('data.txt').made;

We haven’t discussed Grammars before on this blog. Grammars are a kind of class that group regexes¹, not methods. They’re very nice constructs for parsing text.

I’ll walk through the grammar, working our way up from the smallest tokens.

  token left { \d+ }
  token right { \d+ }

A token is a regex (like other languages’ regexes), but it does not backtrack² and ignores whitespace in the token grammar. If you need a refresher on regex syntax: \d means any digit, and + means “one or more”, so \d+ is “one or more digits”. Thus, left and right match numbers.

  token mul { 'mul(' <left> ',' <right> ')'
    { make $<left> * $<right> }
  }

<left> means “match the token named left here”. Ignore the stuff inside the inner curly brackets for a moment; they’re not part of the token. Stuff in quotes is matched literally; this token matches “mul(123,456)” and other valid “mul” calls.

Raku is a successor to Perl, so you can expect it to have very good text manipulation skills. Inside the inner curly brackets is a grammar action. Every time the engine has a successful match on the mul token, it runs that action. The match itself is available in-scope; < > is the operator for getting values from a container (e.g. a hashmap), so $<left> gives us whatever matched the left token.

All tokens (and regexes, etc.) must return a Match. However, we want to return an integer! The function make stores an arbitrary payload inside the returned Match object, which we will retrieve later.

  token TOP { [ <mul> || . ] +
    { make $<mul>.map: *.made }
  }

[ ] is a grouping mechanism in Raku regexes. <mul> || . means “match either a mul token or any character”, and + means “do that one or more times”.

Since we are matching against the mul token repeatedly, $<mul> is a list of Matches. We use map with the Whatever Star again (discussed on Day 1) to call the method made on each match. .made gives us the payload we previously stored in the Match object with make: the product of the two numbers.

We then store this list inside the match with make.

say [+] Mul.parsefile('data.txt').made;

parsefile is a nice helper function that parses the entire file. parse and parsefile will automatically use the token named TOP if none is specified. We then call made on the match to get our list of products (from the payload).

On Day 1 we discussed the reduction metaoperator [ ], which reduces the list with + as the operator. So this line is summing up the list of products and then printing it.

Part 2

For this part we’re tasked with ignoring mul(...,...)s that follow don't()s, until a do() appears.

grammar Mul {
  token TOP { [ <mul> || <do_call> || <dont_call> || . ] +
    :my $*paused = False;
    { make $<mul>.map(*.made) }
  }
  token mul { 'mul(' <left> ',' <right> ')'
    {
      make $<left> * $<right> * $*paused
    }
  }
  token do_call { 'do()'
    { $*paused = False; }
  }
  token dont_call { 'don\'t()'
    { $*paused = True; }
  }
  token left { \d+ }
  token right { \d+ }
}

say [+] Mul.parsefile('data.txt').made;

I’ve highlighted the sections that have significantly changed (the TOP token has also changed to match against do() and don't() tokens).

: in a token lets us declare a new variable. It has to be a dynamic variable (which looks up the value in the caller’s scope) instead of a normal variable (which would look up the value in the current scope). We need this behavior because we need our state (here, whether we “pause” calculating products) to be shared across tokens. We declare this variable as dynamic with the $* twigil.

We can then use this variable like usual in the other blocks. Our do_call token has an action that sets paused to false; the dont_call token’s action sets paused to true. The mul token’s action gives us the product of left and right, multiplied with paused; since paused is a boolean, when coerced to a number it becomes 0 if False and 1 if True.

Over the past 3 days I’ve been trying to make the case that Raku is very powerful for one-off scripts. I hope today’s solution encourages you to use Raku Grammars for ~~all~~ some of your bespoke parsing needs!

Some of you are already saying “That doesn’t look like Perl-Compatible Regular Expressions and I thought Raku was a successor to Perl!“. I suppose there’s a reason why they changed the name :). More seriously, Raku made these changes to make regexes more maintainable. In my opinion, Raku regexes and grammars are much easier to read and modify.↩
For example, in all regex engines, if you have “1234” and you want to match with the regex .*3, the engine will expand .* to “123” (* is greedy unless followed by ?) and compare the regex’s “3” with the string’s “4”; because that fails, if backtracking is enabled (which is typically the case) the engine backtracks and tries “12” as the expansion for .*, compares “3” with the string’s “3”, which succeeds.↩

Tags: