I did AoC day 1 using base R.

Spoilers ahead. Also, while this code works, I don’t know how elegant it is or if there are better ways to do parts of it in base. If you have any tips or pointers I’d love to hear them!

Part 1

The problem

— Day 1: Trebuchet?! —

Something is wrong with global snow production, and you’ve been selected to take a look. The Elves have even given you a map; on it, they’ve used stars to mark the top fifty locations that are likely to be having problems.

You’ve been doing this long enough to know that to restore snow operations, you need to check all fifty stars by December 25th.

Collect stars by solving puzzles. Two puzzles will be made available on each day in the Advent calendar; the second puzzle is unlocked when you complete the first. Each puzzle grants one star. Good luck!

You try to ask why they can’t just use a weather machine (“not powerful enough”) and where they’re even sending you (“the sky”) and why your map looks mostly blank (“you sure ask a lot of questions”) and hang on did you just say the sky (“of course, where do you think snow comes from”) when you realize that the Elves are already loading you into a trebuchet (“please hold still, we need to strap you in”).

As they’re making the final adjustments, they discover that their calibration document (your puzzle input) has been amended by a very young Elf who was apparently just excited to show off her art skills. Consequently, the Elves are having trouble reading the values on the document.

The newly-improved calibration document consists of lines of text; each line originally contained a specific calibration value that the Elves now need to recover. On each line, the calibration value can be found by combining the first digit and the last digit (in that order) to form a single two-digit number.

For example:

1abc2
pqr3stu8vwx
a1b2c3d4e5f
treb7uchet

In this example, the calibration values of these four lines are 12, 38, 15, and 77. Adding these together produces 142.

Consider your entire calibration document. What is the sum of all of the calibration values?

Solution

lines <- readLines("input.txt")

lines[1:3]

[1] "mxmkjvgsdzfhseightonetwoeight7" "3five4s84four9rtbzllggz"       
[3] "75sevenzdrpkv1onetwo"

matches <- gregexpr(pattern = "\\d", text = lines)
digits <- regmatches(lines, m = matches)

digits[1:3]

[[1]]
[1] "7"

[[2]]
[1] "3" "4" "8" "4" "9"

[[3]]
[1] "7" "5" "1"

concatenated <- lapply(digits, \(x) {
  number_of_digits <- length(x)

  # if only one number in a line it gets counted as both the
  # first AND last number
  ifelse(
    number_of_digits == 1,
    paste0(x, x),
    paste0(x[1], x[number_of_digits])
  )
})

concatenated[1:3]

[[1]]
[1] "77"

[[2]]
[1] "39"

[[3]]
[1] "71"

sum(as.integer(unlist(concatenated)))

[1] 55621

Part 2

The problem

— Part Two —

Your calculation isn’t quite right. It looks like some of the digits are actually spelled out with letters: one, two, three, four, five, six, seven, eight, and nine also count as valid “digits”.

Equipped with this new information, you now need to find the real first and last digit on each line. For example:

two1nine
eightwothree
abcone2threexyz
xtwone3four
4nineeightseven2
zoneight234
7pqrstsixteen

In this example, the calibration values are 29, 83, 13, 24, 42, 14, and 76. Adding these together produces 281.

What is the sum of all of the calibration values?

Solution

This is the point at which I cursed my commitment to doing this in base R. After a few hours I had to look up a hint, and in the process also checked for other solutions in R. Everyone else is just using stringr, dplyr, and/or tidyr. As a result they have reasonably simple regex and are mutating tibbles. I’m out here using functions I don’t fully understand, trying to understand lookarounds, lapplying my way over lists 🥲

Many of us who got stuck encountered the same issue: overlapping digits in strings. For example, the first two digits from zqtwonethreekcz3seven2 are two and one. In this case it wouldn’t really matter, but without getting overlapping strings separated like this, it could make me count a string like mwnineight as just containing nine, and then I’d end up with the answer as 99.

And unfortunately the example doesn’t mention this at all.

I managed to get a working regex on regex101.com but of course it didn’t work with gregexpr. Honestly, at this point I’m out of my depth, so just systematically worked through the small number of functions available to me seeing what looked like it worked.

And…

# Expand the regex to include word versions and a positive lookahead assertion, to accommodate for overlapping matches
pattern <- "(?=(\\d)|(one)|(two)|(three)|(four)|(five)|(six)|(seven)|(eight)|(nine))"
matches <- gregexec(pattern = pattern, text = lines, perl = TRUE)
res <- regmatches(lines, m = matches)

res[[985]]

      [,1]  [,2]  [,3]    [,4] [,5]    [,6]
 [1,] ""    ""    ""      ""   ""      ""  
 [2,] ""    ""    ""      "3"  ""      "2" 
 [3,] ""    "one" ""      ""   ""      ""  
 [4,] "two" ""    ""      ""   ""      ""  
 [5,] ""    ""    "three" ""   ""      ""  
 [6,] ""    ""    ""      ""   ""      ""  
 [7,] ""    ""    ""      ""   ""      ""  
 [8,] ""    ""    ""      ""   ""      ""  
 [9,] ""    ""    ""      ""   "seven" ""  
[10,] ""    ""    ""      ""   ""      ""  
[11,] ""    ""    ""      ""   ""      ""

The ffff??? a MATRIX?

Like most people who use R to work with data but never do statistics, I’ve never. Ever. Interacted with a matrix. I have a vague idea about how to index them.

But it turns out they’re easy to collapse and I also got to use one of my fave base functions, nzchar. It returns TRUE for a non-zero length ‘scalar’ character vector, which makes it excellent for logical subsetting.

# Collapse the results into just the matches
digits <- lapply(res, \(x) x[nzchar(x)])

digits[[985]]

[1] "two"   "one"   "three" "3"     "seven" "2"

From here it’s fairly straightforward.

mapping <- stats::setNames(
  1:9,
  c("one", "two", "three", "four", "five", "six", "seven", "eight", "nine")
)

all_numeric <- lapply(digits, \(x) {
  mapped <- mapping[x]

  # Backfill the NAs with the original numeric values
  numeric_indices <- which(is.na(mapped))
  mapped[numeric_indices] <- x[numeric_indices]

  mapped
})

length(all_numeric[lengths(all_numeric) == 1])

[1] 62

There are still a small number of lines where there is only a single number, so I’ll stick with the same code from before… yes I am going to copy and paste considering this is so trivial.

concatenated <- lapply(all_numeric, \(x) {
  number_of_digits <- length(x)

  ifelse(
    number_of_digits == 1,
    paste0(x, x),
    paste0(x[1], x[number_of_digits])
  )
})

concatenated[1:3]

[[1]]
[1] "87"

[[2]]
[1] "39"

[[3]]
[1] "72"

sum(as.integer(unlist(concatenated)))

[1] 53592