Day 4: Security Through Obscurity

author: Harshvardhan Pandit

license: MIT

link to problem statement

Finally, you come across an information kiosk with a list of rooms. Of course, the list is encrypted and full of decoy data, but the instructions to decode the list are barely hidden nearby. Better remove the decoy data first.

Each room consists of an encrypted name (lowercase letters separated by dashes) followed by a dash, a sector ID, and a checksum in square brackets.

A room is real (not a decoy) if the checksum is the five most common letters in the encrypted name, in order, with ties broken by alphabetization. For example:

aaaaa-bbb-z-y-x-123[abxyz] is a real room because the most common letters are a (5), b (3), and then a tie between x, y, and z, which are listed alphabetically.

a-b-c-d-e-f-g-h-987[abcde] is a real room because although the letters are all tied (1 of each), the first five are listed alphabetically.

not-a-real-room-404[oarel] is a real room.

totally-real-room-200[decoy] is not.

Of the real rooms from the list above, the sum of their sector IDs is 1514.

What is the sum of the sector IDs of the real rooms?

Solution logic

First of all, we need to separate the input into characters, sector ID and the checksum. Since the input tokens are of a fixed format, we'll use regex.

The regular expression would be something of the following form.

^([a-z\-]+)(\d+)\[(\w+)\]$

Which translates to:

- `^` Starting at the line
- (([a-z])-)+ having multiple recurrences of groups containing lowercase characters followed by a dash
- (\d+) followed by a sequence of numbers
- \[(\w+)\] followed by a sequence of alphanumeric characters enclosed in square brackets
- $ End of line

An excellent (online) resource to practice regular expression matching is regex101. Be sure to select python as the language!

Algorithm

- Initialize `sector_id_sum` to `0`
- Read in the input
- For each line:
    - Tokenize using regex
    - Strip dashes from characters
    - Count character count
    - Order by decreasing count, and increasing character for same count values
    - Check if the checksum is a substring at position 0 in the character count string
        - If it is, add its sector ID to the sector ID sum

Counter

Python has a nifty utility in the collections utility called Counter, which counts the occurences of an object as it is added to a dictionary. Basically, it is a modified dictionary, where if an item is not present, it's count is 0, and every time an item is added to the dictionary, it's value is incremented by 1.

We use this utility to count the character occurences in the characters token.


In [1]:
import re
pattern = re.compile(r'^([a-z\-]+)(\d+)\[(\w+)\]$')

Read in the input


In [2]:
with open('../inputs/day04.txt', 'r') as f:
    data = [line.strip() for line in f.readlines()]

Apply the algorithm. The answer will be in sector_id_sum.


In [3]:
from collections import Counter

sector_id_sum = 0

for line in data:
    match = pattern.match(line)
    characters, sector_id, checksum = match.groups()
    characters.replace('-','')
    character_counter = Counter(characters.replace('-', ''))
    character_checksum = ''.join(
        [x[0] for x in sorted(
                sorted(character_counter.items(), key=lambda x: x[0]), 
                key=lambda x: x[1], reverse=True)])
    if character_checksum.startswith(checksum):
        sector_id_sum += int(sector_id)

I thought there is a need to explain this line, as it is quite long, and can be confusing.

character_checksum = ''.join(
        [x[0] for x in sorted(
                sorted(character_counter.items(), key=lambda x: x[0]), 
                key=lambda x: x[1], reverse=True)])

Let's start from the outside:

- We want a checksum made up of characters ordered by the sequence, so the `join` method combines the characters from the inner list comprehensions
- The first `sorted` call sorts the items in a descending order based on the number of character occurences
- The second, or inner `sorted` call sorts the items in an ascending order based on the character themselves. This allows characters with equal occurences to be positioned according to their ASCII value (or position in the alphabet).
- The `character_counter.item()` call gets the character counts from the `Counter (dictionary)` as a list of tuples with the first attribute being the character, and the second being the count.

Part Two

With all the decoy data out of the way, it's time to decrypt this list and get moving.

The room names are encrypted by a state-of-the-art shift cipher, which is nearly unbreakable without the right software. However, the information kiosk designers at Easter Bunny HQ were not expecting to deal with a master cryptographer like yourself.

To decrypt a room name, rotate each letter forward through the alphabet a number of times equal to the room's sector ID. A becomes B, B becomes C, Z becomes A, and so on. Dashes become spaces.

For example, the real name for qzmt-zixmtkozy-ivhz-343 is very encrypted name.

What is the sector ID of the room where North Pole objects are stored?

Solution logic

The number 343 is absurd and high. Considering there are only 26 letters. So no matter how many times we rotate through them, we'll start over again. So it is better to rotate it a maximum of 26 times. Which can be done by using the mod function. The rotate function then simply becomes:

`(ord(x) - ord('a') + (sector_id % 26)) % 26`

Here, ord gives the ASCII value of the character, and we shift it by sector_id (mod 26 to keep it from rotating), and then get the mod again in case the value flows over (goes from z back to a). Note, that we must subtract the ASCII value of a since we assume letter start from 0. When converting from this value back to a character, we have to add the ASCII value of a back.

`chr(x + ord('a'))`

Algorithm

- For every valid room
    - apply the rotate function
        - check if character is a letter
        - otherwise replace with space
    - join the characters to get room name
    - store the room name and room ID

The room name containing the north pole objects starts with northpole, so we search for that particular room, assgin the variable to it, and break out of the loop.


In [4]:
for line in data:
    match = pattern.match(line)
    characters, sector_id, checksum = match.groups()
    characters.replace('-',' ')
    character_counter = Counter(characters.replace('-', ''))
    character_checksum = ''.join(
        [x[0] for x in sorted(
                sorted(character_counter.items(), key=lambda x: x[0]), 
                key=lambda x: x[1], reverse=True)])
    if not character_checksum.startswith(checksum):
#         not a valid room
        continue
    
#   Part Two
    sector_id = int(sector_id)
    room_name = []
    for char in characters:
        if not char.isalpha():
            room_name.append(' ')
            continue
        char = (ord(char) - ord('a') + (sector_id % 26)) % 26
        room_name.append(chr(char + ord('a')))
    if 'northpole' in ''.join(room_name):
        northpole_sector = sector_id
        break

== END ==