Thursday, June 27, 2013

Solving NetHack's Mastermind

Note: This article was originally written in 2006. It is published here for posterity.

NetHack's Mastermind is very similar to the real Mastermind, with three changes:

  • Instead of five colors, there are seven notes (A, B, C, D, E, F, and G).
  • There are five positions, not four.
  • You can give partial solutions (such as AB even though the solution is five notes long).

A gear represents a note in the correct position. A tumbler represents a note in an incorrect position. So, for example, if you're trying to find the tune FFAAB and you guess AFAFB, you'll get three gears and two tumblers. If you guess CA you'll get one tumbler. If you guess BBBBB you'll get one gear. If you guess DDDDD you'll get silence (in other words, no gears and no tumblers).

This document describes my attempts at optimizing a NetHack Mastermind solver (with help from Sean Kelly and papers from Don Knuth and Barteld Kooi). The first, naïve algorithm finds the solution on average within 15 turns, possibly taking up to 21. The last, most effective algorithm finds the solution on average within 4 turns, possibly taking up to only 6. This is an important result because in NetHack, the level where you play Mastermind is one of the most dangerous, so you have a strong incentive (your own survival) to find the solution as quickly as possible.


Algorithm: Naïve 1

The first tune we always test is A. If that's a gear, then we're done with the first position. We don't know if any more As are in the tune, so our next stab is AA. Back to the first play though: if A produces a tumbler, then we skip it for now and move on to B. If we get silence, then we know A is not in the tune at all, so we remove it from any further consideration. Repeat this until we have found the tune.

This basic algorithm is straightforward and works well enough. This is roughly what I do when I am playing NetHack, since it can be done without assistance.

So let's use this algorithm to try to find a particular tune:

  1. A: one tumbler (we know there's an A but not in the first position, so skip A for a while)
  2. B: one tumbler (ditto. skip B for a while)
  3. C: one gear (great! first position is C. Try C again)
  4. CC: one gear (now we know there are no more Cs in this tune)
  5. CD: two gears (second position is D. try D again)
  6. CDD: two gears (now we know there are no more Ds in this tune)
  7. CDE: two gears (now we know there are no Es in this tune)
  8. CDF: three gears (third position is F. try F again)
  9. CDFF: three gears (so we know there are no more Fs in this tune)
  10. CDFG: three gears (so we know there are no Gs in this tune)
  11. CDFA: three gears, one tumbler (we know there's an A but not in the fourth position, so skip A for a while)
  12. CDFB: four gears (fourth position is B. try B again)
  13. CDFBB: four gears (so we know there are no more Bs in this tune)
  14. CDFBA: five gears (nailed it!)

This first algorithm looks like this in pseudocode:

  notes = A, B, C, D, E, F, G
  tune  = ""

  while tune.length < 5
    try tune + notes[0]

    if tumblers
      # move this note to the end of the note list
      # we're guaranteed to see the correct note before we see this one again
      note = notes.delete_at[0]
      notes.push(note)
    elsif gears > tune.length
      # correct note
      tune += notes[0]
    else
      notes.delete_at[0]
  end

Here are its results:

  • Average turns per tune: 14.69 (246880 turns to solve all tunes / 16807 tunes)

Algorithm: Naïve 2

The first optimization we can make is that, if at any time we have exactly one possible note left, the rest of the tune most consist solely of that note.

  notes = A, B, C, D, E, F, G
  tune  = ""

  while tune.length < 5
    if notes.size == 1
      tune += notes[0] while tune.length < 5
      last
    end

    try tune + notes[0]

    if any tumblers
      # move this note to the end of the note list
      # we're guaranteed to see the correct note before we see this one again
      notes.push(notes.delete_at[0])
    elsif gears > tune.length
      # correct note
      tune += notes[0]
    else
      notes.delete_at[0]
  end
  • Average turns per tune: 13.55 (227729 turns / 16807 tunes)
  • Average turn savings: 1.14

Algorithm: Naïve 3

The next savings (which will be minor compared to the previous one) is if we've seen five distinct notes, we can rule out any notes not among those five:

  notes = A, B, C, D, E, F, G
  tune  = ""
  seen  = nil

  while tune.length < 5
    if seen.size == 5
      foreach element in notes
        delete it if it's not in seen
      end
    end

    if notes.size == 1
      tune += notes[0] while tune.length < 5
      last
    end

    try tune + notes[0]

    if any tumblers
      seen.add(notes[0]) unless seen.contains(notes[0])
      # move this note to the end of the note list
      # we're guaranteed to see the correct note before we see this one again
      notes.push(notes.delete_at[0])
    elsif gears > tune.length
      # correct note
      tune += notes[0]
      seen.add(notes[0]) unless seen.contains(notes[0])
    else
      notes.delete_at[0]
  end
  • Average turns per tune: 13.50 (226896 turns / 16807 tunes)
  • Average turn savings: 0.05

Algorithm: Naïve 4

So much for the obvious optimizations. Let's try tweaking some details of the algorithm a bit just to see if it helps or hurts.

You know how when we get a tumbler we move the first note to the last position? Let's try doing that when we get a gear, too. Turns out this saves us almost .25 turns.

  notes = A, B, C, D, E, F, G
  tune  = ""
  seen  = nil

  while tune.length < 5
    if seen.size == 5
      foreach element in notes
        delete it if it's not in seen
      end
    end

    if notes.size == 1
      tune += notes[0] while tune.length < 5
      last
    end

    try tune + notes[0]

    if any tumblers
      seen.add(notes[0]) unless seen.contains(notes[0])
      # move this note to the end of the note list
      # we're guaranteed to see the correct note before we see this one again
      notes.push(notes.delete_at[0])
    elsif gears > tune.length
      # correct note
      tune += notes[0]
      seen.add(notes[0]) unless seen.contains(notes[0])
      notes.push(notes.delete_at[0])
    else
      notes.delete_at[0]
  end
  • Average turns per tune: 13.27 (223060 turns / 16807 tunes)
  • Average turn savings: 0.23

Algorithm: Knuth 1

Due to arcanehl's constant (yet ever friendly) prodding, I picked this code up again. He started implementing Knuth's Mastermind algorithm, which, while slower, is much more effective. The average turns to solve a tune goes down from 13 to just 5!

It's a radical change from the algorithm used above, so allow me to explain. The basic idea is after a guess, we eliminate any possibilities that would not produce the same score. For example, if we try AAABB and get 2 gears, 0 tumblers, then we can eliminate any possibility that does not produce 2 gears and 0 tumblers against AAABB (such as CCCCC which would produce 0 gears and 0 tumblers). The initial guess is hardcoded to be AAABB though it could be something entirely different. The next guess is (currently) taken arbitrarily; we use the first possibility.

I changed from Perl to C during testing for speed: arcanehl's Ruby implementation of this algorithm takes hours, mine takes less than twenty-five seconds. Of course, these are the results of timing every possible tune (there are 16807 of them). Going through the algorithm for just one tune will be fast enough at a hundredth the speed.

    possibilities = ... # each possible tune
    while possibilities.size > 1
    if possibilities.size == 16807  # first guess
        guess = "AAABB"
    else
        guess = possibilities[0]
    end

    real_score = try guess

    possibilities.delete_if {|possibility| score(possibility, guess) != real_score}
    end
  • Average turns per tune: 5.5 (93103 turns / 16807 tunes)
  • Average turn savings: 7.7

Algorithm: Knuth 2

About the only possible improvement we can make is to generate a better guess than always using the first possible tune. I figured the median tune (or close enough to it) would be better than the first, and I was right, saving over half a turn.

    possibilities = ... # each possible tune
    while possibilities.size > 1
    if possibilities.size == 16807  # first guess
        guess = "AAABB"
    else
        guess = possibilities[possibilities.size/2]
    end

    real_score = try guess

    possibilities.delete_if {|possibility| score(possibility, guess) != real_score}
    end
  • Average turns per tune: 4.92 (82643 turns / 16807 tunes)
  • Average turn savings: 0.62

Algorithm: Knuth 3

Mysteriously, we find that the best initial tune is AABBC, not AAABB. (that's called foreshadowing)

This is the algorithm that is used in Rodney3. To save memory, all he remembers for each player is the results of each tune. Naturally there's a fair amount of reprocessing, but a 16807-element array would be kind of large to save for each person (and I'm afraid of Perl bitfields).

    possibilities = ... # each possible tune
    while possibilities.size > 1
    if possibilities.size == 16807  # first guess
        guess = "AABBC"
    else
        guess = possibilities[possibilities.size/2]
    end

    real_score = try guess

    possibilities.delete_if {|possibility| score(possibility, guess) != real_score}
    end
  • Average turns per tune: 4.85 (81446 turns / 16807 tunes)
  • Average turn savings: 0.07

Algorithm: Knuth 4

Enough goofing around. Time to implement the rest of Knuth's algorithm: at each step, guess the possibility that would eliminate the most possibilities in the worst case. This means that no matter what the response is for the guess, we'll eliminate a maximum number of possibilities. The way I implement it has time complexity O(n^2) where n is the number of tunes left; I don't know if there's a more efficient solution. For each guess i we score each possible tune j. The worst case for i is equal to the largest number of remaining tunes after playing it (given all possible responses, such as three tumblers and one gear). Then we find the smallest worst case (ie, smallest remaining number of tunes) over all i and we play it!

This takes hours even in my somewhat tuned C implementation. Using the median element as above may be more appropriate if you're playing all 16807 tunes, since this extra processing is probably not worth the turn savings. Otherwise, if you're just finding one tune as in an actual game of NetHack, you'll of course want to use the best algorithm available.

Note: we found the best first tune (AABBC) by running this code without the hardcoded first guess. The code obviously produces the same results without the hardcoded first guess, but there's no reason to do all that processing (which is a lot for all 16807 tunes) when we know the first best guess is always AABBC.

    possibilities = ... # each possible tune
    while possibilities.size > 1
    if possibilities.size == 16807  # first guess
        guess = "AABBC"
    else
          best_worst_case = possibilities.size + 100
        foreach i in possibilities
        remaining = []

        foreach j in possibilities
            score = score(j, i)
            remaining[score] += 1
        end

        worst_case = remaining.max
        if worst_case < best_worst_case
            best_worst_case = worst_case
            guess = i
        end
        end
    end

    real_score = try guess

    possibilities.delete_if {|possibility| score(possibility, guess) != real_score}
    end
  • Average turns per tune: 4.61 (77422 turns / 16807 tunes)
  • Average turn savings: 0.24

Algorithm: Breadth 1

I was talking in #nethack and someone brought up Mastermind. So that revived this whole thing. I was idly searching on the arXiv for Mastermind papers, and only found the one that proves Mastermind is NP-complete, which while good to know, doesn't directly help me improve my solving algorithms. However, I then remembered Google Scholar and searched for Knuth's paper. I couldn't find it, but it did return a few dozen papers that referenced Knuth's; including one by a Mr. Barteld Kooi. The paper, "Yet Another Mastermind Strategy," is available here. The gist of the algorithm is that instead of taking the best worst case scenario (as we do in Knuth's algorithm), take the tune that would produce the largest number of unique responses. So a tune that can return six unique responses is valued over a tune that can only return three unique responses. In the author's words, "[In this strategy, o]ne only looks at the 'breadth' of a partition. On the other side of the spectrum one finds Knuth's worst case strategy, which only looks at the 'depth' of a partition." This next algorithm implements exactly that; looking at only the breadth.

Also, I've rerun the algorithm without the hardcoded first guess, but it guessed AABBC anyway, which is a convenient result.

    possibilities = ... # each possible tune
    while possibilities.size > 1
    if possibilities.size == 16807  # first guess
        guess = "AABBC"
    else
        best = 0
        foreach i in possibilities
        responses = []

        foreach j in possibilities
            score = score(j, i)
            responses[score] += 1
        end

        cnt = 0
        foreach response in responses
            if response > 0
            cnt += 1
            end
        end

        if cnt > best
            best = cnt
            guess = i
        end
    end

    real_score = try guess

    possibilities.delete_if {|possibility| score(possibility, guess) != real_score}
    end
  • Average turns per tune: 4.5 (76038 turns / 16807 tunes)
  • Average turn savings: 0.08

Algorithm: Breadth 2

A consistent guess is one that has a chance of being correct; that is, it's consistent with the tunes we've played so far. When figuring out the next tune to play, we discard any inconsistent guesses. Well, let's see what happens when we keep them!

    possibilities = all_tunes = ... # each possible tune
    while possibilities.size > 1
    if possibilities.size == 16807  # first guess
        guess = "AABBC"
    else
        best = 0
        foreach i in all_tunes
        responses = []

        foreach j in possibilities
            score = score(j, i)
            responses[score] += 1
        end

        cnt = 0
        foreach response in responses
            if response > 0
            cnt += 1
            end
        end

        if cnt > best
            best = cnt
            guess = i
        end
        end
    end

    real_score = try guess

    possibilities.delete_if {|possibility| score(possibility, guess) != real_score}
    end
  • Average turns per tune: 4.4 (73803 turns / 16807 tunes)
  • Average turn savings: 0.13

Algorithm: Knuth 5

Let's apply the same change to Knuth's algorithm, just in case it ends up being better than Breadth 2. Turns out it isn't. The "average turn savings" below is in relation to the Knuth 4 algorithm, not Breadth 2.

    possibilities = all_tunes = ... # each possible tune
    while possibilities.size > 1
    if possibilities.size == 16807  # first guess
        guess = "AABBC"
    else
        best_worst_case = possibilities.size + 100
        foreach i in all_tunes
        remaining = []

        foreach j in possibilities
            score = score(j, i)
            remaining[score] += 1
        end

        worst_case = remaining.max
        if worst_case < best_worst_case
            best_worst_case = worst_case
            guess = i
        end
        end
    end

    real_score = try guess

    possibilities.delete_if {|possibility| score(possibility, guess) != real_score}
    end
  • Average turns per tune: 4.58 (76911 turns / 16807 tunes)
  • Average turn savings: 0.03

Algorithm: Breadth 3

Let's prefer to guess consistently over inconsistently when able. As of Breadth 2 we take the first tune that produces the maximum partition set size, even if that guess is inconsistent. If two tunes produce the same (maximum) partition set size, and one is consistent and the other is not, we should definitely pick the consistent one, because we might stumble upon the answer. One possible area of improvement is listing every tune that produces the maximum partition set size and picking the median one. Also, say an inconsistent tune produces a partition set size of 9 while a consistent guess produces a partition set size of 8. Should we play the consistent one?

    possibilities = all_tunes = ... # each possible tune
    while possibilities.size > 1
    if possibilities.size == 16807  # first guess
        guess = "AABBC"
    else
        best = 0
        best_is_consistent = false

        foreach i in all_tunes
        responses = []

        foreach j in possibilities
            score = score(j, i)
            responses[score] += 1
        end

        cnt = 0
        foreach response in responses
            if response > 0
            cnt += 1
            end
        end

        if cnt > best or (!best_is_consistent and i_is_consistent and cnt == best)
            best = cnt
            guess = i
            best_is_consistent = i_is_consistent
        end
        end
    end

    real_score = try guess

    possibilities.delete_if {|possibility| score(possibility, guess) != real_score}
    end
  • Average turns per tune: 4.38 (73622 turns / 16807 tunes)
  • Average turn savings: 0.01

Algorithm: Knuth 6

Let's again apply the same change to Knuth's algorithm, just on the off-chance it ends up being better than Breadth 3. Turns out it isn't. The "average turn savings" below is in relation to the Knuth 5 algorithm, not Breadth 3.

    possibilities = all_tunes = ... # each possible tune
    while possibilities.size > 1
    if possibilities.size == 16807  # first guess
        guess = "AABBC"
    else
        best_worst_case = possibilities.size + 100
        best_is_consistent = false

        foreach i in all_tunes
        remaining = []

        foreach j in possibilities
            score = score(j, i)
            remaining[score] += 1
        end

        worst_case = remaining.max
        if worst_case < best_worst_case or (!best_is_consistent and i_is_consistent and worst_case == best_worst_case)
            best_worst_case = worst_case
            guess = i
            best_is_consistent = i_is_consistent
        end
        end
    end

    real_score = try guess

    possibilities.delete_if {|possibility| score(possibility, guess) != real_score}
    end
  • Average tunes per turn: 4.51 (75864 turns / 16807 tunes)
  • Average turn savings: 0.06

Results

Here are the exact results of each algorithm.

  1. Naïve 1: Guess notes in turn, remove note if silence, keep track of tune so far
  2. Naïve 2: If one note left, fill rest of tune with it
  3. Naïve 3: If five notes seen, eliminate others
  4. Naïve 4: Rotate notes on gear
  5. Knuth 1: Eliminate impossible tunes, guess first possibility
  6. Knuth 2: Guess median possibility, not first
  7. Knuth 3: Start with AABBC not AAABB
  8. Knuth 4: Greedily guess best possibility
  9. Knuth 5: Allow inconsistent guesses
  10. Knuth 6: Prefer consistent guesses
  11. Breadth 1: Guess tune that would give the most unique answers
  12. Breadth 2: Allow inconsistent guesses
  13. Breadth 3: Prefer consistent guesses

Naïve

turns Naïve 1 Naïve 2 Naïve 3 Naïve 4
1
2
3
4
5 1 1 1 1
6 5 6 6 7
7 15 21 21 28
8 35 77 77 85
9 70 238 242 275
10 126 721 750 825
11 210 1596 1632 1934
12 1519 2576 2617 2903
13 2709 3164 3196 3364
14 3395 3066 3076 2972
15 3241 2422 2405 2127
16 2527 1575 1539 1287
17 1610 840 799 638
18 840 364 331 261
19 364 119 100 85
20 119 21 15
21 21
total 246880 227729 226896 223060
average 14.69 13.55 13.50 13.28

Knuth

turns Knuth 1 Knuth 2 Knuth 3 Knuth 4 Knuth 5 Knuth 6
1 1 1 1 1 1 1
2 19 17 29 29 17 19
3 260 420 463 491 387 423
4 1337 4053 4656 6149 6429 7324
5 6975 8935 8740 9534 9839 8980
6 5812 3208 2752 597 134 60
7 2052 171 166 6
8 336 2
9 13
10 2
total 93103 82643 81446 77422 76911 75864
average 5.54 4.92 4.85 4.61 4.58 4.51

Breadth

turns Breadth 1 Breadth 2 Breadth 3
1 1 1 1
2 33 41 40
3 653 838 846
4 7215 8608 8770
5 8288 7140 6977
6 607 179 173
7 10
total 76038 73803 73622
average 4.52 4.39 4.38

Tuesday, June 18, 2013

We're back!

The TAEB development team is excited to announce that we are back and developing TAEB again!

TAEB site's is now http://taeb.github.io. We're also providing the Interhack site at http://taeb.github.io/interhack. Finally, the canonical repositories of all our NetHack projects are are hosted in the TAEB organization.

If you had a fork of TAEB or Interhack (or NetHack::Item or or or...) we'd love for you to try to get your patches back into the mainline branch. Please create an issue or better yet, send us a pull request.


In terms of improvements to TAEB we don't have much to announce yet. We're busy modernizing TAEB to work with the four years of both NAO patches and changes to Perl modules since it was last run regularly. One brand new feature is that TAEB will now use the _ travel command (with exponential backoff in case he gets stuck – one oscillation problem solved!). There's of course plenty more to come!