The Tactical Amulet Extraction Bot

Thursday, June 27, 2013

Solving NetHack's Mastermind

Note: This article was originally written in 2006. It is published here for posterity.

NetHack's Mastermind is very similar to the real Mastermind, with three changes:

  • Instead of five colors, there are seven notes (A, B, C, D, E, F, and G).
  • There are five positions, not four.
  • You can give partial solutions (such as AB even though the solution is five notes long).

A gear represents a note in the correct position. A tumbler represents a note in an incorrect position. So, for example, if you're trying to find the tune FFAAB and you guess AFAFB, you'll get three gears and two tumblers. If you guess CA you'll get one tumbler. If you guess BBBBB you'll get one gear. If you guess DDDDD you'll get silence (in other words, no gears and no tumblers).

This document describes my attempts at optimizing a NetHack Mastermind solver (with help from Sean Kelly and papers from Don Knuth and Barteld Kooi). The first, naïve algorithm finds the solution on average within 15 turns, possibly taking up to 21. The last, most effective algorithm finds the solution on average within 4 turns, possibly taking up to only 6. This is an important result because in NetHack, the level where you play Mastermind is one of the most dangerous, so you have a strong incentive (your own survival) to find the solution as quickly as possible.


Algorithm: Naïve 1

The first tune we always test is A. If that's a gear, then we're done with the first position. We don't know if any more As are in the tune, so our next stab is AA. Back to the first play though: if A produces a tumbler, then we skip it for now and move on to B. If we get silence, then we know A is not in the tune at all, so we remove it from any further consideration. Repeat this until we have found the tune.

This basic algorithm is straightforward and works well enough. This is roughly what I do when I am playing NetHack, since it can be done without assistance.

So let's use this algorithm to try to find a particular tune:

  1. A: one tumbler (we know there's an A but not in the first position, so skip A for a while)
  2. B: one tumbler (ditto. skip B for a while)
  3. C: one gear (great! first position is C. Try C again)
  4. CC: one gear (now we know there are no more Cs in this tune)
  5. CD: two gears (second position is D. try D again)
  6. CDD: two gears (now we know there are no more Ds in this tune)
  7. CDE: two gears (now we know there are no Es in this tune)
  8. CDF: three gears (third position is F. try F again)
  9. CDFF: three gears (so we know there are no more Fs in this tune)
  10. CDFG: three gears (so we know there are no Gs in this tune)
  11. CDFA: three gears, one tumbler (we know there's an A but not in the fourth position, so skip A for a while)
  12. CDFB: four gears (fourth position is B. try B again)
  13. CDFBB: four gears (so we know there are no more Bs in this tune)
  14. CDFBA: five gears (nailed it!)

This first algorithm looks like this in pseudocode:

  notes = A, B, C, D, E, F, G
  tune  = ""

  while tune.length < 5
    try tune + notes[0]

    if tumblers
      # move this note to the end of the note list
      # we're guaranteed to see the correct note before we see this one again
      note = notes.delete_at[0]
      notes.push(note)
    elsif gears > tune.length
      # correct note
      tune += notes[0]
    else
      notes.delete_at[0]
  end

Here are its results:

  • Average turns per tune: 14.69 (246880 turns to solve all tunes / 16807 tunes)

Algorithm: Naïve 2

The first optimization we can make is that, if at any time we have exactly one possible note left, the rest of the tune most consist solely of that note.

  notes = A, B, C, D, E, F, G
  tune  = ""

  while tune.length < 5
    if notes.size == 1
      tune += notes[0] while tune.length < 5
      last
    end

    try tune + notes[0]

    if any tumblers
      # move this note to the end of the note list
      # we're guaranteed to see the correct note before we see this one again
      notes.push(notes.delete_at[0])
    elsif gears > tune.length
      # correct note
      tune += notes[0]
    else
      notes.delete_at[0]
  end
  • Average turns per tune: 13.55 (227729 turns / 16807 tunes)
  • Average turn savings: 1.14

Algorithm: Naïve 3

The next savings (which will be minor compared to the previous one) is if we've seen five distinct notes, we can rule out any notes not among those five:

  notes = A, B, C, D, E, F, G
  tune  = ""
  seen  = nil

  while tune.length < 5
    if seen.size == 5
      foreach element in notes
        delete it if it's not in seen
      end
    end

    if notes.size == 1
      tune += notes[0] while tune.length < 5
      last
    end

    try tune + notes[0]

    if any tumblers
      seen.add(notes[0]) unless seen.contains(notes[0])
      # move this note to the end of the note list
      # we're guaranteed to see the correct note before we see this one again
      notes.push(notes.delete_at[0])
    elsif gears > tune.length
      # correct note
      tune += notes[0]
      seen.add(notes[0]) unless seen.contains(notes[0])
    else
      notes.delete_at[0]
  end
  • Average turns per tune: 13.50 (226896 turns / 16807 tunes)
  • Average turn savings: 0.05

Algorithm: Naïve 4

So much for the obvious optimizations. Let's try tweaking some details of the algorithm a bit just to see if it helps or hurts.

You know how when we get a tumbler we move the first note to the last position? Let's try doing that when we get a gear, too. Turns out this saves us almost .25 turns.

  notes = A, B, C, D, E, F, G
  tune  = ""
  seen  = nil

  while tune.length < 5
    if seen.size == 5
      foreach element in notes
        delete it if it's not in seen
      end
    end

    if notes.size == 1
      tune += notes[0] while tune.length < 5
      last
    end

    try tune + notes[0]

    if any tumblers
      seen.add(notes[0]) unless seen.contains(notes[0])
      # move this note to the end of the note list
      # we're guaranteed to see the correct note before we see this one again
      notes.push(notes.delete_at[0])
    elsif gears > tune.length
      # correct note
      tune += notes[0]
      seen.add(notes[0]) unless seen.contains(notes[0])
      notes.push(notes.delete_at[0])
    else
      notes.delete_at[0]
  end
  • Average turns per tune: 13.27 (223060 turns / 16807 tunes)
  • Average turn savings: 0.23

Algorithm: Knuth 1

Due to arcanehl's constant (yet ever friendly) prodding, I picked this code up again. He started implementing Knuth's Mastermind algorithm, which, while slower, is much more effective. The average turns to solve a tune goes down from 13 to just 5!

It's a radical change from the algorithm used above, so allow me to explain. The basic idea is after a guess, we eliminate any possibilities that would not produce the same score. For example, if we try AAABB and get 2 gears, 0 tumblers, then we can eliminate any possibility that does not produce 2 gears and 0 tumblers against AAABB (such as CCCCC which would produce 0 gears and 0 tumblers). The initial guess is hardcoded to be AAABB though it could be something entirely different. The next guess is (currently) taken arbitrarily; we use the first possibility.

I changed from Perl to C during testing for speed: arcanehl's Ruby implementation of this algorithm takes hours, mine takes less than twenty-five seconds. Of course, these are the results of timing every possible tune (there are 16807 of them). Going through the algorithm for just one tune will be fast enough at a hundredth the speed.

    possibilities = ... # each possible tune
    while possibilities.size > 1
    if possibilities.size == 16807  # first guess
        guess = "AAABB"
    else
        guess = possibilities[0]
    end

    real_score = try guess

    possibilities.delete_if {|possibility| score(possibility, guess) != real_score}
    end
  • Average turns per tune: 5.5 (93103 turns / 16807 tunes)
  • Average turn savings: 7.7

Algorithm: Knuth 2

About the only possible improvement we can make is to generate a better guess than always using the first possible tune. I figured the median tune (or close enough to it) would be better than the first, and I was right, saving over half a turn.

    possibilities = ... # each possible tune
    while possibilities.size > 1
    if possibilities.size == 16807  # first guess
        guess = "AAABB"
    else
        guess = possibilities[possibilities.size/2]
    end

    real_score = try guess

    possibilities.delete_if {|possibility| score(possibility, guess) != real_score}
    end
  • Average turns per tune: 4.92 (82643 turns / 16807 tunes)
  • Average turn savings: 0.62

Algorithm: Knuth 3

Mysteriously, we find that the best initial tune is AABBC, not AAABB. (that's called foreshadowing)

This is the algorithm that is used in Rodney3. To save memory, all he remembers for each player is the results of each tune. Naturally there's a fair amount of reprocessing, but a 16807-element array would be kind of large to save for each person (and I'm afraid of Perl bitfields).

    possibilities = ... # each possible tune
    while possibilities.size > 1
    if possibilities.size == 16807  # first guess
        guess = "AABBC"
    else
        guess = possibilities[possibilities.size/2]
    end

    real_score = try guess

    possibilities.delete_if {|possibility| score(possibility, guess) != real_score}
    end
  • Average turns per tune: 4.85 (81446 turns / 16807 tunes)
  • Average turn savings: 0.07

Algorithm: Knuth 4

Enough goofing around. Time to implement the rest of Knuth's algorithm: at each step, guess the possibility that would eliminate the most possibilities in the worst case. This means that no matter what the response is for the guess, we'll eliminate a maximum number of possibilities. The way I implement it has time complexity O(n^2) where n is the number of tunes left; I don't know if there's a more efficient solution. For each guess i we score each possible tune j. The worst case for i is equal to the largest number of remaining tunes after playing it (given all possible responses, such as three tumblers and one gear). Then we find the smallest worst case (ie, smallest remaining number of tunes) over all i and we play it!

This takes hours even in my somewhat tuned C implementation. Using the median element as above may be more appropriate if you're playing all 16807 tunes, since this extra processing is probably not worth the turn savings. Otherwise, if you're just finding one tune as in an actual game of NetHack, you'll of course want to use the best algorithm available.

Note: we found the best first tune (AABBC) by running this code without the hardcoded first guess. The code obviously produces the same results without the hardcoded first guess, but there's no reason to do all that processing (which is a lot for all 16807 tunes) when we know the first best guess is always AABBC.

    possibilities = ... # each possible tune
    while possibilities.size > 1
    if possibilities.size == 16807  # first guess
        guess = "AABBC"
    else
          best_worst_case = possibilities.size + 100
        foreach i in possibilities
        remaining = []

        foreach j in possibilities
            score = score(j, i)
            remaining[score] += 1
        end

        worst_case = remaining.max
        if worst_case < best_worst_case
            best_worst_case = worst_case
            guess = i
        end
        end
    end

    real_score = try guess

    possibilities.delete_if {|possibility| score(possibility, guess) != real_score}
    end
  • Average turns per tune: 4.61 (77422 turns / 16807 tunes)
  • Average turn savings: 0.24

Algorithm: Breadth 1

I was talking in #nethack and someone brought up Mastermind. So that revived this whole thing. I was idly searching on the arXiv for Mastermind papers, and only found the one that proves Mastermind is NP-complete, which while good to know, doesn't directly help me improve my solving algorithms. However, I then remembered Google Scholar and searched for Knuth's paper. I couldn't find it, but it did return a few dozen papers that referenced Knuth's; including one by a Mr. Barteld Kooi. The paper, "Yet Another Mastermind Strategy," is available here. The gist of the algorithm is that instead of taking the best worst case scenario (as we do in Knuth's algorithm), take the tune that would produce the largest number of unique responses. So a tune that can return six unique responses is valued over a tune that can only return three unique responses. In the author's words, "[In this strategy, o]ne only looks at the 'breadth' of a partition. On the other side of the spectrum one finds Knuth's worst case strategy, which only looks at the 'depth' of a partition." This next algorithm implements exactly that; looking at only the breadth.

Also, I've rerun the algorithm without the hardcoded first guess, but it guessed AABBC anyway, which is a convenient result.

    possibilities = ... # each possible tune
    while possibilities.size > 1
    if possibilities.size == 16807  # first guess
        guess = "AABBC"
    else
        best = 0
        foreach i in possibilities
        responses = []

        foreach j in possibilities
            score = score(j, i)
            responses[score] += 1
        end

        cnt = 0
        foreach response in responses
            if response > 0
            cnt += 1
            end
        end

        if cnt > best
            best = cnt
            guess = i
        end
    end

    real_score = try guess

    possibilities.delete_if {|possibility| score(possibility, guess) != real_score}
    end
  • Average turns per tune: 4.5 (76038 turns / 16807 tunes)
  • Average turn savings: 0.08

Algorithm: Breadth 2

A consistent guess is one that has a chance of being correct; that is, it's consistent with the tunes we've played so far. When figuring out the next tune to play, we discard any inconsistent guesses. Well, let's see what happens when we keep them!

    possibilities = all_tunes = ... # each possible tune
    while possibilities.size > 1
    if possibilities.size == 16807  # first guess
        guess = "AABBC"
    else
        best = 0
        foreach i in all_tunes
        responses = []

        foreach j in possibilities
            score = score(j, i)
            responses[score] += 1
        end

        cnt = 0
        foreach response in responses
            if response > 0
            cnt += 1
            end
        end

        if cnt > best
            best = cnt
            guess = i
        end
        end
    end

    real_score = try guess

    possibilities.delete_if {|possibility| score(possibility, guess) != real_score}
    end
  • Average turns per tune: 4.4 (73803 turns / 16807 tunes)
  • Average turn savings: 0.13

Algorithm: Knuth 5

Let's apply the same change to Knuth's algorithm, just in case it ends up being better than Breadth 2. Turns out it isn't. The "average turn savings" below is in relation to the Knuth 4 algorithm, not Breadth 2.

    possibilities = all_tunes = ... # each possible tune
    while possibilities.size > 1
    if possibilities.size == 16807  # first guess
        guess = "AABBC"
    else
        best_worst_case = possibilities.size + 100
        foreach i in all_tunes
        remaining = []

        foreach j in possibilities
            score = score(j, i)
            remaining[score] += 1
        end

        worst_case = remaining.max
        if worst_case < best_worst_case
            best_worst_case = worst_case
            guess = i
        end
        end
    end

    real_score = try guess

    possibilities.delete_if {|possibility| score(possibility, guess) != real_score}
    end
  • Average turns per tune: 4.58 (76911 turns / 16807 tunes)
  • Average turn savings: 0.03

Algorithm: Breadth 3

Let's prefer to guess consistently over inconsistently when able. As of Breadth 2 we take the first tune that produces the maximum partition set size, even if that guess is inconsistent. If two tunes produce the same (maximum) partition set size, and one is consistent and the other is not, we should definitely pick the consistent one, because we might stumble upon the answer. One possible area of improvement is listing every tune that produces the maximum partition set size and picking the median one. Also, say an inconsistent tune produces a partition set size of 9 while a consistent guess produces a partition set size of 8. Should we play the consistent one?

    possibilities = all_tunes = ... # each possible tune
    while possibilities.size > 1
    if possibilities.size == 16807  # first guess
        guess = "AABBC"
    else
        best = 0
        best_is_consistent = false

        foreach i in all_tunes
        responses = []

        foreach j in possibilities
            score = score(j, i)
            responses[score] += 1
        end

        cnt = 0
        foreach response in responses
            if response > 0
            cnt += 1
            end
        end

        if cnt > best or (!best_is_consistent and i_is_consistent and cnt == best)
            best = cnt
            guess = i
            best_is_consistent = i_is_consistent
        end
        end
    end

    real_score = try guess

    possibilities.delete_if {|possibility| score(possibility, guess) != real_score}
    end
  • Average turns per tune: 4.38 (73622 turns / 16807 tunes)
  • Average turn savings: 0.01

Algorithm: Knuth 6

Let's again apply the same change to Knuth's algorithm, just on the off-chance it ends up being better than Breadth 3. Turns out it isn't. The "average turn savings" below is in relation to the Knuth 5 algorithm, not Breadth 3.

    possibilities = all_tunes = ... # each possible tune
    while possibilities.size > 1
    if possibilities.size == 16807  # first guess
        guess = "AABBC"
    else
        best_worst_case = possibilities.size + 100
        best_is_consistent = false

        foreach i in all_tunes
        remaining = []

        foreach j in possibilities
            score = score(j, i)
            remaining[score] += 1
        end

        worst_case = remaining.max
        if worst_case < best_worst_case or (!best_is_consistent and i_is_consistent and worst_case == best_worst_case)
            best_worst_case = worst_case
            guess = i
            best_is_consistent = i_is_consistent
        end
        end
    end

    real_score = try guess

    possibilities.delete_if {|possibility| score(possibility, guess) != real_score}
    end
  • Average tunes per turn: 4.51 (75864 turns / 16807 tunes)
  • Average turn savings: 0.06

Results

Here are the exact results of each algorithm.

  1. Naïve 1: Guess notes in turn, remove note if silence, keep track of tune so far
  2. Naïve 2: If one note left, fill rest of tune with it
  3. Naïve 3: If five notes seen, eliminate others
  4. Naïve 4: Rotate notes on gear
  5. Knuth 1: Eliminate impossible tunes, guess first possibility
  6. Knuth 2: Guess median possibility, not first
  7. Knuth 3: Start with AABBC not AAABB
  8. Knuth 4: Greedily guess best possibility
  9. Knuth 5: Allow inconsistent guesses
  10. Knuth 6: Prefer consistent guesses
  11. Breadth 1: Guess tune that would give the most unique answers
  12. Breadth 2: Allow inconsistent guesses
  13. Breadth 3: Prefer consistent guesses

Naïve

turns Naïve 1 Naïve 2 Naïve 3 Naïve 4
1
2
3
4
5 1 1 1 1
6 5 6 6 7
7 15 21 21 28
8 35 77 77 85
9 70 238 242 275
10 126 721 750 825
11 210 1596 1632 1934
12 1519 2576 2617 2903
13 2709 3164 3196 3364
14 3395 3066 3076 2972
15 3241 2422 2405 2127
16 2527 1575 1539 1287
17 1610 840 799 638
18 840 364 331 261
19 364 119 100 85
20 119 21 15
21 21
total 246880 227729 226896 223060
average 14.69 13.55 13.50 13.28

Knuth

turns Knuth 1 Knuth 2 Knuth 3 Knuth 4 Knuth 5 Knuth 6
1 1 1 1 1 1 1
2 19 17 29 29 17 19
3 260 420 463 491 387 423
4 1337 4053 4656 6149 6429 7324
5 6975 8935 8740 9534 9839 8980
6 5812 3208 2752 597 134 60
7 2052 171 166 6
8 336 2
9 13
10 2
total 93103 82643 81446 77422 76911 75864
average 5.54 4.92 4.85 4.61 4.58 4.51

Breadth

turns Breadth 1 Breadth 2 Breadth 3
1 1 1 1
2 33 41 40
3 653 838 846
4 7215 8608 8770
5 8288 7140 6977
6 607 179 173
7 10
total 76038 73803 73622
average 4.52 4.39 4.38

Tuesday, June 18, 2013

We're back!

The TAEB development team is excited to announce that we are back and developing TAEB again!

TAEB site's is now http://taeb.github.io. We're also providing the Interhack site at http://taeb.github.io/interhack. Finally, the canonical repositories of all our NetHack projects are are hosted in the TAEB organization.

If you had a fork of TAEB or Interhack (or NetHack::Item or or or...) we'd love for you to try to get your patches back into the mainline branch. Please create an issue or better yet, send us a pull request.


In terms of improvements to TAEB we don't have much to announce yet. We're busy modernizing TAEB to work with the four years of both NAO patches and changes to Perl modules since it was last run regularly. One brand new feature is that TAEB will now use the _ travel command (with exponential backoff in case he gets stuck – one oscillation problem solved!). There's of course plenty more to come!

Saturday, August 22, 2009

Planar - a TAEB AI

TAEB::AI::Planar is a NetHack AI, written as a plugin for TAEB. (TAEB the framework does not contain AI-specific logic itself; instead, it aims to be a general platform that can be used both by AIs and by other programs that want to interact with NetHack; for instance, it would be plausible to use TAEB to implement an Interhack-like program.) The first TAEB AI, historically, is the one that is now called Behavioral (and which was originally just part of TAEB itself before it was split out); most TAEBs that you will see on nethack.alt.org, for instance, run that AI. Planar is a newer attempt to create an AI, by me, Alex Smith (ais523) and Stefan O’Rear (sorear); at present, it’s used by the bots TAEB523 and sortaeb523 (which run from much the same codebase, but are generally configured differently). Planar’s best score so far is 34179, in the large room at the end of the second level of Sokoban, which was scored locally on my computer; it died walking over a cockatrice corpse when blind and gloveless, the dangers of which it hadn’t been told about.

Planar treats NetHack as a pathfinding exercise; just as a computer would use a pathfinding algorithm to determine the shortest route from one room to another, Planar uses pathfinding algorithms to determine the best ‘route’ from its current situation to its eventual goal. The AI is based around resources, which are things that the AI can spend, can use, or has to be careful not to run out of (for instance, gold or hitpoints), threats, which are things that can be mitigated or fixed, but could cause plans to be more dangerous or which prevent them working correctly if not fixed (for instance, monsters or a trap on the current square), and plans, which are specific goals that it aims for, along with an idea of how to accomplish them.

Here is how Planar goes about deciding what to do on any given turn. (Note that although Planar remembers what it was thinking on previous turns, it recalculates most things every turn to allow for new information, rather than blindly continuing with its previous plan.) Each stage of the calculation is accompanied by a picture showing an actual situation from a game of NetHack that Planar has played (in fact, they’re all from the same game), and the text explains how Planar went about dealing with it.

Threats

Threats

The ‘start’ of Planar’s main loop (that is, when it starts dealing with the current turn, rather than the previous turn), is checking for threats that might make plans more risky. In the picture above, TAEB (the @ sign which appears inverse-video in this screenshot because it has a blinking cursor on it) has decided that it wants to eat the corpse (the % sign) on the ground, probably because it’s relatively hungry and wants to get to the corpse before it rots. Although in theory it could walk directly to the corpse and eat it, there are two potential problems; to the west is a nymph (a lowercase n) who would steal its items if given a chance, and to the east is a golem (an apostrophe) who, being a hostile monster, wants to beat the poor TAEB up. (Incidentally, Planar always asks for more information, if it’s available and doesn’t cost a turn, in order to analyse the situation as well as possible; for instance, if the material the golem was made of (relevant because it determines how dangerous it is) were undeducible from its colour on-screen, it would ask the framework to request NetHack to give more details, in this case by sending a farlook command to examine it remotely in detail.) The two monsters here are threats; the nymph is a threat to the TAEB’s items (which will be a source of resources such as nutrition, damage-potential, and armour), whereas the golem will be a threat to the TAEB’s hitpoints (a resource in themselves). Note that all threats, as well as everything else in Planar, are quantified so that it knows their exact value. (There are two ways to represent the value of things inside Planar; one is as a list of amounts of resources, the other is as a number that attempts to calculate the total value of all the resources in question by using exchange rates. These alter according to the amounts of the resources in question; for instance, normally one hitpoint is considerably more valuable than one unit of nutrition, but that rate would be rather different if the TAEB were healthy but fainting from hunger.) In this case, it would estimate the amount of theft the nymph would likely carry out (and as any NetHack player knows, that’s generally more than you can easily replace); and it would calculate the amount of damage the golem could deal per turn in mêlée combat (taking the worst-case scenario; Planar is just as scared of the Random Number Generator as any human would be).

The threat-check will quickly try to work out how quickly, on average, the monsters could get to each square on the map, and assign a threat function to those squares that indicates how damaging the threats could be if we tried to route to those squares at a particular moment. This is shown in the tactical map above; colder colours represent safer squares, with warmer colours being squares that are more threatening. So, for instance, going to the west past the nymph without dealing with it first is likely to be suicidal, thus the whole area there is filled with magenta (the worst possible colour); likewise, going to the east past the golem without dealing with it first is also a bad idea. The amount of threat that actually matters depends on when we finish business on the square in question; the map above shows the amount of threat (together with the amount of tactical cost, which is insignificant on that diagram) on each square at the first moment we can reach it, but carrying through time-consuming plans on those squares would give the monsters more time to catch up. As a result, the nearest unexplored corridor is a safe blue, as the TAEB could outrun the monsters along it (golems are rather slow, and the TAEB has a head-start over the equally speedy nymph); but the second-nearest unexplored corridor ends in a red square, because although the TAEB could get away from the golem eventually, the golem would get a few hits in as it walked past. So, if it was going exploring, it would use the safer first path.

However, just now the TAEB isn’t exploring; it’s hungry. The corpse shows up as brown on the above map, but that map (which was generated from the tactical planning stage) is just taking into account the length of time the corpse takes to walk to; eating the corpse will also take some time, and the threat function will give a higher threat value for a plan involving eating the corpse once the time taken is known (which will be calculated in the strategic planning stage).

There’s one other important task carried out by threat checking; for each square on the map (to be precise, the current level; as routing for other levels is cached, any threat calculation for them would be ignored anyway), it tags the square with information on how to mitigate the threat or threats that can get there. So for instance, there will be a plan to Mitigate the golem (i.e. kill it, eliminate it, or drive it away somehow) in order to make it safer to eat the corpse. (After it’s killed the golem, it’ll likely see that the nymph is also problematic, and it will try to get rid of her too, probably by throwing sharp pointy objects at her. This is another outcome of the threat check; it’s rather dangerous to attack a nymph in mêlée if you can help it, and the squares further away from her would show a much lower risk as a result, meaning that projectiles would be favoured over melee weapons.)

The bottom of the screen shows the decisions that the other stages of the loop have made; strategic planning has decided that it’s still worth eating the corpse, that getting rid of the golem first is a good idea, and that mêlée is the best way to do it; and tactical planning has decided that the best way to route near the golem is to walk east of it until the square immediately to its west. The chain of plans involved can be seen on the penultimate line of the screen; the final (l) on the screen is added by the framework, which knows that pressing l is the correct way to walk to the east (whereas the AI would just have returned a “walk east” action to the framework).

Tactical planning

Tactics

Once all the threats have been placed on the map, the next step is to work out tactics for the turn. Again, this is done by working out how expensive (in terms of resources) it is to move to any given square on the map; but this time, it’s our own routing we worry about, rather than that of the monsters.

The diagram above shows our TAEB just after it’s killed another monster, again wanting to eat it. This time, the monsters around are less threatening; there’s just an iguana (the nearby colon) to worry about. Although this particular turn came to the same conclusion as the last one analysed (kill the monster so you can eat the corpse in peace), tactical planning does not care about what the TAEB eventually ends up doing, but rather, how best to get to each square on the map.

This is done in terms of tactical plans; a single tactical plan contains a suggestion of how to get from one square to another square, together with information on how risky it would be (taken from the threat map and from information on the cost of the action required). Plans in Planar are very specific; as opposed to Behavioral’s behaviours, which are few in number and general (such as FixStatus and Defend), a plan in Planar contains information about exactly what is to be done (such as ‘move from (12,10) to (13,10) by opening the door on (13,10) then walking there’, which would have a cost in time based on how much time it would likely take to force the door open). It would be entirely common for multiple plans to be considered; for instance, in that example with the door, Planar would also be considering kicking that door down, which could be faster if, say, it were approaching the door from the diagonal (as a broken doorway can be entered diagonally, but one with a door in can’t be).

Tactical planning is a form of routing, in physical NetHack space; however, unlike most other AIs, a large amount of AI logic is done during tactical planning. For instance, Behavioral has a behaviour to open doors so that they don’t block routing later; in Planar, opening doors is only done if they’re in the way. Likewise, Planar will push boulders (trying not to block corridors in the process) and even tunnel through walls if that’s necessary to go where it wants to go, or if that’s the fastest way; the tactical planning stage will have a plan all ready for any strategic plan that needs to move to the square, for whatever reason. The actual routing is done via Dijkstra’s algorithm, in order to gain routing information for the whole map at once.

The diagram above shows routing costs, as determined by tactical planning, to the whole level; as opposed to the previous example, TAEB523 now has a pickaxe, and thus can route into the walls if it wants to. Note that many of the nearby walls are considerably more expensive to route to than the corridors and rooms near them; but the distant walls have comparable costs to the nearby rooms. The reason for this is that digging nearby takes a sufficiently long time that the iguana will catch up to the TAEB and attack; but digging distantly would happen in the future, after the iguana had already been outrun.

At the end of the tactical planning stage, therefore, it’s possible to quickly and efficiently answer any query from the strategic planning stage about how to get somewhere, and how risky doing so would be. (Incidentally, in Planar, ‘risky’ is a technical term meaning ‘difficult, expensive, or both’.) As a result, huge numbers of strategic scenarios can be tried, to determine which one works best.

Strategic planning

Strategy

Strategic planning is the heart of Planar; threat checks and tactical planning are simply calculating information so that strategic planning can refer to it quickly, whereas strategic planning decides what, overall, the bot is actually going to do.

To start with, there are several standard plans that work on improving Planar’s resource situation; for instance, eating when hungry, collecting ammo, and picking up useful items. (An item is useful enough to be picked up if its value to us, say as a source of food or as a backup armour in case the one we’re already carrying around turns out to be cursed, is higher than the drawbacks, such as its weight.) This gradual improvement of the character is known as resource conversion; if the TAEB can manage a net gain in resources, it does so, ignoring the main plan for a while (and going for the largest gains first). Therefore, there’s no need for every major plan to take the details of things like feeding and shopping into account; it’ll happen automatically without a specific request to suppress it. If none of the resource-conversion plans can produce an improvement, though, the main plan takes over; this is a user-configurable overall goal that the bot is trying to accomplish. In the example above, for instance, the main plan is SlowDescent, which explores the dungeon from the top down, thoroughly exploring each level before going onto the next one.

In order to determine how to accomplish the goal in question, strategic planning basically does routing in plan-space, again using Dijkstra’s algorithm. In this case, the steps on the route are plans that make other plans possible. So, for instance, the chain of plans shown in the picture above are SlowDescent | by exploring [this level] | by exploring [a specific square] – by walking towards it; the final plan there is a tactical plan, the rest are strategic. In order to come to this decision, Planar will start by asking SlowDescent if it can be accomplished directly; the answer in this case is always ‘no’, because there is no associated action (it’s a “metaplan”, which is defined in terms of plans rather than in terms of actions). As a result, it’ll look for plans that make up parts of the slow descent, or make it easier to accomplish; in this case, the plans in question will be exploring level 1, exploring level 2, exploring level 3, and exploring level 4, in that preference order. (There’s no problem with more than one plan having the same preference; in that case, Planar would simply take the cheapest, leading to a “nearest first” exploration of the dungeon. However, SlowDescent is specifically trying to explore from the top downwards.)

Each of the level exploration plans are likewise metaplans, which suggest exploring particular locations on the level in question. Although Planar would like to explore on the first or second floors, if possible, those floors were apparently explored out at the time; so as the next-best option, it would consider the exploration spots on the third floor. (The picture above is showing the cache of exploration spots on the level, cached for speed; warm colours show squares in need of exploration, cool colours showing squares that have already been explored.) This time, all the exploration spots have the same preference level; so the least risky exploration is taken, which in this case, with no visible threats, is the nearest. (The tactical planning stage will already have the risk of reaching each of the exploration spots, as well as everywhere else in the dungeon, ready-calculated so that it can be given very quickly when requested; so despite the large number of exploration spots shown, strategic planning will not take much time.)

Before enacting the plan in question, various checks are done. Plans are skipped if they require more of any resource than is available to spend; so, for instance, exploring one square is normally fine, but if that square is next to a demon prince and we’re low on hitpoints, it will be considered far too risky in terms of hitpoints to attempt. (A different plan would be favoured, therefore; probably running away to carry out business in another part of the dungeon.) Likewise, something can be too risky in terms of anything else. The alternative plans suggested by threat check will also be considered, to see if eliminating the threat first and then carrying out the plan will be a better idea than just carrying out the plan directly. A plan can be suppressed by the previous turn’s plan, to avoid losing sight of a slightly-longer-term goal in favour of a short-term goal (so, for instance, it will unequip a shield in order to wield a two-handed digging tool, even though normally improving its armour would be more urgent than digging through a wall). Finally, a plan might not be enacted due to success measurement deciding that it is doomed to fail, or create an oscillation, based on observations rather than on known information.

Success measurement

Success

Once Planar has decided on a plan, it will find out what action is associated with it, and tell the framework that that’s the desired action for the turn. After the framework has processed the results of the action and asked for the next one, though, Planar does after-the-fact processing on the current turn before moving on to consider the next turn’s plan. In particular, it’s trying to determine whether the plan worked as expected or not, and whether it’s stuck in an oscillation.

Each plan comes with a test to determine whether or not it accomplished what it set out to do. For instance, a plan to open a door will check to see if the door is now open; if it isn’t, it will check to see if it was for a known reason (doors in NetHack sometimes randomly fail to open, which is not a problem), or if it was for an unknown reason (for instance, Planar doesn’t yet know it can’t open a door if it’s been forced into jackal form by a werejackal, but it can determine that it tried to open a door, it didn’t work, and it doesn’t know why). If a plan fails unexpectedly, it won’t be retried for some time, which increases sharply the more times the plan in question has failed in a row. (There’s an exception to this; if something happens to make Planar think that the situation has changed, for instance if a tile relevant to the plan in question changes glyph or one of the plans marked as part of or an alternative to the plan succeeds, it will try again instantly.) This happens for both strategic and tactical plans; as always, failure will simply cause it to seek alternatives. In this way, Planar can get around deficiencies in either its own, or the framework’s, understanding of NetHack; this ability to determine what works by experiment will helpfully aid it in handling unexpected situations (whereas Behavioral normally responds to such situations with an infinite loop as it tries a doomed action over and over again).

More subtly, a plan can seem to be working correctly, but leave TAEB stuck in an infinite loop; this is the dreaded “oscillation” that’s so common in NetHack-playing AIs. The picture above shows TAEB near a throne room (full of letters representing monsters); it looks like a good place to explore from afar, but upon approaching more closely, it’s possible to see that it’s full of monsters. Planar tends to be cautious around monsters, especially when there are alternative plans available; so it would walk away from the throne room, forget the location of the monsters (neither the TAEB framework nor Planar currently has any way to track the locations of monsters that are out-of-sight), walk towards it again, and repeat. The repeated indecision in which plan to accomplish is detected in success measurement; and as a result, the success measurement deliberately makes up its mind to try one plan or the other, by temporarily interfering with both the strategic planning and tactical planning stages to help force a particular decision. In the map above, therefore, most of the routing map is unexpectedly grey, rather than the usual blue; this is because success measurement has detected an oscillation and decided to allow only plans that move towards the throne room for a few turns (by causing routing to fail when moving away from it). As a result, the TAEB has decided to look at the items on the floor in the room instead; and has concluded that the safest way to do that is to slaughter the inhabitants of the throne room, picking them off at range using projectiles. (The messages at the top show that the previous turn, it threw something at a monster and killed it; it wants to do that again, but needs to reposition itself first.)

Friday, June 12, 2009

Synchronizing with NetHack

One of the nastiest problems TAEB has to solve is also one of the most mundane. If you've ever played a game over a network or on a heavily loaded computer, you are probably familiar with lag and the problems it creates. While NetHack, as a turn-based game, does not suffer from directly lag-caused deaths, it is still very difficult to make moves sensibly when the responses are delayed.

Bots like TAEB also suffer from lag, indeed much worse than people do. Twenty milliseconds is quite unnoticeable for a human but is an eternity for a bot, and even at that timescale issues of finding the end of a turn are normal. Furthermore, bots are much less able to recover from partial updates than people are, due to the lack of sensible error-recovery.

Presumably, the player just exited a full-screen menu or changed dungeon level. A full screen redraw for NetHack is larger than the MTU of most networks, and due to lag, the second packet has been delayed significantly. (This doesn't require weird conditions; TCP normally waits until data has been successfully received before sending more.) As a result, the lower half of the current frame is black. TAEB, on seeing this frame, makes a couple of damaging inferences:

  • The lower half of the level has turned into solid rock. This is an especially nasty case if the redraw was caused by stairs, as TAEB interprets large areas of new stone to indicate a brand new unexplored level, corrupting the dungeon graph.
  • The bottom line is all spaces and unparsable, resulting in error spam and no stats.

Clearly, bots need a reliable way to tell when NetHack is done sending output for the turn. There have been several approaches to this problem over the years.

Consistently slow

One of the oldest and simplest approaches is simply to sleep for one second or so after every move, to wait for NetHack. This works, most of the time. It has two major flaws; first it is very slow, second it is unreliable - if lag spikes above 1 second, the bot may see an incomplete frame. Nevertheless, it is very simple to implement and has been used in several efforts.

Avoiding the issue

Another approach which has been historically used, most famously by the Angband Borg and Rog-o-matic, is to link the bot into the game's input routines. In theory, this is foolproof, but it carries dangers of a different sort, in addition to the obvious restriction to local games.

If done incautiously, the linking can affect game mechanics or leak information, thus compromising the validity of the bot effort. Modifying the game requires a very in-depth knowledge of NetHack's implementation, which is very unclean and non-transparent in places. Furthermore, requiring the bot to be linked into NetHack greatly complicates the build process; the bot cannot be built or distributed as a self-contained object unless all of NetHack is duplicated, prohibitive effort for many starting projects. As such, the linking approach is very rarely used.

We want TAEB to play in tournaments on public servers, which precludes linking.

Inline pings

One approach which is often proposed, but sadly does not work, is the inline ping. Simply send an invalid command after every real command, and wait for the error. Unfortunately, there is no command which is invalid in every context in NetHack. Not even control-r to redraw the screen; it fails in prompts and menus.

Telnet ping/pong

Invented by Sartak for NetHack fairly early on, and the most common method in use today (in particular, TAEB uses this), this relies on telnet as a side channel. A bot connects to a telnet server using a raw TCP socket, and implements the protocol itself; after every command is sent, the bot attempts to negotiate an invalid telnet stream option (DO 99).

When a telnet daemon receives the packet containing the command, it sees an option it does not support and responds in failure (WONT 99), then passes the command off to NetHack. Shortly afterward, NetHack's response is responded to the bot. Thus, the bot receives NetHack's response very shortly after the "pong". In effect, the telnet options are used as a side channel to dynamically measure network lag. Since under normal conditions network lag is the dominant component of total lag, this gives the bot a very good idea of when the output should be expected.

The reliability of this approach is improved by kernel packet coalescing, also known as Nagle's algorithm. When the pong is sent, the network implementation at the server waits a short time to potentially send a larger packet; if packet sending is low-priority, this will additionally tend to be delayed until NetHack is done calculating. With the pong and the response generally in the same packet, they are guaranteed to arrive at the bot at exactly the same time.

While good enough for bots, this approach is not quite foolproof. If NetHack takes a long time to reply to a command, due to system load, slow execution (there are several quadratic algorithms in the game, most notably monster movement and item stack sorting), or simply being swapped out, it is possible for the kernel to time out the attempt to coalesce, and the pong will arrive significantly before the output. This has been the source of several unreproducible errors in TAEB.

This approach is also limited to telnet servers. It does not help at all with lag as it occurs on heavily-loaded local systems, and does not work well with other network protocols. In principle SSH could be used, however the vastly higher complexity of the SSH protocol makes a custom implementation prohibitively difficult.

Expectations and Confusion

Will Noble (futilius)'s bridey works on an entirely different principle from the other bots. Instead of relying on out-of-band data to find complete frames, bridey is more like a human in that it waits for a frame that looks complete. The code which generates actions also provides a predicate for a sensible-looking result; fully filled bottom line, cursor on the character, etc. When an action is run, bridey waits for a frame that satisfies the predicate. To guard against programming oversights, the predicates are made partially dependant on wait time, and become more lenient as more data has had a chance to arrive.

Actually, bridey combines this with other mechanisms above; the time, as it is used in predicates, is measured in telnet ping/pong cycles. For instance, the move action has a final timeout of 4 cycles. This allows the reliability of the expect system to partially stack on the reliability of ping/pong. bridey, like TAEB, uses simple sleeps on the local interface.

Abusing POSIX Job Control for Fun and Profit

In 2008 I developed a novel method, which is currently being adopted by TAEB for the local interface. NetHack is single threaded and makes very little use of asynchronous calls, so when it is blocked on tty input it cannot possibly generate output until more keys arrive. So, if we could somehow find out when NetHack blocks, we would solve the synchronization problem permanently - unlike the methods above, this is immune to system load.

On the surface, finding out when a process blocks would seem to require low-level, privileged, and generally highly nonportable code. However, we are not the first user of the information. It is possible in UNIX to move processes between the foreground and background at will; if it were possible for background processes to read keyboard input, chaos would result. The traditional solution is for the kernel to stop any process when it attempts to read in the background. Stops, like deaths, result in wait() events; this is necessary in order that the shell may continue them.

With this insight, the solution seems fairly obvious - simply put NetHack in the background, then when it stops, let it read. There are a few complications in this, though. First, it's not possible to simply shovel data into a stopped reader; it has to be briefly continued in the foreground. How long is briefly? Depends on... system load. Fortunately, there is a trick we can use to eliminate inaccuracy - check for data on the virtual keyboard, with ourselves in the foreground. If we see anything, the slave needs to read more, and we re-continue it, using exponential backoff. Of course, data typed is not always immediately available due to line-buffering and various forms of special-key processing - but that applies to us too, so we still get the right answer.

Even with that settled, POSIX only allows a process to use job control if it lives inside the pseudo-terminal. That means, we need to fork a slave in order to do the actual manipulation, and communicate using pipes.

This is very unusual activity terminal driver, and it has exposed two bugs in the BSD kernel so far. First, in all known versions of BSD, a process which reads in the background does not actually receive the stop signal until it returns to user mode; but it returns to user mode only once the wait for terminal data completes! In regular usage, this is masked, because when you type bg all waiting threads are woken, but it was a killer for us; the eventual workaround I found was to briefly change the terminal blocking mode, thus forcing the kernel to reconsider all active blocks. Finding this bug, and the workaround for it, took an annoyingly long amount of time.

Another bug the project found was that the BSD kernel does a one-second sleep whenever a process that is already in the background tries to read. The reasons for this are quite unclear, but it has the effect of killing performance of this approach. Fortunately, recent versions of Darwin (OS 10.5.7 is known to work) have a rewritten sleep system, and this call is no more.

A comparatively minor issue was found in Linux; attempting to read from an idle psuedoterminal master fails with a "No such device" error, instead of the EOF return that would be expected of devices with pipe semantics. I'm not sure if this is a bug, but it did require workaround code.

This method is implemented in the IO::Pty::HalfDuplex perl module.

Variations on the theme

Job control abuse works pretty well on Linux and Darwin, but what if you're on a different platform? Well, there's always low-level, privileged, and generally non-portable code. As seems to often be the case with hacks like this, it usually takes the form of a debugger API abuse; set breakpoints on the functions that can read from the terminal, and arrange to capture output somehow. These approaches share a common liability; debugger interfaces are highly security-sensitive and are often in some way disabled.

  • On most UNIXes (including the BSDs) there is a system call ptrace which fits the bill perfectly. Unfortunately, while the style of ptrace is virtually identical everywhere, no two UNIXes implement the call in exactly the same way. ptrace implements security by refusing to trace processes of other users and denying user and group changes while active; this is a real bother for us, as NetHack is normally installed setgid - it escalates its group to one that can write to the saves and scoreboard.
  • Windows has a debugger interface which is well documented and consistent across versions. Nice? Well, Windows also has, last time I counted, 30 different documented ways to read from the console. Writing the correct traps to catch them all would be interesting, to say the least.
  • Ironically, the best chances seem to exist on DOS. Programs are debugged by directly hooking the operating system, allowing for a fully consistent interface, and there are very few paths for keyboard reading. Unfortunately, CPAN on DOS is non-functional, and installing any large Perl system without it is extremely tedious.
  • Departing from the trend of debuggers, it is also possible on the BSDs to simply read the process status. Compared to the ptrace method, this has the advantage of working on segid processes, but requires polling.

I plan to implement most of these in IO::Pty::HalfDuplex.

Conclusion

In the field of bot writing, even the simplest things can prove almost insurmountably complex. This is very evident in dealing with lag, both network and loading. Lag-related problems are a major cause of unreproducable faults in even the most sophisticated NetHack bots. Many of the simplest approaches to controlling lag are deeply flawed and will make debugging difficult. Let this be a lesson to everyone who wants to start from scratch. Fortunately, you don't have to; TAEB is designed to be a framework, to support additional AIs easily.

Sunday, June 7, 2009

Anatomy of a Step

Several people have asked to see more articles about TAEB's architecture. Even after working with the codebase for a year and a half, I am still very happy with the design. My previous attempts at writing a NetHack bot lasted only a month or two before collapsing under their limitations. However, I think we finally got it right with TAEB, so I enjoy sharing what we have created.

To introduce the important components of TAEB, I am going to walk through TAEB's "main loop". We call each iteration of the main loop a "step". Each step vaguely resembles iterations of NetHack's own main loop. Note that this is completely unrelated to NetHack's turn counter. Due to the speed system, command repetition ("search for twenty turns"), and even paralysis, steps do not correspond with turns. The only similarity is that both counters increase monotonically. Today, we care only about steps, not turns.

Input

The first thing TAEB does is read input from NetHack. This can involve reading from a socket (as in TAEB::Interface::Telnet) or reading from a pseudo-terminal (as in TAEB::Interface::Local). I will eventually write a post about the mechanics of simply deciding when NetHack is done sending data. It is not at all trivial. (update: here's that post)

NetHack prints a stream of characters. Since it provides a full text user interface with two-dimensional maps and colors, NetHack prints escape sequences. These escape sequences encode commands such as "change the pen color to red" (\e[31m) or "go to cell (5, 12)" (\e[12;5H). We run all input through Term::VT102 to parse and handle these escape sequences. Term::VT102 then lets us ask high-level questions like "what is the text of row 23?" or "what color is cell (59, 6)"? We have a subclass TAEB::VT that lets us ask additional questions, such as "is the text 'Hit return to continue:' present on the terminal?"

We now have something resembling what the player would see (a two-dimensional, colored block of text). The next task is to understand what is on the screen. Consider the following NetHack terminal:

As a human, you can figure out a lot of what is going on, even if you have never played NetHack. You can see that the player is in combat with a goblin. From the bottom lines, you can guess that the player is stuffed with food, and probably the character's name. If you have played NetHack, you can recognize several more things. The character is a wizard (evidenced by the Evoker title). There is a general store in the top left room. The character is currently on the bottom of the screen, with a goblin just north of him. You can identify the seven rooms of the level, and the features of the dungeon (like stairs, a fountain, and many doors).

Such analysis is the job of two components, TAEB::ScreenScraper and TAEB::World::Cartographer. The Cartographer analyzes the map to populate internal level and tile information, while the ScreenScraper handles other input (mostly English text). The ScreenScraper runs first, parsing messages to publish announcements. The rest of the system does not have to worry about the content of messages, each component just listens for particular high-level events.

Thankfully, English text appears in fixed places on the screen; TAEB never has to guess which text is prose and which represents the map. Most of the time, text only appears on the top and bottom lines. However, menus can appear in more places on the screen.

Here we have an "identify" menu. The ScreenScraper can detect that there is a menu by looking at the text preceding the cursor. If it is "(end)" or "(# of #)" then TAEB knows a menu is on the screen.

Menus are interesting because they demand input from the rest of the system. The ScreenScraper does not know what items should be identified — it has to ask the AI component. Since there can be many menus in a particular step (such as identifying several items, one at a time), the ScreenScraper has its own loop for parsing different kinds of input. The various kinds of input include menus, prompts ("Eat what?"), location requests ("To what position do you want to be teleported?"), and ordinary top-line messages.

The ScreenScraper and NetHack may communicate several times before the next step is started. What is vital is that the ScreenScraper leaves NetHack in "action mode". There are no unresolved prompts, menus, --More-- messages, etc. This means that the Cartographer can assume that the cursor is on TAEB. This invariant makes TAEB cope perfectly with having a different character glyph due to polymorph or invisibility. Other bots (such as "moomaster") have struggled with this, since they guessed that the white @ on the screen was the character.

Since the ScreenScraper leaves NetHack in action mode, the Cartographer knows that every character between lines 2 and 22 inclusive are the map. If there were a menu on screen, TAEB would get massively confused, thinking there are suddenly hundreds of monsters floating in a void on the right side of the screen.

The Cartographer looks at each cell on the virtual terminal to update each tile in TAEB's internal map. It uses cell glyph and color to determine what is on each tile. For example, a gray { is a sink.1 The sink is mapped to a TAEB::World::Tile::Sink object. This sink object of course knows things every tile knows, such as how many times it has been stepped on by the character, what items are on the tile, and what engraving is on the tile. In addition, sinks know whether they have produced a black pudding, ring, or foocubus through kicking.

One curiosity is that the map is updated after we publish messages from the ScreenScraper. If TAEB moves and receives the message "You see here an opal ring", the ScreenScraper sees that and announces that there is an opal ring on the current tile. However, since the Cartographer has not run, the current tile has not been updated yet, so the item would be incorrectly added to the previous tile. The crude solution we currently use is to freeze the publisher until after the map has been updated. We are not yet sure of a better solution to this.

Output

At this point, the map has been updated, all input from NetHack has been parsed, and the appropriate announcements have been published. We then redraw the screen for the users watching the bot play. We also check if the user typed a key. TAEB has many debug commands, such as ~ to open up an interactive REPL to inspect and change TAEB's internal state.

Finally, it is time to involve the AI. TAEB simply asks the AI "what now?" This can involve arbitrarily complex calculations. One of my favorite features of TAEB's design is that its AI is pluggable. TAEB requires only a few methods be implemented by the AI, and we are working to trim that down to just "what now?"

There are currently two AIs being developed: Behavioral (by most of the core team) and Planar (by ais523). Both are worthy of many posts. We hope to entice more developers to work on TAEB AIs! One of our future projects will be Interhack on top of TAEB. The intelligence of a TAEB::AI::Interhack would not actually be "artificial", of course.

Since TAEB is a framework, the AI will inevitably be tightly-bound to TAEB (though, as explained, the converse is not true). The AI is expected to call all sorts of methods on TAEB objects, such as "am I currently blind?" (TAEB->is_blind), "do I know the identify spell?" (TAEB->find_spell("identify")), "is there a fountain on this level?" (TAEB->current_level->has_type("fountain")), and so on. We think this is a huge boon to developers seeking to write NetHack bots — they can focus solely on the interesting bits of artificial intelligence. We handle the programmatic NetHack.

The AI tells TAEB what to do each step through TAEB::Action objects. These abstract away the details of interacting with prompts and menus. The AI can say:

    sub next_action {
        # ...
        if (my $lizard_corpse = TAEB->find_item("lizard corpse")) {
            return TAEB::Action::Eat->new(
                food => $lizard_corpse,
            );
        }
        # ...
    }

The action will take care of responding to the "Eat what?" prompt with $lizard_corpse's inventory letter. The action will also respond to unexpected prompts, such as "There is a fox corpse here; eat it?". This declarative nature pervades TAEB's design to great effect.

Finally, we send the keystrokes for the current action (such as "e" for eat) to NetHack.

The action will participate in the next iteration of the main loop to respond to prompts and listen for events caused by the action. For example, if TAEB receives the message "You stop eating the lizard corpse", then the action will know not to remove the lizard corpse from the inventory.

Conclusion

The design of TAEB's main loop has had a significant impact on its (relative) longevity. The design of handling all of the input from NetHack as soon as possible drove much of the system's design in very positive ways. It encouraged the reification of Actions, which has been immeasurably useful. It even encouraged the addition of the pubsub system and its later enhancement to announcements. Though there are still some lingering quirks caused by this main loop design (such as teleportation traps), none seem insurmountable.


Footnotes

  1. TAEB remaps some dungeon features to avoid conflicts; in core NetHack, sinks are gray #, but so are corridors. Eventually we may be able to remove some remappings due to increasing Cartographer sophistication, but there's no rush. TAEB does not change too many characters as to render its NetHack screens unfamiliar. (back)

Saturday, March 14, 2009

TAEB 0.03

registering upload with PAUSE web server
POSTing upload for TAEB-0.03.tar.gz
PAUSE add message sent ok [200]

This is the first version that is actually going to be on the CPAN. I decided to put up with Module::Install for a little while longer. :)

TAEB, pubsub, and announcements

TAEB is a decently-sized, componentized program. Components need to be able to communicate with each other. For example, the Senses component (which tracks the state of TAEB's character) needs to know when we are indebted to a shopkeeper so that it can respond to the AI when it asks if TAEB is in debt, and for how much. The Cartographer component (which tracks the state of the dungeon map) needs to know when TAEB becomes indebted so it can mark the current room as a shop. In general, we want any component to be able to listen for any update. This will let us remain flexible and extensible; the completely-separate AI can listen for any update that the framework can.

We use the publish-subscribe pattern to control this complexity. Pubsub decouples those who generate updates (publishers) from those who listen for updates (subscribers). TAEB has been using pubsub for a long time now (since 2008-01-06, according to darcs trackdown). To publish a message, anything in the program can call TAEB->enqueue_message("foo" => @arguments). This will call the msg_foo method on all subscribers with arguments @arguments. We have a component, Publisher, that acts as message broker.

Pubsub has been a great tool for letting TAEB understand messages from NetHack. We have a component that is devoted entirely to figuring out what the current state of the screen is telling us: the ScreenScraper. The ScreenScraper deals mostly with transforming characters on the screen into published messages. For example, when the NetHack prints the message "You owe (somebody) (some amount of) zorkmids.", the ScreenScraper publishes a debt message with one argument: the number of gold pieces TAEB owes. The Senses has a msg_debt method that stores this amount of debt in an attribute. The Cartographer has a msg_debt method that floodfills the current room's tiles with the shop bit. The ScreenScraper does not need to know who cares about debt. The ScreenScraper publishes a lot of messages that no component subscribes to yet, and that's okay!

I'm happy with this, but there is room for improvement. Simple methods are not great message handlers. There are painful bugs lurking when subscriptions interact with inheritance. Suppose the AI defines a generic "go to a specific tile" behavior. This behavior would need to handle walked so it can track TAEB's progress. This GotoTile behavior is then subclassed to produce behaviors such as GoUpstairs and GotoCorpse. GotoCorpse might need to handle walked so it can age the corpse (so that TAEB doesn't eat rotten corpses). When the publisher sends the walked message to the GotoCorpse behavior, it does not invoke GotoTile's msg_walked method, because GotoCorpse didn't invoke it. While you may argue that GotoCorpse should have known to invoke GotoTile's msg_walked as that is part of its public interface, it certainly sucks for usability. It's especially painful when GotoTile begins subscribing to walked months after GotoCorpse is written!

The potential fixes for this are easy. The one I like best is providing some "subscribe to a message" sugar. Where we previously wrote sub msg_walked { ... }, we would now write message walked => sub { ... }. This sugar would handle publishing to parent classes if there are any. It would still use method calls behind the scene; message would install a msg_walked method. This gives us maximum flexibility. A subscriber could define an unusual message that would only conditionally be published to its parents.

The arguments we pass to each method could be improved as well. It's not immediately apparent what arguments the msg_debt method would receive. Currently we pass the amount of debt. However, we should also be providing the name of the shopkeeper TAEB is indebted to. This would help the Cartographer resolve ambiguities when there are multiple rooms and shops in sight. Should we rewrite every msg_debt method (including those potentially written by third-party, unknown AI hackers) to take named parameters? Should we pass in the shopkeeper's name as the second parameter? Should we pass in the name as the first parameter? After all, the name does come first in the message NetHack prints. We could provide one positional parameter (amount) and one named parameter (shopkeeper). Hey wait, it would be useful to also pass which items we're buying as well...

The best answer is to make each message an object. We would have a TAEB::Message::Status::Debt class with attributes amount and shopkeeper. This class could have a method to ask the inventory what items are currently in TAEB's shopping cart. If we're feeling cute, we could even overload this message to stringify to the amount of debt.

Pubsub with objects as messages is generally called Announcements. The concept of announcements is more general than pubsub. Announcements lets subscribers of a message communicate with the other subscribers, and even with the publisher.

Currently, when NetHack presents TAEB with a menu to select which items to pick up, TAEB will ask the AI whether it wants to pick up each item. The AI is asked about the item without the context of the other items that it can pick up. This really sucks! If TAEB is toeing the burden line, it needs to be pretty strict about what items it will pick up. This means it may refuse to pick up a useful item because there just might be an even more useful item further down the list.

Instead, TAEB should publish a "pick up items" announcement. This announcement, TAEB::Message::Query::PickupItems, would have the list of items. Subscribers would select which items they want by invoking methods on the announcement. When all subscribers have had a chance at it, the ScreenScraper would select the items in NetHack's menu accordingly. The selection doesn't have to be binary either; each subscriber could assign a numeric "desire" to each item. The ScreenScraper would then select items that have a sum desire greater than some cutoff (which would be another decision that some subscriber could set). Using announcements would better decouple the AI from the framework.

This is not the first time that turning plain strings into classes has been a major improvement for TAEB. Previously, the AI would return a string, a NetHack command, as the action to perform next. We then reified actions into classes. Letting the current action subscribe to messages, and respond to NetHack prompts, vastly improved TAEB's interactions with NetHack. Problems with unexpected prompts (such as "Eat this corpse on the ground?") quickly and completely vanished. Actions subscribing to messages lets us handle ambiguous messages better; if we just applied a unicorn horn, then we can figure out that "You feel sick." means the unicorn horn was cursed and TAEB can mark it so.

TAEB does not have announcements yet, but that's my next big project. I'm excited by the many possibilities here.