Searching for the best strategy (II)

8 min readMay 15, 2021

How to use automated planning to play a Warhammer 40k game

Hello!

In the first part of this series of posts, we have seen what a planning problem is. Remember that we want to model an agent able to play a game of Warhammer 40K, using different Artificial Intelligence techniques. This is the second post of the series and we will continue developing the topic we started, by describing the planners and how to solve a planning problem.

This post is structured as follows: first, we describe what a solution of a planning problem is, and what makes a solution better than others. Then, we will see a general planner, and the variations it would need to solve the different types of planning problem. Finally, we will create a problem for Warhammer 40k (experiments at last!) and we will use a state-of-the-art planner to solve it.

Planners

A planner works by extensively searching in the set of all possible combinations of variables. This set is called the state space. Do you remember the definition of state we gave in the previous post? A search algorithm considers every different state to be a node, and every action a vertex, as in a graph. Every action modifies the environment by adding or deleting new information.

(left) Graphical representation of a state with one Death Guard against 5 Ultramarines; (right) Representation of a state and its two successor states after expanding two different actions.

Consider the image above. A Death Guard legionnaire (the most similar I could draw without mutations) can choose between two actions: (a) move north so he can target the hated Ultramarines; or (b) stay behind the wall and hold the objective. Since we adopted a set representation, where every proposition or variable represent a fact of the world, and what is not in the set is assumed to be false (remember the Closed World Assumption in the previous post?), to expand a state is simply to remove the propositions rendered false by the action (delete list) and to add the propositions of the (add list). This is the state transition function, that given a state and an action will return the next state.

So, if we are given a initial state and a set of actions, we are good to go. We can use any search algorithm (BFS, DFS or Dijkstra for instance) to find a solution…BUT…the exponential cost of searching a solution by expanding all states will make it almost impossible to find such solution (if it exists). To avoid this exponential explosion, planners use informed search algorithms such as A* or local searches with an emphasis on the heuristic like Hill-Climbing.

In fact, heuristics are one of the main focus of the Automated Planning community. An heuristic is simply a way to tell the algorithm how good can be an state, regarding how close to the goal it is, without having to compute all the cost.

(left) The same representation as above; (right) Representation of a state and its two successor with their heuristic values represented. Notice that expose the soldier to the enemy fire is worse than holding an objective.

The Fast-Forward planner (FF) was an innovative planner when first appeared. Its heuristic was something new, and consisted of solving a relaxed version of the same problem. To see how good an state is, solve it considering an optimistic version of such problem…easy! and this is something we humans are familiarized with. In this relaxed version, only the add list of effects are considered. Wait!…but solving a relaxed version isn’t going to be expensive? Yes…a bit, however it is possible to prune states and expand only those that are really promising, resulting in more efficiency.

Another modern planner is the Fast Downard planner, considered the state-of-the-art. In this planner, before starting to expand any state, a graph with the dependencies between propositions is computed. The heuristic, then analyses these relations to see how many propositions more need to be added to reach a goal state.

Well, enough theory for now. It is time to show some cool planners in action.

Experiments

We used the Python Arcade Library and created a small representation of a wargame, where two armies are facing each other. There are also four different objectives and some obstacles where a unit may hide to avoid being targeted. Each army has 3 units (assembled using older posts!) of 2 troops and a HQ. In the Ultramarines side:

An Intercessor squad of 5 men: rank and file troops.
A Hellblaster squad of 5 men: elite troop with superior weapons.
A Captain: an HQ choice.

For the Death Guard, we have:

2 plague marines units of 5 men each.
A Daemon Prince: a special HQ unit able to cast special attacks in the Psychic Phase.

We used the FD planner in our experiments. Problems accepted by modern planners take 2 files as input, written in a language called Planner Domain Definition Language (PDDL), that is a subset of LISP syntax. A Domain file and a Problem file. The domain file contains the definition of the problem, such as the list of variables (or predicates) and the allowed actions. On the other hand, a problem file contains the definition of an specific instance of the problem, such as the size, or the number of initial troops.

To start solving a W40KI problem, we created a (first version) domain file for the problem with the following definition of the actions and predicates:

Domain for the W40K problem

It is out of the scope of this post to explain the LISP syntax of PDDL, but this version of the domain file is quite straightforward. An action has :parameters that indicate the type of variable that can instantiate the action, for example the move action takes 3 parameters:

?who: the troop moving
?from: the initial position
?to: the final position

It will create an action for each combination of these parameters, even if some movement actions are not applicable because its precondition never met.

The actions in the domain file are move and hold for the Movement Phase, smite for the Psychic Phase, and kill for the Combat Phase. I left some phases outside because they can be fully automatized (for example, when deciding to attack a troop will perform a charge first).

So we defined the problem as follows:

(:goal 
(and
 (or
  (conquer plaguemarines1 o-1)
  (conquer plaguemarines1 o-2)
  (conquer plaguemarines1 o-3)
  (conquer plaguemarines1 o-4)
 )
 (or
  (conquer plaguemarines2 o-1)
  (conquer plaguemarines2 o-2)
  (conquer plaguemarines2 o-3)
  (conquer plaguemarines2 o-4)
 )
 (or
  (conquer daemonprince o-1)
  (conquer daemonprince o-2)
  (conquer daemonprince o-3)
  (conquer daemonprince o-4)
 )
 (killed hellblasters)
 (killed intercessors)
 (killed lieutenant)
)
)

So, we feed these files into the planner, and this is the plan it computed:

Computed plan for the W40K problem

Can you see that something is wrong with this plan? Well, the actions are not ordered! I ordered some actions according to the phase they belong, before executing the plan. And there is another problem as we will see, but first, let us begin by the first part of the goal, move to conquer the objectives:

To conquer an objective, first the hold action must be applied. And for this action to be applied, the unit must be in the same location as the objective. In the image above, we can see that the 3 units spread on the board to hold different objectives. In this first version I did not consider limitations to the movement phases and the ranges of the enemies, so the obtained plans may be unrealistic.

The second phase, is the smite phase, but it is a phase restricted only to psykers. And there is only one psyker in this game, the Daemon prince of the Death Guard. So the planner chooses to cast smite on the Captain (quite epic on my opinion):

The next phase is the shooting phase, in which the Death Guard fires at the Ultramarines. This phase is straightforward, and the interesting thing is how the planner chooses a target for each unit. It considered that the smite was already damaging the captain, so one unit should finish him, and the other unit should engage fire on a different unit. So it chooses to engage the Intercessors:

Why this? Because the problem consider only two situations in a combat: either the troop is in full health so shooting causes wounds, or it is already wounded and another attack will kill it. We can express this using conditional effects on the action’s declaration. Notice the conditional effect on the action of shooting:

(:action shot 
  :parameters (?who ?target) 
  :precondition (and 
    (ranged_weapon ?who) 
    (in_range ?who ?target) 
    (not (shooted ?who)) 
  ) 
  :effect (and 
    (shooted ?who) 
    (when (and (healthy ?target)) (and (wounded ?target) (not (healthy ?target))) ) 
    (when (wounded ?target) (killed ?target) ) ) )

So where is the problem I said before? Well, I could not code (yet) the sprites of the action of combat, but the plan states these orders of combat:

(kill daemonprince hellblasters)
(kill plaguemarines1 hellblasters)
(kill plaguemarines2 intercessors)

That in the map would be translated as the following attacks (including the charge):

Charges and attacks for the Combat phase.

This is obviously sub optimal, because each Death guard is attacking the opposite troop instead of attacking the nearest enemy. How this is possible? Well, we did not considerate the distance, and the charge phase. So the planner assigned an enemy to each troop regardless of the distance.

It is important then to model a domain considering the actions and the states carefully, to avoid such problems. In the following post we will see how to express more restrictions in a domain file to producer better plans.

Conclusions

In this post we have seen how to model an automated planning problem, and how to solve it using a planner. We have also seen some of the problems that may arise from a bad modelling. In the next post, we will see how to encode more restrictions in the domain file to express orders in the actions and other restrictions we have not considered in this version.

Stay safe!

Searching for the best strategy (II)

Written by Ignasi Andres