Searching for the best strategy (III)
Reasoning about states in a planning problem
Hello!
In the previous posts we modeled a game of Warhammer 40K as a planning problem, and solved it. Modeling the rules is an important part, because a bad modelling yield bad plans that might be unrealistic. As we add more expressions into the problem description we may obtain better solutions, relying on the reasoning abilities of the planner.
In this post we will see different extensions to a planning problem, such as enrich action effects, add reasoning to actions, and uncertainty on the effects of the actions. We will see how to use more features of the PDDL language in order to have these features enabled and we will then run the experiments again to see if our army plays better.
Enriching actions effects
As we have seen previously, in Automated planning, an action can be represented by 3 lists of boolean variables: preconditions list, add effects, del effects. The preconditions list contains the list of variables that must be true before applying an action, if one of these variables is false, the action can not be applied. The add and del effects contains variables that become, respectively true and false after applying the action.
But what if there is one particular variable that, even if it is not mandatory as a preconditions, can affect the result of an action? For example, consider an unit shooting another unit in a W40K game, and causing a damage quantity of X. Will the unit be eliminated, or just wounded? The answer is: it depends on the previous number of wounds. So, how can we express this kind of effects, when describing an action? The answer is with Conditional Effects.
As the name indicates, these effects will have conditions, and when these conditions met the variable will be set to true (or false). In the example of our last experiment, it is possible to see that some actions contain at least a conditional effect, for example in the kill action we have the following effects (among other effects):
; Conditional effect
(when
; condition
(and (healthy ?target))
; effect
(and (wounded ?target) (not (healthy ?target)))
)
; Conditional effect
(when
; condition
(wounded ?target)
; effect
(killed ?target)
)
In these effects we state that if a target is healthy, that is, it has not taken any damage, it will change its condition to wounded. On the other hand if the target is already wounded, it will be killed.
Conditional effects allow modelling problems with more detail, in a more concise representation, because it is always possible to replace a conditional effect action with an action with a set of actions with more preconditions and more effects, one for each conditional effect.
However, one disadvantage of conditional effects is their contribution to uncertainty [Wow!…I used it, the magic word for problems in automated planning]. Let us see a bit more of what does uncertainty means in planning.
Planning under Uncertainty
Planning under uncertainty is a huge field of research on its own, but it can be expressed as the answer to the following question. How do you plan when you do not know:
- the initial situation?; or
- the actions effects?
In the first case, there are variables in the initial state that its truth value is unknown. For example, in a W40K game, it is possible to deploy units after the first turn (they are assumed to be in a teleport chamber, and can be placed everywhere in the board at a certain distance from the enemy), so the variable containing their location it is unknown.
On the other hand, uncertainty in the effects of the actions can represent actions whose outcome is not known. The most common example would be the simulation of rolling a dice; there is a 50% chance that it will have a result higher than 3 and a 50% lower.
As a simplified example of a situation with uncertainty in the initial state, imagine the following situation: around the corner, is there a hidden troop ready for an ambush? In this case, we have an state where there is uncertainty about the value of a variable p (location of an enemy troop); and the action that is going to be applied will have a different outcome for each possible initial state! This is uncertainty at its best (or worst!):
The image above shows one of such cases. And why this situation is bad?, may you be asking. Well, think that now you have two different states that must be solved. Exactly, the problem has duplicated! Now think that every action will have to deal with this situation. But this is material for another post.
Reasoning inside the search
Let us recall that solving a planning problem is just searching in the state space where states represent nodes and the actions are the edges between them. That is, the only way to change a state value must be to apply an action, right? However, an agent must be able to deduct the values of some variables, after applying an action, and without applying any further action. That is, reasoning.
This is just another way for an agent to obtain information. If the agent has rules of how the world works, called Domain Axioms,these rules must be enforced in some way, to make them always true. These rules can be expressed in the form of implications:
variable_1 -> variable_2
Do you remember the truth tables? In the case of an implication, if the variable_1, also called the body, is True, it follows that variable_2 must be necessarily true.
As an example, consider the W40K problem. How do we know that a unit has finished moving and it is the shooting turn? Well there are several ways, but one that is clear is that a unit that has already performed its movement phase, should be no longer able to move, and instead must pass to the Shooting phase. There is no action that is able to represent that, because an action can be chosen, or not, but this kind of situation can be expressed more automatically as:
(unit with ranged_weapon) and (unit stopped) -> (phase unit shooting)
In the following section we will see how we can add reasoning effects inside a problem.
Experiments
We will continue the experiments we started in the last post, but we will use these new features to enrich our definition of the W40K problem. So first of all, let us see how the new definition of the actions of the problem:
Notice two differences from the previous definition:
- Conditional effects: on the actions using the when keyword. For example in the smite action, if the target is healthy, the target will be wounded; and if the target is already wounded, it will be killed (and removed from play). (Note: these effects were already present on the previous post)
- Derived predicates: these predicates are tested every time an action is expanded. As an example consider the predicate shooting indicating that it is currently the shooting phase; this predicate will become true if the unit stops moving (predicate stopped) and it possesses a ranged weapon (ranged_weapon).
And the problem definition is also modified to include these additional predicates. Since we have added an additional action (charge), our plan now is bigger:
The derived predicates indicate the phase, so the solution plan now is an ordered sequence of actions. We will execute the actions with the simulator to see if the solution plan is better than the plan obtained with the previous definitions.
First we start with the movement phase. As in the previous versions, we did not limit (yet) the number of movements each unit can perform. The main difference is that now, the Daemon Prince (the miniature that starts first in the Death Guard sides) moves first. This seem a small difference but later in the game this will make a great difference.
Now is the time of the psyker shenanigans. This phase did not change since the only unit that is able to cast a smite attack is the Daemon Prince:
Notice the target of the smite attack, the Ultramarines Hellblaster unit (right top unit). This unit will become wounded after the attack, leaving it as an easy prey for the shooting phase:
Now the plague marines will finish the Hellblaster unit that is already wounded, and open fire against the Intercessors Ultramarine unit (bottom right), leaving the Captain as the only unit alive and healthy at this moment.
We have defined the charging phase as a new movement phase, and again we have made no restriction on the number of movements. But observe how the units are able to hit the enemy to finish it. A Daemon Prince and a unit of plague marines to finish the Ultramarines Captain, and a unit of plague marines to finish the Intercessors.
Conclusions
In this post we have seen how to add more expressions to a problem. Now we are able to express conditional effects and derive predicates without having to execute new actions. We have tested the new problem it and showed our results. In the next post, I would like to complete the problem so It can finish automatically the combat. Thus, we could automatically generate scores of combats, even with different profiles, and we would test different approaches, like teaching a computer to play Warhammer by using the examples.
Stay safe!