The Evolution of Planning as SAT
A Walk Through Encodings and Solvers
Table of Contents
The Planning as Satisfiability approach aims to solve classical deterministic planning problems by converting them into instances of the (SAT) problem. This method stands out by establishing a bridge between two well-established areas of artificial intelligence: automated planning and SAT formula solving.
1. SatPlan 1992
In their seminal work, (Kautz, Henry A and Selman, Bart and others, 1992) proposed the construction of a propositional formula that encodes both the initial state and the goal, as well as the state transitions allowed by actions. The proposed formulation is composed of four main components: (i) variables representing the facts that are true at each time step, (ii) the actions executed, (iii) the preconditions and effects of the actions, and (iv) the mutual exclusion (mutex) constraints. These constraints help maintain the consistency of the plan by preventing, for instance, the simultaneous execution of actions.
A key parameter in this formulation is the time horizon \(t\), which defines the maximum number of discrete time steps considered when searching for a valid plan. Each time step represents a point at which an event can take place. Thus, setting a time horizon \(t\) implies exploring whether a plan of exactly \(t\) steps, or fewer, under specific encoding strategies, can achieve the transition from the initial state to the goal.
In SAT-based planning, this horizon determines the structure of the propositional encoding: variables and clauses must be generated for all facts and actions at each level from time \(0\) to time \(t\). If no satisfying assignment is found, the horizon is incremented, and a new SAT instance is generated for the extended time window. In this way, the planning problem is effectively reduced to a SAT problem.
The algorithm operates iteratively, incrementing the time horizon and generating a new SAT formula at each step. An efficient SAT solver algorithm, for example DPLL, is employed to assess the satisfiability of each formula. If a satisfying assignment is found, it corresponds directly to a valid sequence of actions, that is, a feasible plan. If not, the process continues with an expanded horizon and a new formula, repeating until either a solution is found or it is concluded that the problem is unsolvable within the imposed constraints.
2. SatPlan 1996
The article by Kautz, McAllester, and Selman (Kautz, Henry and McAllester, David and Selman, Bart and others, 1996) deepens the Planning as Satisfiability approach by introducing, analyzing, and comparing various propositional encodings for STRIPS-style planning problems. The main focus of the work is the efficiency and conciseness of reductions from planning instances to propositional logic formulas, with the aim of improve SAT solver performance.
2.1. Linear Encoding
Linear encoding is based on the temporal indexing of fluents and actions. Each action is represented by a proposition indicating its execution at a specific time step, while fluents are indexed for each state layer. In this setting, fluents are formalised as state variables whose values may change over time as actions are executed.
Formally, let \(F = \{f_1, \dots, f_n\}\) be a finite set of fluents, where each fluent
\[ f_i : S \rightarrow \{\mathsf{true}, \mathsf{false}\} \]
is a Boolean-valued function over the set of world states \(S\). A state \(s \in S\) is completely characterised by the subset of fluents that hold in \(s\), that is, \(s \subseteq F\). This view of fluents originates in logical formalisations of action and change, where fluents represent properties of the world whose truth values depend on the current situation or state (McCarthy, John and Hayes, Patrick J, 1981,fikes1971strips).
Actions are defined by their preconditions and effects over fluents. Given an action \(a\), with preconditions \(\text{Pre}(a) \subseteq F\), positive effects \(\text{Add}(a) \subseteq F\), and negative effects \(\text{Del}(a) \subseteq F\), the application of \(a\) in a state \(s\) produces a successor state
\[ s' = (s \setminus \text{Del}(a)) \cup \text{Add}(a), \]
provided that \(\text{Pre}(a) \subseteq s\). Hence, fluents are precisely the components of the state whose truth values may be modified by actions (Ghallab, Malik and Nau, Dana and Traverso, Paolo, 2004).
Under a linear or temporal encoding, fluents are further interpreted as functions of discrete time,
\[ f_i : \mathbb{N} \rightarrow \{\mathsf{true}, \mathsf{false}\}, \]
where \(f_i(t)\) denotes the truth value of fluent \(f_i\) at time step \(t\). This temporal interpretation underlies propositional and constraint-based encodings of planning, in which each pair \((f_i, t)\) corresponds to a distinct variable representing the value of a fluent at a given state layer (Blum, Avrim L and Furst, Merrick L, 1997,kautz1996pushing).
The resulting formula includes clauses that (i) ensure the correspondence between actions and their effects; (ii) implement classical frame axioms, preserving unchanged fluents; (iii) enforce the mutual exclusivity of actions at each time step.
To illustrate, consider a simple scenario with a planning horizon \(T =
1\) and the action MOVE(A,B), which moves block A onto block
B. We introduce variables such as MOVE(A,B,0) for the action at
time 0, and ON(A,B,1) to indicate the resulting state at time 1.
The encoding includes clauses that:
- assert the initial state, e.g.,
CLEAR(B,0)andON(A,Table,0); - enforce preconditions, such as
CLEAR(B,0)andON(A,Table,0)forMOVE(A,B,0); - define effects, like
MOVE(A,B,0)implyingON(A,B,1)and \(\neg\)ON(A,Table,1); - preserve unaffected fluents via frame axioms;
- restrict the plan to one action per time step.
However, the number of variables and clauses grows rapidly with the domain size and operator arity. To mitigate this growth, the authors propose two extensions: operator splitting and explanatory frame axioms.
2.2. Operator Splitting e Explanatory Frame Axioms
Operator splitting decomposes actions of arity \(k\) into \(k\) unary
sub-actions, reducing the combinatorial complexity of arguments.
Instead of encoding MOVE(A,B,C,t) directly, we split it into unary
components SRC(A,t), OBJ(B,t), DST(C,t). Then, the action
MOVE(A,B,C) at time \(t\) is represented by:
\[ \texttt{SRC(A,t)} \wedge \texttt{OBJ(B,t)} \wedge \texttt{DST(C,t)} \]
Explanatory frame axioms, on the other hand, state that changes in
fluents occur only if caused by some explicit action, thereby reducing
the number of clauses compared to classical frame axioms. Suppose at
time \(t\) CLEAR(B) is true, but at time \(t+1\), it is false. The
explanatory frame axiom ensures that this change must be caused by
some action that deletes CLEAR(B) at time \(t\).
\[ \neg \texttt{CLEAR(B,}t{+}1) \wedge \texttt{CLEAR(B,}t) \rightarrow \bigvee_{a \in \mathcal{A}_{\text{del}}} \texttt{a}(t) \]
Where \(\mathcal{A}_{\text{del}} = \{ a \mid \texttt{CLEAR(B)} \in
\text{Del}(a) \}\) is the set of actions that delete CLEAR(B).
If CLEAR(B) becomes false, then at least one action that deletes it
must have occurred at that time. Otherwise, the fluent must remain
true by default. If no such action occurs at time \(t\), then the
change is not explained, and thus:
\[ \neg \left( \bigvee_{a \in \mathcal{A}_{\text{del}}} \texttt{a}(t) \right) \rightarrow \texttt{CLEAR(B,}t{+}1) \]
2.3. Parallel and Planning Graph-Based Encodings
The proposal is a planning graph-based encoding that allows for the parallel execution of actions. In this representation, conflicting actions are made mutually exclusive through additional clauses, and each fluent is supported by actions that maintain or change it.
Let us consider two actions occurring at time step \(t\), MOVE(A,
Table, B, t), Move block A from the table onto block B and
MOVE(C, Table, D, t), Move block C from the table onto block
D. Assume that A, B, C, and D are all distinct blocks, and
that these actions do not interfere (e.g., they access disjoint
resources). The SAT encoding introduces the following types of
clauses:
- Action Preconditions: An action implies its preconditions at the previous layer:
\[ \texttt{MOVE(A,Table,B,t)} \rightarrow \texttt{CLEAR(B,t)} \wedge \texttt{ON(A,Table,t)} \wedge \texttt{CLEAR(A,t)} \]
- Action Effects: The fluent is true at time \(t+1\) if it was added by an action or maintained:
\[ \texttt{ON(A,B,t+1)} \rightarrow \texttt{MOVE(A,Table,B,t)} \vee \texttt{MAINTAIN(ON(A,B),t)} \]
- Mutex Constraints: Actions that conflict are made mutually
exclusive. For instance, if
MOVE(A,Table,B,t)deletes a fluent required byMOVE(A,B,C,t), they cannot co-occur:
\[ \texttt{MOVE(A,Table,B,t)} \rightarrow \neg \texttt{MOVE(A,B,C,t)} \]
- Persistence (No-Op): For fluents that remain unchanged, a
MAINTAINaction is used:
\[ \texttt{MAINTAIN(ON(A,B),t)} \rightarrow \texttt{ON(A,B,t)} \wedge \texttt{ON(A,B,t+1)} \]
In this encoding, planning proceeds through layers (levels), alternating between fluent layers and action layers. Each layer encodes a moment in time, and the SAT solver is responsible for selecting a consistent set of actions that achieve the goal fluents at the final layer, obeying mutex constraints and causal support.
The advantage of this encoding is the potential reduction of the temporal depth of the SAT instance, since multiple independent actions can be applied simultaneously. However, in the worst case, this encoding is not necessarily more compact than the linear encoding with explanatory axioms.
2.4. Lifted Causal Encodings
The encoding is based on lifted causal plans, derived from the approach of (McAllester, David and Rosenblatt, David, 1991). This encoding avoids the full enumeration of operators in the domain by handling actions symbolically (lifted), using variables that are instantiated during the search for a solution.
Each step in the plan is represented by a parameterized copy of an operator. The encoding introduces causal links that connect a producer action to a consumer of a fluent, along with ordering constraints to ensure the validity of those links in the presence of actions that could threaten them (known as deletion threats).
To illustrate, consider a simple planning problem in the Blocks
World domain. The goal is to achieve the fluent ON(A,B) starting
from an initial state where both blocks A and B are on the table,
i.e., ON(A,Table) and ON(B,Table). Suppose the plan involves the
following symbolic steps:
- \(s_1\): a step executing
MOVE(A, Table, B), which adds the fluentON(A,B). - \(s_2\): a later step (e.g., the goal step) that requires
ON(A,B)as a precondition.
A causal link is established from \(s_1\) to \(s_2\):
\[ s_1 \xrightarrow{\texttt{ON(A,B)}} s_2 \]
- \(s_3\): a potential threat such as
MOVE(C, B, Table), which deletesON(A,B)if \(C = A\).
Since \(s_3\) could invalidate the effect of \(s_1\) before \(s_2\) is able to use it, it represents a threat to the causal link. To preserve the correctness of the plan, one of the following ordering constraints must be enforced:
\[ s_3 \prec s_1 \quad \text{or} \quad s_2 \prec s_3 \]
These constraints guarantee that \(s_3\) cannot intervene between \(s_1\) and \(s_2\), thereby protecting the causal link.
This approach has a higher asymptotic complexity compared to earlier formulations: the number of variables is \(\mathcal{O}(n^2)\) and the number of literals is \(\mathcal{O}(n^3)\), where \(n\) is proportional to the number of plan steps. Moreover, there is no explicit need for frame axioms, as the persistence of fluents is inherently ensured by the structure of the causal links.
3. SatPlan 1999 - BlackBox
The evolution of the Planning as Satisfiability approach, initiated by Kautz and Selman (1992, 1996), culminated in the development of the Blackbox (Kautz, Henry and Selman, Bart, 1999). This system introduced a unification of propositional encodings of planning problems and the plan graph structure introduced by (Blum, Avrim L and Furst, Merrick L, 1997). The goal was to combine the structural efficiency of Graphplan with the solving power of modern SAT solvers.
3.1. Blackbox Architecture
Blackbox operates through five main phases:
- It converts the STRIPS problem description into a plan graph with a given horizon \(T\).
- During graph expansion, it computes mutexes (mutual exclusions) using dedicated algorithms.
- It translates the graph into a CNF formula, encoding mutual exclusions as binary negative clauses.
- It applies general propositional simplification algorithms (e.g., the failed literal rule).
- It solves the resulting formula using SAT solvers. If no solution is found, the time horizon \(T\) is incremented and the process restarts.
This iterative cycle enables the incremental application of propositional encodings while retaining the structural compactness of Graphplan. Compared to direct STRIPS to CNF translation, the plan graph leads to fewer variables. Meanwhile, SAT solvers, both systematic and stochastic, can benefit from this compactness and from pruning the search space using mutex constraints.
3.2. Improvements over SATPLAN-1996
The CNF encoding resulting from the plan graph is significantly smaller than those produced by direct STRIPS translations. By including only essential mutexes (e.g., conflicts between preconditions and effects), the system drastically reduces the number of clauses, especially in sequential domains such as blocks world.
Blackbox applies propositional inference techniques to simplify the CNF before solving, including:
- Unit Propagation – inference based on unit clauses, see Example 1.
- Failed Literal Rule – testing the implications of assigning a literal and checking for inconsistency, see Example 2.
- Binary Failed Literal Rule – inference over literal pairs, see Example 3.
(Unit Propagation). Consider the CNF formula:
\[ \Phi = (\texttt{A}) \land (\neg \texttt{A} \lor \texttt{B}) \land (\neg \texttt{B} \lor \texttt{C}) \land (\neg \texttt{C}) \]
Applying unit propagation step by step:
- \((\texttt{A})\) implies \(\texttt{A} = \text{True}\);
- From \((\neg \texttt{A} \lor \texttt{B}) \Rightarrow \texttt{B} = \text{True}\);
- From \((\neg \texttt{B} \lor \texttt{C}) \Rightarrow \texttt{C} = \text{True}\);
- Contradicts \((\neg \texttt{C})\) \(\Rightarrow\) conflict.
Conclusion: The formula is unsatisfiable under the assignment \(A = \text{True}\).
(Failed Literal Rule). Let the formula be:
\[ \Phi = (\neg \texttt{A} \lor \texttt{B}) \land (\neg \texttt{B} \lor \texttt{C}) \land (\neg \texttt{C}) \]
To apply the failed literal rule, we temporarily assign \(\texttt{A} = \text{True}\) and propagate:
- \(\texttt{A} = \text{True} \Rightarrow \texttt{B} = \text{True}\);
- \(\texttt{B} = \text{True} \Rightarrow \texttt{C} = \text{True}\);
- Contradiction with \((\neg \texttt{C})\).
Result: Since assuming \(A = \text{True}\) leads to a conflict, we deduce that \(A = \text{False}\).
(Binary Failed Literal Rule). Consider the formula:
\[ \Phi = (\texttt{A} \lor \texttt{B}) \land (\neg \texttt{A} \lor \texttt{C}) \land (\neg \texttt{B} \lor \texttt{C}) \land (\neg \texttt{C}) \]
Test the literal pair \(\texttt{A} = \text{True}, \texttt{B} = \text{True}\):
- From the second and third clauses, deduce \(\texttt{C} = \text{True}\);
- This contradicts \((\neg \texttt{C})\).
Result: The pair \((\texttt{A}, \texttt{B})\) is invalid. We can safely add the following clause to exclude this combination from future consideration:
\[ \neg \texttt{A} \lor \neg \texttt{B} \]
These inference rules have proven especially effective in sequential domains, where binary techniques alone can determine up to 100% of variable assignments. In addition to logical inference, Blackbox also incorporates randomization and restart strategies. After a fixed number of backtracks, the solver resets and restarts the search with a randomized variable ordering. This approach significantly reduces average solving time and mitigates the heavy-tail behaviour typical of systematic search algorithms.
4. SatPlan 2004
This new version introduces improvements in both the encoding of planning problems and the efficiency of the SAT solving process, while maintaining compatibility with domains expressed in the STRIPS subset of the PDDL language.
4.1. SatPlan 2004 Architecture
Like Blackbox, SATPLAN-2004 follows a multi-phase pipeline:
- Constructs a Graphplan-style plan graph up to a depth \(k\).
- Translates the planning graph into a propositional formula in CNF.
- Applies a SAT solver to attempt to satisfy the formula.
- If the formula is unsatisfiable or a timeout occurs, \(k\) is incremented and the process restarts.
- Otherwise, the satisfying assignment is translated into a valid plan.
- A post-processing step removes redundant actions from the solution.
This final step is critical, as the encoding does not guarantee that all activated actions are necessary to achieve the goal. Filtering out actions improves both the quality and the readability of the resulting plan.
4.2. Encoding Styles
Based on the classes of clauses (i) actions imply their preconditions;
(ii) Facts imply disjunctions of actions that could have produced them
in the previous step (including noops); (iii) Actions imply
disjunctions of actions that establish each of their preconditions;
(iv) Actions with conflicting preconditions or effects are mutually
exclusive; (v) Mutexes inferred via propagation through the planning
graph. SATPLAN-2004 implements four distinct propositional encoding
styles:
- Action-based: based on regressive inference over action preconditions (iii), (iv) and (v).
- Graphplan-based: rooted in the classical Graphplan structure, encoding both direct and indirect causal relationships (i), (ii), (iv) and (v).
- Skinny Action-based and Skinny Graphplan-based: optimized variants that include only mutex clauses inferred through constraint propagation, exclude clause class (v).
The skinny encodings include only essential clauses, which reduces the total number of variables and clauses in the resulting CNF. Empirically, the skinny action-based encoding proved to be particularly effective in large-scale domains where efficient memory usage is crucial.
5. SatPlan 2006
The updated version, SATPLAN-2006, introduces strategic improvements aimed at balancing the expressiveness of propositional encodings with practical scalability for solving complex planning problems.
As with its predecessors, SATPLAN-2006 seeks to find plans of minimal parallel length, allowing multiple non-interfering actions to be executed simultaneously at each time step. Its primary innovation lies in the way mutexes are handled and encoded, as well as in the direct representation of fluents as Boolean variables—enhancing both memory efficiency and performance.
SATPLAN-2006 maintains the resolution structure of SATPLAN-2004 and Blackbox: it constructs a planning graph up to level \(k\), propagates mutexes among actions and fluents during graph expansion, generates a CNF formula, and invokes an external SAT solver. If no solution is found, \(k\) is incremented and the process repeats.
5.1. Improvements over SATPLAN-2004
The main motivation behind the changes introduced in SATPLAN-2006 was to mitigate the exponential growth of mutex clauses observed in SATPLAN-2004. While SATPLAN-2004 avoided mutex propagation entirely during graph construction to conserve memory, SATPLAN-2006 adopts a more balanced and selective approach:
- Mutex propagation is enabled, but only a subset of mutual exclusions is encoded.
- Only mutexes between fluents are translated into binary negative clauses; action mutexes are omitted.
- Both actions and fluents are represented as propositional variables, unlike earlier versions that modeled only actions.
This selective encoding significantly reduces the number of clauses generated, improving scalability without compromising the ability to solve planning problems.
The differences among Blackbox, SATPLAN-2004, and SATPLAN-2006 are summarized in Table 1.
| Feature | Blackbox | SATPLAN-2004 | SATPLAN-2006 |
|---|---|---|---|
| Mutexes in plan graph | Yes (actions + fluents) | Not used | Yes (only fluents) |
| Propositional variables | Actions only | Actions only | Actions + Fluents |
| Compact encoding | Partial | Yes (skinny variants) | Yes (optimised) |
| Solver modularity | Partial | Yes | Yes |
| PDDL support | STRIPS | STRIPS + types | STRIPS + types |
6. MADAGASCAR - Parallel Encodings of Classical Planning as Satisfiability
Rintanen, Heljanko, and Niemelä (Rintanen, Jussi and Heljanko, Keijo and Niemel{\"a}, Ilkka, 2004) investigates different semantics for parallel plans in the context of classical planning as satisfiability. The authors revisit the traditional notion of parallelism, known as step semantics, and propose two alternative variations: process semantics and 1-linearization semantics. These proposals are accompanied by propositional encodings designed to reduce formula complexity and improve practical performance.
6.1. Step Semantics
The step semantics, used in works such as Kautz and Selman (1996), allows the simultaneous execution of operators provided that:
- Their preconditions are satisfied in the current state;
- Their effects do not conflict with one another;
- The execution order does not affect the final state.
This semantics is implemented through constraints that prevent interference between operators. The propositional encoding defines variables for fluents and actions at each time step, with axioms modelling action preconditions, effects, and fluent persistence.
6.2. Process Semantics
Process semantics is a variation of step semantics with an additional restriction: an action must be executed at the earliest possible time step, according to step semantics criterion. In other words, actions should not be delayed if they can be advanced without causing conflicts.
The propositional encoding introduces clauses to prevent unnecessary delays. If an action \(o\) is scheduled for execution at time \(t+1\), the formula requires that there exists at least one action \(o_i\) at time \(t\) whose presence prevents \(o\) from being executed earlier:
\[ o_{t+1} \rightarrow (o_1^t \vee o_2^t \vee \ldots \vee o_n^t) \]
The experiments showed that this semantics did not yield consistent improvements in solving time. The increased number of clauses introduced by the added constraints may hinder the SAT solver’s performance.
6.3. 1-Linearization Semantics
1-linearization semantics is less restrictive than step semantics. It allows the simultaneous execution of operators as long as there exists at least one total ordering of those actions that results in a valid sequential plan. Unlike step semantics, it does not require that all possible orderings be valid.
To ensure that concurrently applied actions are linearisable, the authors introduce disabling graphs. In these graphs, an edge between two actions indicates that one may invalidate the other if executed first. The presence of cycles in such graphs indicates that the concurrent execution of those actions is not feasible.
Two encoding methods are proposed:
- Cubic encoding (\(O(n^3)\)): Uses auxiliary variables to detect the absence of cycles. While more expressive, this encoding can generate large formulas in domains with many operators.
- Fixed ordering: Establishes an arbitrary order among actions and forbids the concurrent execution of those that would violate this order. This encoding is lighter and achieves good practical performance in experimental evaluations.
Rintanen (Rintanen, Jussi, 2004) proposes new strategies for evaluating sequences of propositional formulas in the context of SAT-based planning. The traditional strategy consists of evaluating these formulas sequentially in order to find the shortest plan length that yields a satisfiable formula.
Although the sequential strategy ensures plans with minimal length, it may result in high execution times, particularly when proving the unsatisfiability of shorter formulas is computationally expensive. Rintanen’s key observation is that, in many cases, evaluating a satisfiable formula corresponding to a slightly longer plan may be significantly faster than proving the unsatisfiability of previous shorter formulas. This is due to the increasing complexity of formulas as the plan length grows and to the fact that less constrained satisfiable formulas are often easier to solve.
6.4. Algorithm S: Sequential Evaluation
The traditional method (Algorithm S) evaluates formulas in ascending order of plan length, starting from \(\phi_0\), and proceeds until the first satisfiable formula is found. This method guarantees the shortest possible plan, but may require the complete evaluation of several unsatisfiable formulas before reaching a solution.
6.5. Algorithm A: Parallel Evaluation with \(n\) Processes
Algorithm A distributes the evaluation of formulas among \(n\) concurrent processes. Each process evaluates a formula \(\phi_i\) until its satisfiability is determined. When a process finishes, it is assigned the next unevaluated formula. The algorithm terminates as soon as one satisfiable formula is found. The algorithm A exhibits the following characteristics:
- Allows concurrent evaluation of formulas with different plan lengths.
- Avoids full evaluation of formulas that may later be solved by other processes.
- The choice of parameter \(n\) directly affects performance, requiring a balance between parallelism and resource usage.
- Algorithm A, with \(n\) processes, has a lower bound on performance improvement of \(1/n\) compared to Algorithm S.
In some cases, multiple processes may end up evaluating satisfiable formulas beyond the first one. The performance gain depends on the distribution of evaluation costs across the sequence.
6.6. Algorithm B: Geometric CPU Time Distribution
Algorithm B proposes an interleaved distribution of CPU time across formulas, without requiring a fixed number of processes. Each formula \(\phi_i\) receives a fraction of time proportional to a geometric factor \(\gamma^i\), where \(\gamma \in (0, 1)\). The algorithm B exhibits the following characteristics:
- Enables evaluation of many formulas in parallel with decreasing priority.
- Avoids the need to select a fixed value of \(n\), as required in Algorithm A.
- More adaptable in scenarios where the optimal plan length is unknown.
- Algorithm B, with parameter \(\gamma\), has a performance gain lower bound of \(1 - \gamma\) compared to Algorithm S.
This algorithm can begin evaluating promising formulas without waiting for the full resolution of earlier ones, potentially leading to significant savings in total solving time.
7. Efficient Encoding of Cost Optimal Delete-Free Planning as SAT
The work of Rankooh and Rintanen (Rankooh, Masood Feyzbakhsh and Rintanen, Jussi, 2022) proposes a novel encoding for solving delete-free planning problems with minimum cost using propositional satisfiability. The core idea is to represent relaxed plans through partial causal functions that associate propositions with actions. This encoding was implemented in the Madagascar planner and compared with linear and integer programming approaches.
Delete-free problems are those in which actions only add propositions to the state without removing any. Every STRIPS problem can be relaxed into a delete-free problem, defining the \(h^+\) heuristic, widely used as an admissible estimate of optimal cost.
Consider a planning problem with the following elements:
- Propositions: \(P = \{p, q\}\);
- Actions:
- \(a_1\): preconditions \(\emptyset\), effects \(\{p\}\), cost \(1\);
- \(a_2\): preconditions \(\{p\}\), effects \(\{q\}\), cost \(2\).
- Initial state: \(I = \emptyset\);
- Goal: \(G = \{q\}\).
In this problem, the only way to achieve goal \(q\) is by first making \(p\) true via \(a_1\), followed by executing \(a_2\).
- The corresponding plan is \([a_1, a_2]\);
- The total cost of the plan is \(1 + 2 = 3\).
7.1. Causal Representations of Relaxed Plans
The main theoretical structure used is the partial causal function \(f: P \setminus I \to A\), which maps propositions to actions that add them. This function must satisfy:
- Every proposition in the plan must be added by some action;
- The preconditions of each action must be either in the initial state or caused by other actions;
- The derived causal graph \(G_f\) must be acyclic.
Based on this formalism, minimising the total cost of actions in the range of \(f\) is equivalent to computing \(h^+\). The relaxed plan can be extracted via a topological ordering of the acyclic graph \(G_f\).
Returning to Example 4, we have \(f(p) = a_1\) and \(f(q) = a_2\). Action \(a_1\) adds \(p\), which satisfies the precondition of \(a_2\), which in turn adds \(q\). The causal graph \(G_f\) contains two actions and an edge from \(a_1\) to \(a_2\). This graph is acyclic, and the resulting plan is \([a_1, a_2]\).
7.2. SAT Encoding
The SAT encoding introduces propositional variables to represent:
- Whether a proposition \(p\) has an associated action in \(f\);
- Whether an action \(a\) adds \(p\) and \(f(p) = a\).
The formulas encode the constraints for \(f\) to be a valid partial function, maintaining dependency relations between propositions and ensuring goal coverage.
For proposition \(q\), the encoding sets \(f(q) = a_2\); hence, action \(a_2\) must be present, and \(p\) (a precondition of \(a_2\)) must be available.
This dependency is translated into SAT clauses that ensure a valid causal chain up to the goal proposition \(q\).
To ensure that the causal graph \(G_f\) is acyclic, the authors examine two methods:
- Transitive closure: encodes paths and directly detects cycles.
- Vertex elimination: uses a node ordering to simulate vertex removal and prevent cycles, based on directed elimination width.
If the causal graph contains actions \(a_1 \rightarrow a_2 \rightarrow a_3 \rightarrow a_1\), a cycle is present, rendering \(f\) invalid. The vertex elimination method tries to define an order \(a_1 \prec a_2 \prec a_3\) and ensures that no edge violates this ordering.
7.3. Cost Propagation and Minimisation
To determine the precise value of the \(h^+\) heuristic, i.e., the minimum cost of a plan devoid of deletions, a SAT encoding approach is employed, incorporating constraints on the overall cost of causal actions. The search proceeds incrementally, using a parameter \(u\) as an upper limit on the total cost. The SAT formulation is satisfiable only if there exists a causal function \(f\) such that the sum of the costs associated with its domain actions does not exceed \(u\).
The goal is to find the smallest value of \(u\) such that the corresponding SAT formula is satisfiable. This equates to identifying an acyclic subset of actions sufficient to achieve all goals from the initial state with minimum cost.
Propagation via Propositions (P-SAT)
In this approach, each proposition \(p\) receives a variable representing the minimum cost required to make it true, based on the causal function \(f\).
The propagation rule is:
\[ \text{cost}(p) = \min_{a \in \text{add}(p)} \left[ \text{cost}(a) + \max_{q \in \text{pre}(a)} \text{cost}(q) \right] \]
This rule indicates that the cost to reach \(p\) is computed by considering all actions that produce \(p\) and summing the cost of each with the maximum cost of their preconditions. Thus, the cost propagation proceeds as:
\[ \text{cost}(p) = 1 \]
\[ \text{cost}(q) = 2 + \text{cost}(p) = 3 \]
This propagation is implemented as a set of SAT clauses enforcing the cost variable of \(q\) to reflect the accumulated dependencies.
Propagation via Actions (A-SAT)
Here, control is applied directly to the actions included in the domain of \(f\). Each time an action is selected, its cost is added to a global total cost variable.
The upper bound \(u\) is enforced by adding a clause that blocks assignments exceeding it:
\[ \sum_{a \in A} \text{cost}(a) \cdot \mathbf{1}_{a \in \text{dom}(f)} \leq u \]
The SAT encoding uses binary variables for each action and a counter system for summing costs, similar to pseudo-Boolean constraints encoded in CNF.
Combining Strategies
The AP-SAT version combines both propagation methods, the local estimates from proposition-based propagation and the direct control from action-based propagation. This hybrid approach improves robustness across domains, as the effectiveness of each method depends on the problem’s structure and the relative number of propositions and actions.
Minimum Cost Search
The value of \(u\) is determined using binary search:
- Start with interval \([u_{\text{min}}, u_{\text{max}}]\);
- At each iteration, a new formula is built with \(u = \lfloor (u_{\text{min}} + u_{\text{max}})/2 \rfloor\);
- If the formula is satisfiable, try a lower value; otherwise, increase the limit;
- The process ends when the smallest satisfiable \(u\) is found.
In the earlier example, the optimal cost is known to be 3. If we start with \(u_{\text{max}} = 5\) and \(u_{\text{min}} = 0\), the first test with \(u = 2\) is unsatisfiable, forcing an increase. When \(u = 3\), the formula is satisfiable, and the process terminates with the optimal value.
References
Blum, Avrim L and Furst, Merrick L (1997). Fast planning through planning graph analysis, Elsevier.
Ghallab, Malik and Nau, Dana and Traverso, Paolo (2004). Automated Planning: theory and practice, Elsevier.
Kautz, Henry A and Selman, Bart and others (1992). Planning as Satisfiability..
Kautz, Henry and McAllester, David and Selman, Bart and others (1996). Encoding plans in propositional logic, KR.
Kautz, Henry and Selman, Bart (1999). Unifying SAT-based and graph-based planning.
McAllester, David and Rosenblatt, David (1991). Systematic nonlinear planning.
McCarthy, John and Hayes, Patrick J (1981). Some philosophical problems from the standpoint of artificial intelligence, Elsevier.
Rankooh, Masood Feyzbakhsh and Rintanen, Jussi (2022). Efficient encoding of cost optimal delete-free planning as SAT.
Rintanen, Jussi (2004). Evaluation strategies for planning as satisfiability.
Rintanen, Jussi and Heljanko, Keijo and Niemel{\"a}, Ilkka (2004). Parallel encodings of classical planning as satisfiability.