Skip to main content

1. Classical Planning

Classical Planning

State Model S(P)S(P)

  • Finite and discrete state space SS
  • a known initial state  s0S\ s_0 \in S
  • a set SGSS_G \subseteq S of goal states
  • actions A(s)AA(s) \subseteq A applicable in each sSs \in S
  • a deterministic transition function s=f(a,s)s' = f(a,s) for aA(s)a \in A(s)
  • positve action cost c(a,s)c(a,s)

Goals states can be more than one

Search Algorithm

  • b: maximal branching factor
  • d: goal depth
  • m: the maximal depth reached

Breadth First Search Data structure: FIFO (Queue)

Breadth First Search explores node level by level, and a queue (FIFO) allows nodes to be processed in the order that they are discovered, which ensures all nodes at a given depth are expanded before moving to the next level.

Depth first Search

Data structure: LIFO (Stack)

Depth FiRst Search explores as deep as possible along each branch before backtracking, and a stack (LIFO) enables this by always processing the most recently discovered node that still has unexplored successors.

Uniform Cost Search

Data structure: Priority Queue

Uniform Cost Search expands nodes based on t6he lowest cumulaitve cost, and a priority queue efficiently retrieves the node with the minimal cost at each step, allowing UCS for finding the optimal path.

DFSBFSIDA*HCIDA*
CompleteNoYesYesYesNoYes
OptimalNoYes*Yes*YesNoYes
Timebmb^mbdb^dbdb^dbdb^dbdb^d
Spacebmbmbdb^dbdbdbdb^dbbbdbd

A search algorithm is considered complete if it guarantees to find a solution, if one exists.

BFS and IDS are optimal only all nodes (edges) has equal costs(uniform).

f(n)=g(n)+h(n)f(n) = g(n) + h(n)
g(n)g(n) : The actual cost from the initial state to the current state
h(n)h(n) : The heuristic value from the current state to goal state

h(s)h^*(s) Actual optimal value from current state s to goal state

Property of heuristic

  1. Admissble: never over-estimate. If h(s)h(s)h(s) \leq h^*(s) for all sSs \in S
  2. Safe: for all h(s)=h(s)=h(s) = \infty \rightarrow h^*(s) = \infty
  3. Goal aware: h(s)=0h(s) = 0 for all sSGs\in S_G
  4. Consistent: h(s)cost(a)+h(s)h(s) \leq cost(a) + h(s')

consistent + goal aware \rightarrow admissible
admissible \rightarrow goal aware + safe

Optimality for A star search

  • A star search without re-open mechanism requires both consistent and admissible to ensure finding an optimal path.
  • With re-open mechanism, it only requies admissible

Example for a non-optimal solution by inconsistent but adimissble for A* search without re-open nodes.

     (8)
A
/ \
+1 / \ +3
/ \
B ----- C ----- D
(7) +1 (0) +6 (0)

Reference: Why does A* with admissible non consistent heuristic find non optimal solution?

Greedy best-first search

  • Using priority queue expand node by h(s)
  • Completeness: For safe heuristic, Yes
  • Optimality: No

STRIPS And Relaxation

A STRIPS problem P = F,O,I,G\langle F, O, I, G \rangle : Propositions, Initial state, Goal, Actions.

h+: The Optimal Delete Relaxation Heuristic

Default STRIP model: s' = s + add(s) - del(s)
after delete relaxation: s' = s + add(s)

Defination haddh^{add}

{0gsminaA,gaddac(a)+hadd(s,prea)g=1gghadd(s,{g})g>1\begin{cases} 0 & g \subseteq s \\ \min_{a \in A, g \in \text{add}_a} c(a) + h^{\text{add}}(s, \text{pre}_a) & |g| = 1 \\ \sum_{g' \in g} h^{\text{add}}(s, \{g'\}) & |g| > 1 \\ \end{cases}

如果目标大于1: haddh^{add} 等于每个小目标的haddh^{add} 之和

Defination hmaxh^{max}

{0gsminaA,gaddac(a)+hmax(s,prea)g=1maxgghmax(s,{g})g>1\begin{cases} 0 & g \subseteq s \\ \min_{a \in A, g \in \text{add}_a} c(a) + h^{\text{max}}(s, \text{pre}_a) & |g| = 1 \\ \max_{g' \in g} h^{\text{max}}(s, \{g'\}) & |g| > 1 \\ \end{cases}

如果目标大于1: hmaxh^{max} 等于每个小目标的hmaxh^{max} 的最大值

haddh^{add}hmaxh^{max}:如果一个动作有多个precondition,则视为多个g:g>1|g| > 1

Bellman-Ford Table

hmaxh^{max}

alt text

haddh^{add}

alt text
  • hmaxh^{max} is admissible, but is typically far too optimistic

  • haddh^{add} is not admissible, but is typically a lot more informed than hmaxh^{max}

Relaxed Plan Extraction hffh^{ff}

A heuristic function is called a relaxed plan heuristic, denoted hFFh^{FF}, if, given a state ss, it returns \infty if no relaxed plan exists, and otherwise returns aRPlanc(a)\sum_{a \in RPlan} c(a) where RPlanRPlan is the action set returned by relaxed plan extraction on a closed well-founded best-supporter function for ss.

  • Plan set means no duplicate actions.
alt text
  • hFFh^{FF} never over count sub plans due to RPlan:=RPlan{bs(g)}\text{RPlan} := \text{RPlan} \cup \{ b_s(g) \}
  • hFFh^{FF} may be inadmissible, just like haddh^{add}, but for more subtle reasons.
  • In practice, hFFh^{FF} typically does not over-estimate hh^*

For relaxed plan extraction to make sense, it requires a closed well-founded best-supporter function:

The hmaxh^{max} supporter function:

bsmax(p):=arg minaA,paddac(a)+hmax(s,prea).b^{\text{max}}_s(p) := \argmin_{a \in A, p \in \text{add}_a} c(a) + h^{\text{max}}(s, \text{pre}_a).

The haddh^{add} supporter function:

bsadd(p):=arg minaA,paddac(a)+hadd(s,prea).b^{\text{add}}_s(p) := \argmin_{a \in A, p \in \text{add}_a} c(a) + h^{\text{add}}(s, \text{pre}_a).

Proposition : (hFFh^{FF} is Pessimistic and Agrees with hh^{*} on \infty). For all STRIPS planning tasks Π\Pi, hFFh+h^{FF} \geq h^{+} for all states, h+(s)=h^{+}(s) = \infty if and only if hFF(s)=h^{FF}(s) = \infty. There exists STRIPS planning tasks Π\Pi and ss so that hFF(s)>h(s)h^{FF}(s) > h^{*}(s).

novelty w(s)