Learning Notes
Home
Home
  • Guide
    • Shortkeys
    • Commands
  • Cpp
    • Basic
    • Scope, Duration, and Linkage
    • Pointers(1)
    • Pointers(2)
  • Rust Learning
    • 1.Number Types
    • 2.Ownership
    • 3.Struct
    • 4.Enum
    • 5.Package-management
    • 6.Vector
    • 7.Error handle
    • 8.Trait 泛型
    • 9.Life time
    • 10.Test
    • 11.Minigrep
    • 12.Closure 闭包
    • 13.Smart pointer 智能指针
    • 14.Fearless concurrency 无畏并发
  • Quantum Computing
    • Basic computing and quantum computing Theory
    • Teleportation
    • QKD and QFT
    • QFE and Shor's algorithm
    • QAOA
    • Quantum algorithms
    • Week 11
  • Statical Machine Learning
    • Bayesian Linear Regression
    • Linear regression and logistic regresion
    • Bais
    • SVM, Kernel Methods
    • Precepction and Artificial Neural Network
    • Cross validation, experts learning, MAB
    • Bayesian inference
    • GMM-EM
    • final
  • Software Project Management
    • Lecture
    • Revision
  • AI-planning
    • Classical planning
    • MDP and reinforcement learning
    • Game theory and final recap

AI Planning

Classical Planning

State Model S(P)S(P)S(P)

  • Finite and discrete state space SSS
  • a known initial state  s0∈S\ s_0 \in S s0​∈S
  • a set SG⊆SS_G \subseteq SSG​⊆S of goal states
  • actions A(s)⊆AA(s) \subseteq AA(s)⊆A applicable in each s∈Ss \in Ss∈S
  • a deterministic transition function s′=f(a,s)s' = f(a,s)s′=f(a,s) for a∈A(s)a \in A(s)a∈A(s)
  • positve action cost c(a,s)c(a,s)c(a,s)

Goals states can be more than one

Search Algorithm

  • b: maximal branching factor
  • d: goal depth
  • m: the maximal depth reached

Breadth First Search Data structure: FIFO (Queue)

Breadth First Search explores node level by level, and a queue (FIFO) allows nodes to be processed in the order that they are discovered, which ensures all nodes at a given depth are expanded before moving to the next level.

Depth first Search

Data structure: LIFO (Stack)

Depth FiRst Search explores as deep as possible along each branch before backtracking, and a stack (LIFO) enables this by always processing the most recently discovered node that still has unexplored successors.

Uniform Cost Search

Data structure: Priority Queue

Uniform Cost Search expands nodes based on t6he lowest cumulaitve cost, and a priority queue efficiently retrieves the node with the minimal cost at each step, allowing UCS for finding the optimal path.

DFSBFSIDA*HCIDA*
CompleteNoYesYesYesNoYes
OptimalNoYes*Yes*YesNoYes
Timebmb^mbmbdb^dbdbdb^dbdbdb^dbd∞∞∞bdb^dbd
Spacebmbmbmbdb^dbdbdbdbdbdb^dbdbbbbdbdbd

A search algorithm is considered complete if it guarantees to find a solution, if one exists.

BFS and IDS are optimal only all nodes (edges) has equal costs(uniform).

Astar Search

f(n)=g(n)+h(n)f(n) = g(n) + h(n)f(n)=g(n)+h(n)
g(n)g(n)g(n) : The actual cost from the initial state to the current state
h(n)h(n)h(n) : The heuristic value from the current state to goal state

h∗(s)h^*(s)h∗(s) Actual optimal value from current state s to goal state

Property of heuristic

  1. Admissble: never over-estimate. If h(s)≤h∗(s)h(s) \leq h^*(s)h(s)≤h∗(s) for all s∈Ss \in Ss∈S
  2. Safe: for all h(s)=∞→h∗(s)=∞h(s) = \infty \rightarrow h^*(s) = \inftyh(s)=∞→h∗(s)=∞
  3. Goal aware: h(s)=0h(s) = 0h(s)=0 for all s∈SGs\in S_Gs∈SG​
  4. Consistent: h(s)≤cost(a)+h(s′)h(s) \leq cost(a) + h(s')h(s)≤cost(a)+h(s′)

consistent + goal aware →\rightarrow→ admissible
admissible →\rightarrow→ goal aware + safe

Optimality for A star search

  • A star search without re-open mechanism requires both consistent and admissible to ensure finding an optimal path.
  • With re-open mechanism, it only requies admissible

Example for a non-optimal solution by inconsistent but adimissble for A* search without re-open nodes.

     (8)
      A
     / \
+1  /   \ +3
   /     \   
  B ----- C ----- D
(7)  +1  (0)  +6  (0)

Reference: Why does A* with admissible non consistent heuristic find non optimal solution?

Greedy best-first search

  • Using priority queue expand node by h(s)
  • Completeness: For safe heuristic, Yes
  • Optimality: No

STRIPS And Relaxation

A STRIPS problem P = ⟨F,O,I,G⟩\langle F, O, I, G \rangle⟨F,O,I,G⟩ : Propositions, Initial state, Goal, Actions.

h+: The Optimal Delete Relaxation Heuristic

Default STRIP model: s' = s + add(s) - del(s)
after delete relaxation: s' = s + add(s)

Defination haddh^{add}hadd

{0g⊆smin⁡a∈A,g∈addac(a)+hadd(s,prea)∣g∣=1∑g′∈ghadd(s,{g′})∣g∣>1 \begin{cases} 0 & g \subseteq s \\ \min_{a \in A, g \in \text{add}_a} c(a) + h^{\text{add}}(s, \text{pre}_a) & |g| = 1 \\ \sum_{g' \in g} h^{\text{add}}(s, \{g'\}) & |g| > 1 \\ \end{cases} ⎩⎨⎧​0mina∈A,g∈adda​​c(a)+hadd(s,prea​)∑g′∈g​hadd(s,{g′})​g⊆s∣g∣=1∣g∣>1​

如果目标大于1: haddh^{add}hadd 等于每个小目标的haddh^{add}hadd 之和

Defination hmaxh^{max}hmax

{0g⊆smin⁡a∈A,g∈addac(a)+hmax(s,prea)∣g∣=1max⁡g′∈ghmax(s,{g′})∣g∣>1 \begin{cases} 0 & g \subseteq s \\ \min_{a \in A, g \in \text{add}_a} c(a) + h^{\text{max}}(s, \text{pre}_a) & |g| = 1 \\ \max_{g' \in g} h^{\text{max}}(s, \{g'\}) & |g| > 1 \\ \end{cases} ⎩⎨⎧​0mina∈A,g∈adda​​c(a)+hmax(s,prea​)maxg′∈g​hmax(s,{g′})​g⊆s∣g∣=1∣g∣>1​

如果目标大于1: hmaxh^{max}hmax 等于每个小目标的hmaxh^{max}hmax 的最大值

haddh^{add}hadd 和 hmaxh^{max}hmax:如果一个动作有多个precondition,则视为多个g:∣g∣>1|g| > 1∣g∣>1

Bellman-Ford Table

hmaxh^{max}hmax

alt text

haddh^{add}hadd

alt text
  • hmaxh^{max}hmax is admissible, but is typically far too optimistic

  • haddh^{add}hadd is not admissible, but is typically a lot more informed than hmaxh^{max}hmax

Relaxed Plan Extraction hffh^{ff}hff

A heuristic function is called a relaxed plan heuristic, denoted hFFh^{FF}hFF, if, given a state sss, it returns ∞\infty∞ if no relaxed plan exists, and otherwise returns ∑a∈RPlanc(a)\sum_{a \in RPlan} c(a)∑a∈RPlan​c(a) where RPlanRPlanRPlan is the action set returned by relaxed plan extraction on a closed well-founded best-supporter function for sss.

  • Plan set means no duplicate actions.
alt text
  • hFFh^{FF}hFF never over count sub plans due to RPlan:=RPlan∪{bs(g)}\text{RPlan} := \text{RPlan} \cup \{ b_s(g) \}RPlan:=RPlan∪{bs​(g)}
  • hFFh^{FF}hFF may be inadmissible, just like haddh^{add}hadd, but for more subtle reasons.
  • In practice, hFFh^{FF}hFF typically does not over-estimate h∗h^*h∗

For relaxed plan extraction to make sense, it requires a closed well-founded best-supporter function:

The hmaxh^{max}hmax supporter function:

bsmax(p):=arg min⁡a∈A,p∈addac(a)+hmax(s,prea). b^{\text{max}}_s(p) := \argmin_{a \in A, p \in \text{add}_a} c(a) + h^{\text{max}}(s, \text{pre}_a). bsmax​(p):=a∈A,p∈adda​argmin​c(a)+hmax(s,prea​).

The haddh^{add}hadd supporter function:

bsadd(p):=arg min⁡a∈A,p∈addac(a)+hadd(s,prea). b^{\text{add}}_s(p) := \argmin_{a \in A, p \in \text{add}_a} c(a) + h^{\text{add}}(s, \text{pre}_a). bsadd​(p):=a∈A,p∈adda​argmin​c(a)+hadd(s,prea​).

Proposition : (hFFh^{FF}hFF is Pessimistic and Agrees with h∗h^{*}h∗ on ∞\infty∞). For all STRIPS planning tasks Π\PiΠ, hFF≥h+h^{FF} \geq h^{+}hFF≥h+ for all states, h+(s)=∞h^{+}(s) = \inftyh+(s)=∞ if and only if hFF(s)=∞h^{FF}(s) = \inftyhFF(s)=∞. There exists STRIPS planning tasks Π\PiΠ and sss so that hFF(s)>h∗(s)h^{FF}(s) > h^{*}(s)hFF(s)>h∗(s).

Width-based search

novelty w(s)

Last Updated:
Contributors: pingzhihe
Next
MDP and reinforcement learning