Learning Notes
Home
Home
  • Guide
    • Shortkeys
    • Commands
  • Cpp
    • Basic
    • Scope, Duration, and Linkage
    • Pointers(1)
    • Pointers(2)
  • Rust Learning
    • 1.Number Types
    • 2.Ownership
    • 3.Struct
    • 4.Enum
    • 5.Package-management
    • 6.Vector
    • 7.Error handle
    • 8.Trait 泛型
    • 9.Life time
    • 10.Test
    • 11.Minigrep
    • 12.Closure 闭包
    • 13.Smart pointer 智能指针
    • 14.Fearless concurrency 无畏并发
  • Quantum Computing
    • Basic computing and quantum computing Theory
    • Teleportation
    • QKD and QFT
    • QFE and Shor's algorithm
    • QAOA
    • Quantum algorithms
    • Week 11
  • Statical Machine Learning
    • Bayesian Linear Regression
    • Linear regression and logistic regresion
    • Bais
    • SVM, Kernel Methods
    • Precepction and Artificial Neural Network
    • Cross validation, experts learning, MAB
    • Bayesian inference
    • GMM-EM
    • final
  • Software Project Management
    • Lecture
    • Revision
  • AI-planning
    • Classical planning
    • MDP and reinforcement learning
    • Game theory and final recap

Game Theory

Normal Form Game : for N players if they take actions at the same time.

A normal form game is a tuple G=(N,A,u)G= (N,A,u)G=(N,A,u)

  • NNN is a set of nnn number of players
  • A=A1×…×AnA = A_1 \times \ldots \times A_nA=A1​×…×An​ is an action profile where AiA_iAi​ is the set of actions for playher iii. Thus, an action profile a=(a1,…,an)a=(a_1,\ldots,a_n)a=(a1​,…,an​) describes the sumultaneous moves by all players
  • μ:A→RN\mu : A \rightarrow \mathbb{R}^Nμ:A→RN is a rewards function that returns an NNN -tuple specifying the payoff each player reveives in state SSS. This is called the utility for an action

A pure strategy for an agent is when the agent selects a single action and plays it. If the agent weere to play the game multiple times, they would choose the same action every time.

A mixed strategy is when an agent selects the action to play based on some probability distribution. That is, we choose action aaa with probabilitu 0.8 and action b with probability 0.2. If the agenty were to play the game an infinite number of times, it would select aaa 80 % of the time and bbb 20 % of the time.

A pure strategy can be treat as a mixed strategy: 100 % probability for one action. That is, a pure strategy is a subset of a mixed stragety

weakly dominant and strongly dominant

A strategy is weakly dominant if the utility received by the agent for playing strategy sis_isi​ is greater than or equal to the utility received by that agent for playing si′s^{'}_{i}si′​

NASH EQUILIBRIA FOR PRISONER’S DILEMMA

prisoner dilemma
  • If player 2 choose Admit, player1 choose admit get greater reward: −2>−4-2 > -4−2>−4; If player 2 choose Deny, player 1 choose Admit receives greater reward. 0>−10 > -10>−1
  • If player 1 choose Admit, player2 choose admit get greater reward: −2>−4-2 > -4−2>−4; If player 1 choose Dey, player 2 choose admit get greater reward. 0>−10 > -10>−1

Nash Equilibrium occurs in a game when each player’s strategy maximizes their own payoff, given the strategies chosen by all other players. No player can gain a higher payoff by unilaterally deviating from their current strategy, assuming the other players maintain their strategies.

Nash's existence theorem

Nash proved that if mixed strategies (where a player chooses probabilities of using various pure strategies) are allowed, then every game with a finite number of players in which each player can choose from finitely many pure strategies has at least one Nash equilibrium, which might be a pure strategy for each player or might be a probability distribution over strategies for each player. reference

Key definations recap:

Interleaved action selection (planning) and action execution is known as online planning.

Last Updated:
Contributors: pingzhihe
Prev
MDP and reinforcement learning