Learning Notes
Home
Home
  • Guide
    • Shortkeys
    • Commands
  • Cpp
    • Basic
    • Scope, Duration, and Linkage
    • Pointers(1)
    • Pointers(2)
  • Rust Learning
    • 1.Number Types
    • 2.Ownership
    • 3.Struct
    • 4.Enum
    • 5.Package-management
    • 6.Vector
    • 7.Error handle
    • 8.Trait 泛型
    • 9.Life time
    • 10.Test
    • 11.Minigrep
    • 12.Closure 闭包
    • 13.Smart pointer 智能指针
    • 14.Fearless concurrency 无畏并发
  • Quantum Computing
    • Basic computing and quantum computing Theory
    • Teleportation
    • QKD and QFT
    • QFE and Shor's algorithm
    • QAOA
    • Quantum algorithms
    • Week 11
  • Statical Machine Learning
    • Bayesian Linear Regression
    • Linear regression and logistic regresion
    • Bais
    • SVM, Kernel Methods
    • Precepction and Artificial Neural Network
    • Cross validation, experts learning, MAB
    • Bayesian inference
    • GMM-EM
    • final
  • Software Project Management
    • Lecture
    • Revision
  • AI-planning
    • Classical planning
    • MDP and reinforcement learning
    • Game theory and final recap

GMM

Cluster assignment of point

  • P(Z=j)P(Z=j)P(Z=j) describe by P(Cj)=wj≥0P(C_j) = w_j \geq 0P(Cj​)=wj​≥0 with ∑j=1kwj=1\sum_{j=1}^k w_j = 1∑j=1k​wj​=1

Each cluster has its own Gaussian distribution

  • P(X∣Z=j)=N(μj,Σj)P(X|Z=j) = \mathcal{N}(\mu_j, \Sigma_j)P(X∣Z=j)=N(μj​,Σj​) class conditional density

Gaussian mixture distribution:

P(x)≡∑j=1kwjN(x∣μj,Σj) P(x) \equiv \sum_{j=1}^k w_j \mathcal{N}(x|\mu_j, \Sigma_j) P(x)≡j=1∑k​wj​N(x∣μj​,Σj​)

Modelling the data points as independent, aim is to find P(Cj),μj,Σj,j=1,…,kP(C_j), \mu_j, \Sigma_j, j = 1,\ldots, kP(Cj​),μj​,Σj​,j=1,…,k that maximise

P(x1,…,xn)=∏i=1n∑j=1kP(Cj)P(xi∣Cj)P(x_1, \dots, x_n) = \prod_{i=1}^n \sum_{j=1}^k P(C_j) P(x_i | C_j)P(x1​,…,xn​)=∏i=1n​∑j=1k​P(Cj​)P(xi​∣Cj​)

where P(x∣Cj)≡N(x∣μj,Σj)P(x | C_j) \equiv \mathcal{N}(x | \mu_j, \Sigma_j)P(x∣Cj​)≡N(x∣μj​,Σj​)

Try the usual log trick

log⁡P(x1,…,xn)=∑i=1nlog⁡(∑j=1kP(Cj)P(xi∣Cj)) \log P(x_1, \dots, x_n) = \sum_{i=1}^n \log \left( \sum_{j=1}^k P(C_j) P(x_i | C_j) \right) logP(x1​,…,xn​)=i=1∑n​log(j=1∑k​P(Cj​)P(xi​∣Cj​))

EM

MLE is a frequentist principle that suggests that given a daraset, the "best" parameters to use are the ones that maximise the probability of the data

  • MLE is a way to formally pose the problem

EM is an algorihm

  • EM is a way to solve the problem posed by MLE
  • Especially convenient under unvserved data MLE can be found by other methods such as gradient descent

EM is a general approach, goes beyond GMMs

  • Purpose: Implement MLE underlatent variables Z

Variableasz in GMMs:

  • Variables: Point locations X abd cluster assignments Z
  • Parameters : θ\thetaθ are cluster locations and scales

What is EM really doing?

  • Coordinate ascent on a lower bound on the log-likelihood

    • M-step: ascent in modelede parameters θ\thetaθ
    • E-step: ascent in the marginal likelihood P(z)P(z)P(z)
  • Each step moves towards a local optimum

  • Can get stuck, can need random restarts

EM For fitting the GMM

Initialisation Step:

  • Initialze K clusters C1,…,Ck(μj,Σj)C1, \ldots, C_k (\mu_j, \Sigma_j)C1,…,Ck​(μj​,Σj​) and P(Cj)P(C_j)P(Cj​) for each cluster j.

Iterationm Step:

  • Estimate the cluster of each datum p(Cj∣xj)→p(C_j | x_j) \rightarrowp(Cj​∣xj​)→ Expection
  • Re-estimate the cluster parameters (μj,Σj),p(Cj)(\mu_j, \Sigma_j), p(C_j)(μj​,Σj​),p(Cj​) for each cluster j
Last Updated:
Contributors: pingzhihe
Prev
Bayesian inference
Next
final