Skip to main content

1. Linear regression, Logistic regression, MLE

Linear Regression

Model: H=a+bTH = a + bT

  • Criterion: minimise the sum of squared errors

Expressed as matrix form:

y=Xβ+ϵy = X\beta + \epsilon

where ϵN(0,σ2)\epsilon \sim \mathcal{N}(0, \sigma^2)
thus yN(xw,ϵ)y \sim \mathcal{N}(x'w , \epsilon)

Maximum-likelihood estimation: Choose parameter values that maximise the probability of observed data

p(y1,...,ynx1,...,xn)=i=1np(yixi)p(y_1, ..., y_n|x_1, ..., x_n) = \prod_{i=1}^n p(y_i|x_i)

Maximising log-likelihood as a function of 𝒘 is equivalent to minimising the sum of squared errors

L=yy2yXw+wXXwL = y'y - 2y'Xw + wX'Xw

Solving for 𝒘 yields:

w^=(XTX)1XTy\hat{w} = (X^T X)^{-1} X^T y

𝑿𝑿' denotes transpose
XXX'X should be an invertible matrix, that is: a full rank matrix, column vectorsw linear independence

Basis expansion for linear regression: Turn linear regression to polynomial regression for better fit the data

Logistic regression

Logistic regression assumes a Bernoulli distribution with parameter
and is a model for solving binary classification problems

P(Y=1x)=11+exp(xw)P(Y=1|\mathbf{x}) = \frac{1}{1 + \exp(-\mathbf{x}'w)}

MLE for Logistic regression cannot get an analytical solution.

We introduce: Approximate iterative solution

  • Stochastic Gradient Descent (SGD)
  • Newton-Raphson Method

Stochastic Gradient Descent (SGD)

Newton-Raphson Method

alt text

Regularization

Normal equations solution of linear regression:

w^=(XTX)1XTy\hat{w} = (X^T X)^{-1} X^T y

With irrelevant/multicolinear features, matrix 𝐗!𝐗 has no inverse Regularisation: introduce an additional condition into the system

yXw22+λw22 for λ>0\|y - Xw\|_2^2 + \lambda\|w\|_2^2 \text{ for } \lambda > 0

Adds λ to eigenvalues of 𝐗'𝐗: makes invertible : ridge regression

alt text

Lasso (L1 regularisation) encourages solutions to sit on the axes
Some of the weights are set to zero \rightarrow Solution is sparse