Skip to main content

Part of Speech Tagging

What is a POS tagging?

  • a label assigned to a word in a sentence indicate its grammatical role.

Assumptions that go into a Hidden Markov Model

  • Markov assumption: The current state (POS tag) depends only on the previous state.
  • Output independence:The obserced word depends only on the current state(tag), not on other words or states.
  • Stationary Transition and emission probabilities do not change over time.

Time complexity of the Viterbi algorithm

O(T×N2)O(T \times N^2) where: T = length of the sequence (number of words) N = number of possible tags

Its practical for POS tagging since the number of tags (N) is small (N <\lt 50 for most languages)

Example

Hmm example

Answer

Initial state distribution:

π(JJ)=0.3,π(NNS)=0.4,π(VBP)=0.3 \pi(\text{JJ})=0.3,\quad \pi(\text{NNS})=0.4,\quad \pi(\text{VBP})=0.3

The transition probability matrix:

A=(p(JJJJ)p(JJNNS)p(JJVBP)p(NNSJJ)p(NNSNNS)p(NNSVBP)p(VBPJJ)p(VBPNNS)p(VBPVBP))=(0.40.50.10.10.40.50.40.50.1)A=\begin{pmatrix} p(\text{JJ}\to\text{JJ}) & p(\text{JJ}\to\text{NNS}) & p(\text{JJ}\to\text{VBP})\\[5mm] p(\text{NNS}\to\text{JJ}) & p(\text{NNS}\to\text{NNS}) & p(\text{NNS}\to\text{VBP})\\[5mm] p(\text{VBP}\to\text{JJ}) & p(\text{VBP}\to\text{NNS}) & p(\text{VBP}\to\text{VBP}) \end{pmatrix} = \begin{pmatrix} 0.4 & 0.5 & 0.1\\[3mm] 0.1 & 0.4 & 0.5\\[3mm] 0.4 & 0.5 & 0.1 \end{pmatrix}

The emission probability matrix (with the observed vocabulary {silver, wheels, turn}\{ \text{silver, wheels, turn}\}) is:

p(silverJJ)=0.8,p(wheelsJJ)=0.1,p(turnJJ)=0.1,p(silverNNS)=0.3,p(wheelsNNS)=0.4,p(turnNNS)=0.3,p(silverVBP)=0.1,p(wheelsVBP)=0.3,p(turnVBP)=0.6.\begin{aligned} &p(\text{silver}|\text{JJ})=0.8,\quad p(\text{wheels}|\text{JJ})=0.1,\quad p(\text{turn}|\text{JJ})=0.1,\\[5mm] &p(\text{silver}|\text{NNS})=0.3,\quad p(\text{wheels}|\text{NNS})=0.4,\quad p(\text{turn}|\text{NNS})=0.3,\\[5mm] &p(\text{silver}|\text{VBP})=0.1,\quad p(\text{wheels}|\text{VBP})=0.3,\quad p(\text{turn}|\text{VBP})=0.6. \end{aligned}

Viterbi Algorithm for “silver wheels turn”

Let the observation sequence be

O=(silver,wheels,turn),O = (\text{silver}, \text{wheels}, \text{turn}),

and let the hidden state sequence be

(s1,s2,s3){JJ,NNS,VBP}3.(s_1, s_2, s_3) \in \{\text{JJ}, \text{NNS}, \text{VBP}\}^3.

Initialization (t=1t=1, word “silver”)

δ1(JJ)=π(JJ)×p(silverJJ)=0.3×0.8=0.24,δ1(NNS)=π(NNS)×p(silverNNS)=0.4×0.3=0.12,δ1(VBP)=π(VBP)×p(silverVBP)=0.3×0.1=0.03.\begin{align} \delta_1(\text{JJ}) &= \pi(\text{JJ}) \times p(\text{silver}|\text{JJ}) = 0.3 \times 0.8 = 0.24,\\[1ex] \delta_1(\text{NNS}) &= \pi(\text{NNS}) \times p(\text{silver}|\text{NNS}) = 0.4 \times 0.3 = 0.12,\\[1ex] \delta_1(\text{VBP}) &= \pi(\text{VBP}) \times p(\text{silver}|\text{VBP}) = 0.3 \times 0.1 = 0.03. \end{align}

Step 2: Recursion (t=2t=2, word “wheels”)

δ2(s)=maxs{JJ, NNS, VBP}[δ1(s)×p(ss)]×p(wheelss).\delta_2(s) = \max_{s'\in\{\text{JJ, NNS, VBP}\}} \left[\delta_1(s') \times p(s' \to s)\right] \times p(\text{wheels}|s).

1. δ2(JJ)\delta_2(JJ)

δ1(JJ)×p(JJJJ)×p(wheelsJJ)=0.24×0.4×0.1=0.0096δ1(NNS)×p(NNSJJ)×p(wheelsJJ)=0.12×0.1×0.1=0.0012δ1(VBP)×p(VBPJJ)×p(wheelsJJ)=0.03×0.4×0.1=0.0012\begin{align} &\delta_1(JJ) \times p(JJ \rightarrow JJ) \times p(\text{wheels}|\text{JJ}) = 0.24 \times 0.4 \times 0.1 = 0.0096 \\[1ex] &\delta_1(NNS) \times p(NNS \rightarrow JJ) \times p(\text{wheels}| JJ) = 0.12 \times 0.1 \times 0.1 = 0.0012 \\[1ex] &\delta_1(VBP) \times p(VBP \rightarrow JJ) \times p(\text{wheels} | JJ) = 0.03 \times 0.4 \times 0.1 = 0.0012 \\[1ex] \end{align}

Choose the max value 0.0096, from JJJJJJ \rightarrow JJ

δ2(JJ)=0.0096\delta_2(JJ) = 0.0096

2. δ2(NNS)\delta_2(NNS)

0.24×0.5×0.4=0.0480.12×0.4×0.4=0.01920.03×0.1×0.4=0.006\begin{align} &0.24 \times 0.5 \times 0.4 = 0.048 \\ &0.12 \times 0.4 \times 0.4 = 0.0192 \\ &0.03 \times 0.1 \times0.4 = 0.006 \\ \end{align}

Choose the max value 0.048 , from JJNNSJJ \rightarrow NNS

δ2(NNS)=0.048\delta_2(NNS) = 0.048

3. δ2(VBP)\delta_2(VBP)

0.024×0.1×0.3=0.00720.12×0.5×0.3=0.0180.03×0.1×0.3=0.0009\begin{align} &0.024 \times 0.1 \times 0.3 = 0.0072 \\ &0.12 \times 0.5 \times 0.3 = 0.018 \\ &0.03 \times 0.1 \times 0.3 = 0.0009 \\ \end{align}

Choose max Value 0.072, from NNSVBPNNS \rightarrow VBP

Step 3: Recursion (t=3t=3, word “sliver”)

Same as previous one

δ3(s)=maxs{JJ, NNS, VBP}[δ2(s)×p(ss)]×p(turns).\delta_3(s) = \max_{s'\in\{\text{JJ, NNS, VBP}\}} \left[\delta_2(s') \times p(s' \to s)\right] \times p(\text{turn}|s). δ3(JJ)=0.00072δ3(NNS)=0.00576δ3(VBP)=0.0144\begin{align} &\delta_3(JJ) = 0.00072 \\ &\delta_3(NNS) = 0.00576 \\ &\delta_3(VBP) = 0.0144 \\ \end{align}

Step 4 Termination and Backtracking

At t=3t=3, we compare:

δ3(JJ)=0.00072,δ3(NNS)=0.00576,δ3(VBP)=0.0144\delta_3​(JJ)=0.00072,\quad \delta_3​(NNS)=0.00576,\quad \delta_3​(VBP)=0.0144

The highest probability is 0.0144 for state VBP. Backtracking the recorded predecessor states:

  • At t=3t=3, the best state is VBP.
  • At t=2t=2, the best transition leading to VBP came from NNS.
  • At t=1t=1, the best transition leading to NNS came from JJ.

which corresponds to the following tagging:

  • silver → JJ
  • wheels → NNS
  • turn → VBP