CC-BY
Fabian M. Suchanek
POS-Tagging
40
“Alizée wrote a really great song by herself”
2
Proper Name
Verb
Determiner
Adverb
Adjective
Noun
Preposition
Pronoun
A
Part-of-Speech
(also: POS, POS-tag, word class, lexical class, lexical category) is a set of words
with the same grammatical role.
Def: POS
Part of Speech
• Proper nouns (PN): Alizée, Elvis, Obama...
• Nouns (N): music, sand, ...
• Adjectives (ADJ): fast, funny, ...
• Adverbs (ADV): fast, well, ...
• Verbs (V): run, jump, dance, ...
3
• Pronouns (PRON): he, she, it, this, ...
(what can replace a noun)
• Determiners (DET): the, a, these, your,...
(what goes before a noun)
• Prepositions (PREP): in, with, on, ...
(what goes before determiner + noun)
• Subordinators (SUB): who, whose, that, which, because...
(what introduces a sub-sentence)
A
Part-of-Speech
(also: POS, POS-tag, word class, lexical class, lexical category) is a set of words
with the same grammatical role.
Def: POS-Tagging
POS-Tagging
is the process of assigning to each word in a sentence its POS.
4
Alizée sings a wonderful song.
PN
V
DET
ADJ
N
Alizée, who has a wonderful voice,
sang at the concert in Moscow.
© Ronny Martin Junnilainen
Def: POS-Tagging
5
Alizée sings a wonderful song.
PN
V
DET
ADJ
N
© Ronny Martin Junnilainen
Alizée, who has a wonderful voice,
sang at the concert in Moscow.
PN
PN
V
V
ADJ
N
N
DET
DET
PREP
SUB
PREP
POS-Tagging
is the process of assigning to each word in a sentence its POS.
Probabilistic POS-Tagging
Probabilistic POS tagging is an algorithm for automatic POS tagging
that works by introducing random variables for words and for POS-tags:
6
Alizée
sings
PN
Adj
Verb
sings
Alizée
Prep
PN
runs
Alizée
W1
W2
T2
T1
...
...
Probability
(visible)
(hidden)
World
Verb
Probabilistic POS-Tagging
Given a sentence
we want to find
.
7
sings
PN
Adj
Verb
sings
Prep
PN
runs
W1
W2
T2
T1
...
...
Probability
(visible)
(hidden)
World
Verb
Alizée
Alizée
Alizée
Markov Assumption 1
8
Every word depends just on its tag:
Markov Assumption 1
P(song | Elvis, sings, a, PN, V, D, N) = P(song | N)
The probability that the 4th word is “song”
depends just on the tag of that word:
9
Every word depends just on its tag:
The probability that the 4th word is “song”
depends just on the tag of that word:
P(song | Elvis, sings, a, PN, V, D, N) = P(song | N)
Markov Assumption 1
10
Noun
Elvis
Verb
PN
?
sings
Det
a
Every word depends just on its tag:
The probability that the 4th word is “song”
depends just on the tag of that word:
P(song | Elvis, sings, a, PN, V, D, N) = P(song | N)
Markov Assumption 1
11
PN
Noun
Det
sings
Elvis
Verb
a
?
Every word depends just on its tag:
Markov Assumption 2
12
Every tag depends just on its predecessor
Markov Assumption 2
P(N | PN, V, D) = P(N | D)
The probability that PN, V, D is followed by
a noun is the same as the probability that
D is followed by a noun:
13
Every tag depends just on its predecessor
Markov Assumption 2
The probability that PN, V, D is followed by
a noun is the same as the probability that
D is followed by a noun:
P(N | PN, V, D) = P(N | D)
14
Det
PN
?
song
sings
Verb
Alizée
a
Every tag depends just on its predecessor
Markov Assumption 2
The probability that PN, V, D is followed by
a noun is the same as the probability that
D is followed by a noun:
P(N | PN, V, D) = P(N | D)
15
Alizée
PN
Verb
Det
?
sings
a
song
Every tag depends just on its predecessor
Homogeneity Assumption 1
16
The tag probabilities are the same at all positions
Homogeneity Assumption 1
The probability that a Det is followed by a Noun
is the same at position 7 and 2:
17
The tag probabilities are the same at all positions
Homogeneity Assumption 1
The probability that a Det is followed by a Noun
is the same at position 7 and 2:
Let's denote this probability by
P(Noun|Det)
18
“Transition probability”
The tag probabilities are the same at all positions
Homogeneity Assumption 2
19
The word probabilities are the same at all positions
Homogeneity Assumption 2
The probability that a PN is “Elvis”
is the same at position 7 and 2:
20
The word probabilities are the same at all positions
The probability that a PN is “Elvis”
is the same at position 7 and 2:
Homogeneity Assumption 2
P(Elvis|PN)
Let's denote this probability by
21
“Emission probability”
The word probabilities are the same at all positions
Def: HMM
A (homogeneous) Hidden Markov Model (also: HMM)
is a sequence of random variables, such that
Emission
probabilities
Transition
probabilities
Words of the
sentence
POS-tags
... with
22
HMMs as graphs
Transition probabilities
P(End|Noun)= 20%
23
80%
End
Verb
100%
100%
Adj
50%
Noun
50%
Start
HMMs as graphs
Emission probabilities
P(sounds|Noun)
24
nice
sound
50%
sound
50%
.
sounds
100%
50%
sounds
50%
50%
50%
sound
P(End|Noun)= 20%
80%
End
Verb
100%
100%
Adj
50%
Noun
50%
Start
HMMs as graphs
P(nice, sounds, ., Adj, Noun, End) =
50%
*
50%
*
100%
*
50%
*
20%
*
100%
= 2.5%
25
P(sounds|Noun)
nice
sound
50%
sound
50%
.
sounds
100%
50%
sounds
50%
50%
50%
sound
P(End|Noun)= 20%
80%
End
Verb
100%
100%
Adj
50%
Noun
50%
Start
HMM Question
What is the most likely tagging for “sound sounds .” ?
26
P(sounds|Noun)
nice
sound
50%
sound
50%
.
sounds
100%
50%
sounds
50%
50%
50%
sound
P(End|Noun)= 20%
80%
End
Verb
100%
100%
Adj
50%
Noun
50%
Start
POS tagging with HMMs
Adj + Noun: 50%*50%*100%*50%*20%*100% = 2.5%
Noun + Verb: 50%*50%*80%*50%*100% = 10%
Finding the most likely sequence of tags that generated a sentence is POS tagging (hooray!).
27
What is the most likely tagging for “sound sounds .” ?
Where do we get the HMM?
Elvis
/PN
sings
/Verb
Elvis
/PN
.
/End
Priscilla
/PN
laughs
/Verb
Estimate probabilities from manually annotated corpus:
28
...
...
>Viterbi
The HMM as a Trellis
Task: compute the probability of every possible path from left to right,
find maximum.
29
Start
Noun
Verb
End
50%
Adj
sound
sounds
.
0%
50%
0%
50%
50%
80%
100%
50%
Noun
Verb
End
Adj
50%
50%
50%
Noun
Verb
End
Adj
50%
50%
50%
The HMM as a Trellis
Task: compute the probability of every possible path from left to right,
find maximum.
30
Start
Noun
Verb
End
50%
Adj
sound
sounds
.
0%
50%
0%
50%
50%
80%
100%
50%
Noun
Verb
End
Adj
50%
50%
50%
Noun
Verb
End
Adj
50%
50%
50%
= 10%
50%
*
50%
•
80%
*
50%
•
100%
*
100%
The HMM as a Trellis
31
Start
Noun
Verb
End
50%
Adj
sound
sounds
.
0%
50%
0%
50%
50%
80%
100%
50%
Noun
Verb
End
Adj
50%
50%
50%
Noun
Verb
End
Adj
50%
50%
b
50%
*
50%
•
80%
*
50%
•
a
*
b
a
Complexity:
The HMM as a Trellis
32
Start
Noun
Verb
End
50%
Adj
sound
sounds
.
0%
50%
0%
50%
50%
80%
100%
50%
Noun
Verb
End
Adj
50%
50%
50%
Noun
Verb
End
Adj
50%
50%
b
50%
*
50%
•
80%
*
50%
•
a
*
b
a
Complexity:
same as before!
The HMM as a Trellis
33
Start
Noun
Verb
End
50%
Adj
sound
sounds
.
0%
50%
0%
50%
50%
80%
100%
50%
Noun
V 10%
End
Adj
50%
50%
50%
Noun
Verb
End
Adj
50%
50%
b
50%
*
50%
•
80%
*
50%
•
a
*
b
a
Idea: Store at each node
the probability of the maximal
path that leads there.
Def: Viterbi Algorithm
34
Start
Noun
Verb
End
50%
Adj
sound
sounds
0%
50%
0%
50%
50%
80%
50%
Noun
V 10%
End
Adj
50%
50%
50%
• For each word w
• for each tag t
• for each preceding tag t'
• compute P(t')× P(t|t')× P(w|t)
• store the maximal
probability at t, w
Viterbi Algorithm
35
Start
N 25%
V 0%
E 0%
50%
A 25%
sound
0%
50%
0%
50%
50%
50%
Start left to right,
compute and store probability
of arriving here.
Viterbi Algorithm
36
Start
N 25%
V 0%
E 0%
50%
A 25%
sound
0%
50%
0%
50%
50%
50%
Noun
50%
P(prev)
× P(t|prev)
× P(w|t)
Find
prev
that maximizes
prev=Adj
prev=Verb
prev=End
prev=Noun
25%
*
0%
*
50%
0%
*
0%
*
50%
0%
*
0%
*
50%
25%
*
100%
*
50%
sounds
0%
0%
100%
0%
Viterbi Algorithm
37
Start
N 25%
V 0%
E 0%
50%
A 25%
sound
0%
50%
0%
50%
50%
50%
N 13%
50%
P(prev)
× P(t|prev)
× P(w|t)
Find
prev
that maximizes
prev=Adj
prev=Verb
prev=End
prev=Noun
25%*0%*50%
0%*0%*50%
0%*0%*50%
25%
*
100%
*
50%
sounds
100%
38
Start
50%
sound
sounds
.
0%
50%
0%
50%
50%
100%
50%
V 10%
E 0%
A 0%
50%
50%
50%
N 0%
V 0%
E 10%
A 0%
50%
50%
50%
Viterbi Algorithm
N 25%
V 0%
E 0%
A 25%
N 13%
80%
100%
Read best path
by following
arrows backwards:
Start N V E
Def: Probabilistic POS Tagging
Given a sentence and transition and emission probabilities,
Probabilistic POS Tagging
computes
the sequence of tags that has maximal probability (in an HMM).
Elvis sings
P(Elvis,sings,PN,N)=0.01
P(Elvis,sings,V,N)=0.01
P(Elvis,sings,PN,V)=0.1
39
...
winner
>end
POS Tagging in Python
POS-Tagging can be done in Python, e.g., with the library NLTK:
40
>end
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
tokens = nltk.word_tokenize("Elvis lives")
tagged = nltk.pos_tag(tokens)
print(tagged)
-> [('Elvis', 'PN'), ... ]
Summary: Probabilistic POS Tagging
POS Tagging is the task of annotating the words in a sentence with their
Part‐of‐speech tags (lexical categories such as Noun, Verb, etc.).
• Probabilistic POS tagging uses Hidden Markov Models
• General performance very good (>95% acc.)
• Several POS taggers are available
• Stanford POS tagger
• SpaCy.io (open source)
• NLTK (Python library)
• ...
(HMMs and the Viterbi algorithm serve a wide variety of other tasks,
in particular NLP at all levels of granularity, e.g., in speech processing)
41
>end
Research Questions
How can we deal with
• evil cases?
• unknown words?
• new languages?
• the word “blue” has 4 letters.
• pre- and post-secondary
• look it up
• The Duchess was entertaining last night.
42
>end
POS Tagging helps pattern IE
We can choose to match the placeholders
with only nouns or proper nouns:
Alizée was born in the beautiful Corsica
wasBornIn(Alizée, the)
wasBornIn(Alizée, Corsica)
43
“X was born in Y”
Alizée was born in Corsica
>end
POS-tags can generalize patterns
match
44
Alizée was born in the wonderful Corsica
Alizée was born in the beautiful Corsica
match
“X was born in the ADJ X”
>end
References
Daniel Ramage: “
HMM Fundamentals
”, CS229 Section Notes
45
->dependency-parsing
->semantic-web
->ie-by-reasoning
->security