Formal Grammars
CC-BY
Fabian M. Suchanek
55
Overview
2
Formal Languages
Regular languages
Context‐free languages
Context‐sensitive languages
Recursively enumerable languages
>alphabet
Alphabet, Word
An
alphabet
is a set of symbols.
We will use always implicitly use the set of all (simple) unicode characters as alphabet.
3
A={a,b,c,d,e,f,0,1,2,3,4,5,6,7,8,9}
A={0,...9,a...,z,A...Z,!,?,...}
A
word
over an alphabet A is a sequence of symbols from A.
Since we use the alphabet of unicode characters, words are just strings.
hello!, 42, 3.141592, Douglas and Sally
also with spaces!
Language
A
language
over an alphabet S is a set of words over S.
4
The language of French words: {aimer, chat, sur, ...}
The language of French sentences: {J’aime les chats, Le chat chante,...}
The language of valid Java programs: {public static void main(...)..., ...}
The language of terminating Java programs: {public...,...}
The language of algebraic expressions: {(1+2)*3, (7-2)+(7+2), ...}
The language of prime numbers: {2, 3, 5, 7, 11, ...}
The language of person names: {Thomas Paine, Bertrand Russel,...}
We want to do two things with languages:
1) decide whether a given word belongs to the language
2) generate the words of the language
It turns out that the decision problem is more difficult for certain types of languages than for others.
Languages are grouped as follows:
5
Def: The Chomsky Hierarchy
Regular languages
Context‐free languages
Context‐sensitive languages
Decidable languages
All languages
any language that can be recognized by a terminating algorithm
e.g., most natural languages
e.g., algebraic expressions
recognizeable by regular expressions
Semi‐Decidable languages
e.g., the language of programs that terminate
>details
There are several formalisms that to recognize/generate the words of a language:
• grammars (e.g., context-free grammars)
• simple automata (e.g., finite state automata)
• other formalisms (Turing machines, regular expressions)
The main properties are:
•
each level in the Chomsky hierarchy has its own type of grammar
(the decidable and semi-decidable ones have the same type)
•
for each type of grammar, there is usually an equivalent
automaton and equivalent other formalisms
•
most formalisms can not just recognize the languages, but also generate them
6
Recognizing Languages
A
(formal) grammar
is a tuple of
• A finite set N of nonterminal symbols (we write them in upper case)
• A start symbol s ∈ N
• A finite set ∑ of terminal symbols that is disjoint from N (lower case)
• A finite set P of production rules, each rule of the form
7
Def: Formal Grammars
where * is the Kleene star, which allows repetition of the elements.
8
Def: Formal Grammars
Example grammar:
N={S,NP,VP}
s=S
∑ ={they, run, love, you}
P ={ S→ NP VP, NP→ they, NP→ you, VP→ love, VP→ run }
A
(formal) grammar
is a tuple of
• A finite set N of nonterminal symbols (we write them in upper case)
• A start symbol s ∈ N
• A finite set ∑ of terminal symbols that is disjoint from N (lower case)
• A finite set P of production rules, each rule of the form
where * is the Kleene star, which allows repetition of the elements.
A
string of a grammar
(N, s, ∑ , P) is an element of
(we use Greek letters for strings).
The grammar
derives
a string αβγ from a string αδγ in one step,
if there is a production rule δ → β .
It
derives a word
(of terminal symbols) if there is a sequence of one step derivations
from the start symbol to the word.
The
language of a grammar
is the set of all derivable words.
9
Def: Language of a Grammar
The grammar can both generate
the words of the language
(by derivation) and recognize
them (by parsing, i.e., by finding
the derivations for the word).
Example grammar:
N={S,NP,VP}
s=S
∑ ={they, run, love, you}
P ={ S→ NP VP, NP→ they, NP→ you, VP→ love, VP→ run }
Example derivation: S → NP VP → you VP → you run
Overview
10
Formal Languages
Regular languages
Context‐free languages
Context‐sensitive languages
Recursively enumerable languages
A
(right) regular grammar
has only productions of the following form:
• B→ a , where B is a non-terminal and a is a terminal
• B→ aC , where B and C are non-terminals and a is a terminal
• B→ ε , where B is a non-terminal and ε is the empty string
11
Def: Regular Grammar
12
Def: Regular Grammar
Example grammar:
N={S,A,B}
s=S
∑ ={a,b}
P ={ S→ aA, S→ bB, A→ aA, B→ bB, A→ε, B→ε }
Example derivation:
S → A → aA → aaA → aa
A
(right) regular grammar
has only productions of the following form:
• B→ a , where B is a non-terminal and a is a terminal
• B→ aC , where B and C are non-terminals and a is a terminal
• B→ ε , where B is a non-terminal and ε is the empty string
-> A Gentle Introduction to Regular Expressions
Def: Regular expression
There is an equivalent way to express regular languages:
A
regular expression
(regex) over an alphabet S is one of the following:
• the empty string
• an element of S
• a string of the form XY
• a string of the form (X|Y)
• a string of the form (X)
where X and Y are regexes
13
its language
the regular
expression
(0 is a special regex that cannot appear as part of other regexes. It represents the empty language and does not match any string.)
L(x) = {x} for symbols x
L(E | F) = L(E) ∪ L(F)
A regex is associated to a language:
Regular expression cheat sheet
L(a) = {a}
L(ab) = {ab}
L(a | b) = {a, b}
L(a*) = {ε , a, aa, aaa, ...}
L(a+) := L(aa*)
L([a-z]) := L(a|b|...|z)
L(a{2,4}) := L(aa|aaa|aaaa)
L(a{3}) := L(aaa)
L(a?) := L(ε | a)
L(.) := {a|b|..|A|...|0|...|!|$|...}
L(\.) := {.}
14
Simple symbol
Concatenation
Disjunction
Kleene star
shorthand for “one or more”
shorthand for a range
shorthand for a given number
shorthand for a given number
shorthand for “optional”
shorthand for “any symbol”
escape sequence for special symbols
->Regexes in practice
How can we match regexes?
Douglas Adams fictional character names:
[A-Z][a-z]*ch[a-z]*
The Hitchhiker meets the Golgafrinchan civilisation,
but falls in love with a girl named Fenchurch.
15
?
>FSMs
=> We can use finite state machines.
Def: Finite State Machine
A
finite state machine
(aka FSM, Finite state automaton, FSA) is a directed multi-graph,
where each edge is labeled with a symbol or the empty symbol ε .
One node is labeled “start” (by an incoming arrow).
Zero or more nodes are labeled “final” (by a double circle).
16
a
b
Start
Final
d
c
See a more formal definition in
Wikipedia
The empty transition can be walked
without accepting a symbol:
Multiple edges are written as:
ε
a, b
a
b
:=
FSM Recognizing Words
An FSM
recognizes
(also: generates, accepts) a string, if there is a path
from the start node to a final node whose edge labels are the string.
17
ad
bccd
bb
a
b
Start
Final
d
c
>tasks
Task: FSM
Find strings generated by the following FSM (Betelgeuse 7 language):
18
Final
l
Start
c
x
Final
j
>tasks
Task: FSM
Draw an FSM that accepts the following strings
Draw an FSM that accepts the following strings
19
br
kbr
brbr
kbrbr
brbrbr
kbrbrbr ...
ling
lingping
long
pong
lingpong
ping
pingpong
Regex ≡ FSM
20
2. Start with this configuration:
REGEX
3. Handle alternation:
X|Y
X
Y
Every regex can be transformed into an equivalent FSM (and vice versa):
1.
Simplify the regex to contain only
concatenation, alternation, kleene star
ε
21
4. Handle concatenation:
XY
X
Y
5. Handle Kleene star:
(X)*
X
6. Proceed recursively, until all edges are single symbols
Regex ≡ FSM
>details
FSM ≡ Right Regular Grammar
Every Right Regular Grammar can be transformed into an equivalent FSM as follows:
• introduce a state for every non‐terminal
• let the grammar start symbol be the initial state of the automaton
• introduce a final state F
• for every rule A→a , add a transition from A to F with symbol a
• for every rule A→aC , add a transition from A to C with symbol c
22
>details
Every Right Regular Grammar can be transformed into an equivalent
FSM as follows:
• introduce a state for every non‐terminal
• let the grammar start symbol be the initial state of the automaton
• introduce a final state F
• for every rule A→a , add a transition from A to F with symbol a
• for every rule A→aC , add a transition from A to C with symbol a
23
b
Vice versa, every FSM can be transformed
to a grammar. [e.g.,
Zhang&Qian, 2013
]
N={S,A,B}, s=S, ∑ ={a,b}
P ={ S→ aA, S→ bB, A→ aA, B→ bB, A→ε, B→ε }
S
A
B
a
b
F
ε
ε
a
>details
FSM ≡ Right Regular Grammar
Runtime of FSMs
Given a word of length l and given an FSM with n states,
determining whether the FSM accepts the word
• takes O(l) time if no state has several outgoing edges with the same label
•
else (by determinizing on the fly)
However, certain regexes can run very slowly on certain inputs, which poses a
security risk
.
24
There is a looooot more to say about FSMs, e.g.:
• making them deterministic
• compressing them
• making them more powerful
• learning FSMs from examples
The languages that can be generated/recognized by a regular
grammar are the
regular languages
of the Chomsky Hierarchy.
25
Summary: Regular Languages
Examples for regular languages:
• search/replace patterns in editors
• simple named entities
• numbers, dates, quantities
Regular languages are closed under
• union
• intersection
• complement
Regular languages can be regognized/generated by regular grammars,
regular expressions, or finite state machines (FSMs).
• concatenation
• iteration (Kleene star)
>pumping (need FSMs)
Not all languages are regular
Take the language of algebraic expressions:
L={ (1+2)*3, 4*(5+(6/7)), ...}
Assume there was an automaton A with p states that recognizes this language.
Consider the following word w ∈ L :
When A recognizes w , it has to pass through at least one state q twice during the opening
parantheses. Be n the number of parantheses recognized between two subsequent passes
through q . Then A recognizes the following word w' ∕∈ L as well:
(((...
(...(
42 )...)))
26
p+1 opening parantheses and p+1 closing ones
p+1 opening parantheses and p+1 closing ones
n opening parantheses
This proof is known as the
pumping lemma
.
(((...( 42 )...)))
Overview
27
Formal Languages
Regular languages
Context‐free languages
Context‐sensitive languages
Recursively enumerable languages
Def: Context‐free grammar
28
A
context‐free grammar
has only productions where the left‐hand‐side is a single non‐terminal.
Example:
N={S,NP,VP}
s=S
∑ ={they, run, love, you}
P ={ S→ NP VP, NP→ they, NP→ you, VP→ love, VP→ run }
Phrase Structure Grammars
Alizée is the mother of the little Annily.
29
PN
PN
V
DET
N
PREP
DET
ADJ
NP
NP
NP
NP
PP
VP
S
S -> NP VP
NP -> PN
VP -> V
NP -> DET ADJ N
PN -> Alizée
DET -> the
...
try it out
>phrase structure
(Some approaches build only binary trees)
Alizée, who has a beautiful voice, is the mother of Annily.
30
PN
PN
V
DET
N
PREP
NP
NP
NP
NP
PP
VP
S
PN
DET
ADJ
NP
SUB
VP
V
COMP
The grammar identifies
a sub‐ordinate phrase.
Phrase Structure Grammars
Context‐free grammars are a standard way to model natural language superficially.
To model all details, we need context‐sensitive grammars (see next section in this lecture).
There are some languages that are neither context‐sensitive nor context‐free,
most notably Swiss German.
This is due to
•
cross‐serial dependencies
•
Scrambling
(also in high German)
31
Natural language
>Dep&PDAs
Def: Dependency grammar
Context‐free grammars are equivalent to dependency grammars.
A
dependency grammar
is a set of rules of the form
where X and Y are POS tags, L is a label.
32
X
Y
L
Noun
Verb
Verb
PN
Det
Noun
subj
obj
det
or
X
Y
L
dependent
head
PN
Verb
Det
Noun
det
obj
subj
Elvis sings a song.
[
Joakim Nivre
] There are variants, where X and Y are words, or rules have more edges or no label.
>PDAs
->Dependency-parsing
33
Def: Pushdown Automaton
Given an input alphabet ∑ , a stack alphabet Γ , and a start symbol Z ∈ Γ ,
a
pushdown automaton
(PDA) is a directed multi‐graph, were each edge is labeled with
• a symbol ∈ ∑ to read from the input (or ε )
• a symbol ∈ Γ to pop from the stack (or ε )
• a sequence of symbols
to push onto the stack
One node is labeled “start”, and zero or more nodes are labeled “final”.
read: ε
pop: Z
push: ε
read: (
pop: ε
push: 1
read: )
pop: 1
push: ε
34
Def: Running a PDA
A
configuration
of a PDA is a triple of a node q , an input string
, and a stack
.
The
step relation
is defined for all
,
as the following relation between configurations:
〈q, cw, tβ〉 ⊦ 〈q', w, γβ〉
if there is an edge from q to q' that is labeled with “read c , pop t , push γ ”.
A word w is recognized by the PDA, if
where
is the start node and q is a final node.
The
language
of a PDA is the set of recognized words.
read: ε
pop: Z
push: ε
read: (
pop: ε
push: 1
read: )
pop: 1
push: ε
35
Def: Context‐free grammars & PDAs
For every context‐free grammar there is an equivalent pushdown
automaton and vice versa. To construct the PDA for a context‐free
grammar, normalize the rules so that they are all of the following forms:
A→ε , A→ a , A→ BC
Be S the start symbol. Then construct the (non‐deterministic) PDA as:
read: a
pop: A
push: ε
... for every rule
A→ a
read: ε
pop: A
push: BC
... for every rule
A→ BC
read: ε
pop: Z
push: ε
read: ε
pop: ε
push: S
36
Limitations of context‐free grammars
Context‐free grammars need extensive duplication of non-terminal symbols to model
dependencies between words:
Various remedies have been devised, e.g., using arguments that can be compiled away:
...or devising “weakly context-sensitive grammars”. However, no context‐free grammar
can recognize
Sentence -> FirstPersonSingularSentence | SecondPerson...
FirstPersonSingularSentence-> “I” FirstPersonSingularVerbPhrase
FirstPersonSingularVerbPhrase -> “go” | “love” | ...
Sentence(p,n ) -> NounPhrase(p,n ) VerbPhrase(p,n )
NounPhrase(1,s) -> “I”
L={a
b
c
: n >0 }
Summary: Context‐free languages
37
The languages that can be generated/recognized by a context‐free
grammar are the
context‐free languages
of the Chomsky Hierarchy.
Examples for context‐free languages:
• algebraic expressions
• simple programming languages
• simplistic models of natural language sentences
...or anything with nested structures
Context‐free languages are closed under
• union
• concatenation
Context‐free languages can be recognized/generated by a context‐free grammar or a pushdown
automaton in polynomial time. Every regular language is a context‐free language.
Context‐free grammars are equivalent to
Dependency-grammars
.
• Kleene star
...but not under intersection or complement.
Overview
38
Formal Languages
Regular languages
Context‐free languages
Context‐sensitive languages
Recursively enumerable languages
Def: Context‐sensitive grammar
39
A
context‐sensitive grammar
has only productions of the form
αAβ→ αγβ
...where A is a non‐terminal, γ is a non‐empty string, and α and β are strings
(consisting of terminal and/or non-terminals). The languages that can be generated/recognized
by a context‐sensitive grammar are the
context‐sensitive languages
of the Chomsky Hierarchy.
Example:
N={S,NP,VP, 1s, 2s, 3s, 1p, 2p, 3p}
s=S
∑ ={they, run, love, you}
P ={ S→ NP 3p VP 3p, NP 3p→ they 3p, VP 3p→ run 3p, 3p→ε ,...}
>non‐contracting
To permit the empty word, one allows also S → ε , if S does not appear on the right side of a rule.
Non‐Contracting grammar
40
An
essentially non‐contracting grammar
is a grammar where the
left‐hand‐side of a production is never longer than its right‐hand‐side,
except if the right‐hand‐side is the empty word.
Every essentially non‐contracting grammar can be transformed into a
context‐sensitive grammar and vice versa.
Def: Turing Machine
41
Given a tape alphabet Γ , a blank symbol b ∈ Γ , and an input alphabet ∑ ⊆ Γ \ {b} ,
a
turing machine
is a labeled directed multi‐graph, where each edge is labeled with
• a symbol ∈ Γ to read
• a symbol ∈ Γ to write
• a direction (left or right) to move the read‐write head
One node is labeled “start”, and zero or more nodes are labeled “final”.
read: e
write: e
move: R
read: h
write: f
move: R
read: m
write: m
move: R
read: o
write: e
move: R
An FSM is a Turing machine that can just read and move right.
A Pushdown Automaton is a Turing machine that can just read and move right, and that stores a stack on the tape.
Def: Running a Turing Machine
42
An
configuration
of a Turing machine is a tuple of
• a current node q
• a left string
• a right string
The
step relation
is defined between two configurations for all
as
〈q,α,cβ〉 ⊦ 〈q',αc',β〉
if there is an edge from q to q' labeled with “read c , write c' , move R”, and as
〈q,αa,cβ〉 ⊦ 〈q',α,ac'β〉 if the move is L. A word w is recognized if
such that q is a final state, where
is an infinite sequence of blanks.
read: e
write: e
move: R
read: h
write: f
move: R
read: m
write: m
move: R
read: o
write: e
move: R
Turing Machines and Computers
43
A Turing machine can simulate multiple tapes, registers, and a Random Access Memory.
=>
A Turing machine has the same computational capability as a computer
(if the finite memory is ignored)
=>
A programming language is called Turing complete, if it can simulate a Turing machine
(which most languages can)
read: e
write: e
move: R
read: h
write: f
move: R
read: m
write: m
move: R
read: o
write: e
move: R
Summary: Context‐sensitive grammars
44
The following are equivalent
•
a language L is context‐sensitive
•
L can be recognized by a Turing machine with a bounded band
(whose length is a constant factor of the length of the input)
•
L can be recognized by a context‐sensitive grammar
All regular and context‐free languages are context‐sensitive, but not all languages are
context‐sensitive. Consider, e.g., the language of pairs of equivalent regular expressions
with exponentiation.
read: e
write: e
move: R
read: h
write: f
move: R
read: m
write: m
move: R
read: o
write: e
move: R
Overview
45
Formal Languages
Regular languages
Context‐free languages
Context‐sensitive languages
Recursively enumerable languages
Unrestricted Grammars
46
An
unrestricted grammar
is a grammar with arbitrary productions. For every unrestricted grammar,
there exists a Turing Machine that recognizes the words of the language, and vice versa.
read: e
write: e
move: R
read: h
write: f
move: R
read: m
write: m
move: R
read: o
write: e
move: R
As a grammar:
S -> hoMe
M -> mM
M -> ε
Def: Decidable language
47
A
decidable language
is a language for which there exists a
total Turing Machine
(= one that halts on all inputs) that recognizes it.
read: e
write: e
move: R
read: h
write: f
move: R
read: m
write: m
move: R
read: o
write: e
move: R
All regular, context‐free, and context‐sensitive languages are decidable.
Def: Semi‐decidable language
48
A
semi‐decidable
(recursively enumerable) language is a language L for which there exists
a Turing Machine M so that
• if w ∈ L , then M halts and accepts w
• if w ∕∈ L , then M either does not halt, or does not accept w
Examples of semi‐decidable, but not decidable problems:
• Halting problem: Does the program terminate?
• Language equality: Do two context-free grammars generate the same language?
• Hilbert’s 10th problem: Does the following equation have a solution?
>Halting&FOL
Def: Halting Problem
49
The
halting problem
is the decision problem of all Turing Machines (equivalently: all computer
programs) that terminate (= do not loop foerver). This problem is not decidable.
Proof: Assume that there were a Turing Machine H
so that H(p)=true iff program p terminates.
Then construct the program A with
A:
if H(A) then loop
Then ask H(A) . If H(A)=true , then A does not terminate.
The problem is not decidable, but semi‐decidable.
Proof: Construct a Turing Machine that runs the program and accepts it when it terminates.
>FOL
The
Entscheidungsproblem
is the decision problem of all first‐order‐logic (FOL) formulae that
are valid. This problem is semi‐decidable.
Proof: We can systematically enumerate all valid formulae and check if the input
is one of them:
∀ x: p(x) ⇒ p(x)
∀ x: p(x) ⇒ p(x) ∨ q(x)
∀ x: p(x) ∧ q(x) ⇒ p(x) ∨ q(x)
...
50
Def: Entscheidungsproblem
>proof
The problem is not decidable.
Proof idea: We can construct a formula that is valid iff a Turing Machine halts.
Every Turing Machine M can be transformed to a FOL formula φ(M) , such that
if read 'h', write 'f', move right
if read 'o', write 'e', move right
if read 'm' or 'e', move right
φ(M) holds ⇔ M halts
φ(M) = ∃ t, i: pos(t,i) ∧ ¬ field(t,i,'h') ∧ ...
∧ (∀ t', i': pos(t',i') ∧ field(t',i','h') ⇒ field(t'+1, i', 'f')) ...
51
Turing Machine in FOL
h
o
m
m
e
M:
.
Proof: Assume that the Entscheidungsproblem were decidable,
i.e. assume that there is an algorithm A such that
A(φ)=1 iff φ is valid
Then A can be used to decide whether φ(M) is valid for a Turing Machine M.
=> A decides the halting problem (impossible)
52
Why FOL is not decidable
The Entscheidungsproblem is decidable if we consider only certain subsets of FOL.
Examples:
• Propositional logic (i.e., no quantifiers)
• Horn rules
• Bernays‐Schönfinkel formulae, i.e., formulae of the form
without function symbols
• Description logic
53
Decidable subsets of FOL
->description-logics
Any decision problem that is effectively computable (= can be decided by a human with paper
and pencil) can be computed by
• a Turing Machine,
or, equivalently, by
• λ calculus
• general recursive functions.
54
Church‐Turing Hypothesis
->Dependency-parsing
->Knowledge-representation
-> overview