Fabian M. Suchanek
Fact Extraction
50
Semantic IE
You
are
here
2
Source Selection and Preparation
Entity Recognition
Entity Disambiguation
singer
Fact Extraction
KB
construction
Entity Typing
singer Elvis
3
Overview
•
Fact Extraction
•
DIPRE Algorithm
•
Classification Methods
•
General Considerations
4
Def: Fact Extraction
Fact extraction
is the extraction of facts about entities from a corpus.
[Anefo]
Bertrand Russell was a
British
philosopher, mathematician,
political activist, and Nobel Laureate. Russell co‐wrote the
“Principia Mathematica”
and was member of the
India Leage
and the
British Humanist Association
.
Wikipedia: Russell
5
Def: Fact Extraction
Wikipedia: Russell
[Anefo]
<Bertrand_Russell, type, philosopher>
<Bertrand_Russell, type, mathematician>
<Bertrand_Russell, type, political_activist>
<Bertrand_Russell, won, Nobel_Prize>
<Bertrand_Russell, wrote, Principia_Mathematica>
<Bertrand_Russell, memberOf, India_League>
<Bertrand_Russell, memberOf, British_Humanists>
Fact extraction
is the extraction of facts about entities from a corpus.
Bertrand Russell was a
British
philosopher, mathematician,
political activist, and Nobel Laureate. Russell co‐wrote the
“Principia Mathematica”
and was member of the
India Leage
and the
British Humanist Association
.
6
Assumptions
Wikipedia: Russell
[Anefo]
Bertrand Russell
was a
British
philosopher, mathematician,
political activist, and
Nobel Laureate
.
Russell
co‐wrote the
“Principia Mathematica”
and was member of the
India Leage
and the
British Humanist Association
.
1) NERC has been run
In the following, we make a number of assumptions:
7
Assumptions
Wikipedia: Russell
[Anefo]
1) NERC has been run
2) We have a KB with the entities
In the following, we make a number of assumptions:
<Bertrand_Russell>
<person>
Bertrand Russell
was a
British
philosopher, mathematician,
political activist, and
Nobel Laureate
.
Russell
co‐wrote the
“Principia Mathematica”
and was member of the
India Leage
and the
British Humanist Association
.
8
Assumptions
Wikipedia: Russell
[Anefo]
Bertrand Russell
was a
British
philosopher, mathematician,
political activist, and
Nobel Laureate
.
Russell
co‐wrote the
“Principia Mathematica”
and was member of the
India Leage
and the
British Humanist Association
.
1) NERC has been run
2) We have a KB with the entities
3) Disambiguation is already done
In the following, we make a number of assumptions:
<Bertrand_Russell>
<person>
9
Assumptions
Wikipedia: Russell
[Anefo]
Bertrand Russell
was a
British
philosopher, mathematician,
political activist, and
Nobel Laureate
.
Russell
co‐wrote the
“Principia Mathematica”
and was member of the
India Leage
and the
British Humanist Association
.
1) NERC has been run
2) We have a KB with the entities
3) Disambiguation is already done
4) We focus on a single given relation
In the following, we make a number of assumptions:
<Bertrand_Russell>
<person>
<wrote>
10
Assumptions
Wikipedia: Russell
[Anefo]
Bertrand Russell
was a
British
philosopher, mathematician,
political activist, and
Nobel Laureate
.
Russell
co‐wrote the
“Principia Mathematica”
and was member of the
India Leage
and the
British Humanist Association
.
1) NERC has been run
2) We have a KB with the entities
3) Disambiguation is already done
4) We focus on a single given relation
5) This relation has a known domain and range
In the following, we make a number of assumptions:
<Bertrand_Russell>
<person>
<wrote, domain, person>, <wrote, range, work>
<wrote>
11
Why is it difficult?
Bertrand Russell
написал
“Principia Mathematica”
.
For the machine, the text is just a sequence of unintelligible symbols.
<Bertrand_Russell>
<person>
<Principia_Mathematica>
wrote
?
12
Why is it difficult?
Bertrand Russell
написал
“Principia Mathematica”
.
For the machine, the text is just a sequence of unintelligible symbols.
<Bertrand_Russell>
<person>
<Principia_Mathematica>
wrote
?
sequence of symbols
in source language,
no predefined semantics
predefined relation
with defined domain
and range
13
Overview
•
Fact Extraction
•
DIPRE Algorithm
•
Classification Methods
•
General Considerations
Def: Extraction Pattern
14
An
extraction pattern
for a binary relation
is a phrase that contains two
place‐holders X and Y, and that indicates that X and Y stand in relation
.
ist der Autor von
.
schrieb
.
wrote
s Werk
schrieb das Buch
.
Extraction
patterns
Where do we get the patterns?
• Option 1: Manually compile patterns.
15
=> This is what you do in the lab
Public Domain
• Option 2: Pattern deduction