CC-BY
Fabian M. Suchanek
Fact Extraction
74
Semantic IE
2
Source Selection and Preparation
Entity Recognition
Entity Disambiguation
singer
Fact Extraction
KB
construction
Entity Typing
singer Elvis
You
are
here
3
Fact Extraction
•
Definitions
•
without background knowledge
•
by extraction patterns
•
by Large Language Models
•
by Natural Language Inference
•
with background knowledge
•
by the DIPRE Algorithm
•
by classification
•
General Considerations
•
Semantic Representations
Def: Open Information Extraction
4
Open Information Extraction
(Open IE) extracts a triple of a subject, verb,
and object from a sentence without canonicalizing these components.
It has been said that man is a rational animal. All my life I
have been searching for evidence that could support this.
[Bertrand Russell]
[Anefo]
Wikipedia: Russell
Def: Open Information Extraction
5
〈 man, is, rational animal〉
try a demo
Open Information Extraction
(Open IE) extracts a triple of a subject, verb,
and object from a sentence without canonicalizing these components.
It has been said that man is a rational animal. All my life I
have been searching for evidence that could support this.
[Bertrand Russell]
[Anefo]
Wikipedia: Russell
Open IE Uses
6
Example:
“Who built the pyramids?”
〈 ?, built, pyramides 〉
try it out!
62 answers from 584 sentences
Egyptians (132)
Ancient Egypt (123)
aliens (44)
the people (38)
slaves (29)
Khufu (23)
the Pharaohs (17)
the men (16)
the kings (11)
the ones (9)
The question (8)
In many applications, Open IE
is fully sufficient (e.g., natural
language question answering).
Open IE Weaknesses
7
Example:
“Who built the pyramids?”
〈 ?, built, pyramides 〉
62 answers from 584 sentences
Egyptians (132)
Ancient Egypt (123)
aliens (44)
the people (38)
slaves (29)
Khufu (23)
the Pharaohs (17)
the men (16)
the kings (11)
the ones (9)
The question (8)
Open IE is less useful for tasks such as
• counting facts
• counting entities
• logical reasoning
• fact checking
Def: Canonical knowledge base
8
62 answers from 584 sentences
Egyptians (132)
Ancient Egypt (123)
aliens (44)
the people (38)
slaves (29)
Khufu (23)
the Pharaohs (17)
the men (16)
the kings (11)
the ones (9)
The question (8)
<Ancient_Egypt, built, Giza_pyramids>
<Great_Giza_Pyramid, partOf, Giza_pyramids>
<Pyramid of Khafre, partOf, Giza_pyramids>
<Pyramid of Menkaure, partOf, Giza_pyramids>
canoni‐
calization
In a
canonical knowledge base
, every entity and every relation exists exactly once,
and has a unique identifier.
9
Def: Fact Extraction
Fact extraction
is the extraction of canonicalized facts about entities from a corpus.
Bertrand Russell was a British philosopher, mathematician,
political activist, and Nobel Laureate. Russell co‐wrote the
“Principia Mathematica”.
[via TheFamousPeople]