The NoRDF Project
Fabian Suchanek
Amazing! This talk is free
of the Corona virus!
(about the speaker, we don’t know...)
For us, a knowledge base (KB) is a graph, where
the nodes are entities and the edges are relations.
2
Knowledge Bases
type
born
1935
Singer
Person
subclassOf
(We do not distinguish T-Box and A-Box.)
Cool knowledge‐based applications
Apple Siri
3
When was
Elvis born?
“1935”
IBM Watson
Discovered 6 kineasis
proteins that relate
to cancer
How long was the
Thirty Years’ War?
Amazon Echo
These applications feed from
knowledge bases
.
There are plenty of knowledge bases
NELL
TextRunner
Plus industrial projects at
4
Sponsored Message: YAGO
NELL
TextRunner
We develop YAGO, one of the largest open general purpose KBs.
The newest version, YAGO4,
• combines Wikidata and schema.org
• contains 50 million entities and 2 billion facts
• is so clean that it allows for automated reasoning
https://yago-knowledge.org
5
What’s in a knowledge base?
From YAGO
Essentially binary facts (“triples”) in the knowledge format “RDF”:
6
What’s in the real world?
In February 1998, Andrew Wakefield published a paper in the medical
journal The Lancet, which reported on twelve children with
developmental disorders. The parents were said to have linked the start
of behavioral symptoms to vaccination. The resulting controversy
became the biggest science story of 2002. As a result, vaccination rates
dropped sharply. In 2011, the BMJ detailed how Wakefield had faked
some of the data behind the 1998 Lancet article.
Beliefs
Claims
Events
Reasons
Stories
Falsifications
...none of which is in a knowledge base!
7
The NoRDF Project: Go Beyond Triples
If we want tomorrow’s intelligent applications to be really intelligent,
we have to extend their knowledge bases by
8
1) We have to be able to extract complex knowledge from text
(a process called “Information Extraction”, “IE”)
2) We have to be able to represent such knowledge and to reason on it
Beliefs
Claims
Events
Reasons
Stories
Falsifications
9
Several cool approaches can extract non‐binary information:
- FRED
- K-Parser
- Document spanners
- ClausIE
Andrew Wakefield
published in
The Lancet
in 1998.
Publication_event
author
venue
time
IE: What is possible already
- StuffIE
- OpenIE
- HighLife
- Advanced Meaning Representation (AMR)
IE: What we need
10
“Wakefield published a paper that reported on children. Their parents
were said to have linked the start of behavioral symptoms to vaccination.
The resulting controversy caused vaccination rates to fall. ...”
Publication
RateChange
Wakefield
paper
Claim
symptoms
children
vaccination
Link
parents
vaccinationRate
-
caused
of
direction
author
pub.
content
about
of
by
of
of
of
Cross‐sentence analysis, advanced co‐reference resolution, standardized
types of frames, relationships between events, negation, hypothetical
stances, storylines, ...
IE: Why Deep Learning is not enough
11
“Wakefield published a paper that reported on children. Their parents
were said to have linked the start of behavioral symptoms to vaccination.
The resulting controversy caused vaccination rates to fall. ...”
Did Wakefile publish a paper?
Who published a paper?
Were vaccination rates higher before the publication?
What caused the controversy?
Does vaccination cause autism?
What nationality is the person who caused the vaccine controversy?
?
Reasoning: What we have
12
RateChange
vaccinationRate
-
of
direction
As knowledge representation:
- Frames, JSON
- complex objects
- object-relational databases
Publication
Wakefield
paper
caused
author
pub.
Reasoning: What we have
13
As knowledge representation:
- Frames, JSON
- complex objects
- object-relational databases
- Fact identifiers
- RDF*
- Reification
For reasoning:
- RDFS, OWL DL, SHACL
- Description Logic
- Context logics
- Modal logics
- Epistemic logics
- Formal argumentation
- Belief revision
- Provenance and annotated logics
Cannot represent
- “All clients believe that the company delivers a good service”
- “the loss of value on the stock market happened because the
public learned of a fraudulent activity by the company”
- “Mary believes everything Paul says, Paul says
Mary believes
”
... or if they can, they are undecidable
Reasoning: What we need
14
1) a very simple logic
inside
a context
2) a very simple logic
about
contexts
=> a moderately simple logic
in combination
First‐order logic without
?
OWL EL?
Datalog?
Horn Rules?
Datalog?
You have a great idea? Let me know!
(?)
(?)
Vagueness, fuzziness, and probability: orthogonal topics
Applications
15
• Analysis of fake news / fact checking:
understand an article about a controversial topic, allow reasoning
(who said what when and why, what is the evidence, ...)
• Analysis of the e-reputation of a company:
extract controversy or beliefs with reasons and supporters,
for companies or their products
• Modeling of controversies:
detect a controversial topic on the Web (in blogs, forums, Twitter),
extract opinions, and model different views
>more
Understanding the arguments of the other side
is a prerequisite for refuting them.
Applications
16
• Flagging of potentially fraudulent activity:
Detect claims that contradict knowledge, or violate rules.
• Modeling of processes:
Model sequences of actions, causal relationships, and suggestions.
• Smarter chatbots:
Allow dialogues that go beyond single-shot questions.
• Legal text understanding:
Analyze a law, a regulation, or a contract, and derive
what is permitted and what is obligatory for which party.
Our project “NoRDF”
17