CC-BY
Fabian M. Suchanek
and
Nitisha Jain
Neuro-Symbolic Methods
for Fact Prediction
19
Professor at Télécom Paris/France
https://suchanek.name
Lecturer 1: Fabian Suchanek
2
A large knowledge base
https://yago-knowledge.org
NoRDF
Extracting information
from natural language
https://nordf.telecom-paris.fr
AMIE
Mining rules in knowledge bases
https://github.com/lajus/amie
I work on several topics broadly related to AI:
• Natural Language Processing
• Data Integration
• Knowledge Bases
• Automated Reasoning
Lecturer 2: Nitisha Jain
3
PhD Student at Hasso Plattner Institute
https://nitishajain.github.io/
Working on knowledge bases and natural language processing,
in particular embeddings and domain‐specific knowledge bases.
Joint work with
Armand Boschin
PhD student at Télécom Paris
Gurami Kerechashvili
Master’s student at Télécom Paris
Armand Boschin, Nitisha Jain, G. Keretchashvili, Fabian Suchanek:
“
Combining Embeddings and Rules for Fact Prediction
”
Summer School on Artificial Intelligence in Bergen (AIB) 2022
A
knowledge base
(KB) is a machine‐readable collection of knowledge
about the real world, which typically has to fulfill semantic constraints.
A KB often takes the form of a graph.
Knowledge Base
5
nationality
Person
Singer
Country
Constraints:
• countries and persons disjoint
• only people have nationalities
• every singer is also a person
• ...
type
type
type
Knowledge Base:
6
Example: YAGO KB about Elvis
Try it out!
7
Fact Prediction
schema:homeLocation
8
Neuro‐Symbolic Fact Prediction
1. Fact prediction by symbolic means (
Rule Mining
)
2. Fact prediction by neural means (
Link Prediction
)
3. Fact prediction by combining both
9
Neuro‐Symbolic Fact Prediction
Rule Mining
+ produces patterns that can be understood by humans
+ can work together with the schema of the KB, axioms
+ can deal with literals and numerical values
+ is typically evaluated under the open world assumption
Link Prediction
+ predicts facts jointly from all available evidence
+ is not limited to the properties that appear in a rule
+ predicts facts with high confidence
=> Can we combine the two?
10
1st Group: Impose constraints on NN
The first group of neuro‐symbolic approaches uses a neural network for
link prediction, but constrains it by the semantic constraints from the KB.
KB triples
Elvis, married, Priscilla
Elvis, knows, Priscilla
Elvis, knows, Elizabeth
Clara, married, Sara
Clara, knows, Sara
Elizabeth, married, Philip
Elizabeth, knows, Philip
?
f
0.8
0.9
0.7
0.6
0.7
0.8
?
scoring function of the network
that scores a triple
Assume there is a constraint
married(x , y ) ⇒ knows(x , y )
.
11
1st Group: Impose constraints on NN
The first group of neuro‐symbolic approaches uses a neural network for
link prediction, but constrains it by the semantic constraints from the KB.
Assume there is a constraint
married(x , y ) ⇒ knows(x , y )
.
yes, implied by constraint
f
0.8
0.9
0.7
0.6
0.7
0.8
?
KB triples
Elvis, married, Priscilla
Elvis, knows, Priscilla
Elvis, knows, Elizabeth
Clara, married, Sara
Clara, knows, Sara
Elizabeth, married, Philip
Elizabeth, knows, Philip
12
1st Group: Impose constraints on NN
The first group of neuro‐symbolic approaches uses a neural network for
link prediction, but constrains it by the semantic constraints from the KB.
Assume there is a constraint
married(x , y ) ⇒ knows(x , y )
.
KB triples
Elvis, married, Priscilla
Elvis, knows, Priscilla
Elvis, knows, Elizabeth
Clara, married, Sara
Clara, knows, Sara
Elizabeth, married, Philip
Elizabeth, knows, Philip
f
0.8
0.9
0.7
0.6
0.7
0.8
0.9
Can be enforced by
f(〈x,r,y〉)≥f(〈x,s,y〉)
or alternatively by
where
is a positive
learned vector.
13
1st Group: Impose constraints on NN
The first group of neuro‐symbolic approaches uses a neural network for
link prediction, but constrains it by the semantic constraints from the KB.
We have seen this for constraints of the form
married(x , y ) ⇒ knows(x , y )
.
Assume there is a constraint
human ⊆ animal
.
14
1st Group: Impose constraints on NN
The first group of neuro‐symbolic approaches uses a neural network for
link prediction, but constrains it by the semantic constraints from the KB.
We have seen this for constraints of the form
married(x , y ) ⇒ knows(x , y )
.
Assume there is a constraint
human ⊆ animal
.
... then we can model this as
human(x , true) ⇒ animal(x , true)
.
15
1st Group: Impose constraints on NN
The first group of neuro‐symbolic approaches uses a neural network for
link prediction, but constrains it by the semantic constraints from the KB.
We have seen this for constraints of the form
married(x , y ) ⇒ knows(x , y )
.
Assume there is a constraint
human ⊆ animal
.
... then we can model this as
human(x , true) ⇒ animal(x , true)
.
Assume there is a constraint
married(x , y ) ⇔ spouse(x , y )
.
...then we can impose
.
16
1st Group: Impose constraints on NN
The first group of neuro‐symbolic approaches uses a neural network for
link prediction, but constrains it by the semantic constraints from the KB.
We have seen this for constraints of the form
married(x , y ) ⇒ knows(x , y )
.
Assume there is a constraint
human ⊆ animal
.
... then we can model this as
human(x , true) ⇒ animal(x , true)
.
Assume there is a constraint
married(x , y ) ⇔ spouse(x , y )
.
...then we can impose
.
Assume there is a constraint
husbandOfWife(x , y ) ⇔ wifeOfHusband(y , x )
.
...then we can impose
. In TransE, this entails
17
2nd Group: Complex constraints
The second group of approaches consider more complex logical
constraints on the embeddings. It finds predictions that are logically
inconsistent and use them as negative samples in the retraining step.
Assume there is a prediction
locatedIn(Samsung, EmmanuelMacron)
18
2nd Group: Complex constraints
The second group of approaches consider more complex logical
constraints on the embeddings. It finds predictions that are logically
inconsistent and use them as negative samples in the retraining step.
Assume there is a prediction
locatedIn(Samsung, EmmanuelMacron)
1) A reasoner can be used to flag this prediction as inconsistent
(companies are located in places, not in people)
2) Next, more negative samples with the same inconsistency
pattern can be generated, e.g.,
locatedIn(Samsung, JoeBiden)
19
2nd Group: Complex constraints
The second group of approaches consider more complex logical
constraints on the embeddings. It finds predictions that are logically
inconsistent and use them as negative samples in the retraining step.
(ReasonKGE framework)
20
2nd Group: Complex constraints
The second group of approaches consider more complex logical
constraints on the embeddings. It finds predictions that are logically
inconsistent and use them as negative samples in the retraining step.
One can also use ontological properties to add negative samples
and add constraints on the scoring function.
Ontological property:
disjointWith(person, location)
For
locatedIn(Samsung, South Korea)
negative samples could be obtained by replacing
South Korea
by
Sheryl Sandberg
,
Arianna Huffington
, etc.
21
3rd Group: Learn rules & embeddings
The third group of approaches to neuro‐symbolic fact prediction takes
as input the KB and rules, and learns to score both rules and facts —
jointly
from KB:
married(Priscilla, Elvis)
from rule mining:
married(x , y ) ∧ livesIn(x , z )
⇒ livesIn(y , z )
f(〈x, r, y〉)
f(p ⇒ q)
score
score
22
3rd Group: Learn rules & embeddings
The third group of approaches to neuro‐symbolic fact prediction takes
as input the KB and rules, and learns to score both rules and facts —
jointly
from KB:
married(Priscilla, Elvis)
from rule mining:
married(x , y ) ∧ livesIn(x , z )
⇒ livesIn(y , z )
f(〈x, r, y〉)
f(p ⇒ q) = f(¬p ∨ q) = f(¬(p∧¬q))
= 1 - ( f(p) × (1-f(q))
= 1 - f(p) + f(p) × f(q)
score
score
23
3rd Group: Learn rules & embeddings
The third group of approaches to neuro‐symbolic fact prediction takes
as input the KB and rules, and learns to score both rules and facts —
jointly
from KB:
married(Priscilla, Elvis)
from rule mining:
married(x , y ) ∧ livesIn(x , z )
⇒ livesIn(y , z )
f(〈x, r, y〉)
f(p ⇒ q) = 1 - f(p) + f(p) × f(q)
score
score
learn jointly
24
3rd Group: Learn rules & embeddings
The third group of approaches to neuro‐symbolic fact prediction takes
as input the KB and rules, and learns to score both rules and facts —
jointly or iteratively.
from rule mining:
married(x , y ) ∧ livesIn(x , z ) ⇒ livesIn(y , z )
from KB:
married(Priscilla, Elvis)
Embeddings
Predictions
25
3rd Group: Learn rules & embeddings
The third group of approaches to neuro‐symbolic fact prediction takes
as input the KB and rules, and learns to score both rules and facts —
jointly or iteratively.
from rule mining:
married(x , y ) ∧ livesIn(x , z ) ⇒ livesIn(y , z )
from KB:
married(Priscilla, Elvis)
Embeddings
Predictions
26
4th Group: Speed up Rule Mining
The fourth group of approaches to neuro‐symbolic fact prediction uses
embeddings to sample the KB so as to speed up rule mining, e.g.,
• by identifying interesting classes
person
singer
rock singer
Should we learn rules about people, about singers, about rock singers,
or about all? (all is too expensive)
27
4th Group: Speed up Rule Mining
The fourth group of approaches to neuro‐symbolic fact prediction uses
embeddings to sample the KB so as to speed up rule mining, e.g.,
• by identifying interesting classes
person
singer
rock singer
Should we learn rules about people, about singers, about rock singers,
or about all? (all is too expensive)
Embeddings will cluster
instances not by their declared class,
but by the attributes they share
=> learn rules about singers, not rock singers
28
4th Group: Speed up Rule Mining
The fourth group of approaches to neuro‐symbolic fact prediction uses
embeddings to sample the KB so as to speed up rule mining, e.g.,
• by identifying interesting classes
• by sampling relation paths
In the RESCAL link prediction method, a fact is scored as
The quality of a rule
can then be estimated by checking
learned matrix that represents r
r
Neuro‐Symbolic Fact Prediction
Rule Mining
+ produces patterns that can be understood by humans
+ can work together with the schema of the KB, axioms
+ can deal with literals and numerical values
+ is typically evaluated under the open world assumption
Link Prediction
+ predicts facts jointly from all available evidence
+ is not limited to the properties that appear in a rule
+ predicts facts with high confidence
Promising approaches to marry the two
• adding logical constraints to link prediction
• learning rule weights and fact scores jointly
• using embeddings to speed up rule mining
Refe‐
rences
in our
article