PARIS: Probabilistic Alignment of Relations, Instances, and Schema Fabian M. Suchanek (Max Planck Institute for Informatics) Serge Abiteboul (INRIA Saclay, Webdam team) Pierre Senellart (Télécom ParisTech) 1 CC-BY   Fabian M. Suchanek . This talk was originally given at VLDB 2012. To obtain a PDF version of these slides, open them in Chrome, and print to PDF.
RDF Ontologies Singer plays type born 2 Person An RDF ontology can be seen as a graph of entities subclassOf 1935
RDF Ontologies Singer plays type born 3 Person An RDF ontology can be seen as a graph of entities subclassOf classes relations / properties instances literals 1935
RDF Ontologies instances relations / properties classes literals Singer plays type born 4 Person An RDF ontology can be seen as a graph of entities subclassOf intelligent search, QA, machine translation... Ontologies serve all kinds of purposes: 1935
Existing Ontologies There are literally hundreds of ontologies on the Web Singer type born plays 5 Person subclassOf 1935 ]>
Overlapping Data birthDate 1935 type RockSinger marriedTo YAGO ElvisPedia born Singer type 6 Many ontologies contain similar or overlapping entities and facts. plays 1935
Problem: Elvis is lonely ElvisPedia birthDate type Singer marriedTo type YAGO born 1935 RockSinger 7 plays Who is the spouse of the guitar player? ? 1935 marriedTo
Problem: Elvis is lonely ElvisPedia birthDate type Singer marriedTo type YAGO born 1935 RockSinger 8 plays Who is the spouse of the guitar player? ? 1935 marriedTo Complementary  information cannot be used
Solution: Unify Entities ElvisPedia birthDate type Singer marriedTo type YAGO born 1935 RockSinger 9 plays 1935 marriedTo identical
Goal: Merging Ontologies ElvisPedia birthDate type Singer marriedTo type YAGO born 1935 RockSinger 10 plays To merge two ontologies, we have to identify 1935
Goal: Merging Ontologies ElvisPedia birthDate type Singer marriedTo type YAGO born 1935 RockSinger 11 To merge two ontologies, we have to identify • equivalent instances plays sameAs 1935
Goal: Merging Ontologies subClassOf ElvisPedia birthDate type Singer marriedTo type YAGO born 1935 RockSinger 12 To merge two ontologies, we have to identify • equivalent instances • equivalent or subsuming classes sameAs plays 1935
Goal: Merging Ontologies sameAs subClassOf ElvisPedia birthDate type Singer marriedTo type YAGO 1935 born 1935 RockSinger 13 To merge two ontologies, we have to identify • equivalent instances • equivalent or subsuming classes • equivalent or subsuming relations subPropertyOf plays
Previous work Previous work • uses hard logical constraints, which may be inadequate • requires parameter tuning • has not been tried on large ontologies birthDate marriedTo 1935 RockSinger type 14 type Singer born 1935 [Gracia 2009, Jean-Mary 2009, Isaac 2007, Aumueller 2005, Wang 2008, Noessner 2010,  Sais 2007/2009, Arasu 2009, Volz 2009,   Bhattacharya 2007, Hogan 2007/2010, Hu 2011, Li 2009, Udrea 2007, and more]
Previous work Previous work • uses hard logical constraints, which may be inadequate • requires parameter tuning • has not been tried on large ontologies birthDate marriedTo 1935 RockSinger type 15 type Singer born (1) • has mostly focused on     (1) instance matching or     (2) schema alignment,    ... but not both (2) (1) (2) 1935 [Gracia 2009, Jean-Mary 2009, Isaac 2007, Aumueller 2005, Wang 2008, Noessner 2010,  Sais 2007/2009, Arasu 2009, Volz 2009,   Bhattacharya 2007, Hogan 2007/2010, Hu 2011, Li 2009, Udrea 2007, and more]
PARIS: Aligning Everything At Once born There is a synergy between equality of instances, properties and classes! => Compute all together! birthDate marriedTo 1935 type RockSinger type Singer 16 PARIS 1935
Probabilistic Model We chose a probabilistic model marriedTo 1935 type RockSinger born type birthDate Singer 17  the probability that   is a subclass of    the probability that   =    the probability that  is a sub-property of  1935
Probabilistic Model 18 ... All possible alignments between the ontologies ... Worlds:
Probabilistic Model 19 ... All possible alignments between the ontologies ... Probabilities: ... Worlds:
Probabilistic Model 20 Events: We care here mainly about equality and subsumption events. Each event can be true or false  in a particular world. ... ... Probabilities: ... Worlds:
Probabilistic Model 21 Marginals: The marginal probability of an event is given by the sum of the probabilities of the worlds where the event holds. ... ... Probabilities: ... Worlds:
23 We do not care about these values. We only care about these marginals! (it is the product measure) We are interested in marginals that fulfill certain properties. For any set of marginals, there exists a probability distribution. Probabilistic Model Marginals: ... ... Probabilities: ... Worlds:
Literals birthDate RockSinger type 1935 Singer type born marriedTo 25 ? 1936 The probability that two literals are equal shall reflect the likelyhood that the two literals are intended to refer to the same thing.
Literals birthDate RockSinger type 1935 Singer type born marriedTo 26 ? 1936 The probability that two literals are equal shall reflect the likelyhood that the two literals are intended to refer to the same thing. • for strings:       string distance • for numbers:       numeric distance • for other literals:       domain-specific
Equality of Instances For instances, relations give a hint: Elvis Presley label label born 1935 born 1935 Elvis Presley marriedTo ? 28
Equality of Instances For instances, relations give a hint: Elvis Presley label label born 1935 born 1935 Elvis Presley marriedTo ? 29 Not many people are called Elvis => highly indicative
Equality of Instances For instances, relations give a hint: Not many people are called Elvis => highly indicative Elvis Presley label label born 1935 born 1935 Elvis Presley marriedTo ? 30 Many people are born in 1935 => less indicative
Not many people are called Elvis => highly indicative Many people are born in 1935 => less indicative Elvis Presley label label born 1935 born 1935 Elvis Presley marriedTo ? 31 The local inverse functionality of a relation with an argument is one over the number of incoming links: Local Inverse Functionality
Elvis Presley 1935 marriedTo label 1935 born label born Elvis Presley Only one person is called Elvis ? 32 The local inverse functionality of a relation with an argument is one over the number of incoming links: Local Inverse Functionality
Elvis Presley 1935 marriedTo label 1935 born label born Elvis Presley Only one person is called Elvis ? 33 10 people are born in 1935 The local inverse functionality of a relation with an argument is one over the number of incoming links: Local Inverse Functionality
Probability of Inverse Functionality Elvis Presley Elvis Presley marriedTo label born 1935 1935 label born ? 34 Many people share their year of birth Few people share the same name We define the probability of a relation being inverse functional as the harmonic mean of the local inverse functionalities:
Equality of Instances marriedTo Elvis Presley Elvis Presley label label born born 1935 1935 ? 35  and   shall be matched if
Equality of Instances marriedTo Elvis Presley Elvis Presley label label born born 1935 1935 ? 36
Equality of Instances marriedTo Elvis Presley Elvis Presley label label born born 1935 1935 ? 37
Equality of Instances marriedTo Elvis Presley Elvis Presley label label born born 1935 1935 ? 38 This evaluates to 1 iff • There is at least one     highly inverse functional r • There is one shared     argument 
Equality of Instances • Literals: precomputed • Others: recursive label marriedTo 1935 1935 born born label Elvis Presley Elvis Presley ? 39
Equality of Classes Singer type RockSinger If all instances of one class are instances of the other then the former subsumes the latter 40 type
type 41 subclassOf type Singer RockSinger Equality of Classes If all instances of one class are instances of the other then the former subsumes the latter
type subclassOf 42 type Singer RockSinger Equality of Classes If all instances of one class are instances of the other then the former subsumes the latter
type subclassOf 43 type Singer RockSinger Equality of Classes If all instances of one class are instances of the other then the former subsumes the latter
type subclassOf 44 type Singer RockSinger Equality of Classes If all instances of one class are instances of the other then the former subsumes the latter
Equality of Relations knows marriedTo knows If every pair of one relation is a pair of another relation, then the first is a sub-property of the second. 45
Equality of Relations knows marriedTo knows If every pair of one relation is a pair of another relation, then the first is a sub-property of the second. 46 marriedTo subpropertyOf knows
Equality of Relations knows marriedTo knows If every pair of one relation is a pair of another relation, then the first is a sub-property of the second. 47 knows subpropertyOf marriedTo Can be solved analogously to the subsumption of classes.
Algorithm 1. Fix equalities for literals 48 Literals:
Instances: Relations: 1. Fix equalities for literals 2. Set equalities for relations to a small initial value 3. Iterate the estimations for relations and instances to convergence (*) (*) We have no proof for convergence, but it seems to happen 50 Iterate Algorithm Literals:
Algorithm Iterate 1. Fix equalities for literals 2. Set equalities for relations to a small initial value 3. Iterate the estimations for relations and instances to convergence (*) 4. Compute the estimations for classes (*) We have no proof for convergence, but it seems to happen 51 final step Classes: Instances: Relations: