PARIS: Probabilistic Alignment of Relations, Instances, and Schema Fabian M. Suchanek (Max Planck Institute for Informatics) Serge Abiteboul (INRIA Saclay) Pierre Senellart (Télécom ParisTech) 1
RDF Ontologies Singer plays type born 2 Person An RDF ontology can be seen as a graph of entities subclassOf 1935
RDF Ontologies Singer plays type born 3 Person An RDF ontology can be seen as a graph of entities subclassOf classes relations / properties instances literals 1935
RDF Ontologies instances relations / properties classes literals Singer plays type born 4 Person An RDF ontology can be seen as a graph of entities subclassOf intelligent search, QA, machine translation... Ontologies serve all kinds of purposes: 1935
Existing Ontologies There are literally hundreds of ontologies on the Web Singer type born plays 5 Person subclassOf 1935
Overlapping Data birthDate 1935 type RockSinger marriedTo YAGO ElvisPedia born Singer type 6 Many ontologies contain similar or overlapping entities and facts plays 1935
Problem: Elvis is lonely ElvisPedia birthDate type Singer marriedTo type YAGO born 1935 RockSinger 7 plays Who is the spouse of the guitar player? ? 1935 marriedTo
Problem: Elvis is lonely ElvisPedia birthDate type Singer marriedTo type YAGO born 1935 RockSinger 8 plays Who is the spouse of the guitar player? 1935 marriedTo Complementary  information cannot be used ?
Solution: Unify Entities ElvisPedia birthDate type Singer marriedTo type YAGO born 1935 RockSinger 9 plays 1935 marriedTo identical
Goal: Merging Ontologies ElvisPedia birthDate type Singer marriedTo type YAGO born 1935 RockSinger 10 plays To merge two ontologies, we have to identify 1935
Goal: Merging Ontologies ElvisPedia birthDate type Singer marriedTo type YAGO born 1935 RockSinger 11 To merge two ontologies, we have to identify • equivalent instances sameAs 1935 plays
Goal: Merging Ontologies subClassOf ElvisPedia birthDate type Singer marriedTo type YAGO born 1935 RockSinger 12 To merge two ontologies, we have to identify • equivalent instances • equivalent or subsuming classes sameAs plays 1935
Goal: Merging Ontologies sameAs subClassOf ElvisPedia birthDate type Singer marriedTo type YAGO 1935 born 1935 RockSinger 13 To merge two ontologies, we have to identify • equivalent instances • equivalent or subsuming classes • equivalent or subsuming relations subPropertyOf plays
Previous work Previous work • uses hard logical constraints, which may be inadequate • requires parameter tuning • has not been tried on large ontologies birthDate marriedTo 1935 RockSinger type 14 type Singer born 1935 [Gracia 2009, Jean-Mary 2009, Isaac 2007, Aumueller 2005, Wang 2008, Noessner 2010,  Sais 2007/2009, Arasu 2009, Volz 2009,   Bhattacharya 2007, Hogan 2007/2010, Hu 2011, Li 2009, Udrea 2007, and more]
Previous work Previous work • uses hard logical constraints, which may be inadequate • requires parameter tuning • has not been tried on large ontologies birthDate marriedTo 1935 RockSinger type 15 type Singer born (1) • has mostly focused on     (1) instance matching or     (2) schema alignment,    ... but not both (2) (1) (2) 1935 [Gracia 2009, Jean-Mary 2009, Isaac 2007, Aumueller 2005, Wang 2008, Noessner 2010,  Sais 2007/2009, Arasu 2009, Volz 2009,   Bhattacharya 2007, Hogan 2007/2010, Hu 2011, Li 2009, Udrea 2007, and more]
PARIS: Aligning Everything At Once born There is a synergy between equality of instances, properties and classes! => Compute all together! birthDate marriedTo 1935 type RockSinger type Singer 16 PARIS 1935
Probabilistic Model We chose a probabilistic model the probability that x=y the probability that  is a sub‐property of  the probability that  is a sub‐class of  marriedTo 1935 type RockSinger born type birthDate Singer 17 1935
Probabilistic Model 18 All possible alignments between the ontologies ... Worlds: ...
Probabilistic Model 19 All possible alignments between the ontologies ... Worlds: Probabilities: ... ...
Probabilistic Model 20 All possible alignments between the ontologies We care here mainly about equality and subsumption events. Each event can be true or false  in a particular world. ... Worlds: Probabilities: ... Marginals: ...
Probabilistic Model 21 The marginal probability of an event is given by the sum of the probabilities of the worlds where the event holds. ... Worlds: Probabilities: ... Marginals: ...
Probabilistic Model 22 We do not care about these values. ... Worlds: Probabilities: ... Marginals: ...
Probabilistic Model 23 We only care about these marginals! We do not care about these values. ... Worlds: Probabilities: ... Marginals: ...
Probabilistic Model 24 ... Worlds: Probabilities: ... Marginals: ... (it is the product measure) We are interested in marginals that fulfill certain properties. For any set of marginals, there exists a probability distribution. We only care about these marginals!
Literals 25 The probability that two literals are equal shall reflect the likelihood that the two literals are intended to refer to the same thing. birthDate RockSinger type 1935 Singer type born marriedTo ? 1936
Literals 26 The probability that two literals are equal shall reflect the likelihood that the two literals are intended to refer to the same thing. • for strings:       string distance • for numbers:       numeric distance • for other literals:       domain-specific birthDate RockSinger type 1935 Singer type born marriedTo ? 1936
Literals 27 We chose a particularly simple equality: ? • for strings:       string distance • for numbers:       numeric distance • for other literals:       domain-specific birthDate RockSinger type 1935 Singer type born marriedTo ? 1936
Equality of Instances For instances, relations give a hint: ? 28 birthDate RockSinger type 1935 Singer type born marriedTo 1936
Equality of Instances For instances, relations give a hint: 29 Not many people are called Elvis => highly indicative ? birthDate RockSinger type 1935 Singer type born marriedTo 1936
Equality of Instances For instances, relations give a hint: Not many people are called Elvis => highly indicative 30 Many people are born in 1935 => less indicative ? birthDate RockSinger type 1935 Singer type born marriedTo 1936
Not many people are called Elvis => highly indicative Many people are born in 1935 => less indicative 31 one over the number of incoming links: The local inverse functionality of a relation with an argument is Local Inverse Functionality ? birthDate RockSinger type 1935 Singer type born marriedTo 1936
Local Inverse Functionality Only one person is called Elvis The local inverse functionality of a relation with an argument is one over the number of incoming links: 32 ? birthDate RockSinger type 1935 Singer type born marriedTo 1936
Local Inverse Functionality Only one person is called Elvis The local inverse functionality of a relation with an argument is one over the number of incoming links: 33 10 people are born in 1935 ? birthDate RockSinger type 1935 Singer type born marriedTo 1936
Probability of Inverse Functionality 34 Many people share their year of birth Few people share the same name We define the probability of a relation being inverse functional as the harmonic mean of the local inverse functionalities: ? birthDate RockSinger type 1935 Singer type born marriedTo 1936
Equality of Instances 35  and   shall be matched if ? birthDate RockSinger type 1935 Singer type born marriedTo 1936
Equality of Instances 36 ? birthDate RockSinger type 1935 Singer type born marriedTo 1936
Equality of Instances 37 ? birthDate RockSinger type 1935 Singer type born marriedTo 1936
Equality of Instances 38 This evaluates to 1 iff • There is at least one     highly inverse functional r • There is one shared    argument y=y' ? birthDate RockSinger type 1935 Singer type born marriedTo 1936