Fabian M. Suchanek Disambiguation 22
Semantic IE You are here 2 Source Selection and Preparation Entity Recognition Entity Disambiguation singer Fact Extraction Reasoning Instance Extraction singer Elvis
Def: Disambiguation Given an ambiguous name in a corpus and its meanings,  disambiguation  is the task of determining the intended meaning. 3 ? Homer eats a doughnut. [The Economist, 2018-12-22]
Disambiguation Setting Usually Named Entity Recognition (NER) runs first, and the goal is to map the names to entities in a Knowledge Base (KB). NER’ed corpus 4 "Homer" label label American poet type Knowledge Base Homer eats a doughnut. >context
Def: Context of a word The context of a word in a corpus is the multi-set of the words in its vicinity without the stopwords. Homer eats a doughnut. Context of “Homer”: {eats, doughnut} 5 (The definition may vary depending on the application)
Def: Context of an entity The context of an entity in a KB is the set of all labels of all entities in its vicinity. USA "USA" doughnut "doughnut" "America" label label likes label livesIn Context of Homer: {doughnut, USA,   America} (The definition may vary depending on the application)
Def: Context-based disambiguation Context-based disambiguation  (also: bag of words disambiguation) maps a name in a corpus to the entity in the KB whose context has the highest overlap to the context of the name. (The definition may vary depending on the application) 7 For USA Today, Homer is among the top 25 most influential people of the past 25 years. Knowledge Base doughnut USA poet Greece "Homer" Odyssey
Def: Context-based disambiguation 8 For USA Today, Homer is among the top 25 most influential people of the past 25 years. Knowledge Base doughnut USA poet Greece "Homer" Odyssey Context of "Homer" in corpus: {USA, Today, top, influential, people, past, years} Context-based disambiguation  (also: bag of words disambiguation) maps a name in a corpus to the entity in the KB whose context has the highest overlap to the context of the name.
Def: Context-based disambiguation 9 For USA Today, Homer is among the top 25 most influential people of the past 25 years. Knowledge Base doughnut USA poet Greece "Homer" Odyssey Context of "Homer" in corpus: {USA, Today, top, influential, people, past, years} {USA, doughnut} {poet, Geece, O.} Context-based disambiguation  (also: bag of words disambiguation) maps a name in a corpus to the entity in the KB whose context has the highest overlap to the context of the name.
Def: Context-based disambiguation 10 For USA Today, Homer is among the top 25 most influential people of the past 25 years. Knowledge Base doughnut USA poet Greece "Homer" Odyssey Context of "Homer" in corpus: {USA, Today, top, influential, people, past, years} {USA, doughnut} {poet, Geece, O.}  highest overlap -> Homer Simpson wins Context-based disambiguation  (also: bag of words disambiguation) maps a name in a corpus to the entity in the KB whose context has the highest overlap to the context of the name.
What if there is little context? This is very important for the Simpsons. The Robert Simpson Department Store. Defunct since 1990.