Mining Incompleteness of Knowledge Bases
The problem of incompleteness
A knowledge base (KB) is a computer-processable collection of knowledge about the real world. An example of such a KB is YAGO, which we co-develop at Télécom Paris University together with the Max Planck Institute for Informatics. The content of these KBs is usually correct, i.e., it corresponds to reality: If the KB says that Elvis Presley was married to Priscilla Presley, then this is most likely the case. However, KBs are usually not complete: There may be information missing from the KB. For example, the KB may not contain all albums of Elvis Presley. This is a problem for downstream applications of the KB, which will receive answers to queries that do not correspond to reality (e.g., when asking how many albums Elvis released, or when asking whether Elvis released a particular album that happens to be missing from the KB).The problem is exacerbated by the fact that KBs usually do not store negative information. For example, a KB will not store the fact that Elvis did not (co-)release the album “Brothers in Arms”. While this fact is not in the KB, this does not mean that it is false: KBs operate under an open world assumption, which means that we may not conclude that statements that are absent from the KB are false. The reason is that even the KB creators usually only know (or can extract) the true statements — they do not know which statements are false, or which claims are incomplete. Thus, there is no way to detect from the KB alone whether a statement that is not in the KB is true in reality.
It is relatively easy to compute a score that tells us which percentage of entities has a certain attribute in the KB. But that alone does not tell us whether, in reality, a higher percentage of entities has that attribute or not. In our research team, we develop methods that can estimate where the KB is incomplete with respect to the real world. Our methods can estimate the incompleteness of a KB from the KB itself — without the use of external information.
- Simon Razniewski, Hiba Arnaout, Shrestha Ghosh, Fabian M. Suchanek:
“Completeness, Recall, and Negation in Open-World Knowledge Bases: A Survey”,
ACM Computing Surveys  (ACM CSUR), 2023
Detecting missing facts
The KB may be incomplete on facts. For a given subject (say, Elvis Presley), and a given relation (say, releasedAlbum), we can detect whether there are objects missing (i.e., whether there are albums in reality that are not in the KB). We use the rule mining system AMIE for this purpose:- Luis Galárraga, Simon Razniewski, Antoine Amarilli, Fabian M. Suchanek:
“Predicting Completeness in Knowledge Bases” (pdf)
Full paper at the International Conference on Web Search and Data Mining (WSDM) , 2017
Mining missing facts
If the KB is missing facts, we can try to predict them automatically. This works by help of rule mining. For example, we can mine automatically that people usually live in the same city as their spouse. When we are missing the city of residence of Priscilla Presley, we can predict that she lives in the same city as Elvis Presley.- Jonathan Lajus, Luis Galárraga, Fabian M. Suchanek:
“Fast and Exact Rule Mining with AMIE 3” (pdf)
Full paper at the Extended Semantic Web Conference (ESWC) , 2020
See also: AMIE project - Luis Galárraga, Christina Teflioudi, Katja Hose, Fabian M. Suchanek:
“Fast Rule Mining in Ontological Knowledge Bases with AMIE+” (pdf)
Journal article in the VLDB Journal (VLDBJ) , 2015
See also: AMIE Web page - Luis Galárraga, Christina Teflioudi, Katja Hose, Fabian M. Suchanek:
“AMIE: Association Rule Mining under Incomplete Evidence in Ontological Knowledge Bases” (pdf)
Full paper at the International World Wide Web Conference (WWW) , 2013
Best student paper award
See also: technical report
See also: AMIE Web site
Detecting obligatory attributes
Some attributes are obligatory for a class, i.e., all instances of that class have the attribute in the real world. For example, all people must have a birth date. Other attributes are not obligatory. For example, not all people have a spouse. In the KB, these attributes cannot easily be distinguished, because both of them may be highly incomplete. We have developed a probabilistic method that can guess whether an attribute is obligatory or not, even if the KB is highly incomplete.- Jonathan Lajus, Fabian M. Suchanek:
“Are All People Married? Determining Obligatory Attributes in Knowledge Bases” (pdf)
Full paper at the Web Conference (WWW) , 2018
See also: Project Web page
Detecting missing entities
The KB may miss entities. For example, the KB may not contain all cities in France. By help of statistical methods, we can give a lower bound for the number of missing entities per class.- Arnaud Soulet, Arnaud Giacometti, Béatrice Markhoff, Fabian M. Suchanek:
“Representativeness of Knowledge Bases with the Generalized Benford’s Law” (pdf)
Full paper at the International Semantic Web Conference (ISWC) , 2018
Best Paper Nominee
See also: Project Web page