Discovery of Complex Schemas for RDF Knowledge Bases
GrantThis is the Web page of the grant “Discovery of Complex Schemas for RDF Knowledge Bases” (“DICOS”) of the French National Research Agency.
- Fabian Suchanek (coordinator)
- Pierre Senellart (partner)
- Nicoleta Preda (collaborator)
- Julien Romero (PhD student)
- Camille Bourgaux (Postdoc, now at ENS Ulm)
ContextRecent years have seen the rise of large knowledge bases such as DBpedia, YAGO, Freebase, and Google’s knowledge graph. The advance of the Linked Open Data project, which now contains thousands of knowledge bases, is a case to the point. These knowledge bases use RDF and are thus inherently schema-less. We propose to use rule mining to deduce schema constraints automatically from the data. Building on recent advances in the field, we propose to enlarge the scope of automated rule mining to numerical and existential rules. The resulting constraints could be used to spot errors in the data or even to predict missing pieces in the knowledge. The particular challenge in the context of knowledge bases is the absence of counterexamples, which requires a new approach to mining rules.
The project has 3 work packages:
- Mining Dependencies
- Mining Rules with Existential Quantifiers
- Numerical Rule Mining
ResultsThe work progressed on the following axes :
- Our idea is to spot mistakes in the data of Wikidata, to learn how the contributors of Wikidata corrected these mistakes in the past, and to propose to correct similar mistakes on the current data. This falls in the Work Package 1 of the DICOS project, “Mining Dependencies”. Our postdoc, Camille Bourgaux, to work with another PhD student on this project. The work led to the publication of a full paper at WWW 2019, the main conference in the area of the Web (Rank A*). Camille also got a permanent position at the CNRS directly after her postdocship.
- Dynamic Knowledge Bases
- We work on schema mining on dynamic knowledge bases (i.e., knowledge bases that are accessible only through Web services). The idea is to pin down all queries that can be answered given the services. This amounts to a characterization of the part of the knowledge base that is accessible from the outside. This is a special case of Work Package 1 that we decided to treat because it has a clear usecase. Our PhD student, Julien Romero, currently works on this topic.
- Conditional Key Mining
- The idea is to mine constraints that identify an entity uniquely in a certain context. For example, a German PhD student can have only a single advisor. Thus, the student uniquely identifies the advisor – but only in Germany. This work falls in the Work Package 1 as well. I have worked in this project with two colleagues, one PhD student, and one postdoc. The work has led to a publication at ISWC 2017, the most important conference in the domain of the Semantic Web. The DICOS project financed by trip to the conference, and is thus acknowledged on that publication.
- Mining of obligatory attributes
- We work on mining obligatory attributes in knowledge bases. The idea is to find out whether an attribute (e.g., “hasNationality” or “isMarried”) applies to all instances of a class in the real world (i.e., whether all people are married in the real world) – given only the incomplete knowledge of the knowledge base. This falls in the Work Package 2 of the DICOS project, “Mining Rules with Existential Quantifiers”. We worked with a PhD student, Jonathan Lajus, on this problem. This work has led to a publication at WWW 2018 (Rank A*, as previously mentioned). The DICOS project financed Jonathan’s trip to the conference, and is thus acknowledged on that publication.
- Reasoning on data with constraints
- We have not just worked on mining constraints, but also on using the constraints. In particular, Camille Bourgaux investigated the inconsistencies that may result from the addition of logical constraints. She worked with three colleagues on inconsistency-tolerant query answering over temporal data, and on querying knowledge bases whose facts are annotated with meta-data, such as the temporal validity of a fact or its source. These works led respectively to a publication in the Semantic Web Journal (to appear in March 2019) and to a publication at AAAI 2019 (Rank A*).
- Thomas Pellissier Tanon, Camille Bourgaux, Fabian M. Suchanek:
“Learning How to Correct a Knowledge Base from the Edit History” (pdf)
Full paper at the The Web Conference (WWW), 2019
- Jonathan Lajus, Fabian M. Suchanek:
“Are All People Married? Determining Obligatory Attributes in Knowledge Bases” (pdf)
Full paper at the Web Conference (WWW), 2018
See also: Project Web page
- Danai Symeonidou, Luis Galárraga, Nathalie Pernelle, Fatiha Saïs, Fabian M. Suchanek:
“VICKEY: Mining Conditional Keys on Knowledge Bases” (pdf)
Full paper at the International Semantic Web Conference (ISWC), 2017
See also: VICKEY Web page
- Fabian M. Suchanek:
“Extraction d’informations” (pdf)
Book chapter in the Les Big Data à découvert , 2017
- Camille Bourgaux, Patrick Koopmann, and Anni-Yasmin Turhan:
“Ontology-Mediated Query Answering over Temporal and Inconsistent Data” (pdf)
(How to query an inconsistent temporal knowledge base)
Semantic Web Journal, 2019
- Camille Bourgaux and Ana Ozaki:
“Querying Attributed DL-Lite Ontologies Using Provenance Semirings” (pdf technical report)
(How to query a knowledge base with annotated facts and constraints that may depend on these annotations)