CC-BY Fabian M. Suchanek

Discovery of Complex Schemas for RDF Knowledge Bases


This is the Web page of the grant “Discovery of Complex Schemas for RDF Knowledge Bases” (“DICOS”) of the French National Research Agency.

Project members:


Recent years have seen the rise of large knowledge bases such as DBpedia, YAGO, Freebase, and Google’s knowledge graph. The advance of the Linked Open Data project, which now contains thousands of knowledge bases, is a case to the point. These knowledge bases use RDF and are thus inherently schema-less. We propose to use rule mining to deduce schema constraints automatically from the data. Building on recent advances in the field, we propose to enlarge the scope of automated rule mining to numerical and existential rules. The resulting constraints could be used to spot errors in the data or even to predict missing pieces in the knowledge. The particular challenge in the context of knowledge bases is the absence of counterexamples, which requires a new approach to mining rules.

The project has 3 work packages:

  1. Mining Dependencies
  2. Mining Rules with Existential Quantifiers
  3. Numerical Rule Mining


The work progressed on the following axes :
Our idea is to spot mistakes in the data of Wikidata, to learn how the contributors of Wikidata corrected these mistakes in the past, and to propose to correct similar mistakes on the current data. This falls in the Work Package 1 of the DICOS project, “Mining Dependencies”. Our postdoc, Camille Bourgaux, works with another PhD student on this project. The work led to the publication of a full paper at WWW 2019, the main conference in the area of the Web (Rank A*). Camille also got a permanent position at the CNRS directly after her postdocship.
Dynamic Knowledge Bases
We work on schema mining on dynamic knowledge bases (i.e., knowledge bases that are accessible only through Web services). The idea is to pin down all queries that can be answered given the services. This amounts to a characterization of the part of the knowledge base that is accessible from the outside. This is a special case of Work Package 1 that we decided to treat because it has a clear use case. Our PhD student, Julien Romero, works on this topic, and we have published a full paper at EWSC 2020 (Rank A).
Conditional Key Mining
The idea is to mine constraints that identify an entity uniquely in a certain context. For example, a German PhD student can have only a single advisor. Thus, the student uniquely identifies the advisor – but only in Germany. This work falls in the Work Package 1 as well. I have worked in this project with two colleagues, one PhD student, and one postdoc. The work has led to a publication at ISWC 2017, the most important conference in the domain of the Semantic Web. The DICOS project financed by trip to the conference, and is thus acknowledged on that publication.
Mining of obligatory attributes
We work on mining obligatory attributes in knowledge bases. The idea is to find out whether an attribute (e.g., “hasNationality” or “isMarried”) applies to all instances of a class in the real world (i.e., whether all people are married in the real world) – given only the incomplete knowledge of the knowledge base. This falls in the Work Package 2 of the DICOS project, “Mining Rules with Existential Quantifiers”. We worked with a PhD student, Jonathan Lajus, on this problem. This work has led to a publication at WWW 2018 (Rank A*, as previously mentioned). The DICOS project financed Jonathan’s trip to the conference, and is thus acknowledged on that publication.
Reasoning on data with constraints
We have not just worked on mining constraints, but also on using the constraints. In particular, Camille Bourgaux investigated the inconsistencies that may result from the addition of logical constraints. She worked with three colleagues on inconsistency-tolerant query answering over temporal data, and on querying knowledge bases whose facts are annotated with meta-data, such as the temporal validity of a fact or its source. These works led respectively to a publication in the Semantic Web Journal (to appear in March 2019) and to a publication at AAAI 2019 (Rank A*).