Knowledge Base Construction

Course in Autumn 2016 at Télécom ParisTech
in the M2 Data & Knowledge
© 2016 Fabian M. Suchanek

Content

In this class, we will take an overview of Information Extraction for Knowledge Base Construction. This is the process of deriving structured information (such as alive(Elvis)) from digital text (such as the sentence "Elvis is alive"). The lecture will cover named entity recognition, entity disambiguation, instance extraction, fact extraction, and ontological information extraction. We will then see how we can mine this data for correlations. We will also touch upon applications of Information Extraction, such as Google's knowledge graph and IBM's Watson question answering system, as well as academic projects such as YAGO, DBpedia, and NELL.

The grades are now available here.

Schedule

This class is free of OWL!
Day               Session 1 (normally Amphi Rubis)Session 2 (normally C45)
2016-11-22 Intro, Motivation, Knowledge Representation Research topics* (Amphi Rubis)
2016-11-29 Corpus, Character Encodings, Named Entity Recognition (Amphi Grenat) Lab 1: Tries (C129)
2016-12-06 Evaluation, Named Entity Annotation Lab 2: Type Extraction (C129)
2016-12-13 Disambiguation, Instance Extraction Lab 3: Disambiguation
2017-01-03Fact Extraction, POS Tagging Lab 4: POS Tagging
2017-01-10Dependency Parsing, Decidability* Lab 5: IE with POS tags
2017-01-17Semantic Web Lab 6: Entity Mapping
2017-01-3113:30-15:00, Amphi Estaunié: Exam
The exam is “closed-book”. Paper is provided. Bring a pen and a brain, the rest is on us!
2017-03-2918:00-19:30, C49: Re-Exam
The exam is “closed-book”. Paper is provided. Bring a pen and a brain, the rest is on us!
Supplementary material for those interested: Rule Mining, IE by Reasoning, Markov Logic, Wrapper induction.

The schedule beyond the current point of time is tentative. The PDF slides are provided for convenience only, the authoritative ones are the SVG slides.

The grading is as follows: First session: 50% Labs + 50% Exam; second session: 50% original labs + 50% re-exam.

* Allergy warning: may contain traces of description logics