Knowledge Base Construction


In this class, we will take an overview of Information Extraction for Knowledge Base Construction. This is the process of deriving structured information (such as alive(Elvis)) from digital text (such as the sentence "Elvis is alive"). We will first see applications of information extraction, notably in question answering systems, chatbots, and personal assistants. Then, we will cover the technical steps of knowledge base construction: natural language processing, named entity recognition, entity disambiguation, instance extraction, fact extraction, and knowledge cleaning.


The course is graded by 6 labs and a final exam. Modalities: Teachers:


The course takes place on Monday afternoon, 13:30-16:30. The course is hybrid:
2021-09-13: Introduction (Amphi 5)
  1. Information Extraction: What is it?
  2. Information Extraction: Why do we want to do it?
  3. Information Extraction: Who does it?
  4. Information Extraction: How does it work?
  5. Knowledge Representation (until Slide 38, “schema”)
Please install Python 3.6 (or later) on your machine for the labs! See here for help.
Then please update PIP by running python3 -m pip install --upgrade pip
2021-09-20: Background refresher (Room 1A222)
  1. Introduction
  2. Perceptrons
  3. Feed-forward neural networks
  4. Word embeddings (only “word2vec”)
  5. Architectures (only “RNNs”)
  6. Transformers (not done, optional material)
  7. Lab: Classification. Deadline: midnight of September 26th October 3rd.
2021-09-27: Named Entity Recognition and Classification (Room 1A222)
  1. Named Entity Recognition and Classification (without Deep Learning Methods)
  2. Lab: NERC. Deadline: Midnight of October 3rd (= the night before the next lecture).
2021-10-04: Typing and Disambiguation (Room 1A222)
  1. Extractive Entity Typing (without “Taxonomy Induction”)
  2. Entity Disambiguation
  3. Lab: Disambiguation. Deadline: Midnight of October 17th (= the night before the next lecture).
2021-10-11: No course
This session does not take place. Use the time to prepare for the exam.
2021-10-18: Fact extraction (Room 1A222)
  1. Fact Extraction
  2. Regular expressions (recap)
  3. POS Tagging (without Viterbi Algorithm)
  4. Lab: Instance Extraction. Deadline: midnight before the next lecture.
2021-10-25: Fact Extraction by Reasoning (Room 1A222)
  1. Information Extraction by Reasoning (60)
  2. Lab: Max Sat. Deadline: midnight before the next lecture.
2021-11-08: Rule Mining (Room 1A222)
  1. Rule Mining (without “bottom-up rule mining”)
  2. Lab: Fact prediction. Deadline: midnight evening of November 28th (2 weeks after the exam).
2021-11-15: Exam (13:30-15:00)
at Télécom Paris (in presence)
The exam is closed-book (no materials are allowed except for a pen).