Knowledge Base Construction

Content

In this class, we will take an overview of Information Extraction for Knowledge Base Construction. This is the process of deriving structured information (such as alive(Elvis)) from digital text (such as the sentence “Elvis is alive”). We will first see applications of information extraction, notably in question answering systems, chatbots, and personal assistants. Then, we will cover the technical steps of knowledge base construction: natural language processing, named entity recognition, entity disambiguation, instance extraction, fact extraction, and knowledge cleaning.

Grading

The course is graded by 6 labs and a final exam. Modalities: Teachers:

Schedule

The course takes place on Thursday afternoon, 13:30-16:30.
2022-09-15: Introduction (0D17)
  1. Information Extraction: What is it?
  2. Information Extraction: Why do we want to do it?
  3. Information Extraction: How does it work?
  4. Knowledge Representation (until Slide 39, “Schema”)
2022-09-22: Background refresher (Room 1A226)
  1. Introduction
  2. Perceptrons
  3. Feed-forward neural networks
  4. Lab: Entity Classification (deadline: the night before the next lecture at 5:00am)
Please install Python 3.6 (or later) on your machine for the labs! See here for help.
Then please update PIP by running python3 -m pip install --upgrade pip

Students with an M1 Mac should proceed as described here.

2022-09-29: Named Entity Recognition and Classification (Room 1A226)
  1. Word embeddings
  2. Named Entity Recognition and Classification
  3. Lab: NERC (deadline: the night before the next lecture at 5:00am).
    Do not use (1) word embeddings (use only features), (2) other architectures, (3) libraries that can do NERC (spacy.io)
2022-10-06: Typing and Disambiguation (Room 1A226)
  1. Extractive Entity Typing
  2. Entity Disambiguation
  3. Dependency Parsing
  4. Lab: Disambiguation (deadline: the night before the next lecture at 5:00am)
2022-10-13: Fact extraction (Room 1A226)
  1. Fact Extraction
  2. Regular expressions (recap)
  3. POS Tagging (without Viterbi Algorithm)
  4. Lab: Instance Extraction (deadline: the night before the next lecture at 5:00am)
2022-10-20: Fact Extraction by Reasoning (Room 1A226) Starts 15 min late at 13:45
  1. Information Extraction by Reasoning
  2. Lab: Max Sat (deadline 30th of October)
2022-10-27: Rule Mining (Room 1D23)
Upon student request, this lecture does not take place physically. It is available as a video recording.
  1. Rule Mining
  2. Lab: Fact prediction (deadline 10th 13th of November)
We offer office hours on November 2nd 16:00-17:30 and November 8th 17:00-18:00 in office 4C20 at Télécom Paris
2022-11-10: Exam (15:00-16:30, Room 0D20,in presence)
The exam is “closed-book”: no materials are allowed except for a pen.