Knowledge Base Construction

Content

In this class, we will take an overview of Information Extraction, i.e., the process of deriving structured information (such as alive(Elvis)) from digital text (such as the sentence “Elvis is alive”). We will cover the technical steps of information extraction: named entity recognition, entity disambiguation, and fact extraction. For each of them, we will see different methods: fine-tuning language models, prompt engineering, and training-free procedures. Finally, we will talk about techniques for knowledge cleaning: link prediction, entity alignment and rule mining.

Grading

The course is graded by 6 labs and a final exam. Discussing assignments together is allowed, but each student must write their own solution. No sharing of code, plagiarism entails a grade of 0 for the lab/exam.

Teachers:

Schedule

Please install Python 3.6 (or later) on your machine for the labs! See here for help.
Then please update PIP by running python3 -m pip install --upgrade pip
Students with an M1 Mac should proceed as described here.

The class takes place at Telecom Paris.

Introduction (2023-11-23, room 1C39)
  1. Introduction to Information Extraction
  2. Knowledge Bases
  3. Formal Grammars, including Regular Expressions
Named Entity Recognition and Classification (2023-11-30, room 1A222)
  1. Knowledge Representation
  2. Named Entity Recognition and Classification
  3. Lab: training-free NERC
Typing and Disambiguation (2023-12-07, room 1A222)
  1. Prompt Engineering
  2. Entity Disambiguation
  3. Lab: Disambiguation with prompt engineering
Fact extraction (2023-12-14, room 1A222)
  1. Extractive Entity Typing
  2. Fact Extraction
  3. Lab: Fact extraction by fine-tuning BERT
Fact Extraction by Reasoning (2023-12-21, room 1A222)
  1. Information Extraction by Reasoning
  2. Lab: Max Sat
Rule Mining (2024-01-11, room 1A222)
  1. Rule Mining
  2. Lab: Fact prediction
Semantic Web (2024-01-18, room 1A222)
  1. Semantic Web
  2. Lab: Entity alignment or Link Prediction
Exam
The exam is “closed-book”: no materials are allowed except for a pen.