Knowledge Base Construction

Content

Language Models have revolutionized natural language processing. Yet, they can say wrong things in a very convincing way — they hallucinate. One solution to this problem can come from structured data such as knowledge graphs, which can serve to correct and inform the model. In this class, we will see how to bridge the gap between natural language (the sentence “Elvis is alive”) and structured information (the statement alive(Elvis)). We will cover the technical steps of information extraction: named entity recognition, entity disambiguation, and fact extraction. For each of them, we will see different methods: fine-tuning language models, prompt engineering, and training-free procedures. Finally, we will talk about techniques for knowledge cleaning: link prediction, entity alignment and rule mining.

Grading

The course is graded by 6 labs and a final exam. Discussing assignments together is allowed, but each student must write their own solution. No sharing of code, plagiarism entails a grade of 0 for the lab/exam. Unless otherwise mentioned, labs are due before the next lecture.

Teachers:

Schedule

The class takes place at Telecom Paris.
Introduction (2024-11-15, room 1C47)
  1. Introduction to Knowledge Base Construction
  2. Knowledge Graphs
  3. Knowledge Representation
Named Entity Recognition and Classification (2024-11-22, room 1D22)

Snow fell, and local tradition dictates that public services come to a standstill. The class will take place in presence for those who can attend in person, and will also be streamed here for those who join remotely.

  1. Continuation of Knowledge Representation
    (We covered all slides except Reification and Canonicalization)
  2. Named Entity Recognition and Classification
    (Except conditional random fields)
  3. Lab: training-free NERC
Supplementary material: A quick refresher of Regular Expressions
Typing and Disambiguation (2024-11-29, room 1A222)
  1. Entity Disambiguation
  2. Prompt Engineering
  3. Lab: Disambiguation with prompt engineering (without Fabian)
Fact extraction (2024-12-06, room 1A222)
  1. Fact Extraction
    (Without semantic representations)
  2. Constrained Decoding
  3. Lab: Fact extraction by constrained decoding
Fact Extraction by Reasoning (2024-12-13, room 1A222)
  1. Information Extraction by Reasoning
  2. Lab: Max Sat
Rule Mining (2024-12-20, room 1A222)
  1. Rule Mining (recording)
  2. Lab: Rule mining

Looking for an internship? Check here!

Looking for an internship + PhD thesis? Check here, we will be hiring soon!

Semantic Web (2025-01-10, room 1A222)
  1. Semantic Web
    (Without OWL, SPARQL, RDFa)
  2. Lab: KB cleanup (deadline: 2025-01-26)
Exam (2025-01-17, room 0C04)
The exam is “closed-book”: no materials are allowed except for a pen.