Knowledge Base Construction
In this class, we will take an overview of Information Extraction for Knowledge Base Construction. This is the process of deriving structured information (such as alive(Elvis)) from digital text (such as the sentence "Elvis is alive"). We will first see applications of information extraction, notably in question answering systems, chatbots, and personal assistants. Then, we will cover the technical steps of knowledge base construction: natural language processing, named entity recognition, entity disambiguation, instance extraction, fact extraction, and knowledge cleaning.
The course is graded by 6 labs and a final exam.
- Final grade: 50% labs + 50% exam
- Final grade at re-take: 50% original labs + 50% reexam
- It is not allowed to use code from other sources (students, Web, libraries, etc.) that was designed to solve the problem of the lab. Example: If the task is POS-tagging, it is not allowed to use a library or a piece of code from the Web for POS-tagging for this lab.
- Discussing assignments together is allowed, but each student must write their own solution.
No sharing of code!
- Plagiarism entails a grade of 0 for the lab/exam.
The course takes place on Thursday afternoon, 13:30-16:30. Due to the Corona pandemic, the course will take place online in Zoom here.
- 2020-11-26: Introduction
- Information Extraction: What is it?
- Information Extraction: Why do we want to do it?
- Information Extraction: Who does it?
- Information Extraction: How does it work?
- Knowledge Representation (until Slide 46, “equality”)
- 2020-12-03: Background refresher
- Feed-forward neural networks
- Word embeddings (only “word2vec”)
- Transformers (not done, optional material)
- Lab: Classification (deadline: midnight on the
9th 11th of December 2020)
- 2020-12-10: Named Entity Recognition and Classification
- Named Entity Recognition and Classification (without Deep Learning Methods)
- Lab: NERC
(deadline: midnight on the
16th 20th of December 2020)
- 2020-12-17: Typing and Disambiguation
- Extractive Entity Typing (without “Taxonomy Induction”)
- Entity Disambiguation (without “only textual features”)
- Lab: Disambiguation (deadline: midnight on the 6th of January 2021)
- 2021-01-07: Fact extraction
- Fact Extraction (40)
- Formal Grammars (30, only Formal Languages, Regular languages, FSM)
- Regular expressions (26)
This lecture might take longer that 1.5h, so that we will shorten the lab accordingly.
- Lab: Instance Extraction, without dependency parsing or grammatical parsing (deadline January 13th midnight)
- 2021-01-14: Formal Grammars
- Formal Grammars (30, context-free languages, PDA, Turing Machine, Halting Problem)
- Dependency Parsing (optional material)
- Lab: Instance Extraction with dependency parsing (deadline 20th of January midnight)
- 2021-01-21: Fact Extraction by Reasoning
- Information Extraction by Reasoning (70)
- Lab: Max Sat (Deadline Sunday January 31st at midnight)
- 2021-02-04: Exam (13:30-15:00)
- Room “Amphi 7” at Télécom Paris (in Palaiseau)
The exam is closed-book (no materials are allowed except for a pen).