In this class, we will take an overview of Information Extraction for Knowledge Base Construction. This is the process of deriving structured information (such as alive(Elvis)) from digital text (such as the sentence “Elvis is alive”). We will first see applications of information extraction, notably in question answering systems, chatbots, and personal assistants. Then, we will cover the technical steps of knowledge base construction: natural language processing, named entity recognition, entity disambiguation, instance extraction, fact extraction, and knowledge cleaning.
Grading
The course is graded by 6 labs and a final exam.
Final grade: 50% labs + 50% exam
Final grade at re-take: 50% original labs + 50% reexam
Modalities:
It is not allowed to use code from other sources (students, Web, libraries, etc.) that was designed to solve the problem of the lab. Example: If the task is POS-tagging, it is not allowed to use a library or a piece of code from the Web for POS-tagging for this lab.
Discussing assignments together is allowed, but each student must write their own solution.
No sharing of code!
Lab: Entity Classification (deadline: the night before the next lecture at 5:00am)
Please install Python 3.6 (or later) on your machine for the labs!
See here for help.
Then please update PIP by running python3 -m pip install --upgrade pip
Students with an M1 Mac should proceed as described here.
2022-09-29: Named Entity Recognition and Classification (Room 1A226)
Lab: NERC (deadline: the night before the next lecture at 5:00am). Do not use (1) word embeddings (use only features), (2) other architectures, (3) libraries that can do NERC (spacy.io)
2022-10-06: Typing and Disambiguation (Room 1A226)