Conditions of usage Creative Commons License BY-NC Share

Information Extraction (TPT33 / INF393)

Course in Spring 2014 at Télécom ParisTech in the ATHENS program
© 2014 Fabian M. Suchanek, with Luis Galárraga


In this class, we will take an overview of Information Extraction techniques and the Semantic Web. Information extraction is the process of deriving structured information (such as alive(Elvis)) from digital text (such as the sentence "Elvis is alive"). The first part of this lecture will focus on factual and semantic information extraction, i.e., we will cover named entity recognition, entity disambiguation, instance extraction, fact extraction, and ontological information extraction. The Semantic Web is the little brother of the Web that aims to represent information in a machine-readable form. So, after having learned how to extract the information from text documents, we will learn how to represent it in a semantic way. We will cover the standards RDF/S, URIs, and RDFa, and recent advances in the field. We will also touch upon applications of both Information Extraction and the Semantic Web, such as Google's knowledge graph, IBM's Watson question answering system, and Facebook's Open Graph, and academic projects such as YAGO, DBpedia, and NELL.


Course title: Information Extraction
Course id in ATHENS: TPT33
Course id at Télécom: INF393
Language: English
Time: Monday 2014-03-17 to Friday 2014-03-21
Schedule: every day Rooms:

The class will be evaluated by work in the labs. The labs will be a combination of practical work (programming) and exam-like exercises. Every student works on their own. Depending on the exercise, the results are to be handed in either after the lab session or at the beginning of the next lecture.


The grades are available here (by the secret key that the students chose).


Day Session 1Session 2Lab rooms
Monday MotivationKnowledge Rep.C126&C127
Tuesday Knowledge Rep.Named Entity RecognitionC126&C127
Wednesday Named Entity AnnotationInstance ExtractionC124&C125
Thursday Structured sources Room B555Unstructured sourcesC127&C128
FridayReasoningSemantic WebC130
The schedule beyond the current point of time is tentative. The PDF slides are provided for convenience only, the authoritative ones are the SVG slides.

Supplementary material: