CC-BY
Fabian M. Suchanek
Knowledge
Representation
70
We want to represent knowledge about the world in a computer.
No easy feat:
Goal: Make computers understand
2
Knowledge representation is a difficult skill to learn on
the job. [...] The importance of knowledge representation
in diverse industry settings [...] should reinforce the
idea that knowledge representation should be a
fundamental part of a computer science curriculum, as
fundamental as data structures and algorithms.
—
industry experts from Google, Microsoft, Facebook,
Amazon, IBM
Here, we look at the knowledge representation formalism
that has evolved as a standard in the domain.
Overview
3
•
Entities
•
Classes
•
Relations
•
The gory details
•
Reification
•
Canonicalization
•
The Open World Assumption
•
Reality
An
entity
(also: resource) is anything that may be an object of thought.
Entity
4
>digression
Digression: Entities
5
Is this an entity?
How many entities are there?
Or this?
Digression: Entities
6
>digression
How many entities are there?
Isn’t everything just atoms?
Is this an entity?
Or this?
Over time, all parts of a ship are
replaced at some point of time.
Then, is it still the same ship?
see: Theseus’s ship on Wikipedia
Digression: Identity
7
New York Times
Over time, all parts of a ship are
replaced at some point of time.
Then, is it still the same ship?
Digression: Identity
8
Humans replace their cells every 7 years.
see: Theseus’s ship on Wikipedia
We consider only a finite set of entities that are of interest, and we
assume them to be atomic and identifiable.
An
identifier
for an entity is a string of characters
that represents the entity uniquely in a knowledge base.
Def: Identifiers
9
Examples for identifiers:
• unique names, as in YAGO or DBpedia:
Rowan_Atkinson_(actor)
• abstract identifiers, as in Wikidata or Freebase:
/m/02jq1
We sometimes say “entity” when we mean the identifier. We sometimes use images for the identifiers.
Try it out!
An
label
for an entity is a human-readable string that names the entity.
Labels that refer to the same entity are called
synonyms
.
Entities that have a label are called
named entities
.
Def: Labels
10
[Atkinson on Wikidata]
identifier (for a named
entity, Rowan Atkinson)
labels
(all synonymous)
A label that refers to several entities is called
ambiguous
.
Def: Ambiguity
11
Example: “Paris” is an ambiguous label, as it can refer to
several cities, a greek hero, or people with that name.
Paris reads about Paris in Paris.
Try it out!
Identifier:
Paris_Hilton
Paris_(Greek_myth)
Paris_(city)
Labels:
“Paris Hilton”
“Paris Whitney Hilton”
“
Paris
”
...
“Paris the Hero”
“
Paris
”
“Alexander”
...
“
Paris
”
“City of Light”
“Parigi”
...
>literals
A
literal
is a fixed value that takes the form of a string of characters.
(It is an entity that is identical to its identifier — in all knowledge bases.)
Def: Literals
12
"1955-01-06"
"Hello world"
"42"
This is the number 42,
and any knowledge base
will use the identifier "42"
to refer to this entity.
Overview
13
•
Entities
•
Classes
•
Relations
•
The gory details
•
Reification
•
Canonicalization
•
The Open World Assumption
•
Reality
A
class
(also:
concept
) is a set of similar entities. Each entity is an
instance
of (also: has the type of, belongs to) the class.
Other classes:
- Scientists
- Cars
- Cities
- Rivers
- Universities
- Theories
- ...
Class
14
(The exact definition of “class” is a philosophical conundrum. See
later
in this lecture.)
Instances
“It's a bit disconcerting being treated like Madonna.”
— Rowan Atkinson
Class “Entertainers”
Rowan Atkinson
Madonna
An instance can belong to several classes.
Multiple Classes
15
“The best way to increase society’s resistance to insulting or offensive
speech is to allow a lot more of it. As with childhood diseases, you can
better resist those germs to which you have been exposed.”
— Rowan Atkinson
Class “Free Speech Activists”