CC-BY
Fabian M. Suchanek
Knowledge Representation
70
We want to represent knowledge about the world in a computer. No easy feat:
Goal: Make computers understand
2
Knowledge representation is a difficult skill to learn on the job. [...]
The importance of knowledge representation in diverse industry settings [...]
should reinforce the idea that knowledge representation should be a fundamental
part of a computer science curriculum, as fundamental as data structures and algorithms.
Here, we look at the knowledge representation formalism that has evolved as a standard.
—
industry experts from Google, Microsoft, Facebook, Amazon, IBM
Overview
3
•
Entities
•
Classes
•
Relations
•
The gory details
•
Reification
•
Canonicalization
•
The Open World Assumption
•
Reality
An
entity
(also: resource) is anything that may be an object of thought.
Entity
4
>digression
Digression: Entities
5
Is this an entity?
How many entities are there?
Or this?
Digression: Entities
6
>digression
How many entities are there?
Isn’t everything just atoms?
Is this an entity?
Or this?
Over time, all parts of a ship are
replaced at some point of time.
Then, is it still the same ship?
see: Theseus’s ship on Wikipedia
Digression: Identity
7
New York Times
Digression: Identity
8
Humans replace their cells every 7 years.
Over time, all parts of a ship are
replaced at some point of time.
Then, is it still the same ship?
We consider only a finite set of entities that are of interest, and we assume them to be atomic
and identifiable.
An
identifier
for an entity is a string of characters that represents the entity uniquely.
Def: Identifiers
9
Examples for identifiers:
• unique names, as in YAGO or DBpedia:
Rowan_Atkinson_(actor)
• abstract identifiers, as in Wikidata or Freebase:
/m/02jq1
We sometimes say “entity” when we mean the identifier. We sometimes use images for the identifiers.
Try it out!
An
label
for an entity is a human-readable string that names the entity.
Labels that refer to the same entity are called
synonyms
.
Entities that have a label are called
named entities
.
Def: Labels
10
[Atkinson on Wikidata]
identifier
labels (all synonymous)
A label that refers to several entities is called
ambiguous
.
Def: Ambiguity
11
Example: “Paris” is an ambiguous label, as it can refer to
several cities, a greek hero, or people with that name.
Paris reads about Paris in Paris.
Try it out!
Identifier:
Paris_Hilton
Paris_(Greek_myth)
Paris_(city)
Labels:
“Paris Hilton”
“Paris Whitney Hilton”
“
Paris
”
...
“Paris the Hero”
“
Paris
”
“Alexander”
...
“
Paris
”
“City of Light”
“Parigi”
...
>literals
A
literal
is a fixed value that takes the form of a string of characters.
(It is an entity that is identical to its identifier — in all knowledge bases.)
Def: Literals
12
"1955-01-06"
"Hello world"
"42"
This is the number 42, and any knowledge base
will use the identifier "42" to refer to this entity.
Literals can have a
datatype
"1955-01-06"^^xsd:date
"Hello world"^^xsd:string
"42"^^xsd:int
or a
language tag
"Hello world"@en
see example
Overview
13
•
Entities
•
Classes
•
Relations
•
The gory details
•
Reification
•
Canonicalization
•
The Open World Assumption
•
Reality
A
class
(also:
concept
) is a set of similar entities.
Each entity is an
instance
of (also: has the type of, belongs to) the class.
Other classes:
- Scientists
- Cars
- Cities
- Rivers
- Universities
- Theories
- ...
Class
14
(The exact definition of “class” is a philosophical conundrum. See
later
in this lecture.)
Instances
Class “Entertainers”
Rowan Atkinson
Madonna
An instance can belong to several classes.
Multiple Classes
15
“The best way to increase society’s resistance to insulting or offensive speech is to allow a lot
more of it. As with childhood diseases, you can better resist those germs to which you have been
exposed.” — Rowan Atkinson
Class “Free Speech Activists”