Summary Artificial Intelligence
(c) 2002-01-28 Fabian M. Suchanek
http://suchanek.name
This is a summary of the course "Methods of Artificial Intelligence"
held by Prof. Claus Rollinger and Dr. Kai-Uwe Kuehnberger in the
WS 2001 at the University of Osnabrueck. Since the official script
barely gives clear and consistent definitions, most definitions are
just developed by induction and made consistent.
By reading the following text, you accept that the author does not
accept any responsibility forthe correctness or completeness of
this text. If you have any corrections or remarks (especially
concerning the statements marked with a (?)), please
send me a mail. This is the only way to make the publication of this
summary useful for me, too.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Problem Solving
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Aritificial Intelligence (AI):
... studies computations that realize cognitive functions. It's
human _competence_ which serves as a model for AI.
Dimensions of AI:
Types of Rationality: Cognitive approach or Engeneering approach
Types of Competence: Acting or Reasoning
Short history of AI:
AI had a strong co-evolution with psychology
1940-1970: Neural networks, heuristic search (Dartmouth conference)
1960-1980: Domain knowledge: microworlds, expert systems
1980-1990: Distributed processing, neural networks
1990-????: Multiagent systems, artificial social organizations
Production systems:
A production system is a tupel (S,T,C) of
* States (all possible states)
* Transitions (all operations available)
* A Control structure (the unit which executes the transitions (?))
Search:
Solving a problem can be seen as searching for the solution by trying
out operations that might lead to the goal. Generally speaking, we
have an "initial state", lots of possible states and "actions"/
"operators" to transform the current state into another state. We aim
to reach a "goal state" by applying a sequence of operators. We
have a function (a goal test) which tells us if the current state is
a goal state.
Graphs:
A graph is the drawing of a net. It consists of "nodes" (dots),
connected by "edges" (lines). The edges may be "directed"
(indicated by arrows), meaning that they can only be passed in one
direction, and "weighted" (indicated by numbers), meaning that it
costs something to pass this edge. A sequence of connected nodes is
called a path. The sum of the numbers on the edges is called the
"path cost" of that path.
State space:
The state space is a graph consisting of:
* S: all possible states S (drawn as nodes of the graph)
* A: actions leading from one state to another (drawn as the edges
connecting the nodes).
* i: an initial state
* P: a goal test (a function P:S->{true,false})
Solutions to the problem are paths in the state space, beginning at
i and ending at any node for which P returns "true".
Tree:
A tree (in the sense of computer science :-) is a parent node (a dot)
with zero, one, two, or more children. Each of the children is a tree
again. Nodes without children are called "leaves". In this course, we
refer to the average number of children as "b" (breadth) and to the
number of generations (levels) as "d" (depth).
Search tree:
A search tree visualizes the nodes visited by a search algorithm
(s.b.). Its nodes are the nodes of the state space.
Costs:
There are two kinds of costs:
* search costs (offline costs): Resources used to find the solution
* path costs (online costs): Resources used in carrying out the
solution, i.e. path costs as defined in "state space"
Complexity (O-calculus):
The complexity of a function roughly describes the course of this
function. For instance, f(x)=2x^2+3x-10 roughly behaves like x^2.
One writes: f E O(x^2).
(Here, "E" means "is an element of" and "^" means "to the power of")
Mathematically speaking, the complexity of a function f is
another function g such that:
There is a constant c and
there is a starting value x0 such that
for all x>x0: f(x)= solution_depth
Optimal: no
Time complexity: O(b^n)
Space complexity: O(b*n)
Bidirectional search:
Un-informed search, means searching with breadth-first search from the
goal and from the initial state until the two searches meet.
Advantage: The search depth of each search is d/2, the time and
space complexity b^d are thus reduced drastically
Disadvantages: The goal has to be known and it has always to be
checked whether the two searches have met
Complete: yes
Optimal: yes
Time complexity: O(b^(d/2))
Space complexity: O(b^(d/2))
Best-first search:
Informed search which can estimate the costs from a current node
to the goal node. It first expands the most promising nodes.
Greedy search:
A best-first search, which always first expands the node
which appears closest (i.e. cheapest) to the goal. It does not take
into account the costs of the path from the initial state to the
current state.
Complete: yes, provided that the tree is finite and that cycles are
checked
Optimal: no
Time complexity: O(b^d)
Space complexity: O(b^d)
A* search:
A best-first search which first expands the node which appears
to be on the best path from the initial node to the goal node. A* thus
takes into account:
The real cost from the initial node to the current node n: g(n)
The estimated cost from current node n to the goal node: h(n)
A*'s guess about the usefulness of the current node n is thus:
f(n)=g(n)+h(n)
Complete: yes
Optimal: yes
Time complexity: O(b^d) (?)
Space complextity: O(b^d) (?)
Admissible heuristics:
In order to work properly, A* needs a heuristic h which fulfills
certain conditions. Suppose h*(n) are the real costs from node n to
the goal, then it must hold: h(n)==h1(n) for all nodes n. As a result, h2 is more
realistic, without overestimating the costs. If A* is run with h2,
it will expand less or exactly as many nodes as with h1.
Pathmax-search:
Sub-type of A* which uses the pathmax lemma. In ordinary A*, it
may occur that the costs f(n) decrease along a path, i.e. the
total path costs become cheaper the more you walk (it is a
non-monotistic heuristic). To avoid this, do not assign
f(n)=g(n)+h(n), but use f(n)=max( g(n)+h(n) , f(predecessor(n)) )
(pathmax lemma). As a result, f will always be non-decreasing along
the path, taking care that A* focuses on expanding the optimal path.
Hill-climbing:
Informed search which just always chooses the node which appears best.
No stepping back is done. When all nodes around appear worse than the
current node, terminate. Analogy: Searching the top of the Mount
Everest by just always chosing the direction where it seems to go up.
Complete: no
Optimal: no, in Osnabrueck, you would end up on the Westerberg
Time complexity: O(d) where d is the number of steps to the next hill
Space complexity: O(1), nothing is kept in memory
Problems: Lots. The algorithm only finds a local maximum (a hill) and
can get lost in a plateau
Iterative improvement search:
Take a single complete but suboptimal solution and try to improve it
by local modifications.
Space complexity: O(1), just keeps one single path in memory.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Constraints
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Constraint satisfaction problems (CSPs):
Unlike in search strategies, something is known about the goal
test in CSPs: The goal fulfills certain conditions (constraints)
and we can calculate to which extend (fully, partially, not at all)
the current state already fulfills these conditions.
A CSP consists of a number of variables which have to be assigned
values such that the constellation fulfills the goal conditions.
In the initial state, all variables are unassigned. Each variable has
a set of possible values. When solving the problem, we will decrease
the number of possible values for each variable until there is only
one possible value for each variable (or more, if there are more
solutions to the problem).
Constraints:
Constraints can be
* unary, i.e. of the form predicate(variable). This means that all
values which could ever be assigned to this variable must fulfill
a constraint. The set of values satisfying these unary constraints
is called the "domain" of this variable.
* binary, i.e. of the form predicate(var1,var2). This means that for
every possible value of var1, there must be a possible value of
var2, such that a relation is fulfilled.
Consistency:
* A variable is node-consistent if its set of possible values only
contains those which fulfill the unary constraints for that variable
* A pair of variables is arc-consistent if all their binary
constraints p are fulfilled, i.e.: For every val1 out of the first
variable's set there is a val2 out of the second variable's set
such that p(val1,val2).
* The problem is path-consistent if for each variable, a value can be
chosen out of its set of possible values such that all binary
constraints are fulfilled at the same time.
This translates to: For all 3-tupels (A,B,C) of variables I can
choose, the following holds: If there is a pAC(a,c) with a and c
out of the values of A and C, then there is a value b out of B
such that pAB(a,b) and pBC(b,c).
Consistency algorithm:
1. Make all variables node-consistent by kicking out the values which
do not fulfill the unary constraints (domain restriction operation)
2. Make all pairs arc-consistent by kicking out the values which
have no correspondant in the other variable such that the binary
constraints are fulfilled (arc consistency domain restriction
operation)
3. Choose a value for each variable out of the sets of remaining
values such that the problem is path-consistent.
Path consistency:
Establishing path-consistency is non-trivial: There are multiple
possible combinations of variable and values, but only some fulfill
all binary constraints at the same time. How can we pick the one
which does? Here are some observations:
* The order of variable assignments does not change the solution
* A violated constraint cannot be corrected by further assignments.
This leads to a simple cutoff criterion
Forward checking:
A strategy for establishing path-consistency.
Path consistent assignments can be found by keeping track of the
possible values for each variable which have not yet been tried out.
When all possible values of one variable have been tried out and no
solution has been found, there is none.
Most constrained variable heuristic strategy:
A strategy for establishing path consistency, extending the forward
checking strategy.
The variable with the fewest possible values is chosen next to have a
value assigned.
Most constraining variable heuristic strategy:
A strategy for establishing path consistency, extending the Most
constrained variable heuristic strategy.
The variable involved in the largest number of constraints on other
unassigned variables is chosen to have a value assigned.
Least constraining value heuristic strategy:
A strategy for establishing path consistency, extending the Most
constraining variable heuristic strategy.
Chose a value that kicks out the smallest number of values for the
connected variables.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Game Playing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Game playing:
A game in sense of computer science is an initial state, a set of
legal moves transforming the current state to another state and a
valuation function which tells us if the current state is a game
over state and who won (?). Playing a game means that two parties
alternately perform moves. The game playing problem is finding good
moves in order to win the game. This problem differs from
classical search since there is an opponent whose moves cannot
be predicted. Just for fun, we'll call the starting player MAX and
the other one MIN.
Types of games:
Randomize dimension: Is the game deterministic or is chance involved
(as in games with a dice)?
Information dimension: Does the player have perfect information or
imperfect information (like in card games where
the opponent's cards are unknown)?
Searching the optimal move:
A game tree can be unfolded with the initial game state in its top
node and all possible states after MAX's move in the 1st level.
The next level is made up by all possible states after MIN's
move etc. We just consider legal moves. A leaf is reached when the
game is over.
MiniMax Algorithm:
An algorithm for determining which moves in a deterministic, perfect-
information game would be most useful for MAX in order to win. First,
the whole tree has to be unfolded for every possible sequence of
moves. Then, the final states are examined and given a "utility"
(a valuation, eine Bewertung): +1 if MAX won, -1 if MIN won and 0 in
case of a remi. Other values could be imagined, depending on how well
MAX won or how badly he lost. Now these values are propagated up:
Beginning at the last but one level, we go up. When we are on one of
MAX's levels, we chose the maximum value of the node's children, when
we are on one of MIN's levels, we chose the minimum value of the
node's children. Thus, we assume that each player always chooses the
optimal move for himself: MIN wants to reach a -1-leaf and MAX heads
towards a +1-leaf.
Complete: yes, if the tree is finite
Optimal: yes, assuming an optimal opponent
Time complexity: O(b^d), all nodes are visited
Space complexity: O(b*d), since for every level, one node with all its
brothers is kept in memory (?)
Problem: For really tricky games (chess), it is impossible to unfold
the whole game tree since b=35 and d=100.
Minimax-Cutoff Algorithm:
Simplification of the Minimax Algorithm: The game tree is not
unfolded to the leaves, but only to a given level. The nodes of
the last expanded level are then given an estimated utility
instead of a real one.
Alpha-beta principle:
If you have an idea that is surely bad, do not take time to see how
truely awful it is. (Nilsson)
Alpha-Beta Algorithm:
Improvement of the Minimax Algorithm, driven by the alpha-beta
principle. That is: If a player has the opportunity of choosing a
good move (i.e. a high value for MAX or a low value for MIN), it is
not necessary to also unfold the worse alternatives. Imagine we are
on MAX's level and he can choose between a node with the utility 15
and another node with an unknown utility in the underlying MIN-level.
We expand the unknown node and its first child has the utility 3.
Then we know that MIN will either choose this 3 or expand the other
children. In any case, his choice will be =< 3. Since we already have
a good alternative (the 15), there is no use figuring out what MIN
might have chosen if we had taken the unknown alternative. Cutting
away these unnecessary alternatives is called alpha-beta pruning.
We call MAX's choices alpha-values and MIN's choices beta-values.
The efficiency of alpha-beta pruning also depends on the order in
which we evaluate the alternatives (move ordering).
Complete: yes, if the tree is finite
Optimal: yes, against an optimal opponent
Time complexity: O(b^(d/2)), since the depth is divided by two
on average when move ordering is used
Space complexity: (?)
Nondeterministic Minimax Algorithm:
Adjustment of the Minimax Algorithm for non-deterministic games:
Each move by MIN or MAX is followed by a "chance": The result state R
of a player's move has again n children (possible successive states,
with a dice, n would be 6) with a probability assigned to them. The
Minimax algorithm now multiplies the utility of each of these
successive states with their respective probabilities. The sum of
these products is then the utility of node R.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cognitive Search
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Theories on the productiveness of human thought:
* Behaviorist: The human mind works by trial and error
* Gestaltist: Solutions are found by data-driven
perceptual organization (insight)
* Newell & Simon (1972): The mind searches for the solution in a
state space (problem space hypothesis)
Weak strategy: An uninformed, but complete strategy for searching
in a state space.
Strong strategy: An informed, but incomplete strategy for searching
in a state space.
Means-End-Analysis: A strategy for solving a problem which works
as follows:
* If the current state is the goal state, we're done
* Choose the operator which covers the largest distance to the goal
* In order to make this operator available, the necessary
preconditions of this operator need to be fulfilled. Make these
preconditions a subgoal and achieve this subgoal by applying
Means-End-Analysis again
GPS, General Problem Solver: A program written by Newell & Simon in
1972 which solved a problem by means-end-analysis.
Expert chess players and novices:
Expert chess players do not unfold more nodes of the state space
(check out more possible moves) and they do not unfold more levels
of the state space (check out more possible concequences) than novice
chess players. Experts just choose the better moves and thus cut down
their search tree.
Visual ability of expert chess players:
Expert chess players are better at memorizing and reproducing a
constellation of chess pieces on the board. But this only works
for chess game constellations -- both experts and novice fail in
memorizing and reproducing random constellations. Experts have
just learned to memorize common configurations of pieces as
one single perceptual unit.
Chunk theory: The human working memory can hold about 7
units of information. These units may be words, numbers, letters
or concepts. Years of practice establish these units.
Sequential access to lists:
Sternberg found out in 1969 that when people are to memorize lists,
they can access the first element fastest ("anchor effect"). The
more remote the required item is from the beginning, the longer
it takes to access it.
Sequential access to images:
Kosslyn found in 1981 that when people memorized a map and
are asked to shift their attention from one point in the map to
another, the time needed is proportional to the distance in the
map.
Isomorphic problems: Two problems are isomorphic if they are
essentially the same.
Example: The two-player game of choosing a number from the
set of {1,2,3,4,5,6,7,8,9} alternatedly in order to get three
numbers which sum up to 15 is isomorphic to TicTacToe:
The numbers can be arranged in a 3x3 matrix and the number-
choosing game can be reduced to getting 3 numbers in a row
-- like in TicTacToe.
Cognitive treatment of isomorphic problems: As analyzed by
Kotovsky, Hayes & Simon in 1985, two isomorphic problems
are represented differently in the human brain and may be of different
cognitive complexity. The working memory load and the ease
of learning and applying state operators is highly
dependend on the chosen representation of the problem.
Modus ponens: The inference that if it holds that p implies q and it
holds that p, then it follows q.
Example: When it rains, the street gets wet. It is raining.
Hence the street gets wet.
Modus tollens: Another way of expressing the modus ponens.
If it holds that p implies q and it holds that q is false, then it
follows that p must be false, too.
Example: When it rains, the street gets wet. The street is not
wet. Hence it does not rain. (Because if it rained, the street
would be wet.)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Knowledge Representation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The most important representation formalisms:
* PROLOG
* Predicate logic (PL)
* KL-ONE language family
* Semantic nets
* Frames
* Terminological systems
* Production systems
* PROLOG
Object, entity, member, instance: Anything.
Relation: A relation on two sets is a subset of the cartesian
product of these sets. In terms of functions, a relation is
a function which gets two objects and returns a truth value.
Possible writings for a relation R and two objects a and b:
a R b,
R(a,b),
R(a,b)=true,
(a,b) E R,
E R,
a is a R of b.
Concept, category, class: A union of objects which are in some
way similar.
Membership: The fact that an objects belongs to a category. Possible
writings for object o and category C:
o is a member of C,
o is an instance of C,
o realizes C,
o is a C,
o is in C,
o is an element of C,
o belongs to C
C includes o.
Ontology: A system of categories.
Importance of an ontology: An ontology can make information tools of
the future "understand" with what they are dealing. Domain
information would be encoded precisely and queries could be
formulated in a more precise way than by keywords. Postulations
for an ontology:
* It must be extendible to fit other domains and jargons
* It must have a simple language to enable users to understand it
* It must be HTML compliant in order to be integrated into the Web
Artificial world: AI scientists' playground.
Agent: An actor in an artificial world. He usually gets sensory
inputs from the world and performs actions changing the state of
the world. He has (or builds up) knowledge about the world,
including the current state, the results of his actions and
condition-action rules. Upon this knowledge and sensory inputs he
decides which action to take.
Properties of an artificial world environment:
An artificial world environment is called
* Accessible, if the sensors detect all aspects of the environment
that are relevant to the choice of action
* Deterministic, if the next state is completely determined by
current state and the action selected
* Static, if the environment does not change while the agent is
deliberating
* Discrete, if there is a limited number of distinct, clearly
defined percepts and actions
Wumpus world: An artificial world. Consists of a 2-dimensional matrix
with an agent, a monster ("wumpus"), several pits and a
bunch of gold, each entity occupying one field of the matrix.
Goal: The agent has to find the gold and avoid the pits and the
wumpus.
Actions: Go, turn, shoot (once)
Inputs: A "breeze" when any field around him is a pit, a "smell"
when any field around him is a wumpus and a "glitter" when
he is on the field with the gold.
Environment:
* not fully accessible, the agent can only get local inputs
* deterministic, all results of actions are specified
* discrete, there are no fuzzy states
* static, the wumpus and the pits do not move
Procedural approach: Telling the agent how to act. This implies a
conventional imparative programming style.
Declarative approach: Telling the agent what to do. The agent will
then infer the way to the goal by his domain-specific knowledge, his
sensory inputs and a built-in generic inference engine, which is
domain-independent. This implies a logic programming style
(PROLOG).
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Predicate Logic
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Syntax: A set of rules defining the well-formed expressions of a formal
language.
Semantics: A mapping from formal language expressions to real world
facts.
Proposition: An abstract representation of a fact.
Propositional logic: A special formal language which can work on
propositions.
Propositional logic formula:
* An uppercase letter (standing for a proposition) OR
* A string of the form ~X where X is a formula OR
* A string of the form A /\ B where A and B are formulas OR
* A string of the form A \/ B where A and B are formulas
First order logic, FOL, predicate logic, PL: A special formal
language which extends propositional logic.
FOL variable: An undetermined component in FOL, written as a
lowercase letter (usually x,y,z).
FOL function: A function from terms to terms, written as a
lowercase letter (usually f,g,h) or with a name.
Examples: sin, f, sqrt, max, ...
FOL constant: A 0-place FOL function, usually given a proper
name.
Examples: wumpus, rollinger, 3, 5.0, ...
FOL term:
* a FOL variable OR
* a FOL function call
Example: sin(sqrt(max(x,y)))
FOL predicate: A relation on FOL terms, written
as an uppercase letter (usually P,Q,R).
Examples: P(x), Greater(7,sqrt(3))
FOL formula:
* A string of the form P(t1,t2,...tN) where P is a predicate and ti
are terms OR
* A string of the form ~X where X is a formula OR
* A string of the form A /\ B where A and B are formulas OR
* A string of the form A \/ B where A and B are formulas OR
* A string of the form "All x: v" where x is a variable and v is a
formula OR
* A string of the form "Ex x: v" where x is a variable and v is a
formula
Bound FOL variable: A variable is bound in a formula Ex x: V if it
occurs in V. (Same with All x: V)
Free FOL variable: A variable is free if it is not bound.
FOL sentence: A formula without free variables.
Prenex form: A FOL formula with all its occurring quantifiers at the
beginning. For any FOL formula, there exists an equivalent formula
in prenex form. A FOL formula can be transformed to prenex form as
follows:
1. Rename all bound variables such that they are unique in the
formula
2. Move all quantifiers to the beginning. With outer-level AND- and
OR-expressions, this can be done by just wiping out the quantifier
and adding it at the beginning. All complex formulas have to be
transformed to simple AND- and OR- formulas.
Example: (Ex x: v) => a ~~~~~~> All x: ~v \/ a
because: (Ex x: v)=> a <=> ~(Ex x: v) \/ a <=> (All x: ~v) \/ a
Skolem form: A prenex form without any existential quantifiers. For
every prenex form there is a Skolem form, which is not semantically
equivalent, but equivalent with respect to satisfiability.
Existential quantifiers in a prenex form v can be removed as
follows:
1. If there is no existential quantifier in v, we're done
2. Scan the formula up to the first existential quantifier, keep
track of all universally quantified variables so far
3. Choose a function symbol which does not yet occur in the formula
4. Replace all occurances of the existentially quantified variable
by f(x1,x2,...xN), where f is the new function symbol and
x1,...xN are the preceding universally quantified variables
5. Wipe out the existential quantifier
6. Go to 1
Literal: A string of the form P(t1, t2, ...tN) where P is a FOL
predicate and ti are FOL terms.
Conjunctive normal form, CNF: A FOL formula is in conjunctive normal
form if it is of the form
(A1 \/ A2 \/... AN) /\ (B1 \/ B2 \/... BM) /\ ...
where all symbols are literals.
Clause: A set of literals. It stands for the disjunction of these
literals.
Example: (a \/ b \/ c) ~~~~~> {a, b, c}
Clause form: A set of clauses. The clause form is another
way of writing a conjunctive normal form.
Example: ( a \/ b) /\ (c) ~~~~> { {a,b}, {c} }
Substitution pair: A pair of a variable and an identifier. The
variable is supposed to be replaced by the identifier in a formula.
Example: [variable/identifyer]
Substitution: Applying a substitution pair to a formula, i.e.
replacing the variable indicated in the substitution pair by the
identifier throughout the formula.
Example: (a /\ b \/ x) [x/c] = (a /\ b \/ c)
Unifier: A set of substitutions is a unifier of a set of literals, if
applying the substitutions to the literals makes all literals equal.
Example:
Literals: L1 = P(x,a), L2 = P(b,y),
Unifier: { [x/b], [y/a] },
result: P(b,a)
Unifiable: A set of literals is unifiable if there is a unifier for it.
Most general unifier, MGU: A unifier of a set of literals is the most
general unifier if for any other unifier, there is a substitution
which turns the result of the MGU into the result of this other
unifier. Carelessly speaking, the MGU is the unifier which
eliminates the least variables.
Example: For { P(x), P(f(y)) }, the unifier { [x/f(y)] } is more
general than {[x/f(a)], [y/a] }, because the result of the first
(P(f(y)) can be turned into the result of the second (P(f(a))) by
applying [y/a].
Warning: This seems to be explained wrongly in the script: The
script talks of applying the substitution to the unfiers instead of
to the resulting literals!
Resolvent: A clause R is the resolvent of two clauses C1' and C2' if
* C1' and C2' can be put thru substitutions such that the results C1
and C2 do not contain common variables
(Concretely: Rename the variables in C1 and C2 to make the names
unique)
* There exist literals L1,L2,...LN in C1 and ~L1', ~L2', ...~LN' in
C2 such that { L1, L2,...LN, L1', L2', ... LN' } is unifiable. Let
s be the MGU of this set.
(Concretely: Find literals in C1 that have negated counterparts in
C2 or vice versa)
* R = (C1 - {L1, L2, ...LN}) U (C2-{~L1', ~L2', ... ~LN'}) s
(Concretely: Substitute variables in C1 and C2 to make L1 look
like L1' etc., then wipe out all these literals in both C1 and C2,
what you get is the resolvent)
Resolution: Finding a resolvent for two clauses. If the resolvent is
an empty clause, then this indicates that the two clauses are not in
all cases both true, i.e. ~ ( All x1: All x2:... All xN: C1 /\ C2).
This in turn means that there is an instantiation of the variables
such that C1 /\ C2 becomes false. Imagine you knew that C1 is true in
all cases and Q=~C2. Since there is a case where C1 /\ C2 is false,
in this case C2 must be false, which is equivalent to saying that
there is a case where Q is true. If C1 is your knowledge base (true
in all cases) and Q is a query, this is the way to prove that there
is an answer to Q.
Resolution strategy: Heuristic rules which specify the order of
resolving the clauses. Can be used to speed up resolution.
Unit clause resolution rule: Resolve those clauses which consist of
one literal first.
Linear resolution proof: A linear resolution proof for a set S of
clauses is a sequence of clauses ending in the empty clause where
each clause is
* a resolvent of the preceding clause and a clause of S OR
* a resolvent of the preceding clause and any other preceding clause
Restricting resolution: Avoiding certain resolution steps in
order to restrict the search space. As a disadvantage, the
resolution may become incomplete.
Horn clause: A clause which contains at least one positive literal.
Example: { ~a, ~b, c } is the same as ~a \/ ~b \/ c,
which is a /\ b => c.
Definite clause, program clause: A clause which is an entry in a
knowledge base.
SLD resolution: Linear resolution of horn clauses with a selection
function for definite clauses.
Strategy of coercion: Chose an action which turns the current state
safe no matter what it was before. (?)
Diagnostic rule: A rule infering the cause from the effect.
Example: The street is wet. Hence it is raining.
Causal rule: A rule infering the effect from the cause.
Example: It is raining. Hence the street is wet.
Situation calculus: An extension of FOL which can represent changes
of facts by adding a situation argument to each predicate.
Example: unhappy(bob,tue2pm), where tue2pm denotes a situation.
Result function: A function receiving an action and a situation and
returning another situation, which is the result of applying the
action in the original situation.
Example: result(eatChocolate,s) = s1, where it holds that
happy(agent, s1)
Effect axiom: A rule describing the consequence of an action.
Example: All s: hasChocolate(agent,s) =>
happy(agent,result(eatChocolate,s))
Frame axiom: A rule describing the unchanged environment.
Example:
All s: stillHasToShootBloedenWumpus(agent,s) =>
stillHasToShootBloedenWumpus(agent,result(eatChocolate,s))
Frame problem: The problem of finding an elegant way to handle non-
change. This includes avoiding the writing down of numerous frame
axioms and the copying of frame inferences in every step.
Qualification problem: The problem of formalizing facts exhaustingly.
Ramification problem: The problem of including all possible
consequences of an action (?).
Successor state axiom: A rule stating the truth conditions of a
predicate in a specific situation by listing the ways in which this
predicate might have become true or might have remained true.
Successor state axioms solve the frame problem.
Example: All s: All a: happy(agent,result(a,s)) <=>
(hasChocolate(agent,s) /\ a = eatChocolate)
\/ happy(agent,s).
This states that the agent becomes happy by eating chocolate or by
being happy in the preceding situation.
Problem solving with a knowledge base: The knowledge base contains
the definite clauses about the world. Then a query of the form
"goalAchieved(s)" is asked, i.e it is asked whether there exists a
situation s in which the predicate "goalAchieved" holds. The
knowledge base now finds such an s. Since we always defined
situations by the result function applied to a preceding situation,
s will be of the form "result( action1, result(action2, result
(action3, ........, result(actionN,s0))))....) and thus implicitely
specify a sequence of actions.
Problem solving by plans: In the above description, the result of the
query is quite unhandy to read. That's why one defines a function
planResult, which gets a sequence of actions and a situation and
returns the situation, which is the result of applying all the
actions.
//Applying no action results in the same situation
All s: planResult([],s) = s
//Apply the rest of the plan to the result of the 1st action
All s,p,a: planResult([a|p],s) = planResult(p,result(a,s))
Plan: A sequence of action steps.
And-Introduction: The inference that "a /\ b" when it holds that "a"
and it holds that "b".
Example: big(bob), big(bill) ~~~~~~> big(bob) /\ big(bill)
Universal elimination: The inference that a universally quantified
proposition holds for a specific constant.
Example: All x: toBe(x) \/ ~toBe(x) ~~~~~~~~>
toBe(cogsci) \/ ~toBe(cogsci)
Proof: A sequence of literals for which it holds that
* the last literal is the proposition to be proved
* every literal has been gained by applying inferences to preceding
literals.
Proving as searching: A proof can be found by searching a state
space. Its nodes are sets of literals and its operators are
inferences. The goal test just checks whether the desired literal is
present in the current state. Problem: There is a high branching
factor, especially for universal elimination.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
KL-ONE
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
KL-ONE: A formalism for representing knowledge. It consists of
KL-ONE membership propositions and KL-ONE concepts.
KL-ONE membership proposition: A string of the form
o E C
where o is an object and C is a KL-ONE concept.
In PL: C(o)
KL-ONE concept:
* a concept symbol (mostly an uppercase letter) OR
* an intersection of concepts OR
* a union of concepts OR
* a negation of a concept OR
* a value restricted concept OR
* an existential value restricted concept
Intersection of concepts C1 and C2: The concept which includes all
those objects which belong to both C1 and C2.
In KL-ONE: C1 AND C2, where AND is a square-shaped logical "and"
In PL: The concept of those x for which C1(x) /\ C2(x)
Example: Male AND Parent = Father
Union of concepts C1 and C2: The concept which includes all
objects which belong to C1, to C2 or to both.
In KL-ONE: C1 OR C2, where OR is a square-shaped logical "or"
In PL: The concept of those x for which C1(x) \/ C2(x)
Example: Boy OR Girl = Child
Negation of concept C: The concept which includes all objects
which are not in C.
In KL-ONE: -C
In PL: The concept of those x for which ~C(x)
Example: -Alive = Dead
Value restriced concept: A concept consisting of objects o
for which all those objects which fulfill a given relation with o
belong to another given category.
In KL-ONE: All R:C, meaning those objects o, for which all c with
R(o,c) belong to C.
In PL: The concept of those x for which All y: R(x,y) => C(y)
Example: All Child:Male = Those_with_only_male_children
Existential value restricted concept: A concept of objects o
for which there is another object of a given category which
fulfills a given relation.
In KL-ONE: Ex R:C, meaning those objects o, for which there exists
a c of C, such that R(o,c).
In PL: The concept of those x for which Ex y: R(x,y) /\ C(y)
Example: Ex Child:Male = Those_who_have_at_least_one_son
Terminology T: A formalism extending KL-ONE.
Two more signs are used:
* C1 < C2 (a square-shaped less-equal sign), indicating that all
objects of C1 also belong to C2 (partial definition)
In PL: All x: C1(x) => C2(x)
* C1 = C2 (an equal sign with a dot above it), indicating that
all objects of C1 also belong to C2 and vice versa (total
definition)
In PL: All x: C1(x) <=> C2(x)
Overall aim: Finding an algorithm which can decide whether
one concept is a sub-concept of another concept. In order to do so,
we will first simplify T definitions and then transform the problem
to a constraint problem.
Terminology T*: A restriction of the terminology T, no partial
definitions are used.
Transforming T into T*: Turning all partial definitions into
total definitions. If the partial definition is "C1 < C2", then
the total definition would be "C1 = C2 AND C1*", indicating that
something C1-specific (namely C1*) picks all C1's out of the C2's.
C1* remains undefined.
Example:
// all CogScis are students
CogSci < Student
// There is something CogSci specific CogSci*, such that
CogSci = Student AND CogSci*
Expansion: A KL-ONE concept defined by only undefined concepts.
Getting an expansion by the help of T* definitions:
Plugging the T* definitions together to get an expansion.
The expansion is well-defined if there are no cycles in the T*
definitions and each concept occurs only once on the right side
of the definitions.
Example:
// Partial definition of "Student" by undefined concept "Human"
Student < Human
// Partial definition of "CogSci"
CogSci < Student
// Transformation to total definitions
Student = Human AND Student*
CogSci = Student AND CogSci*
// Transformation to an expansion
E(CogSci) = Human AND Student* AND CogSci*
Inconsistency: A concept is inconsistent if no object can belong
to it.
Subsumption: A relation between two concepts indicating that all
objects belonging to the first concept also belong to the
second concept (the first is a sub-concept of the second). The
relation is written down as an oddly-shaped less-equal sign, here
represented by a " S U { (x:C1 AND C2),(x:C1),(x:C2) }
intersective constraints are split up
* S U { (x:C1 OR C2) } ~~~> S U { (x:C1) } or S U { (x:C2) }
union constraints result in two possible constraint systems
* S U { (x:(Ex R:C)) } ~~~> S U { (x:(Ex R:C)), (xRy), (y:C) }
existential constraints add a new constrained variable
* S U { (x:(All R:C)) } ~~~> S U { (x:(All R:C)), (xRy), (y:C) }
Satisfiability: A constraint system is satisfiable if there
is an instantiation of variables such that all constraints
are fulfilled. This translates to: There is no elementary
contradiction within the constraint system.
Subsumption algorithm: An algorithm deciding whether C1 is a
sub-concept of C2.
0. The problem is C1 R(a,c)
Symmetric relation: A relation R on a set S for which it holds that
All a,b E S: R(a,b) => R(b,a)
Asymmetric relation: A relation R on a set S for which it holds that
All a,b E S: R(a,b) => ~R(b,a)
Antisymmetric relation: A relation R on a set S for which it holds
that
All a,b E S: R(a,b) /\ R(b,a) => a=b
Complete relation: A relation R on a set S for which it holds that
All a,b E S: R(a,b) \/ R(b,a)
This means that the relation is defined for all pairs in the set.
Partial relation: A relation which is not complete.
Order relation: A reflexive, antisymmetric and transitive relation.
Partial order relation: An order relation which is not complete.
See also: statistik.txt at "Eigenschaften von Relationen" for a
complete German list of relations and examples.
Supremum: A function which receives two elements of a partially
ordered set and returns an element of this set, which is the least
upperbound of the two. If there is no minimum among the upperbounds,
the function fails.
Example:
If the set is the set of natural numbers, then sup(4,19) = 19
If the set is the set of characters and the relation is the lexical
sorting relation, then sup('a','A') fails: Both 'b' and 'B' are
"least upperbounds" and we cannot compare them in order to find out
which one is the minimum.
Infimum: A function which receives two elements of a partially
ordered set and returns an element of this set, which is the maximum
lowerbound of the two. If there is no maximum among the lowerbounds,
the function fails.
Lattice: A partially ordered set where for all possible pairs of
elements, sup(a,b) and inf(a,b) exist.
Complete lattice: A partially ordered set where every subset has an
infimum and a supremum.
Lattice graph: A graph whose nodes are concepts. It has a top element
(standing for a most general concept compromising all objects) and
a bottom element (standing for a most restrictive concept with all
attributes but no objects). A connection from an upper node to a
lower node means that the lower concept is a sub-concept of the
higher concept. Such lattices are complete.
Formal concept analysis: Determining extensions and intensions of
concepts by a lattice graph.
Semantic net: A graph whose nodes are concepts, instances and
properties. Concepts are connected by "is_a"-connections,
indicating that one concept (usually the lower one) is a subconcept
of the other. Instances (usually at the bottom) are connected
to their concept by an "instance"-connection. Attributes are
connected to concepts by a "has_attribute"-connection.
Inference in a semantic net: Following the links. There are two
ways: Intersection search and inheritance search.
Intersection search: Finding the relations between two objects by
starting at both nodes in the semantic net and trying to meet the
two searches.
Advantages:
* entity-based organization
* fast parallel organization
Inheritance search: Finding the properties of an object by following
up the concept links and collecting all attributes.
Monotonic logic: A reasoning where every concept has the properties of
all its super-concepts.
Non-monotonic logic: A reasoning where a concept can miss a property
which one of its super concepts has.
Default-reasoning: A way of handling non-monotonic logic. Properties
are attached to concepts in a semantic net as usual, but subconcepts
which miss a property of a superconcept are explicitely marked so.
When an inference is to be drawn, the links are followed upwards
from subconcepts to superconcepts and the first occurance of a
property overrides all following occurances of that property.
Example:
Birds can usually fly.
Penguins are birds.
Penguins can't fly.
Tweety is a Penguin.
Hence: Tweety cannot fly.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Conceptual Dependency and Scripts
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Object-Attribute-Value representation: A way of stating that the
attribute of an object has a specific value, namely by writing:
Object -> Attribute -> Value
OAV-representations are more general than representations of the
form attribute(object,value) and can be arranged in a net.
Event representation: A way of describing a sentence by OAV-
representations. The event itself is seen as an instance of the
verb (i.e. -> "instance" -> ). All participating
instances stand in a certain relation to the event, e.g.
"agent", "receiver", ...
Surface structure of a sentence: The phrase marker of a sentence.
Deep structure of a sentence, case frame: A list of the following
variables with their values for that sentence:
* Predicate P - the action
* Time T - the time of the event
* Agent A - the actor of the event
* Counter Agent C - force against which the action is carried out
* Object O - the entity that moves or changes
* Result R - the entity that comes into existence as a result
* Instrument I - immediate physical cause of an event
* Source S - the place from which something moves
* Goal G - the place to which something moves
* Experience E - the entity that experiences the effect
(invented by Fillmore)
Conceptual dependency, CD: A formalism by Roger Schank to represent
the meaning of a sentence independent of its surface structure. The
event described can be built up from primitive components and can
be arranged in a CD graph.
CD graph:
Mostly a graph of the form
<=> <----