Markov Random Fields Complex dependencies and/or feature functions only visible variables visible and invisible variables Markov Chains Markov Random Fields Hidden Markov Models Conditional Random Fields 1 Chains Introduction to probabilities
Markov Random Field  (MRF) is a set of random variables that satisfies: where   is the set of random variables that are  neighbors  of  That is: The probability of   taking a certain value given the values of the other variables depends only on the values of the variables in the neighborhood of    Def: Markov Random Field 2
Example: Markov Random Field X1, X2, and X3 depend on each other (shouting is always ferociously, purring is never ferociously, shouting is more likely to be by Shrek)   X4 and X5 depend on each other (either both are the empty string, or X4=with and X5=pleasure)   (X1,X2,X3)   and    (X4,X5)      are independent 3 X2 w1: P(w3)=0.02 w4: P(w4)=0.02 P(w1)=0.1 Shrek X1 purrs Shrek P(w2)=0.1 Shrek w3: shouts Shrek shouts w2: purrs X3 X4 X5 ferociously with pleasure ferociously pleasure with w5: pleasure shouts with ferociously ... Puss P(w5)=0.01 world probability ... ...
Example: Markov Random Field 4 X2 w1: P(w3)=0.02 w4: P(w4)=0.02 P(w1)=0.1 Shrek X1 purrs Shrek P(w2)=0.1 Shrek w3: shouts Shrek shouts w2: purrs X3 X4 X5 ferociously with pleasure pleasure w5: pleasure shouts with ferociously Puss P(w5)=0.01 world probability Neighbor sets: ... ... ... ferociously with
We depict the neighborhood as an undirected graph, where the nodes are variables, and   is an edge if  .   Def: MRF graph 5
The probability of   depends only on the value of the predecessor of  The probability of   depends only on the value of the neighbors of  , but neighborhood is symmetric, and the probability is also conditioned on the “future”  . MRFs and Markov Chains 6 Markov Random Fields: Markov Chains:
Syntax: Projection 7 We define Example:
If all probabilities in an MRF are  , then  such that where the   are the maximal cliques in the MRF graph. Every   is a function that takes as input only the variables of the  th clique. Wikipedia/Hammersley-Clifford-Theorem Special case: Factorizable MRFs 8 (We consider only factorizable MRFs)
Example: Factorizable MRF ? ? (might not sum to 1) 9 X2 w1: P(w3)=0.002 P(w1)=0.1 Shrek X1 Shrek P(w2)=0.1 w3: shouts Shrek shouts w2: shouts X3 X4 X5 ferociously with pleasure ferociously with world probability ... ... ... pleasure ferociously shouts ferociously Shrek w3: shouts w3: Shrek pleasure with P(w4)=0.002 P(w4)=0.01
To obtain a value in [0,1], we normalize by  Normalization where   is simply the sum over the products for all possible sequences of values  : Yes, run over all possible sequences   and sum up the products! 10 This ensures  .
Special case: MRFs w/ Features For each clique  , we define feature functions 11 ... Then we define the potentials as: We define weights for the features: These form a vector  and we define  .
MRFs with features 12 With  we have
Def: Log likelihood 13 The log-likelihood of a MRF is “log” “linear” “log-linear model”
Markov Random Fields Complex dependencies and/or feature functions only visible variables visible and invisible variables Markov Chains Markov Random Fields Hidden Markov Models Conditional Random Fields 14 Chains Introduction to probabilities