# The Universal Replacement System

This essay introduces a Turing-complete formalism that aims to be as intuitive as possible.

By reading this essay, you acknowledge that the author does not accept any responsibility for the completeness or correctness of this essay. If you have comments, the author would be happy about a mail to fabian@domain.

## Short Introduction

The Universal Replacement System (URS) is a formalism that allows describing other formalisms. URS is based on rules. A rule says that any string that matches its left hand side shall be replaced by the right hand side. Here is an example:
`Bob loves Alice (Alice ~> Clara)`
... becomes...
`Bob loves Clara (Alice ~> Clara)`
If you want to try that out, have a look at the online demo.
If you want to play with it offline, download the Java code.
If you want to understand the details, read on!

## Introduction

### Background

In science and in mathematics, we use formal languages to make formal statements. Some of these languages are
• First Order Logic
• Predicate Logic
• Set theory
• Lambda calculus
• Programming languages

These languages are by themselves an object of study. In particular, these languages are defined formally. For example, to define a β-reduction in the λ-calculus, one writes:

((λ V.E) F) ~> E[V := F]
The goal of URS is to formalize this language, i.e., to formalize the language that we use to describe other formal languages.

### Goal

The goal of URS is to formalize the meta-language in which we describe other formal languages. In particular, the formalization shall be
1. as close as possible to the language we usually employ
2. powerful enough to describe as many languages or formalisms as possible
3. expressive enough to deal with different types of syntaxes
4. general enough to deal with the co-existence of different formalisms at the same time

Most formalisms are Turing-complete, and so will URS. Thus, the goal cannot be to be more expressive in terms of the Turing hierarchy. Rather, the goal is to provide a convenient means of formalization for such languages.

The study of formal languages is a very established field in mathematics and computer science. Yet, none of the popular languages is expressive enough to define the other languages in a convenient way:
Grammars
Grammars can generate arbitary formal languages. Yet, grammars do not know the concept of variables. For example, we cannot write things like
x*(y+z) → x*y + x*z
It is very inconvenient to express that an x on the left-hand side of a rule shall be filled with the same value as the x on the right-hand side of the rule.
Term Rewriting Systems
Term Rewriting Systems were designed to express transformation on prefix terms. Yet, term rewriting systems cannot deal with languages that are not prefix, such as algebra or set theory. In a term rewriting system, we cannot say
x ∈ {x, ...} = true
Furthermore, we cannot deal with lists of things, unless we rewrite them in a cons-notation.
Prolog
Prolog is a programming language that can be used check and generate formal languages. Yet, much like term rewriting systems, Prolog is limited to a prefix notation of terms. Furthermore, it does not allow using non-latin symbols such as ∈ or {}. Last, its rules can be counter-intuitive. Consider, e.g., the following rules:
replace(X,Y,[X|Rest1],[Y|Rest2]) :- replace(X,Y,Rest1, Rest2).
replace(X,Y,[Z|Rest1],[Z|Rest2]) :- replace(X,Y,Rest1, Rest2).
replace(X,Y,[],[]).
These rules define a replacement system. Yet, the second result of `replace(a,z,[a,b,c],L)` will be `L=[a,b,c]`. This is because the second rule does not explicitly exclude that `Z=X`. This is counter-intuitive.
None of these formalisms (except maybe Prolog) handles the concept of locality, meaning that certain transformations apply in certain formal languages, but not in others.

## Definition of URS

This section will introduce a new formalism, the Universal Replacement System (URS). The goal of this formalism is to be as intuitive as possible, while at the same time being powerful enough to subsume and express most existing formalisms.

### URS Terms

URS terms are defined as follows:
• a parenthesis is one of ( ) {} [] “”
• a term is one of the following:
• a character that is not a parenthesis
• an opening parenthesis, followed by a term sequence, followed by the corresponding closing parenthesis (called a sub-term)
• a rule
• a term sequence is a sequence of terms
• a rule is a sequence of the form
`( term sequence ~> term sequence )`

With this definition, all of the following are terms:

• a
• A
• {1, 2, 3}
• [Elvis, Priscilla]
• (5 + (4-3))

Note that URS sees single characters as terms. Thus, a string such as "Elvis" is actually a term sequence. This interpretation allows URS to parse arbitrary strings into characters, and see arbitrary sequences of characters as terms. The following lines are all term sequences:

• bob
• hello world!
• happy(peter)
• (loves peter mary)
• 7.0+5.3
• 4 ∈ {1,2,3,4}
• If X lies then X is a liar .

Note that the statements can be classical logic (prefix) statements, algebraic (infix) expressions, and natural language sentences. Since symbols can be terms, statements can contain commata, mathematical operators, and special symbols (such as λ or ∈).

### Levels

Term sequences can contain other terms and term sequences. Consider the following example:
This is a long sentence (with a parenthesis (and another one)).
The level of a term is the number of preceding opening brackets (of any type) minus the number of preceding closing brackets (of any type). We say that a term that is contained in another term is an inner term or a nested term. The containing term is the parent term.

### Rules

As defined above (Terms), an URS rule is a sequence of the form
(term ... term ~> term ... term)
The arrow is a space character, followed by a tilde, followed by a greater-than symbol, followed by a space. The part of the rule that is to the left of the arrow is called the antecedent of the rule. The right part is called the succedent. If the left part of a rule contains an uppercase character, that character becomes a variable. Here are some examples of rules:
• (X*(Y+Z) ~> X*Y+X*Z)
• (bob ~> mary)
• (X ∈ {X, Y} ~> true)
• (nonsense ~> )

### Substitutions

A substitution is the process of replacing a variable in an URS term by a term sequence (Terms). For example, consider the following statement:
If X loves Y, then X should marry Y.
Substituting X by "Bob" yields
If Bob loves Y, then Bob should marry Y.
A substitution has to replace all occurrences of the variable, and all occurrences by the same term sequence.

### Rule matching

An URS rule matches a term sequence, if there is a substitution that makes the antecedent of the rule equal to the term sequence. Here are examples:
• The `rule (rockstar(X) ~> rich(X))` matches the terms
`rockstar(elvis), rockstar({Elvis, Madonna}), rockstar(mother(elvis))`
• The rule `({X, Y} ~> {Y})` matches the terms
`{a, b, c}, {a, }, {M, N}`
• The rule `(Bob loves X. ~> Carl loves X.)` matches the term sequences
`Bob loves Mary., Bob loves Anna., Bob loves Walther.`

Note that variable match term sequences, i.e., a string of characters and sub-terms. Thus, a variable will always match a sequence of characters and sub-terms. It will never match an incomplete sub-term, or parentheses.

The text items of a variable occurrence in the antecedent are the terms that follow the variable occurrence in the antecedent. For example, in the rule `(aXbc ~> z)`, the text item of the variable X is `bc`. The variable will match the shortest term sequence in the input that is followed by this text item. For example, for the input `a123bc456bc`, we will get `X=123`, and not `X=123bc456`. This restriction entails that the match can fail even if there is a satisfying substitution. The restriction serves to avoid a combinatorial explosion when parsing the input.

If a variable has no text item in the antecedent (or a text item of length 0), the variable will match exactly 1 term. This boils down to 1 character in the extreme case. This is why we had to put a dot at the end of the sentence in the rule `(Bob loves X. ~> ...)`. Otherwise, X would have matched just the first letter of Mary, Anna, and Walther.

## Rule Application

### Rule application

If a rule matches a term sequence, then the rule can be applied to that term sequence. Applying a rule means replacing the term sequence by the succedent of the rule, and then applying the substitution to it. Here is an example:
Rule: (X loves Y. ~> X has to love Y's dog.)
Term sequence: Bob loves Mary.
Substitution: X->Bob, Y->Mary
Rule applied: Bob has to love Mary's dog.

### General and special rules

A rule is more general than another rule, if the first matches the antecedent of the second. Here are examples:
• `(loves(X,Y) ~> X is an altruist)`
is more general than
`(loves(X,X) ~> X is an egoist)`
• `(XYZ ~> X)`
is more general than
`(sad ~> happy)`

### Rules and Terms

Rules are themselves terms (Terms). They can appear in term sequences. URS does not distinguish between an input string and rules. Everything is just term sequences. Thus, rules act within their own term sequence:
Bob hates Mary (hates ~> loves)
... becomes ...
Bob loves Mary (hates ~> loves)
Rules can also apply inside other terms:
Bob hates (whatever Mary does) (does ~> eats)
... becomes ...
Bob hates (whatever Mary eats) (does ~> eats)
Modifying a term sequence this way is called reducing the term sequence. However, rules can only be applied within the same term sequence or within its nested terms. The following term sequence does not allow us to apply a rule:
Bob hates (whatever Mary does (hates ~> loves))

### Choice of rules

There can be different rules in a term sequence, and different rules can match at different places. Thus, we have the choice which rule to apply where. The result can be completely different. We establish the following precedence principles:
1. Match in inner terms first
2. Match the left-most term sequence first
3. Match the most inner rule first
4. Match the most specific rule first
5. Match the rule first that was defined first
Rule applications never walk into rules.

### Algorithm

The choice of rules (RuleChoice) leads to the following algorithm for term reduction:
reduce(TermSequence t)
for(Subterm s in t) {
reduce(s)
}
R = rules in t
Sort R, most specific rules first
applyRules(R, t)

applyRules(Rules R, TermSequence t)
for(Subterm s in t) applyRules(R, t)
for(Rule r in R) {
for(i=0 to length of t) {
if(r matches t at position i) apply r to t at position i
}
}

### Sample Implementation

The URS algorithm (Algorithm) has been implemented in Java and is available here:
http://suchanek.name/texts/urs/java.zip

You can also run the program directly in your browser by clicking here: http://suchanek.name/texts/urs/app.html

### Termination

A term sequence terminates, if there is no rule that can be applied. Not all term sequences terminate. Consider, e.g., the following term sequence:
a b (X Y ~> Y X)
This term sequence will swap the `a` and the `b`, then swap it again, and so forth.

Term sequences can also generate infinite sequences. Consider the following example:

a (X ~> X X)
This term sequence will reduce to an infinite sequence of `a`.

## Application of URS

### Inclusion

We imagine the entire set of text documents in the universe to be one single URS term sequence. Every document is part of a rule, where the antecedent is the unique name of the document in the form `(~>...)`, and the succedent is the document itself:
((~> http://suchanek.name/texts/urs/index.html) ~> The Universal Replacement system. This essay introduces a Turing-complete formalism that aims to be as intuitive as possible. ...)
This means that our own document is just the succedent of a rule in the universe term sequence. We can refer to other documents by sequences of the form `(~>...)`. Since these match the antecedent of the corresponding document rule, they will be reduced to the content of the referred document. This allows us to include term sequences from other documents in our own document. All documents have to be UTF-8 encoded.

The domain of the present document contains some useful term sequences that users can refer to.

### Boolean Logic

We can use URS to define boolean logic, i.e., rules that work on terms of `true` and `false`. For example, consider the following rules
((true and A) ~> A)
((false or A) ~> A)
These rules will reduce a term as follows:
(true and (false or (true and true)))
(true and (false or true))
(true and true)
true
The boolean functions are available here: They can be included in any term sequence (Inclusion) by stating
(~> http://suchanek.name/texts/urs/boolean.txt)

### Set theory

We can use URS to define the basics of set theory with rules. For example, we can define the function "∈", which determines whether an item is an element of a set:
((X ∈ {X, K}) ~> true)
((X ∈ {Y, K}) ~> (X in {K}))
((X ∈ {Y}) ~> false)
((X ∈ {X}) ~> true)
((X ∈ {}) ~> false)
Note how the variable `K` will match a sequence of terms with commas. The rules will reduce the following term as follows:
banana ∈ {apple, banana, cherry}
banana ∈ {banana, cherry}
true

These rules are available here:

They can be included in any term sequence (Inclusion) by stating
(~> http://suchanek.name/texts/urs/set.txt)

### Lambda Calculus

The Universal Replacement system can simulate the reduction rules of the untyped Lambda-Calculus. Such rules will reduce a term as follows:
((λ x. (λ x.x) x) 42)
((λ x . x [x : = x]) 42)
((λ x . x) 42)
(x [x : = 42])
42
Note the parenthesis around the entire term sequence. These are necessary to bound the left-hand side of the variables of the URS rule `((X[X := R]) ~> R)`. The rules for lambda reduction are available here: They can be included in any term sequence (Inclusion) by stating
(~> http://suchanek.name/texts/urs/lambda.txt)
These rules implement beta-reductions with the necessary alpha-renamings.

### Term Rewriting

A term rewriting system can be simulated in URS by introducing, for each term rewriting rule A→B a corresponding URS rule (A ~> B). If one rewriting rule A→B is more general than another rule A'→C, then we have to introduce an URS rule (A'~>B), because otherwise URS will always give preference to the more specific rule.

### Grammars

A grammar can be simulated in URS by introducing, for each grammar rule α→β, the corresponding URS rule (α~>β). Non-terminals have to be written with lower case letters in order not to be mistaken for variables in URS.

### SKI Calculus

The SKI combinator calculus can be simulated in a very straight-forward way in the Universal Replacement System, by introducing the following rules:
(i X) ~> X
((k X) Y) ~> X
(((s X) Y) Z) ~> X Z (Y Z)
These rules can be included by stating
(~> http://suchanek.name/texts/urs/ski.txt)

### Turing Machines

A Turing machine can be simulated in URS by introducing, for each right-going transition
State s, Input i → Output o
an URS rule
*s*i ~> o*s*
and for each left-going transition
State s, Input i → Output o
the URS rules
c*s*i ~> *s*co
where c are all characters of the alphabet. The Turing machine is started by prepending `*s0*` before the input string, where s0 is the initial state. The asterisk is a character that may not appear in the alphabet of the Turing machine.