The Universal Replacement System
By reading this essay, you acknowledge that the author does not accept any responsibility for the completeness or correctness of this essay. If you have comments, the author would be happy about a mail to fabian@domain.
Short Introduction
The Universal Replacement System (URS) is a formalism that allows describing other formalisms. URS is based on rules. A rule says that any string that matches its left hand side shall be replaced by the right hand side. Here is an example:Bob loves Alice (Alice ~> Clara)
... becomes...
Bob loves Clara (Alice ~> Clara)
If you want to play with it offline, download the Java code.
If you want to understand the details, read on!
Introduction
Background
In science and in mathematics, we use formal languages to make formal statements. Some of these languages are First Order Logic
 Predicate Logic
 Set theory
 Lambda calculus
 Programming languages
These languages are by themselves an object of study. In particular, these languages are defined formally. For example, to define a βreduction in the λcalculus, one writes:
Goal
The goal of URS is to formalize the metalanguage in which we describe other formal languages. In particular, the formalization shall be as close as possible to the language we usually employ
 powerful enough to describe as many languages or formalisms as possible
 expressive enough to deal with different types of syntaxes
 general enough to deal with the coexistence of different formalisms at the same time
Most formalisms are Turingcomplete, and so will URS. Thus, the goal cannot be to be more expressive in terms of the Turing hierarchy. Rather, the goal is to provide a convenient means of formalization for such languages.
Related Approaches
The study of formal languages is a very established field in mathematics and computer science. Yet, none of the popular languages is expressive enough to define the other languages in a convenient way: Grammars

Grammars can generate arbitary formal languages. Yet, grammars do not know the concept of variables. For example, we cannot write things like
x*(y+z) → x*y + x*zIt is very inconvenient to express that an x on the lefthand side of a rule shall be filled with the same value as the x on the righthand side of the rule.
 Term Rewriting Systems

Term Rewriting Systems were designed to express transformation on prefix terms. Yet, term rewriting systems cannot deal with languages that are not prefix, such as algebra or set theory. In a term rewriting system, we cannot say
x ∈ {x, ...} = trueFurthermore, we cannot deal with lists of things, unless we rewrite them in a consnotation.
 Prolog

Prolog is a programming language that can be used check and generate formal languages. Yet, much like term rewriting systems, Prolog is limited to a prefix notation of terms. Furthermore, it does not allow using nonlatin symbols such as ∈ or {}. Last, its rules can be counterintuitive. Consider, e.g., the following rules:
replace(X,Y,[XRest1],[YRest2]) : replace(X,Y,Rest1, Rest2).These rules define a replacement system. Yet, the second result of
replace(X,Y,[ZRest1],[ZRest2]) : replace(X,Y,Rest1, Rest2).
replace(X,Y,[],[]).replace(a,z,[a,b,c],L)
will beL=[a,b,c]
. This is because the second rule does not explicitly exclude thatZ=X
. This is counterintuitive.
Definition of URS
This section will introduce a new formalism, the Universal Replacement System (URS). The goal of this formalism is to be as intuitive as possible, while at the same time being powerful enough to subsume and express most existing formalisms.URS Terms
URS terms are defined as follows:
a parenthesis is one of ( ) {} [] “”

a term is one of the following:
 a character that is not a parenthesis
 an opening parenthesis, followed by a term sequence, followed by the corresponding closing parenthesis (called a subterm)
 a rule
 a term sequence is a sequence of terms
 a rule is a sequence of the form
( term sequence ~> term sequence )
With this definition, all of the following are terms:
 a
 A
 {1, 2, 3}
 [Elvis, Priscilla]
 (5 + (43))
Note that URS sees single characters as terms. Thus, a string such as "Elvis" is actually a term sequence. This interpretation allows URS to parse arbitrary strings into characters, and see arbitrary sequences of characters as terms. The following lines are all term sequences:
 bob
 hello world!
 happy(peter)
 (loves peter mary)
 7.0+5.3
 4 ∈ {1,2,3,4}
 If X lies then X is a liar .
Note that the statements can be classical logic (prefix) statements, algebraic (infix) expressions, and natural language sentences. Since symbols can be terms, statements can contain commata, mathematical operators, and special symbols (such as λ or ∈).
Levels
Term sequences can contain other terms and term sequences. Consider the following example:Rules
As defined above (Terms), an URS rule is a sequence of the form (X*(Y+Z) ~> X*Y+X*Z)
 (bob ~> mary)
 (X ∈ {X, Y} ~> true)
 (nonsense ~> )
Substitutions
A substitution is the process of replacing a variable in an URS term by a term sequence (Terms). For example, consider the following statement:Rule matching
An URS rule matches a term sequence, if there is a substitution that makes the antecedent of the rule equal to the term sequence. Here are examples:
The
rule (rockstar(X) ~> rich(X))
matches the terms
rockstar(elvis), rockstar({Elvis, Madonna}), rockstar(mother(elvis))

The rule
({X, Y} ~> {Y})
matches the terms
{a, b, c}, {a, }, {M, N}

The rule
(Bob loves X. ~> Carl loves X.)
matches the term sequences
Bob loves Mary., Bob loves Anna., Bob loves Walther.
Note that variable match term sequences, i.e., a string of characters and subterms. Thus, a variable will always match a sequence of characters and subterms. It will never match an incomplete subterm, or parentheses.
The text items of a variable occurrence in the antecedent are the terms that follow the variable occurrence in the antecedent. For example, in the rule (aXbc ~> z)
, the text item of the variable X is bc
. The variable will match the shortest term sequence in the input that is followed by this text item. For example, for the input a123bc456bc
, we will get X=123
, and not X=123bc456
. This restriction entails that the match can fail even if there is a satisfying substitution. The restriction serves to avoid a combinatorial explosion when parsing the input.
If a variable has no text item in the antecedent (or a text item of length 0), the variable will match exactly 1 term. This boils down to 1 character in the extreme case. This is why we had to put a dot at the end of the sentence in the rule (Bob loves X. ~> ...)
. Otherwise, X would have matched just the first letter of Mary, Anna, and Walther.
Rule Application
Rule application
If a rule matches a term sequence, then the rule can be applied to that term sequence. Applying a rule means replacing the term sequence by the succedent of the rule, and then applying the substitution to it. Here is an example:Term sequence: Bob loves Mary.
Substitution: X>Bob, Y>Mary
Rule applied: Bob has to love Mary's dog.
General and special rules
A rule is more general than another rule, if the first matches the antecedent of the second. Here are examples:
(loves(X,Y) ~> X is an altruist)
is more general than
(loves(X,X) ~> X is an egoist)

(XYZ ~> X)
is more general than
(sad ~> happy)
Rules and Terms
Rules are themselves terms (Terms). They can appear in term sequences. URS does not distinguish between an input string and rules. Everything is just term sequences. Thus, rules act within their own term sequence:Choice of rules
There can be different rules in a term sequence, and different rules can match at different places. Thus, we have the choice which rule to apply where. The result can be completely different. We establish the following precedence principles: Match in inner terms first
 Match the leftmost term sequence first
 Match the most inner rule first
 Match the most specific rule first
 Match the rule first that was defined first
Algorithm
The choice of rules (RuleChoice) leads to the following algorithm for term reduction:for(Subterm s in t) {
reduce(s)
}
R = rules in t
Sort R, most specific rules first
applyRules(R, t)
applyRules(Rules R, TermSequence t)
for(Subterm s in t) applyRules(R, t)
for(Rule r in R) {
for(i=0 to length of t) {
if(r matches t at position i) apply r to t at position i
}
}
Sample Implementation
The URS algorithm (Algorithm) has been implemented in Java and is available here:http://suchanek.name/texts/urs/java.zip
You can also run the program directly in your browser by clicking here: http://suchanek.name/texts/urs/app.html
Termination
A term sequence terminates, if there is no rule that can be applied. Not all term sequences terminate. Consider, e.g., the following term sequence:a
and the b
, then swap it again, and so forth.
Term sequences can also generate infinite sequences. Consider the following example:
a
.
Application of URS
Inclusion
We imagine the entire set of text documents in the universe to be one single URS term sequence. Every document is part of a rule, where the antecedent is the unique name of the document in the form(~>...)
, and the succedent is the document itself:
(~>...)
. Since these match the antecedent of the corresponding document rule, they will be reduced to the content of the referred document. This allows us to include term sequences from other documents in our own document. All documents have to be UTF8 encoded.
The domain of the present document contains some useful term sequences that users can refer to.
Boolean Logic
We can use URS to define boolean logic, i.e., rules that work on terms oftrue
and false
. For example, consider the following rules
((false or A) ~> A)
(true and (false or true))
(true and true)
true
Set theory
We can use URS to define the basics of set theory with rules. For example, we can define the function "∈", which determines whether an item is an element of a set:((X ∈ {Y, K}) ~> (X in {K}))
((X ∈ {Y}) ~> false)
((X ∈ {X}) ~> true)
((X ∈ {}) ~> false)
K
will match a sequence of terms with commas. The rules will reduce the following term as follows:
banana ∈ {banana, cherry}
true
These rules are available here:
They can be included in any term sequence (Inclusion) by statingLambda Calculus
The Universal Replacement system can simulate the reduction rules of the untyped LambdaCalculus. Such rules will reduce a term as follows:((λ x . x [x : = x]) 42)
((λ x . x) 42)
(x [x : = 42])
42
((X[X := R]) ~> R)
. The rules for lambda reduction are available here:
They can be included in any term sequence (Inclusion) by stating
Term Rewriting
A term rewriting system can be simulated in URS by introducing, for each term rewriting rule A→B a corresponding URS rule (A ~> B). If one rewriting rule A→B is more general than another rule A'→C, then we have to introduce an URS rule (A'~>B), because otherwise URS will always give preference to the more specific rule.Grammars
A grammar can be simulated in URS by introducing, for each grammar rule α→β, the corresponding URS rule (α~>β). Nonterminals have to be written with lower case letters in order not to be mistaken for variables in URS.SKI Calculus
The SKI combinator calculus can be simulated in a very straightforward way in the Universal Replacement System, by introducing the following rules:((k X) Y) ~> X
(((s X) Y) Z) ~> X Z (Y Z)
Turing Machines
A Turing machine can be simulated in URS by introducing, for each rightgoing transition*s0*
before the input string, where s0 is the initial state. The asterisk is a character that may not appear in the alphabet of the Turing machine.