In this blog post, we give an insight into the confluence of the sequent calculus, a property mainly considered since the discovery of the so-called *Curry-Howard correspondence* [1]. Confluence is often thought to be part of the well-understood folklore. For people already aware of the links between intuitionism and computation, it may feel natural and intuitive to think that LJ (intuitionistic sequent calculus [2]) should “behave well”, hence be both terminating and confluent. However, as we will show in the post, it appears that the subject is more subtle than expected.

## Confluence and cut-elimination

First, we make clear what we mean by confluence. Cut-elimination evaluates a proof by eliminating an occurrence of cut-rule while preserving the hypothesis sequents (possibly with a different number of occurrences) and the concluding sequent.

Sometimes, it happens that there are several sequences of reductions possible: . If all paths lead to the same result, i.e. we have , we say that the system is confluent. The possibility of reducing to two different proofs is computationally problematic: imagine a program which can have two different results depending on how we evaluate it.

A crucial point of confluence is what we mean when we say that two proofs are equal. We first naively consider that two proofs are equal when they are identical.

## The failure of Classical Logic

Cut-elimination for classical sequent calculus is known to have non-deterministic behaviour. We illustrate this with a simple example from Girard’s Proofs and Types [3]. In order to keep track of the location of formulas, we set . We have:

This can be reduced to one of these two proofs:

We obtain two structurally different proofs (because different rules are used but also because and can be different). Through the Curry-Howard interpretation, it corresponds to the fact that a program can be reduced to two different results depending on the evaluation strategy considered, which is something we usually do not want.

Two natural solutions are suggested by Lafont [3]:

- forbid the right-contraction rule and limit the right-hand side of sequents to at most one formula. This leads to
**intuitionistic logic**(LJ) where this problem does not appear; - more brutally, forbid all structural rules except exchange and obtain the linear fragment of
**linear logic**.

## The success of Intuitionistic Logic

Notice that the previous example does not apply in LJ since LJ limits the conclusion of sequents to at most one formula, i.e. sequents are of the form such that is either empty or a single formula. It is not even possible to use the same trick. Consider the same example as before with the on the left instead of the right (since contraction is only allowed on the left). We have the following cut-elimination:

If we think of the cut rule as using a lemma, we notice that the lemma introduced is erased by a weakening rule in the right-hand side. Since the only purpose of was to prove that lemma , it shows that is not needed in the proof and can be erased. Therefore, the cut-elimination yields a unique proof.

## Confluence still fails with the commutation of rules

Although LJ has a better behaviour than LK, it is still easy to find counter-examples related to the commutation of rules:

This proof can be reduced as follows:

However one can reduce it in another way as follows:

So, we ended up with two proofs that are same up to the permutation of rules and . The same happens with a very trivial proof with cut and playing with exponentials (where ):

This proof can be reduced either to the left or right branch. The problem is that applying weakening on a contracted formula does nothing, so the two proofs are essentially the same but still make confluence fail.

## Solution for irrelevant differences: refining proof equivalence

The previous failure of confluence for LJ shows that the sequent calculus is actually not convenient to speak about confluence. However, it is not a sufficient reason to abandon the pursuit of confluence. Two proofs differing only from the order of application of rules or some irrelevant use of structural rules are essentially the same, logically speaking. Particularly, if we consider those proofs in the natural deduction, we will have again the same proofs since the translation from NJ to LJ is not injective, hence NJ yields a quotienting of LJ proofs [3]. And indeed, this idea of quotienting proofs is one of the reason to work with proof-nets of linear logic [4, 5] which are non-sequential representation of proofs where the the order of rules is irrelevant. Other representation of proofs attempted at capturing the “essence” of proofs such as expansion trees [6] or deep inference [7]. Denotational or categorical semantics can also be consider in order to quotient proofs [8, 9].

However, this approach has a dramatic consequence for most interpretations (especially for LK). Consider a semantic interpretation associating a mathematical object to a proof . If we would like to have the preservation of the semantics by cut-elimination, i.e. if implies , then in the case of the example above for LK where we have a reduction into two different proofs, we need to have . That means any proofs of are associated to the same object: they all collapse.

However, this is not the case in LJ because we could have two really different interpretations of the same sequent. If we follow the Curry-Howard correspondence between proofs and terms of the -calculus, one can consider two terms and . Although coming from NJ, their associated NJ proofs come from the following different LJ proofs:

## Solution for non-determinism in LK: restricting rules

It has been shown that it is still possible to keep classical logic by using more sophisticated and fine-grained restrictions on the interaction between formulas in a proof (typically, restricting the allowed uses of the cut rule). These restrictions use the idea of *polarisation *and *focussing* respectively associated to Girard [10] and Andreoli [11].

The idea of polarisation that formulas are divided into two classes, positive and negative, and that rules take polarity into account. Some early polarised systems are Girard’s system [10], Danos et al.’s and systems [12] and Danos et al.’s system [13].

Recently, Liang and Miller’s defined a new system system [14] from which both and can be retrieved. It includes the following features in order to solve the non-determinism caused by structural rules:

- restriction of weakening on the axiom rule so that it cannot interfere with a cut rule;

- the cut rule can only take place between a positive and negative formula;

- contraction is limited to positive formulas.

As stated by Liang and Miller [14], the system , in particular, has the advantage to keep a presentation very close to the usual sequent calculus while obtaining determinism. The irrelevancy of rule order is handled with the idea of focussing which enforces an order while preserving provability (rules are divided into reversible and irreversible ones then in a bottom-up reading, it is always possible to focus on the reversible ones when possible).

*We would like to thank Olivier Laurent for his useful opinion on the subject.*

### References

[Bibtex]

```
@article{howard1980formulae,
title={The formulae-as-types notion of construction},
author={Howard, William A},
journal={To HB Curry: essays on combinatory logic, lambda calculus and formalism},
volume={44},
pages={479--490},
year={1980}
}
```

[Bibtex]

```
@book{dragalin1988mathematical,
title={Mathematical intuitionism},
author={Dragalin, Albert Grigorevich and Mendelson, Elliott},
year={1988},
publisher={American Mathematical Society}
}
```

[Bibtex]

```
@book{girard1989proofs,
title={Proofs and types},
author={Girard, Jean-Yves and Taylor, Paul and Lafont, Yves},
volume={7},
year={1989},
publisher={Cambridge university press Cambridge}
}
```

[Bibtex]

```
@incollection{girard2017proof,
title={Proof-nets: the parallel syntax for proof-theory},
author={Girard, Jean-Yves},
booktitle={Logic and Algebra},
pages={97--124},
year={2017},
publisher={Routledge}
}
```

[Bibtex]

```
@article{girard1987linear,
title={Linear logic},
author={Girard, Jean-Yves},
journal={Theoretical computer science},
volume={50},
number={1},
pages={1--101},
year={1987},
publisher={Elsevier}
}
```

[Bibtex]

```
@article{miller1987compact,
title={A compact representation of proofs},
author={Miller, Dale A},
journal={Studia Logica},
volume={46},
number={4},
pages={347--370},
year={1987},
publisher={Springer}
}
```

[Bibtex]

```
@article{guglielmi2007system,
title={A system of interaction and structure},
author={Guglielmi, Alessio},
journal={ACM Transactions on Computational Logic (TOCL)},
volume={8},
number={1},
pages={1--es},
year={2007},
publisher={ACM New York, NY, USA}
}
```

[Bibtex]

```
@article{lambek1968deductive,
title={Deductive systems and categories},
author={Lambek, Joachim},
journal={Mathematical Systems Theory},
volume={2},
number={4},
pages={287--318},
year={1968},
publisher={Springer}
}
```

[Bibtex]

```
@article{lambek1974functional,
title={Functional completeness of cartesian categories},
author={Lambek, Joachim},
journal={Annals of Mathematical Logic},
volume={6},
number={3-4},
pages={259--292},
year={1974},
publisher={North-Holland}
}
```

[Bibtex]

```
@article{girard1991new,
title={A new constructive logic: classic logic},
author={Girard, Jean-Yves},
journal={Mathematical structures in computer science},
volume={1},
number={3},
pages={255--296},
year={1991},
publisher={Cambridge University Press}
}
```

[Bibtex]

```
@article{andreoli1992logic,
title={Logic programming with focusing proofs in linear logic},
author={Andreoli, Jean-Marc},
journal={Journal of logic and computation},
volume={2},
number={3},
pages={297--347},
year={1992},
publisher={Dov Gabbay}
}
```

[Bibtex]

```
@article{danos1995lkq,
title={LKQ and LKT: sequent calculi for second order logic based upon dual linear decompositions of classical implication},
author={Danos, Vincent and Joinet, Jean-Baptiste and Schellinx, Harold},
journal={Advances in Linear Logic},
volume={222},
pages={211--224},
year={1995}
}
```

[Bibtex]

```
@article{danos1997new,
title={A new deconstructive logic: Linear logic},
author={Danos, Vincent and Joinet, Jean-Baptiste and Schellinx, Harold},
journal={The Journal of Symbolic Logic},
volume={62},
number={3},
pages={755--807},
year={1997},
publisher={Cambridge University Press}
}
```

```
@article{liang2020focusing,
title={Focusing Gentzen’s LK proof system},
author={Liang, Chuck and Miller, Dale},
journal={Submitted, April},
year={2020}
}
```

Hi Boris, Farzad, thanks for the nice post! Let me kick things off with some naive questions, since I have not played so much with this stuff:

Hi Anupam, thanks for your questions.

Re your first question, I am not completely sure if I understand the mix rule you are mentioning. For instance, I think the cut rule is not drivable using this mix and the other inferences rules.

Do you mean having mix instead of cut, and try to do mix-elimination?

If yes, there is a non-symmetrical mix elimination rule by G. Gentzen, which is the following:

And I think even this system is not confluent.

let us take the following proof:

And this proof can be reduced to two different proofs:

1)

2)

We can find this kind of example (not exactly this one) in the paper “Strong Normalisation Proofs for Cut Elimination in Gentzen’s Sequent Calculi” by Elias Tahhan.

About your second question, what do you mean by multi-conclusion calculus for intuitionistic logic?

No I do not mean the (full) mix of Gentzen that you describe, but a comment on your example: the two derivations (1) and (2) are still equivalent up to structural rules. It is not as extreme as the non-equivalence of Lafont’s wkR-wkL counterexample that you describe in your post. Can we say something formal here about confluence ‘up to an equivalence’ with respect to Gentzen’s mix-elimination? Such an equivalence might induce a reasonable computational interpretation associated with mix-elimination.

Now, what I meant was really adding my simplified mix rule to the system

with cut. I believe that cut is internally eliminable in this system, but the wkR-wkL example can now be made to reduce to a unique derivation that ‘merges’ the two. It seems like this system might be confluent up to rule permutations (e.g. for reducing a cut between two mixes) and perhaps some structural rules. Do you know if this is true?Regarding multi-conclusion version of LJ, this is usually attributed to Dummett I believe. See the following note by Valeria de Paiva: https://www.cs.bham.ac.uk/~vdp/publications/fil.pdf .

For your example of a cut between wR and wL in LJ, I believe it makes sense to erase π₁ when C is negative but π₂ when C is positive. Indeed, this is how polarised LK solves the dilemma, and the same cut-elimination strategy works for LJ. Consequently it makes sense to me to see this example as being non-confluent a priori. Does your argument for confluence on this counter-example assumes a particular computational interpretation?

Regarding the solution for non-determinism in LKF based on restricting rules, it looks like you are referring to the claim at the end of §2.1 in Liang and Miller (2021) (p.5 of http://www.lix.polytechnique.fr/Labo/Dale.Miller/papers/lkf-2021-03-31.pdf). I find this remark cryptic, because indeed those problematic examples cannot be written in LKF due to the restricted shape of proofs, but in order to obtain a proof of the restricted shape, one has to first normalise it in a certain way, and how this reshaping is done is hidden in the admissibility proofs of many principles (notably cuts and structural rules). So once a restricted proof is obtained, it is possible that some arbitrary choices have been made already.

(Girard’s LC and Danos-Joinet-Schellinx’s LK^η, mentioned by Liang and Miller, solve the non-determinism by determining the reduction of proofs rather than by restricting proof rules, in the sense that rules of LK are derivable instead of merely admissible. This makes it easier to properly talk about confluence and preservation of a denotation.)

But! If we try and see where the choices might be made in Liang and Miller’s proof of cut-elimination, I do not think it appears in an obvious manner. There is a subtlety in how (and to which extent) the restrictions “solve” the problem of non-determinism.

Fortunately, there are some results that help us understand what is going on. Danos-Joinet-Schellinx (1997) remarked that the drive to reduce proofs in a way compatible η-expansions orients the cut-elimination in a certain way; more precisely that:

– the strategy that apparently arbitrarily favours one side according to the polarity (including by revealing cuts hidden in irreversible rules), introduced by LC,

– a strategy that applies reversion on-the-fly when there would be a dilemma, and

– a strategy that requires the least amount of modalities when translating into linear logic

coincide, and that in the proof system thus determined, η-equivalence is preserved by normalisation.

Consequently, they note, one is tempted to only work with maximally reversed proofs (as is done in LFK), which could be seen as representatives for a “quotient” of classical proofs up to reversion, and in which this strategy is enforced implicitly. Such a restriction was first studied by Quatrini and Tortora de Falco (“Polarisation des preuves classiques et renversement”, 1996), and a correspondence with (Krivine-style) negative translations into intuitionistic logic was shown by Laurent (“Étude de la polarisation en logique”, 2001). (Also, as mentioned in these works, this is a trick that breaks down with 2nd order quantifications.)

Girard’s proof sketch for cut-elimination in LK in the Blind Spot (2006, §3.4) seems to have been influenced by these works, and interestingly it has many similarities with Liang and Miller’s proof. Girard comments that such restrictions make the classical logic “locally” intuitionistic, which is one way to see polarisation.

As the Eng & Jafarrahmani article points out, there are two broad ways to make cut elimination more deterministic.

1) Lafont suggested forbidding structural rules—either on the right – yielding intuitionistic logic or on the left and right – yielding linear logic—can make cut-elimination deterministic.

2) We can use polarization in classical logic using (something like) the LKF proof system. Since the cut rule is applied to a positive and negative formula, and since only positive formulas are subjected to structural rules, non-determinism is also avoided.

There is a big difference between these two approaches. In the first approach, a global solution is enforced (structural rules are forbidden uniformly on the full proof), and classical logic is, essentially, forbidden. In the second approach, there is non-determinism in how one can polarize a classical formula, but once that choice is made, the rest is deterministic.

The latter approach has the merit of leaving the resolution of the non-determinism to the person writing the classical logic specification. For example, the programmer must say, for every occurrence of a conjunction and disjunction, whether or not it is additive or multiplicative. This approach should appeal to functional programmers trying to write classical logic specifications.

Dale’s distinction does not convince me, for two reasons.

The first reason is that how you polarise a classical formula is not enough to determine cut-elimination in the second approach, as I explain below, contrary to what the previous comment claims. The second reason is that the way cut-elimination becomes determined with focusing in the second approach, in fact, coincides with certain translations into intuitionistic and linear logic, as arising from the first approach. This does not seem compatible to me with the idea that there is a “big” difference between the approaches as described.

***

1) How you polarise a classical formula is not enough to determine cut-elimination of proofs of this formula.

The polarisation theory typically gives two counter-examples.

– The first example is Lafont’s critical pair, cutting Γ⊢A,Δ against Γ’,A⊢Δ’ where A on both sides is introduced immediately by a weakening. The order in which cuts are eliminated is indeed determined by how you polarise A. But, A does not appear in the end sequent, and therefore how you polarise _the whole proof with cuts_, not merely formulas in the end sequent, is what matters. The analogy with a person writing a classical logic specification breaks down there. (This example is mentioned in Liang & Miller (2021).)

– The second example, generalizing the first one, is the generic left-introduction rule of ⊃ from LK (or equivalently the right-introduction rule of ∧). Assuming one wants to take the polarisation (+,−)→− (resp. (+,+)→+), focalisation forces the identification of these unrestricted rules with the corresponding focused rules, together with two cut rules, each one used to plug into it one of the two branches of the derivation. But there is an arbitrary choice in the order in which the two cuts are introduced. The two possible choices lead again to completely unrelated proofs, like in the Lafont critical pair, for instance if the two branches end with a weakening. But this instance of undeterminacy is still present _after_ the whole proof with cuts has been assigned polarities.

Liang & Miller (2021) does not mention the arbitrary choice in the order of these two cuts (e.g. in the proof of its Theorem 5), nor its consequence for indeterminacy of LK cut-elimination, both of which were correctly observed in Girard (1991).

***

2) Both approaches hide the same global transformation into linear logic.

Liang & Miller (2021) calls LKF a syntax for normal and maximally eta-expanded proofs in Girard’s/Danos et al.’s polarised classical logic. When obtaining such normal forms from a LK derivation which has been suitably polarised (picking an arbitrary order for the rules (⊃L) and (∧R) regarding the previous caveat), what one does is a global proof transformation. Using global proof transformations, we can as well translate classical proofs into intuitionistic or linear proofs. Many efforts over the past three decades have shown the correspondence between ways to eliminate cuts, ways to do negative translations into intuitionistic logic, and ways to do polarised translations into linear logic (where in particular it is not appreciated enough that maximally eta-expanding proofs amounts to a Krivine-style negative translation). Moreover corresponding transformations determine the same focused forms.

As an illustration, the rules of LKF are essentially derivations of linear logic in disguise (as the careful readers of, for instance, Liang & Miller (2009) know). This is explained by the fact that the main part of the work of translating LK into LL has already been done by the time one has obtained an LKF derivation. The “reasons” (to quote Liang and Miller (2021)) why “the problems with the nondeterminism in cut elimination caused by the use of structural rules in classical logic disappear in LKF” are obvious from the perspective that it is a subset of linear logic. Alas the paper makes it sound deeper and more complicated than it really is.

The actual work of translating LK into linear logic/LKF is responsible for the choices made during cut-elimination, but Liang & Miller (2021) hide this translation in the meta-theory, where it is mixed with the proof of cut admissibility. Thus it is by design unable to do a good job at explaining how and where choices are made.

***

As I explained in my previous message, Danos et al. (1997) observed that eta-expansion forces a certain cut-elimination order, and with this observation they reconstructed Girard’s LC. Their result is stronger and deeper, for instance they are able to properly state a confluence property for the ensuing proof reduction. To achieve this, they study sequent calculi where the cut rule is primitive. They also describe translations into linear logic that inspired their analysis of confluent LK cut-elimination strategies.

How polarisation makes cut-elimination in LK deterministic is a thoroughly beaten subject: the bar to propose explanations that deviate from the past literature is high. In addition, Liang & Miller (2021) seems to be more on the side of working to make a synthesis of existing results than to prove new ones. Nevertheless there appears to be confusions in the original post and in the follow-up discussion about classic results from the literature, and one reason might be that Liang & Miller (2021) does not do a great job of contextualizing these results and synthesizing them with LKF.

Part of the problem might be that Liang & Miller (2021) wants to take a description of focused normal forms, and advocate it as a proof system for classical logic. It conflates the problem of finding a convenient representation of classical proofs, with the one of finding a canonical representatives for proof search. But it is clear that a syntax of normal forms is of no use for comparing expressiveness between proof systems (because normal forms blur the lines between proof systems), for talking about translations between proof systems (where you would like to be able to discuss the preservation of some reductions or equalities between proofs), nor for talking about the determinism of proof reduction (where for instance you would like to be able to state its confluence).

This is evident when the paper tries to somehow find in the shape of LKF derivations reasons why polarisation solves the non-determinism in LK, while obviating many works that have explained the links between polarised proof systems, negative translations and translations into linear logic.

I sympathise with some of the goals of Liang & Miller (2021), such as deriving well-known results about the logic easily using one’s formalism of choice. So far the example of the Herbrand theorem is given. It seems, though, that the proof is the usual one by inspection of a cut-free proof followed by hand-waving building of the proof of the disjunction. I do not find that an important role is played by its particular choice of polarisation. The claimed advantage seems to be that it avoids tedious permutation arguments, but this is a low bar for a good formalism. More generally the paper did not convince me that you want to give a name to the focused normal forms and treat it as a proof system.