In Anupam’s recent post, he introduced the linear inferences *switch *and *medial*. By computational search it had been shown that all linear inferences with 6 or fewer variables are derivable from these two inferences. He also gave an 8 variable linear inference which is not derivable from switch and medial and posed the question of whether this was minimal. Here we answer this by showing that this 8 variable inference was minimal.

A linear formula on a set of variables is a formula constructed from connectives and that uses each variable in exactly once. A linear inference is then a tautology of the form where both and are linear formulae on a set of variables . In this post we assume these formulae do not contain negation. Of particular interest are the following two linear inferences:

- Switch:
- Medial:

We can then compose these inferences by rewriting, and by applying associativity and commutativity rules. More details on this can be found in Anupam’s post or various papers [1] [2] [3].

Using a correspondence between linear formulae and graphs, we were able to write a search algorithm that was able to verify the following theorem:

Theorem 1.All linear inferences with 7 or fewer variables are derivable from switch and medial.

Corollary 2.The minimal switch and medial independent linear inference has 8 variables.

### Linear formulae as graphs

The key insight that allowed searching linear inferences with 7 variables (as opposed to the 6 variables that previous work [4] was able to check) was using a link between linear formulae and graphs. This quotients out associativity and commutativity and so greatly reduces the size of the search space.

For any linear formula , we can associate a (non-directed) graph to it as follows:

- Let the nodes of be the variables of .
- Let there be a unique edge between (distinct) nodes and if the top-level connective of the minimal subformula of containing and is , and no edge otherwise.

This second point is clearer with an example. Take the linear formula :

will have 4 nodes. There is an edge between and , as the smallest subformula containing and is , and the top-level connective of this is . There is no edge between and as the smallest subformula containing them is , and the topmost connective of this is . The smallest subformula containing and is the whole formula, and so there is an edge between them.

Continuing for the rest of the edges, we get the following graph:

The following theorem makes these graphs useful:

Proposition 4.4/Theorem 4.6 from [1]

Theorem 3.If and are two linear formulae with the same variables, then there is an inference if and only if for all maximum cliques of there is a such that is a maximum clique of .

#### P4-free graphs

Since every linear formula has a corresponding graph, the natural question is whether every graph has a corresponding linear formula. The answer to this question is in fact no. The graphs that correspond to linear formulae are called cographs, which are graphs built inductively from single nodes, disjoint unions, and the dual of disjoint union (taking the union of two graphs and inserting every possible edge between the two halves). Cographs have the following characterization (Proposition 3.6 from [1]).

Definition 4.A graph isP4-freeif none of it’s induced subgraphs are isomorphic to

Proposition 5.A graph is P4-free if and only if it is a cograph and hence the graphs that correspond to linear formulae are exactly the P4-free graphs.

This means that we are able to reason about all linear formulae without ever working with any of these formulae, and instead just working with P4-free graphs.

### The algorithm

The algorithm is explained in this section by going through step by step. The code is written in haskell and should be semi-readable even without any coding knowledge. The full (commented) code can be found here.

#### Representing graphs

We represent nodes as integers and then represent a linear graph as function from two nodes (integers) to a boolean which tells us whether there is an edge between the two nodes.

We also define a second version where the data of a linear graph is packed into a single integer. This is defined as `LinGraphC`

in the code.

#### Collecting all P4-free graphs

The first step is to collect a list of all P4-free graphs. Since there are 7 choose 2 (= 21) pairs of nodes, there are total graphs with 7 nodes. This is too many to run through all the graphs and checking in turn whether each one is P4-free. Therefore we use the following result.

Remark 6.Any induced subgraph of a P4-free graph is P4-free.

Now to generate all P4-free graphs of size , we can first recursively generate a list of all size P4-free graphs, and then for each of these size graphs, check whether each extension to variables is a P4-free graph. The base case is then that there is precisely one P4-free graph with no nodes.

This means that large numbers of non P4-free graphs can be discarded at once by noticing that they share a non P4-free induced subgraph.

There are 78416 P4-free graphs with 7 variables.

#### Finding all linear inferences

The next step is determining for which pairs of graphs , there is a linear inference . Theorem 3 is used to determine this.

To get all the max cliques of a graph, the Bron-Kerbosch algorithm is used. To avoid having to regenerate the max cliques of each graph, we first run through the entire list of P4-free graphs and store the max cliques for each of these in the following data structure:

Then given two graphs along with their sets of max cliques, it is simple to test if there is a linear implication:

There are 525109922 total linear inferences between graphs with 7 variables.

#### Trivial inferences

The next step in the algorithm is to filter out trivial implications.

Definition 7.A linear implication istrivial at variableif is a linear implication.A linear implication is

trivialif it is trivial at any of its variables.

It is not necessary to check trivial inferences because of the following theorem.

Theorem 34 in [2]

Theorem 8.If a linear inference is trivial somewhere then there is a linear inference on fewer variables that is not trivial anywhere and from which is derivable from switch, medial and units.

As any 6 variable or fewer linear inference is derivable using medial and switch [4], it is immediate from the above theorem that any trivial linear inference with 7 variables is derivable from medial, switch and units.

Further, we have the following lemma:

Lemma 28 from [2]

Lemma 9.If a linear implication is not trivial and can be derived from switch and medial, then it can be derived from switch and medial without introducing units.

This will prove helpful when checking whether a given inference is a single switch or medial.

Lastly, we need a way to determine if an inference is trivial using only the graphs. Luckily this is easy given that we already have all the maximum cliques.

Proposition 5.7 in [1]

Lemma 10.A linear implication of graphs is trivial at node if for all maximum cliques of , there is a set such that is a maximum clique of .

and so we can easily check if an inference is trivial with the following code (note that unlike the code for checking an inference, checking triviality crucially depends on the number of variables):

There are 12339852 non-trivial linear inferences between graphs with 7 variables.

#### Minimal inferences

The next stage of the algorithm is to remove any transitive inferences.

Definition 11.Given graphs and , an inference isminimalif for all graphs with and being inferences, we have that or .

There are three reasons for excluding any transitive inferences:

- If there exists an inference which does not derive from switch and medial, then there exists a minimal inference which does not derive from switch and medial. This is because if we have and and both derive from switch and medial, then also does.
- This reduces the number of inferences we have left to check.
- A minimal inference must either be a single application of switch or medial, or not be derivable from switch or medial, which is easier to check than checking if an arbitrary inference derives from switch and medial.

The minimal inferences can be found by first collecting all (non-trivial) inferences by their premise. Then for each graph , the minimal inferences with domain can be found by taking all graphs inferred by and taking away any graphs such that there is graph with being a valid chain of inferences.

There are 524188 (non-trivial) minimal inferences between graphs with 7 variables.

#### Checking remaining inferences

All that remains is to check the remaining inferences. Because all of these inferences are minimal, it suffices to check whether each one is either a medial or a switch, and because they are non trivial, we do not need to worry about units.

To check if an inferences is an instance of medial we use the following criterion from [5]

Proposition 12.A linear inference is derivable from medial iffTheorem 5.1 in [5]

- Whenever there is an edge between and in then there is also an edge between and in ;
- Whenever there isn’t an edge between and in and is one in then there are such that we have:

Although this criterion checks if an inference is derivable from arbitrarily many medial rewrites, it is sufficient as if a minimal inference is not derivable from medial, then it is either derivable by a single switch or is not derivable from switch and medial.

Therefore all that remains is to check any remaining inferences are a single instance of switch. This is done by iterating through all possible cograph decompositions of the premise and conclusion of the inference and checking if any of them have the right form for a single switch rewrite.

After doing this, it turned out that 428428 of the inferences were switches and 95760 were medials with no other inferences, which completes the proof of Theorem 1.

### Checking 8 variables

It would be nice if we were able to take this code and apply it to 8 variables to check that it finds a minimal inference that is not a switch or medial. This would provide some evidence that the code is correct as well as possibly finding more inferences that are not derivable.

However the computational power to check 8 variables is vastly greater than that which was needed to check 7 variables. As written above there are 78416 P4-free graphs with 7 nodes but we were able to calculate that there are 1320064 P4-free graphs with 8 nodes. Assuming we could calculate all the max cliques for all the graphs and store them, we would still run into trouble with the second step of the algorithm which involves checking every pair of graphs for an inference. Even assuming that we can check for an inference between two 8-node graphs in the same amount of time as two 7-node graphs (which is not true as 8-node graphs will in general have more max cliques) then this step will take at least 250 times as long. There are also a lot more stages of the algorithm after this.

The current algorithm takes around 10 minutes to run on a relatively powerful desktop, and so 8 variables was out of our reach though it is possible someone with a more powerful computer or more time could attempt to run the code. It is also possible that there are some easy time savings, for instance quotienting out by graph isomorphism in the premise of each inference, though we could not work out how to implement this nicely.

### References

[Bibtex]

```
@article{Das_2017,
title={On linear rewriting systems for Boolean logic and some applications to proof theory},
volume={12},
ISSN={1860-5974},
url={http://dx.doi.org/10.2168/LMCS-12(4:9)2016},
DOI={10.2168/lmcs-12(4:9)2016},
number={4},
journal={Logical Methods in Computer Science},
publisher={Logical Methods in Computer Science e.V.},
author={Das, Anupam and Strassburger, Lutz},
editor={Fernandez, MaribelEditor},
year={2017},
month={Apr}
}
```

[Bibtex]

```
@InProceedings{das:LIPIcs:2013:4060,
author = {Anupam Das},
title = {{Rewriting with Linear Inferences in Propositional Logic}},
booktitle = {24th International Conference on Rewriting Techniques and Applications (RTA 2013)},
pages = {158--173},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-939897-53-8},
ISSN = {1868-8969},
year = {2013},
volume = {21},
editor = {Femke van Raamsdonk},
publisher = {Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
address = {Dagstuhl, Germany},
URL = {http://drops.dagstuhl.de/opus/volltexte/2013/4060},
URN = {urn:nbn:de:0030-drops-40609},
doi = {10.4230/LIPIcs.RTA.2013.158},
annote = {Keywords: proof theory, propositional logic, complexity of proofs, deep inference}
}
```

[Bibtex]

```
@misc{calk2020formulasascographs,
title={Beyond formulas-as-cographs: an extension of Boolean logic to arbitrary graphs},
author={Cameron Calk and Anupam Das and Tim Waring},
year={2020},
eprint={2004.12941},
archivePrefix={arXiv},
primaryClass={cs.LO}
}
```

[Bibtex]

```
@article{Sip12:automated-search,
title={An automated search of linear inference rules},
author={Alvin \v{S}ipraga},
year={2012},
journal={Research project. {S}upervised by {A}lessio {G}uglielmi and {A}nupam {D}as},
url={http://arcturus.su/mimir/autolininf.pdf}
}
```

[Bibtex]

```
@InProceedings{10.1007/978-3-540-73449-9_26,
author="Stra{\ss}burger, Lutz",
editor="Baader, Franz",
title="A Characterization of Medial as Rewriting Rule",
booktitle="Term Rewriting and Applications",
year="2007",
publisher="Springer Berlin Heidelberg",
address="Berlin, Heidelberg",
pages="344--358",
abstract="Medial is an inference rule scheme that appears in various deductive systems based on deep inference. In this paper we investigate the properties of medial as rewriting rule independently from logic. We present a graph theoretical criterion for checking whether there exists a medial rewriting path between two formulas. Finally, we return to logic and apply our criterion for giving a combinatorial proof for a decomposition theorem, i.e., proof theoretical statement about syntax.",
isbn="978-3-540-73449-9"
}
```

Even though we discussed this before, I feel compelled to comment… Well done Alex! This question has been bugging me for years so, along with my previous post, it is nice to finally have resolution :).

As you point out, a previous attempt [4] hit computational barriers at 6 variables. It is nice to see that some of the graph theoretic work done in the intermediate years was useful for this algorithm to become feasible at 7 variables.