How good is LBD?
SAT solvers are the programs to decide if a given propositional logic formula is satisfiable or not (SAT is an abbreviation of ‘satisfiability’ in this context, not a kind of exam.) A solver returns an assignment set if the given formula is satisfiable, or an empty set otherwise.
Since we can encode constraint problems emerged in various fields into logic formulas, development of efficient SAT solvers is important. You can find more on SAT problem and SAT slover at Wikipedia:
In computer science, the Boolean satisfiability problem (sometimes called Propositional Satisfiability Problem and…en.wikipedia.org
In the last two decades, researchers have been developing many solvers, and the performance of them has improved gradually and drastically. Among them, GLUCOSE, released in 2009, is one of the most import solvers. It archived a good speedup by using a new criterion to select useful clauses called LBD :
Definition 1 (Literals Blocks Distance (LBD)) Given a clause C, and a partition of its literals into n subsets accord- ing to the current assignment, s.t. literals are partitioned w.r.t their decision level. The LBD of C is exactly n.
(cited from G. Audemard and L. Simon, “Predicting Learnt Clauses Quality in Modern SAT Solvers,” in Proc. of IJCAI-09, 2009, pp. 399–404.)
Since the paper shows its great improvement against MiniSat (the epoch-making solver in 2004), and LBD seemed a pretty good criterion for many researchers, most solvers after GLUCOSE have adopted LBD.
So at first I thought all old metrics like ‘clause size’ or ‘clause activity’ (MiniSat used them) should be eliminated from GLUCOSE completely. No. Actually, it still uses all of them. Why do they exist? No answer on papers. So I examined the goodness of LBD by comparing the performances of the modified MiniSat implementations.
We can invent a lot of clause ordering functions and database reduction policies (by removing useless clauses) based on the size, LBD, or activity of clauses. Here is the list I used:
o#0— picking up all binary clauses, then ordering by clause activity (used in MiniSat).
o#1— by only LBD
o#2— by LBD, then clause size
glu— picking up all binary clauses as MinSat does, then ordering by LBD, and activity lastly (used in real GLUCOSE)
clause filtering for database reduction
d#0— keep a better half, then remove(filter) all clauses that neither are locked nor have smaller activities than a threshold.
d#1— keep a better half, then remove all unlocked clauses
By combining these two criteria, I built 8 variants of MiniSat:
- E00 = (
d#0) ; identical to MiniSAT
- E01 = (
- E10 = (
- E11 = (
- E20 = (
- E21 = (
- G00 = (
d#0); almost GLUCOSE
- G01 = (
In this experimentation, I picked up 131 problems from SAT-Race 2015 main track, which MiniSat-2.2 can solve within 1200 sec. execution time on my Linux box.
The following figure shows the result of the first 6 variants.
- As shown by 3 small arrows, all variations without clause filtering policy got poor results. Since the largest gap appears between the activity based orders E00 and E01, keeping some clauses with small activities would be important. It probably means there are pretty many clauses that are useful but have small activities (hypothesis 1: activity ordering is a good picker and a bad filter).
- From now, for simpilicy, notation ‘Δ
b’ means the gap between
b. Since ΔE10-E11 (and ΔE20-E21) is much smaller compared with ΔE00-E01, LBD seems a better criterion for DB reduction than clause activity. Against my expectation, however, none of the LBD-based orderings is the best criterion. Just E10 is very close to the best E00. We couldn’t expect much from LBD (there are papers supporting this). So there might be pretty many clauses that have good LBD values but are useless (hypothesis 2: LBD ordering is a good filter and a bad picker).
- Both of E20 and E21, using LBD and clause size, got worse results. A possible interpretation is that clause size wasn’t a good criterion (hypothesis 3: Don’t use clause size).
What if we combine LBD with clause activity instead of clause size? This is the way to GLUCOSE. The next figure shows the result.
GL0 is over E00 and E01. It might be because LBD is used with other metrics that GLUCOSE works pretty well. It’s the conclusion for now.
But I have to add more. Hypothesis 3 might be wrong; a recent paper  concludes:
As a summary, we demonstrate that the size of the clause remains the best metric to quantify the usefulness of a given clause. […] In our opinion, the performance of the LBD measure can be explained by the fact that it is really related to the size of the clauses. Indeed, we have 2 ≤LBD(c) ≤|c|. The clause size is an upper bound of the LBD measure. For instance, the LBD of binary clauses is 2. Hence, the strategy defined in Glucose favors in some sens maintaining short clauses.
 S. Jabbour, J. Lonlac, L. Sais, and Y. Salhi, “Revisiting the Learned Clauses Database Reduction Strategies,” 2013.
Well, my hypothesis could be illustrated like this.
So neither is optimal, certainly. I need further examination to validate their metrics.
It will be another story.