Yet Another SAT Solver Benchmarks
About clause deletion criterion and restart policy on SAT solvers again.
After a month later of the previous story on SAT solver, I’ve done another benchmark evaluation on good old SAT solvers and my solver.
SAT solvers are the programs to decide if a given propositional logic formula is satisfiable or not (SAT is an…medium.com
The problems which I used previously might be a biased set for a specific solver because they are a set of problems that MiniSat 2.2 can solve in a short period.
The most reliable way to draw solid conclusions is using the identical benchmark set that the newest SAT competition uses. One problem is its evaluation cost; 350 problems with a 5000-second timeout. I should need a several weeks to draw an informative graph.
So I decided to modify the rule drastically. I ran two settings as follow.
1. solving each problem sequentially with a 400-second timeout.
2. solving several problems in parallel with an 800-second timeout. (Results at, or near, T=800 would be affected by other threads.)
The Cactus plots
One benchmark run lasts about a day in both settings. Here are the results. (some data were omitted for simplicity).
- Glucose-3.0 (2012) is better than MiniSat-2.2 (2009) at T=400, but this is reversed at T=780–800.
- Literal Block Distance or LBD is better than the clause-size based criterion as we thought.
- Luby is not good. Yes, we knew: “local restart” has been proposed, and many solvers adopt it. Glucose uses … well, I don’t know. What’s “budget”? I have to investigate it.
- Interestingly, the modified MiniSat which uses LBD but not Luby is the best solver in this evaluation. Probably this result matches a claim in a paper I have read recently. And these tendencies are also held on my solver.
- About my solver, the new version is better than the previous one certainly at T=400. But no notable difference exists at T=780–800.
- And both versions of my solver beats MiniSat and Glucose at T=780–800. Oh, is it possible? What a suspcious result…
So many confusing results stem from the latter configuration.
Honestly, I ran only the first configuration at first. Just before publishing this story, I tried the latter on a whim. And I had to rewrite this completely.
From this experimentation, I learnt that to change experiment setting is very problematic. I need a lot of faster computers for now. 😩
The following is about my solver variants.
- We can see the same tendencies: LBD is good, and Luby is not. But I have no idea which is better, no LBD without Luby or LBD with Luby.
Base on this evaluation, I will implement ‘local restart’ or ‘quick restart’ in the next version.