Yet Another SAT Solver Benchmarks

About clause deletion criterion and restart policy on SAT solvers again.

After a month later of the previous story on SAT solver, I’ve done another benchmark evaluation on good old SAT solvers and my solver.

The problems which I used previously might be a biased set for a specific solver because they are a set of problems that MiniSat 2.2 can solve in a short period.

The most reliable way to draw solid conclusions is using the identical benchmark set that the newest SAT competition uses. One problem is its evaluation cost; 350 problems with a 5000-second timeout. I should need a several weeks to draw an informative graph.

So I decided to modify the rule drastically. I ran two settings as follow.

1. solving each problem sequentially with a 400-second timeout. 
2. solving several problems in parallel with an 800-second timeout. (Results at, or near, T=800 would be affected by other threads.)

The Cactus plots

One benchmark run lasts about a day in both settings. Here are the results. (some data were omitted for simplicity).

The timeout is 400 seconds instead of 5000.
The timeout is 800 seconds.
  • Glucose-3.0 (2012) is better than MiniSat-2.2 (2009) at T=400, but this is reversed at T=780–800.
  • Literal Block Distance or LBD is better than the clause-size based criterion as we thought.
  • Luby is not good. Yes, we knew: “local restart” has been proposed, and many solvers adopt it. Glucose uses … well, I don’t know. What’s “budget”? I have to investigate it.
  • Interestingly, the modified MiniSat which uses LBD but not Luby is the best solver in this evaluation. Probably this result matches a claim in a paper I have read recently. And these tendencies are also held on my solver.
  • About my solver, the new version is better than the previous one certainly at T=400. But no notable difference exists at T=780–800.
  • And both versions of my solver beats MiniSat and Glucose at T=780–800. Oh, is it possible? What a suspcious result…

So many confusing results stem from the latter configuration.
Honestly, I ran only the first configuration at first. Just before publishing this story, I tried the latter on a whim. And I had to rewrite this completely.
From this experimentation, I learnt that to change experiment setting is very problematic. I need a lot of faster computers for now. 😩


Footnote

The following is about my solver variants.

  • We can see the same tendencies: LBD is good, and Luby is not. But I have no idea which is better, no LBD without Luby or LBD with Luby.

Base on this evaluation, I will implement ‘local restart’ or ‘quick restart’ in the next version.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.