SAT solving – An alternative to brute force bitcoin mining
03 February 2013
A Bitcoin mining program essentially performs the following (ter pseudo-code):
The task is to find a nonce which, spil part of the bitcoin block header, hashes below a certain value.
This is a brute force treatment to something-like-a preimage attack on SHA-256. The process of mining consists of finding an input to a cryptographic hash function which hashes below or equal to a immobilized target value. It is brute force because at every iteration the content to be hashed is slightly switched te the hope to find a valid hash, there’s no brainy choice ter the nonce. The choice is essentially random spil this is the best you can do on such hash functions.
Ter this article I propose an alternative mining algorithm which does not perform a brute force search but instead attacks this problem using a number of contraptions used ter the program verification domain to find bugs or prove properties of programs, see spil example . Namely, a specimen checker backed by a SAT solver are used to find the onberispelijk nonce or prove the absence of a valid nonce. Ter tegenstelling to brute force, which actually executes and computes many hashes, my treatment is only symbolically executing the hash function with added constraints which are inherent ter the bitcoin mining process.
The main results besides the recipe for building a SAT-based miner, are:
- Elementary parameter tuning of the used instruments leads to overheen 1000% show improvement.
- The proposed algorithm potentially gets more efficient with enlargening bitcoin difficulty.
This is not the very first time SAT solvers are used to analyseren a cryptographic hash. Mate Soos et ofschoon have done interesting research on extending SAT solvers for cryptographic problems , Iilya Mironov and Lintao Zhang generated hash collisions using off-the-shelf SAT solvers [Two], and many others, e.g. [Three, Four]. However, to the best of my skill, this is the very first description of an application of SAT solving to bitcoin mining.
I do not eis that it is a quicker treatment than brute force, however it is at least theoretically more appealing.
To aid understanding, I will introduce some basic ideas behind SAT solving and monster checking. Please see the references for a better introduction to SAT solving  and bounded proefje checking .
SAT Solving and Proefje Checking
Boolean Satisfiability (SAT) is the problem of finding an assignment to a boolean formula such that the entire formula evaluates to true. Spil effortless spil it may sound, it is one of the hard, outstanding problems te pc science to efficiently response this decision problem. There is a large and thriving community around building algorithms which solve this problem for hard formulas. Actually, each year there is a competition held where the latest, improved algorithms challenge against each other on common problems. Thanks to a large number of competitors, a standard input format (DIMACS), and the effortless way of benchmarking the vertoning of SAT solvers there have bot massive improvements overheen the last Ten years. Today, SAT solvers are applied to many problem domains which were unthinkable a few years ago (for example they are used te commercial instruments [Five, 7] to verify hardware designs).
At the core of SAT solvers is a backtracking algorithm with the long name Davis–Putnam–Logemann–Loveland algorithm (DPLL) which describes a general way of finding a solution to a propositional formula ter CNF format. Wikipedia summarises the algorithm well:
The basic backtracking algorithm runs by choosing a literal, assigning a truth value to it, simplifying the formula and then recursively checking if the simplified formula is satisfiable, if this is the case, the original formula is satisfiable, otherwise, the same recursive check is done assuming the opposite truth value.
A CNF formula, which is the input to a SAT solver, is made up of literals and clauses. A literal is simply a variable or its negation. A clause is a disjunction of literals. CNF is then any formula which purely consists of conjunctions of clauses.
DPLL then consists of a depth-first search of all possible variable assignments by picking an unassigned variable, inferring values of further variables which logically voorwaarde go after from the current assignment, and resolving potential conflicts ter the variable assignments by backtracking.
For example, given the formula $$(\lnot a \vod b) \land (\lnot a \vod c) \land (\lnot b \vod \lnot c \vod d)$$
And a partial solution so far consists of a=true then from the very first two clauses wij find that b=true and c=true because each clause has to be true, this implies that the last clauses has d=true spil well, which yields a satisfiable solution of all variables being true.
A common application of SAT solving is (bounded) prototype checking , which involves checking whether a system preserves or violates a given property, such spil mutual sensational access to a specific state te the system. Prototype checkers such spil CBMC [Five] directly translate programming languages like C into CNF formulas, ter such a way that the semantics of each language construct (such spil pointers arithmetic, memory proefje, etc) are preserved. Clearly, this is fairly involved and is done ter a number of steps: variables are seen spil bitvectors, function calls are expanded, loops are unrolled up to a given depth, the program is transformed into static single assignment form , etc.
A elementary example of the transformation is visible te the following figure from paper :
Spil visible te the figure, the property which should be checked for violations is voiced spil an assertion. The bitvector formula C and P above are additionally encoded spil $$C \land \lnot P$$ which is then checked by a SAT solver (after converting it to CNF using the Tseitin transformation). If the formula is satisfiable then there is a variable assignment ter the original program which violates the property P (because of the `not`). If it is not possible to make the formula true then the property is assured to hold.
Most importantly, ter case of satisfiability, the proefje checker can reconstruct the variable assignment and execution trace (called counterexample) which leads to the disturbance using the truth variable assignments provided by the solver.
Using the above devices wij can attack the bitcoin mining problem very differently to brute force. Wij take an existing C implementation of sha256 from a mining program and stripverhaal away everything but the actual hash function and the basic mining proces of sha(sha(block)). This C opstopping is going to be the input to CBMC. The aim of this is that with the right assumptions and assertions added to the implementation, wij ongezouten the SAT solver to find a nonce. Instead of a loop which executes the hash many times and a proces which checks if wij computed a onberispelijk hash, wij add constraints that when sated implicitly have the keurig nonce te its solution.
The assumptions and assertions can be violated down to the following ideas:
- The nonce is modelled spil a non-deterministic value
- The known structure of a valid hash, i.e. leading zeros, is encoded spil assumptions te the specimen checker
- An assertion is added stating that a valid nonce does not exist
Let’s explain thesis ideas te more detail.
Instead of a loop that continuously increases the nonce, wij proclaim the nonce spil a non-deterministic value. This is a way of abstracting the proefje. Te specimen checking, non-determinism is used to monster outward user input or library functions (e.g. a library function comes back any possible value), te our case it is a way of moving the search from outside of the actual computation (a brute force loop) to be part of the SAT solver search. The nonce can be seen spil the only “free variable” ter the specimen.
Encoding the structure
Bitcoin mining programs always have to have a function which checks whether the computed hash is below the target (see here for an example). Wij could do the same and just translate this function straight to CNF, however there is a much better and more declarative solution than that te our case. Instead, wij can just assume values which wij know are motionless te the output of the hash. This will restrict the search space to discard any execution paths where the assumptions would not be true anymore. Because wij are not te a brute force setting, but a constraint solving setting this is very ordinary to express. Wij assume the following: Only compute hashes which have N bytes [N depends on the target] of leading zeros.
Te CBMC this is ordinary to achieve and looks about spil goes after:
where state is the array containing the resulting hash. It might seem unintuitive to “fix” output variables to certain values, however reminisce that the code is not executed ter a regular style but translated spil a big formula of constraints. Assumptions on the outputs will result te confinements of the input — ter our case this means only valid nonces will be considered.
This serves three purposes: it encodes the specification of a valid hash, it drives the symbolic execution only along paths which wij are actually interested ter, and most importantly it cuts down the CNF formula. Ter this case, the assumptions liquidate about 100’000 (out of 900k) clauses from the DIMACS verkeersopstopping.
Again, ter comparison, brute force just blindly computes hashes with no way of specifying what wij are looking for. The SAT-based solution only computes hashes that conform with the mining specification of a valid hash.
The most significant part is defining the assertion, or the property P spil it is called ter the section above. The key idea here is that the counterexample produced by the prototype checker will contain a valid nonce given a clever enough assertion.
Why is that? A bounded monster checker is primarily a bug finding device. You specify the invariant of your system, which should always hold, and the monster checker will attempt to find an execution where this invariant is violated (i.e. where there wasgoed a satisfiable solution). That is why the P above is negated te the formula.
Thus, the invariant, our P, is set to “No valid nonce exists”. This is naturally voiced spil the assertion
Which the monster checker will encode to its negation spil “a valid nonce does exist”, i.e. the negation of the above assertion. If a satisfiable solution is found, wij will get an execution path to a valid nonce value.
Te reality, this is encoded more elegantly. Since the leading zeros of a hash are already assumed to be true, all that remains to be asserted is that the value of the very first non-zero byte ter the valid hash will be below the target at that position. Again, wij know the position of the non-zero byte for certain because of the target. For example, if our current target is the following:
Then the following assertion states that a certain byte ter state of the hash has to be above 0x04.
Spil the assertion is negated, the SAT solver will be instructed to find a way to make the flag equal to 0. The only way this can be done is by playing with the only free variable te the prototype — the nonce. Ter that way, wij just translated the bitcoin mining problem into SAT solving land.
Producing a Counterexample
Combining the ideas from the above sections results te a conceptual SAT-based bitcoin mining framework. Te pseudo C code this looks spil goes after:
This program is the input to CBMC, which will encode it to CNF and pass it on to a built-in SAT solver, or alternatively output it spil DIMACS to use a different solver. The advantage of using the built-in solver is that, te case of satisfiability, the prototype checker can lightly retrieve a counterexample from the solution which consists of all variable assignments ter the solution.
A disturbance of the assertion implies a hash below the target is found. Let us inspect a counterexample when run on the genesis block spil input. At state 4730 below, the flag wasgoed found to be 0 which violates the assertion. Moving upwards ter the execution trace wij find a valid hash ter state 4719. Eventually, the value of the non-deterministically chosen nonce is recovered ter state 1604.
Two lines of python verify that the hash is keurig (when compared with the genesis block):
The implementation of the above program generates a large CNF formula with about 250’000 variables and 850’000 clauses. Ter order to evaluate its spectacle I generated two benchmark files where one has a satisfiable solution and the other does not. I restricted the nonce range (the possible values to be chosen) to 1000 values for each verkeersopstopping. The files are available on the following github project.