This article was written by Akvelon Data Scientist/Machine Learning Researcher Daniel Diroff. It was originally published on Medium.
Table of Contents:
Introduction: A common problem/point of discussion in the Bitcoin community has been that of “coin selection”, or the process of selecting the right UTXOs to fund transactions. Currently, there are many different techniques being implemented by exchanges and wallet providers. Some, of course, are better than others depending on the ultimate goals one has in mind. An excellent breakdown of some specific algorithms, as well as an in-depth discussion of the topic can be found in the master’s thesis of M. Erhardt. There, Erhardt in addition suggests and develops a new algorithm designed to improve coin selection for Bitcoin Core. Since publication in 2016, Erhardt’s new algorithm has been implemented on the Bitcoin Core client.
A main difficulty in choosing a coin selection algorithm is the inherent nature of various contradictory goals in this context. These goals include increased privacy, cost savings, confirmation times, and considerations of the UTXO set. Some are user specific, while others are more community wide goals (such as addressing the rampant growth of the UTXO set). This means that the problem of coin selection is a complex one, and one where it is likely the case that no algorithm can be considered best across all situations.
Nevertheless, by stating some assumptions beforehand and making clear how to evaluate the effectiveness of one algorithm versus another, progress can and has been made. We believe that with the focus on cost savings (on a transaction by transaction basis) there is still some room for improvement. Taking the goal of cost savings to be the main focus of this article, we wish to present a new approach which aims at offering improvements to a standard technique to coin selection. The full formulation and details of this new approach, for which we are calling Bitcoin coin selection with leverage, can be found in our technical paper here.
To establish some context let us recall some basics of Bitcoin.
Review of Bitcoin Transaction Basics:
Scenario we consider as playing the role of a Bitcoin Wallet Provider/Exchange
We imagine the following scenario: We play the role of a Bitcoin exchange or wallet provider and are given some specific, what we’ll call payment requests, to process. That is, a collection of desired exchanges of Bitcoin from one address to another requested by our customers. To process these requests, a transaction (or multiple) would be created with each payment request occupying an output. At this point appropriate input UTXOs would be selected to make the transaction valid. The transaction is then pushed out to be mined and added to the blockchain.
The requirement for a transaction to be valid is a natural one, the sum total of the input funds need to cover the sum total of outputs. Due to the Bitcoin protocol, individual UTXOs can “only be spent in their entirety”, so this means very frequently the input funds are more than enough to cover the outputs. In this case usually an additional output is added to the transaction sending the surplus back to the original sender. This is usually referred to as change.
An extra layer of complexity which has become increasing relevant in the past few years is the fee associated to a transaction. A transaction fee is attached to every transaction in an implicit form. Any leftover funds from the inputs after all outputs are accounted for is considered the transaction fee and is given to the miner who confirms the transaction. Thus, the onus is on the creator of the transaction to set the fee. Fees then act as an incentive for the miner to pick up unconfirmed transactions. In years past, it was not unusual for transactions to have a zero fee, miners picked up transactions based on a priority metric which took into account both the value and wait time. Nowadays however, it is expected that every transaction comes with a fee.
As block space is limited (1 Mb) miners now usually prioritize picking up transactions by fee-per-byte. A given transaction, of course, takes up a certain number of bytes to record the exchange of Bitcoin. Hence, transactions have a notion of size simply measured in bytes. This means larger transactions (in byte size) have higher fees associated to them to achieve the same confirmation times. The fee-per-byte rate needed to achieve certain confirmation times is then set by a market, and this value can fluctuate wildly. At the time this article was written in November 2019, the market fee-per-byte rate fluctuated around 20–35 Satoshi per byte (1 Satoshi = 0.00000001 BTC), where as in the extreme market conditions from late 2017 — early 2018, the fee-per-byte rate rose to above 950 Satoshi per byte. A great look at the historical averages can be seen here.
Thus, with cost savings (at least on a transaction by transaction basis) in mind, it is in the user’s best interest to minimize the size of each transaction, as smaller transactions lead to lower fees. Bitcoin allows for many different transaction formats, but in this article we focus on P2PKH. Transaction sizes are then straight forward to calculate; the transaction header information is 10 bytes, each input UTXO occupies 148 bytes whereas each output occupies 34. Thus, for example, it is more cost effective to use fewer input UTXOs if one can get away with it.
A final important notion to mention is that of Bitcoin dust. Although what is considered “dust” isn’t entirely agreed upon by the Bitcoin community, Bitcoin dust is generally small values of Bitcoin which are financially irrational to spend, i.e. they cost more to spend than their value. A natural constraint for coin selection is then to avoid the creation of dust in the change output. In other words, if a transaction is constructed with a change output, its value should be above a dust threshold. Bitcoin dust is its own issue with its unique set of challenges. An interesting discussion on the topic is available here.
The Knapsack Problem and Coin Selection: As a first attempt to minimize cost, besides using the minimal number of UTXOs required, one might try to simply avoid the creation of a change output by choosing the input UTXOs just right. Change-free transactions not only have the benefit in being smaller in size, but help address the problem of the rampant growth of the UTXO set.
This idea of avoiding the creation of change is commonly viewed as a type of “Knapsack Problem”. The “Knapsack Problem” is a classic optimization problem where one is given a set of items, each with a value and weight. The goal is to optimally fill a knapsack with a specified weight capacity, with these items so as to maximize the total value. There are many variants of this problem which include allowing for repeated items and the one we will present relevant for coin selection. For a nice description of knapsack problems in general see the Wikipedia article.
Indeed, coin selection can be thought of as a variant of a knapsack problem, however perhaps it is cleaner to think of it as an example of something known as an integer linear program. The precise definition of an integer linear program is not important for our purposes but for those interested can see here.
Coin selection can be thought of as a variant knapsack problem in the following way: The “knapsack” in this context will be the transaction itself, to be filled with UTXOs which each carry with them a weight (their value in BTC) and cost (148 = the number of bytes required to record an input). Unlike the standard problem however, the knapsack will have both a minimum and maximum capacity, with the minimum capacity being the minimum required funds for the transaction to be valid (sum total of outputs plus the transaction fee). The maximum capacity is chosen to be the minimum capacity plus the dust threshold, the idea being that it is better to overpay the miner rather than create dust with a change output. Furthermore, this means that the knapsacks capacity is dynamic as the transaction fee depends directly on the byte size of the transaction and hence on the number of input UTXOs chosen.
We then ask for the knapsacks final weight to be somewhere in this desired range while optimizing for minimal total cost of the transaction (rather than maximize value as in the classic knapsack problem). We will refer to these problems as the standard knapsack coin selection problem, or when it is clear from context, simply the standard knapsack problem.
Implementing algorithms to solve such knapsack like problems for coin selection can fail for several reasons. For example, depending on one’s internal UTXO bank, it may not be possible to find an optimal match, forcing one into using a change output. This of course may commonly happen with small wallets. Second, integer linear programs are NP-complete, meaning the current algorithms available do not run in polynomial time. Hence, it’s reasonable to imagine an optimal solution existing but the algorithm failing to find it in an allowable period of time. For such situations, coin selection would likely fallback to utilize some other solution with a change output that would “get the job done”.
The algorithms typically utilized to solve integer linear programs are usually referred to as branch and bound algorithms, or more accurately in this context, branch and cut. The details of these algorithms are not important for the purposes of this article but the interested reader can learn more here and here.
Coin Selection with Leverage: The new idea we’d like to present is to offer an improvement for to the fallback step, to implement what we call coin selection with leverage. The main idea is to exploit the fact that one has some control on the change outputs being sent back to them, and therefore some control over their own UTXO set distribution. In particular, some change outputs might be a perfect fit, in the sense that, the new UTXO is exactly the one needed to make some subsequent transaction change-free. This is exactly what coin selection with leverage attempts to do.
If the algorithm fails to solve the standard knapsack problem, i.e. no change-free solution was found, coin selection with leverage then prepares a second transaction concurrently. It is designed to smartly choose UTXOs for the original transaction, UTXOs for the new transaction and additional payment requests to process so that the second transaction is change-free once the (inevitable) change output from the first transaction is utilized as an input UTXO for second.
In other words, coin selection with leverage prepares two transactions simultaneously by searching for:
- A set of input UTXOs to fund the first transaction.
- A set of additional payment requests to process. These are the outputs for the second transaction.
- A set of input UTXOs to fund the second transaction.
These 3 items are produced so that:
- The change output from the first transaction is used as an input for the second.
- The second transaction is change-free.
In this way we are “leveraging” or exploiting our control of the change outputs which can be created. Indeed, one may consider a specific change output as being valuable if it can be used in some subsequent transaction to avoid change. We remark that, of course, to utilize the change output from the first transaction as an input to the second, the first transaction must be confirmed. Thus, it is reasonable to imagine that the additional payment requests to be processed would come from a pool of lower priority requests, so the possible extended wait time is not too much of an issue.
The precise formulation of this problem is also an integer linear program. It is a bit more complex than that of the standard knapsack coin selection problem, but still an integer linear program nonetheless. Hence, the same solvers can be utilized to solve the leverage problem.
A full precise write-up of the formulation can be viewed here. Also contained in the paper is a collection of simulation results done to show the potential advantages for utilizing this approach. The technique itself was formulated and solved utilizing the Python library PuLP and its default solver from the open source project COIN-OR. The default solver implements a branch and cut algorithm to solve integer linear programs. The simulations were also run in Python and the code is publicly available on GitHub.
Glance at Simulations and Implementation Details: As mentioned, the utility of this new approach was tested by running several simulations using Python. The main ingredients to the simulations were 3 things:
- A UTXO Pool — the internal bank of UTXOs accessible by the algorithm to construct transactions. Each simulation was done using a subsample of 2,500 UTXOs from the true UTXO set from October 1st, 2019. Values are stored in Satoshi.
- A Payment Request Pool — the fictitious queue of pending payment requests to be processed. For the purposes of the simulations, these are stored simply as raw Satoshi values, abstracting away technical details such as destination address etc. For each simulation the pool is a subsample of 250 values from true credit card transaction data obtained from the IEEE-CIS Fraud Detection Kaggle competition dataset provided by the Vesta Corporation.
- Coin Selection Algorithm — The technique used to construct transactions to process payment requests from the payment requests pool. This is one of two algorithms:
— a. Standard Knapsack Coin Selection: This is the baseline technique for which we compare our new approach. It consists of two stages: Given payment requests,
— i. Try solving the standard knapsack coin selection problem to find UTXOs to create a change-free transaction.
— ii. If (i) fails, utilize a fallback solution. Here the fallback solution of choice was to sort the UTXO pool by value and take the minimal number of UTXOs required from the top.
— b. Coin Selection with Leverage: Our new approach consisting of three stages: Given payment requests,
— — i. Try solving the standard knapsack coin selection problem to find UTXOs to create a change-free transaction.
— — ii. If (i) fails, utilize new leverage technique by creating a second change-free transaction concurrently.
— — iii. If (ii) fails, utilize the fallback solution.
Simulations are run by repetitive iterations, where each iteration consists of:
- Isolating/selecting a fixed number of payment requests for which to process. The exact number is a parameter for the algorithm, but for the sake of brevity let us say it is 5. These 5 payment requests are taken from the top of the payment request pool. In this way the payment request pool/queue can be thought of as ordered by urgency. In any case, the goal of the iteration is to process at least these 5 pay requests.
- Depending of the chosen algorithm, step 2 is:
— a. Standard Knapsack Coin Selection: The current 5 payment requests are processed either by finding a perfect match for a change-free transaction or by using the fallback solution. Once appropriate UTXOs are found, the UTXO and payment requests pools are updated. Used UTXOs are removed and the same for the payment requests.
— b. Coin Selection with Leverage : At least the current 5 payment are processed. More could be processed by virtue of them appearing in the second transaction during the leverage step. Once appropriate UTXOs are found, the UTXO and payment requests pools are updated accordingly.
Each simulation has a variety of parameters which can be set at the user’s discretion (e.g. the number of payment requests to process for each transaction). The simulations were done across a wide range of choices for these parameters, each one consisting of 5 iterations.
Throughout all of the simulations, we saw a consistent 1–2% savings with the new leverage approach compared to the standard knapsack technique. This may not sound like much but can add up quickly for large operations, especially if market conditions return to what they were in late 2017 — early 2018.
To see what this could mean at scale let’s walk through some numbers. There are approximately 300,000 Bitcoin transactions processed each day. For a fictitious wallet provider/exchange that is responsible for say ~2% of this volume, we can estimate they would process about 6,000 of these. From here let us roughly estimate that these 6,000 transactions cover about 12,000–15,000 payment requests each day. In this scenario the simulations show the following:
Thus, while the savings for mild market conditions may be marginal, the benefits can be drastic if conditions are akin to what they were in early 2018. More details on the simulation results can be found in Section 4 of our technical paper.
As is intuitive, these numbers grow proportionally with the number of payment requests. Furthermore, for larger operations with access to UTXO pools with size in excess of our chosen size of 2,500, it may be reasonable to assume that one could first subsample a pool of 2,500 UTXOs to be used for coin selection. This may be done, for example, in the interest of speed. In this case, the simulations done may prove useful.
Final Remarks: This new leverage technique aims to improve upon standard knapsack like approaches to coin selection with the goal of cost savings in mind. The simulations done show the potential utility of this approach, and it seems to fit in nicely as a replacement for a standard knapsack step in any full coin selection procedure currently being implemented. As change-free transactions seem to be beneficial all around, we believe it is only natural to try and build change-free transactions to the “second-order” if indeed a change-output is required.
Of course, as mentioned above, there are many other factors to consider in the context of coin selection. We hope that the parameters available to tune in the leverage technique, allow the user to align this approach with the many other goals to be considered.
Daniel Diroff is a data scientist/machine learning researcher at Akvelon. Read his original paper on Bitcoin Coin Selection with Leverage here.