On Bitcoin transaction sizes

As intermediate steps towards the goal of building an analytic transaction-throughput model, two previous articles investigated script and witness sizes for the most common Bitcoin transaction types as well as the not yet implemented but highly anticipated Pay-to-Taproot transaction type.

This article builds on previous findings and investigates the sizes of inputs, outputs, and witnesses of different transaction types. To this end, first-principles-based estimates are derived and verified using empirical data. The findings are distilled into libtxsize, a library to give size, weight, and virtual size estimates for transactions with arbitrary inputs, outputs, and witnesses.

This article is structured as follows. First, the general formats of transaction inputs, outputs, and witnesses are discussed. The results of this discussion are then integrated with findings from previous articles to derive size estimates for inputs, outputs, and witnesses of different transaction types; with the exception of Pay-to-Taproot, for which no empirical data is available, all estimates are verified using data from Bitcoin's blockchain. Next, the format of Bitcoin transactions is discussed, and findings about input, output, and witness sizes are integrated to create an analytic model to estimate the size of arbitrary Bitcoin transactions. The article concludes by presenting libtxsize, a library that incorporates all previous insights to automate transaction-size estimates, and a short summary.

Transaction input, output, and witness formats

In the following, the general formats of inputs, outputs, and witnesses are discussed, and the formats' implications on transaction size are investigated.

Transaction input format

From a high-level viewpoint, each transaction inputs comprises two key pieces of information: a reference to an unspent transaction output (UTXO) and an unlocking-script to satisfy the locking script of the referenced UTXO.

To reference a UTXO, inputs include a 32-byte transaction hash, which is used to identify a previous transaction, and a 4-byte integer, which specifies the position of an output in that transaction.

Unlocking scripts have a variable size, and will be discussed in the following section. For now, it is sufficient to note that the fact that unlocking scripts have a variable size makes it necessary to explicitly encode their size. This is done using a variable-length integer, a data type devised to minimize the size when encoding non-negative integers up to a size of eight bytes. A variable-length integer requires one byte to encode values from 0 to 252; three bytes for 16-bit integers; five bytes for 32-bit integers; and nine bytes for 64-bit integers. In practice, most unlocking scripts are smaller than 253 bytes, so the encoding of the length of the script is typically one byte.

Finally, each input contains a 4-byte sequence number, which is currently used for the replace-by-fee mechanism that allows updating a transaction's fee to increase the likelihood of it being included in a block.

To summarize, the input size is determined by a fixed component of 40 bytes, comprising the 32-byte hash, the 4-byte position, and the 4-byte sequence number; and a variable one, comprising the unlocking script and the encoding of its length.

Transaction output format

Transaction outputs include two key pieces of information: the amount and the locking script.

The amount of Bitcoin associated with the output is encoded using an 8-byte integer. Like unlocking scripts, locking scripts have a variable size, and will be discussed in the following section. Moreover, outputs include a variable-length integer that encodes the size of the locking script.

In conclusion, the output size is made up of two components: a fixed contribution of eight bytes, corresponding to the encoding of the amount; and a variable contribution, determined by the unlocking script and the encoding of its length.

Transaction witness format

Witnesses can serve as alternative stores for data to unlock outputs. Each witness contains a variable-length integer to indicate the number of items it contains. The items are of arbitrary size, so each item's length is encoded using a variable-length integer as well.

Estimating the sizes of different input, output, and witness types

In the following, analytic estimates for input, output, and, where applicable, witness types are derived and compared to empirical data, whenever the latter is available. Note that sizes for locking and unlocking scripts as well as witnesses are based on previous findings (in general, see the table under “Results and Conclusion” of this article; for Pay-to-Taproot, see this article).

Pay-to-Public-Key

Pay-to-Public-Key (P2PK) outputs have a fixed size of 44 bytes: eight bytes to encode the value, a one-byte variable-length integer encoding the locking script's size, and the 35-byte locking script.

This analytic estimate is validated by the empirical data shown in the figure below, which contains a histogram of the sizes of all P2PK outputs as of block 637,302. Discounting 76-byte outputs, which are an artifact from Bitcoin's early days when public keys were encoded using the uncompressed SEC format, all outputs have a size of 44 bytes. (For a detailed discussion of the uncompressed SEC format, see “Encoding of Public Keys” in this article.)

Image

The average size of P2PK inputs is about 113.5 bytes: 40 bytes for referencing a UTXO and the sequence number, a one-byte variable-length integer to encode the unlocking script's size, and the 72.5-byte unlocking script. Note, however, that using the recently introduced low-r optimization, a consistent size of 113 bytes can be achieved for P2PK inputs.

As before, the analytic estimate is validated by empirical data. The figure below shows a histogram of the sizes of all P2PK inputs as of block 637,302. As expected, more than 90% of all inputs have a size of 113 or 114 bytes. Deviations are caused by the DER encoding of ECDSA signatures, which can lead to the addition or subtraction of bytes depending on the (random) signature values. (See “Encoding of Signatures” in this article for a detailed explanation).

Image

Pay-to-Public-Key-Hash

Pay-to-Public-Key-Hash (P2PKH) outputs have a fixed size of 34 bytes: eight bytes to encode the value, a one-byte variable-length integer encoding the locking script's size, and the 25-byte locking script.

Again, the analytic estimate is corroborated by empirical data, shown in the figure below. The data in the histogram of the size of all P2PKH outputs as of block 637,302 indicates that all such outputs have a size of 34 bytes.

Image

The average size of P2PKH inputs is about 147.5 bytes: 40 bytes for referencing a UTXO and the sequence number, a one-byte variable-length integer to encode the unlocking script's size, and the 106.5-byte unlocking script. Again, note that using the low-r optimization, a consistent size of 147 bytes can be achieved for P2PKH inputs.

The estimate is supported by empirical data shown in the figure below which contains a histogram of the sizes of all P2PKH inputs as of block 637,302. As expected, the majority of inputs have a size of 147 or 148 bytes. As before, deviations around the estimate are side effects caused by the DER signature-encoding standard. Moreover, the second cluster around 180 bytes can be neglected, for it is an artifact from Bitcoin's early days where public keys in P2PKH inputs were encoded using the uncompressed SEC format.

Image

Bare Multi-Signature

The size of bare Multi-Signature (multisig) outputs depends on n, the size of the set of accepted keys. The overall size is made up of two components: a fixed 12-byte contribution comprising the eight-byte amount; a one-byte variable-length integer encoding the locking script's size; and three bytes for the OP_m, OP_n, and OP_CHECKMULTISIG instructions that are part of the locking script. Additionally, there is a dynamic contribution of 34n bytes, for each of the public-key encodings in the locking script requires a one-byte variable-length integer to indicate the key's size and the 33-byte key itself. The overall size estimate for m-of-n-multisig outputs is thus 12+34n bytes.

In the following, this analytic estimate is verified for 1-of-2 and 1-of-3 multisig outputs, which together amount for more than 98% of all multisig outputs. The estimate for the former is 80 bytes; for the latter it is 114 bytes. The empirical data in the figure below, which contains histograms of the sizes of all 1-of-2 and 1-of-3 multisig outputs as of block 637,302, supports these estimates: the bulk of 1-of-2 and 1-of-3 multisig outputs have a size of 80 and 112 bytes, respectively. In each case, there is a smaller number of outputs that are 32 bytes larger than the estimate. As before, these are artifacts that can be attributed to the early days of Bitcoin, where public keys were encoded using the uncompressed SEC format.

Image

The size of multisig inputs depends on m, the number of signatures required to unlock the referenced output. The overall size is made up of two components: a fixed 42-byte contribution comprising the 32-byte hash, the 4-byte position, the 4-byte sequence number, one byte for the variable-length integer encoding the unlocking script's length, and one byte for the OP_0 instruction that is part of the unlocking script. Additionally, there is a dynamic contribution of 72.5m bytes, for each of the signature encodings in the unlocking script includes a one-byte variable-length integer to indicate the signature size and signature itself, which, on average, has a size of 71.5 bytes. The estimate for the overall size of m-of-n-multisig inputs is thus 44+72.5m bytes.

As before, this analytic estimate is verified for 1-of-2 and 1-of-3 multisig variants. Since m=1 for both variants, they share the same estimate of 114.5 bytes. The empirical data in the figure below, which contains a histogram of the sizes of all 1-of-2 and 1-of-3 multisig outputs as of block 637,302, supports this estimate: as expected, the bulk of all 1-of-2 and 1-of-3 multisig outputs have a size of either 114 or 115 bytes.

Image

Null Data

The size of Null-Data outputs depends on s, the size of the user's payload included in the output. The overall output size is made up of two components: a fixed ten-byte contribution comprising the 8-byte amount, one byte for the variable-length integer encoding the unlocking script's length, and one byte for the OP_RETURN instruction that is part of the locking script. Additionally, there is a dynamic contribution that includes the encoding of the length of the locking script, the encoding of the length of the user's payload, and the actual payload. If the payload is smaller than 75 bytes, its length can be explicitly encoded in Bitcoin script and requires only one byte; if the payload is larger than 75 bytes, an additional OP_PUSHDATA1 instruction is required, increasing the total requirement to encode the length of the payload to two bytes. The overall estimate for Null-Data outputs is thus 11+s bytes when including a payload with a size of up to 75 bytes, and 12+s bytes when including a larger payload.

This analytic estimate is verified for 20- and 80-byte Null-Data outputs, which together amount for more than 90% of all Null-Data outputs. The estimate for the former is 31 bytes; for the latter it is 92 bytes. The empirical data in the figure below, which contains histograms of the sizes of all Null-Data outputs as of block 637,302, supports these estimates.

Image

Pay-to-Script-Hash

Pay-to-Script-Hash (P2SH) outputs have a fixed size of 32 bytes: eight bytes to encode the value, a one-byte variable-length integer encoding the locking script's size, and the 23-byte locking script.

This analytic estimate is corroborated by the empirical data in the figure below, which contains a histogram of all P2SH outputs as of block 637,302, and confirms that all P2SH outputs have a size of 32 bytes.

Image

In contrast to fixed-size P2SH outputs, the size of P2SH inputs varies significantly depending on the type of redeem script included in the input. The most relevant redeem-script use cases are discussed in the following.

Pay-to-Script-Hash-Multi-Signature

The size of P2SH-multisig inputs depends on m, the number of signatures required to satisfy the redeem script; and n, the size of the set of accepted keys. The overall size of P2SH-multisig inputs is made up of two components: a fixed 40-byte contribution comprising the 32-byte hash, the 4-byte position, and the 4-byte sequence number; and a variable contribution including the size of the unlocking script and the encoding of the script's size. The unlocking script, in turn, comprises two components: a redeem script, whose hash matches the hash in the referenced UTXO's locking script, that is interpreted as locking script; and data to satisfy the redeem script.

The redeem script is identical to a bare multisig locking script and comprises: a fixed contribution of three bytes for the OP_m, OP_n, and OP_CHECKMULTISIG script instructions; and a dynamic contribution of 34n bytes, for each of the n encoded public keys in the redeem script contributes a one-byte variable-length integer to encode the keys size and the 33-byte key itself. The redeem script's contribution is thus 3+34n bytes in total. The overhead of the Bitcoin script instruction(s) pushing the redeem script onto the stack depends on the redeem script's size: if it is smaller than 75 bytes, its length can be explicitly encoded using a one-byte Bitcoin script instruction; if the redeem script is larger than 75 bytes, an additional OP_PUSHDATA1 instruction is required, increasing the contribution by one byte.

In addition, the unlocking script contains data to satisfy the redeem script. This data is identical to a bare multisig unlocking script and includes fixed a one-byte contribution for the OP_0 instruction; and a dynamic contribution of 72.5m bytes to encode m 71.5-byte ECDSA signatures and their lengths. The data's contribution is thus 1+72.5m bytes.

The total size of a P2SH-multisig input is thus given by the sum of the fixed 40-byte contribution, the redeem script's size of 3+34n bytes, the one- or two-byte overhead of pushing the redeem script onto the stack, the contribution of the data to satisfy the redeem script's of 1+72.5m bytes, and the contribution of the variable-size integer to encode the unlocking script's length.

This analytic estimate is verified for 2-of-2 and 2-of-3 P2SH-multisig inputs, which together amount for more than 90% of P2SH-multisig inputs. For the former, n=2, so the estimate for the redeem script's size is 3+34n=71 bytes. Because the witness script is smaller than 75 bytes, the overhead of pushing it onto the stack is only one byte. Moreover, m=2, so the contribution of the data to satisfy the redeem script is 146 bytes. The unlocking script's size is thus 218 bytes. Because the script is smaller than 253 bytes, the variable-length integer encoding its size requires only one byte. Together with the fixed contribution of 40 bytes, the estimate for a 2-of-2 P2SH-multisig input is thus 259 bytes.

This estimate is corroborated by empirical data shown in the histogram below, which contains a histogram of all 2-of-2 and 2-of-3 P2SH-multisig inputs as of block 637,302. Almost half of all 2-of-2 P2SH-multisig inputs have a size of 259 bytes. The remaining inputs are either one byte smaller or larger than that, a fact which can be explained by signatures being, on average, 71.5 bytes: absent the low-r optimization, there is a 50% probability of generating either a 71-byte or 72-byte signature; for two signatures, this implies a 25% chance of two 71-byte signatures (corresponding to inputs with a size of 258 bytes), a 50% chance of one 71-byte and a 72-byte signature (resulting in 259-byte inputs), and a 25% chance of two 72-byte signatures (leading to 260-byte inputs).

Image

For 2-of-3 P2SH-multisig inputs n=3, so the estimate for the redeem script's size is 3+34r=105 bytes. Because the witness script is larger than 75 bytes, the overhead of pushing it onto the stack is two bytes: one byte for the OP_PUSHDATA1 instruction, another byte encoding the size of the redeem script. Moreover, m=2, so contribution of the data to satisfy the redeem script is 146 bytes. The unlocking script's size is thus 253 bytes. The script's size in this instance is not smaller than 253 bytes, so the encoding of the unlocking script's size with a variable-length integer requires three bytes. Together with the fixed contribution of 40 bytes, the estimate for a 2-of-3 P2SH-multisig input is thus 296 bytes.

This estimate is also validated by empirical data shown in the histogram above: almost half of all 2-of-3 P2SH multisig inputs have a size of 296 bytes. As before, around 25% of the inputs have a size that is one byte larger than the estimate; again, this circumstance can be explained by the 25% chance of generating two 72-byte signatures compared to the estimate, which uses two times 71.5 bytes. Another 25% of inputs have a size that at 293 bytes is three bytes smaller than the estimate. The reason for this is that in case of two 71-byte signatures, the unlocking script's size is only 252 bytes, which implies that its size can be encoded with a one-byte variable-length integer instead of a three-byte one. The combined reduction in size is thus tree bytes: a one-byte reduction stemming from the use of two 71-byte signatures instead of the estimate of two times 71.5 bytes; another two bytes because only one byte (instead of three) is required to encode the unlocking script's size.

Pay-to-Script-Hash-Pay-to-Witness-Script-Hash-Multi-Signature

Pay-to-Script-Hash-Pay-to-Witness-Script-Hash-Multi-Signature (P2SH-P2WSH-multisig) inputs have a fixed size of 76 bytes: 40 bytes for referencing a UTXO and the sequence number, a one-byte variable-length integer to encode the unlocking script's size, and a 35-byte unlocking script.

The analytic estimate is supported by empirical data, shown in the figure below. The histogram, which displays the size of all P2SH-P2WSH-multisig inputs as of block 637,302, indicates that all such inputs have a size of 76 bytes.

Image

The size P2SH-P2WSH-multisig witnesses depends on m, the number of signatures required to satisfy the witness script; and n, the size of the set of accepted keys. The size of witnesses is entirely dynamic; contributions comprise: a witness script that is interpreted as locking script; data to satisfy the locking script; and variable-length integers indicating the size of each witness item as well as the number of overall items.

The witness script contains the same information as a P2SH-multisig redeem script, so it contributes the same 3+34n bytes. Since the witness script is a witness item, its length it encoded using a variable-length integer.

The data to unlock the witness script is similar to that used in P2SH-multisig inputs with the only difference that signatures are individual witness items whose length is encoded using a variable-length integer instead of being script data that is pushed onto the stack using Bitcoin script instructions. This, however, has no impact on the size: each of the m signatures contributes, on average, 71.5 bytes, plus one byte to encode its length using a variable-length integer. In addition, the data includes a variable-length integer indicating one item of zero length to push an empty item onto the stack (analogous to the OP_0 instruction used when the data is encoded in a Bitcoin script). The overall contribution of the data is thus the same 1+72.5m bytes as was the case for P2SH-multisig inputs.

The total witness size is thus given by the size of the variable-length integer indicating the number of witness items, the data's contributions of 1+72.5m bytes, the witness script's contribution of 3+34n bytes, and the size of the variable-length integer indicating the size of the witness script.

This analytic estimate is verified for 2-of-2 and 2-of-3 P2SH-P2WSH-multisig inputs, which combined amount for more than 90% of P2SH-P2WSH-multisig inputs. For the former, m=n=2, so the data's and witness script's contributions are 1+72.5m=146 bytes and 3+34n=71 bytes, respectively. Together with the two one-byte variable-length integers to indicate the number of witness items (three: two signatures and the witness script) and the length of the witness script, the overall estimate is 219 bytes.

For 2-of-3 P2SH-P2WSH-multisig inputs, m=2 and n=3, so the data's and witness script's contributions are 1+72.5m=146 bytes and 3+34n=105 bytes, respectively. Again, the two variable-length integers indicating the number of witness items and the length of the witness script contribute two bytes, so the overall estimate is 253 bytes.

These estimates are supported by the empirical data shown in the figure below, which contains a histogram of all 2-of-2 and 2-of-3 P2SH-P2WSH-multisig witnesses as of block 637,302. For both multisig variants, the peak of the distribution matches the estimate. As before, inputs that are one byte smaller or larger than the estimate can be explained by the 25% chances of creating either two 71-byte signatures or two 72-byte signatures instead of the estimate, which uses the average of two times 71.5 bytes.

Image

Pay-to-Script-Hash-Pay-to-Witness-Public-Key-Hash

Pay-to-Script-Hash-Pay-to-Witness-Public-Key-Hash (P2SH-P2WPKH) inputs have a fixed size of 64 bytes: 40 bytes for referencing a UTXO and the sequence number, a one-byte variable-length integer to encode the unlocking script's size, and a 23-byte unlocking script.

The analytic estimate is supported by empirical data, shown in the figure below. The histogram, which includes the sizes of all P2SH-P2WPKH inputs as of block 637,302, indicates that all such inputs have a size of 64 bytes.

Image

The average size of P2SH-P2WPKH witnesses is about 107.5 bytes: a one-byte variable-length integer indicating the number of items in the witness (two: a signature and a public key); a 71.5-byte signature and a one-byte variable-length integer encoding its size; and a 33-byte public key and a one-byte variable-length integer encoding its size.

This estimate is corroborated by empirical data. The histogram shown in the figure below displays the sizes of all P2SH-P2WPKH witnesses as of block 637,302. The data indicates more than 60% of all P2SH-P2WPKH witnesses have a size of 107 bytes; the remaining 40% have a size of 108 bytes. The bias toward 107 bytes can be explained by the low-r optimization, which results in 71-byte signatures in comparison to the 71.5-byte average used by the estimate.

Image

Pay-to-Witness-Public-Key-Hash

Pay-to-Witness-Public-Key-Hash (P2WPKH) outputs have fixed size of 31 bytes: eight bytes to encode the value, a one-byte variable-length integer encoding the locking script's size, and a 22-byte locking script.

This analytic estimate is supported by the empirical data shown in the figure below, which contains a histogram of the sizes of all P2WPKH outputs as of block 637,302. As expected, all observed outputs have a size of 31 bytes.

Image

P2WPKH inputs also have fixed size. The contributions of a 32-byte transaction id, a four-byte position, a one-byte variable-length integer, an empty unlocking script, and a four-byte sequence number result in a total size of 41 bytes.

Again, the estimate is corroborated by empirical data. The figure below contains a histogram of the sizes of all P2WPKH inputs as of block 637,302. The data indicates that all of them have a size of 41 bytes.

Image

P2WPKH witnesses contain the same data as P2SH-P2WPKH witnesses: a one-byte variable-length integer indicating the number of items in the witness (two: a signature and a public key); a 71.5-byte signature and a one-byte variable-length integer encoding its size; and a 33-byte public key and a one-byte variable-length integer encoding its size. P2WPKH witnesses thus have the same 107.5-bytes estimate as P2SH-P2WPKH witnesses.

Again, the analytic estimate is supported by empirical data. The histogram shown in the figure below displays the sizes of all P2WPKH witnesses as of block 637,302. The data shows that the bulk of witnesses have a size of either 107 or 108 bytes. As was the case for P2SH-P2WPKH witnesses, the bias toward 107 bytes can be explained by the low-r optimization, which allows producing 71-byte signatures in comparison to the 71.5-byte average used for the estimate.

Image

Pay-to-Witness-Script-Hash

Pay-to-Witness-Script-Hash (P2WSH) outputs have fixed size of 43 bytes: eight bytes to encode the value, a one-byte variable-length integer encoding the locking script's size, and the 34-byte locking script.

This analytic estimate is supported by the empirical data shown in the figure below, which contains a histogram of the size of all P2WSH outputs as of block 637,302. As expected, all observed outputs have a size of 43 bytes.

Image

P2WSH inputs also have fixed size. The contributions of a 32-byte transaction id, a 4-byte position, a one-byte variable-length integer, an empty unlocking script, and a 4-byte sequence number result in a total size of 41 bytes.

As before, the estimate is corroborated by empirical data. The figure below contains a histogram of the size of all P2WSH inputs as of block 637,302. The data indicates that all of them have a size of 41 bytes.

Image

P2WSH witnesses contain the same data as P2SH-P2WSH witnesses. The size of a P2WSH witness is thus given by the following contributions: a variable-length integer indicating the number of witness items; 1+72.5m bytes of data to satisfy the witness script; the witness script, which contributes 3+34n bytes; and another variable-length integer indicating the size of the witness script.

In the following, the analytic estimate is verified for 1-of-1, 2-of-2, and 2-of-3 P2WSH-multisig witnesses, which together amount for more than 98% of all P2WSH-multisig transactions.

For 1-of-1 P2WSH-multisig, the witness script is 3+34n=37 bytes, so the variable-length integer encoding the witness script's size requires only one byte. Moreover, the data to satisfy the witness script contributes 1+72.5m=73.5 bytes. The variable-length integer to encode the number of stack items contributes another byte. The total estimate is thus 112.5 bytes.

This estimate is supported by the empirical data shown in the figure below, which contains histograms of the sizes of all witnesses for the P2WSH-multisig variants under consideration. For 1-of-1 P2WSH-multisig, the observed sizes of 112 or 113 bytes match the analytic estimate.

Image

For 2-of-2 P2WSH-multisig, m=2 and n=2. The witness script is thus 3+34n=71 bytes, and the variable-length integer encoding the witness script's size requires only one byte. Moreover, the data to satisfy the witness script contributes 1+72.5m=146 bytes. The variable-length integer to encode the number of stack items contributes another byte. The total estimate is thus 219 bytes.

For 2-of-3 P2WSH-multisig, m=2 and n=3, so the data's and witness script's contributions are 1+72.5m=146 bytes and 3+34n=105 bytes, respectively. Again, the two variable-length integers indicating the number of witness items and the length of the witness script contribute two bytes, so the overall estimate is 253 bytes.

Both the 2-of-2 and 2-of-3 estimates are also supported by the empirical data shown in the figure above. Note that, as before, in case of 2-of-3 P2WSH-multisig, the bias toward 252 bytes can be explained by the low-r optimization.

Pay-to-Taproot

Pay-to-Taproot (P2TR) outputs have fixed size of 43 bytes: eight bytes to encode the value, a one-byte variable-length integer encoding the locking script's size, and the 34-byte locking script.

P2TR inputs also have fixed size. The contributions of a 32-byte transaction id, a 4-byte position, a one-byte variable-length integer, an empty unlocking script, and a 4-byte sequence number result in a total size of 41 bytes.

In case of key path, P2TR witnesses have a fixed size, too. The contributions of a one-byte variable-length integer indicating the number of witness items, another one-byte variable-length integer to indicate the length of a Schnorr signature, and a 64-byte Schnorr signature result in an estimate of 66 bytes for the total size of the witness.

In case of script path, P2TR witnesses have a variable size. Given the absence of empirical data, it is impossible to derive the sizes of common use cases. Moreover, absent empirical data, the previous analytic estimates cannot be verified.

Transaction format

So far, the sizes of different input, output, and witness types were discussed. To give estimates for transactions, additional data that is part of transactions needs to be taken into account.

In addition to inputs, outputs, and witnesses, transactions include: a four-byte version; in case any of a transaction's inputs uses witness data, a one-byte Segregated Witness (SegWit) marker and one-byte SegWit version; two variable-length integers to indicate the number of inputs, outputs (the number of witnesses is identical to that of inputs, so two variable-length integers are sufficient); and a four-byte lock-time field.

Transaction-size estimates are then given by fixed eight-byte contribution given by the transaction's version and lock time; the size of the variable-length integer indicating the number of inputs and the inputs themselves; the size of the variable-length integer indicating the number of outputs and the inputs themselves. In case of SegWit transactions, the size of the witnesses and the fixed two-byte contribution of the SegWit marker and flag must be considered as well.

Automated transaction-size estimates with libtxsize

libtxsize distills all previous findings into a library for automated transaction-size estimates. libtxsize is written in Python and includes simple Python interfaces to get estimates for input, output, and witnesses sizes, as well as estimates for arbitrary transactions. Moreover, it includes a command-line interface to play around with; estimating a transaction's size is as simple as specifying the desired input and output types:

$ ./libtxsize-cli.py -i P2WPKH,P2SH-1-of-2-multisig -o P2PKH,P2TR
+-------------------------+------------+-------------+-------------+
| Part/Metric             |   size [B] | weight [WU] |  vsize [vB] |
+-------------------------+------------+-------------+-------------+
| INPUTS                  |            |             |             |
| 1. P2WPKH               |         41 |         164 |          41 |
| 2. P2SH-1-of-2-multisig |        186 |         744 |         186 |
+-------------------------+------------+-------------+-------------+
| WITNESSES               |            |             |             |
| 1. P2WPKH               |        107 |         107 |       26.75 |
| 2. P2SH-1-of-2-multisig |        N/A |         N/A |         N/A |
+-------------------------+------------+-------------+-------------+
| OUTPUTS                 |            |             |             |
| 1. P2PKH                |         34 |         136 |          34 |
| 2. P2TR                 |         43 |         172 |          43 |
+-------------------------+------------+-------------+-------------+
| INPUT DATA              |        227 |         908 |         227 |
| WITNESS DATA            |        108 |         108 |        27.0 |
| OUTPUT DATA             |         77 |         308 |       19.25 |
| TRANSACTION OVERHEAD    |         12 |          34 |         8.5 |
+-------------------------+------------+-------------+-------------+
| TRANSACTION TOTAL       |        424 |        1366 |       341.5 |
+-------------------------+------------+-------------+-------------+

In addition to the overall transaction-size estimate, libtxsize also includes information of the different parts of the transaction, such as individual inputs, outputs, and witnesses, as well as transaction overhead. Moreover, libtxsize includes weight and virtual-size estimates in addition to size estimates.

Note that to make results more relevant, estimates assume 71-byte signature sizes based on the low-r optimization, which was introduced in 2018.

Results and Conclusion

The size of different input, output, and witness types were investigated using first-principles analysis. Based on the results of the analysis, estimates were established for all relevant use cases; moreover, all estimates (with the exception of Pay-to-Taproot) were validated using empirical data.

The following table summarizes these findings for quick reference. It includes input, output, and witness sizes for the most relevant use cases. Note that the estimates are based on an ECDSA-signature size of 71.5 bytes.

Type input size witness size output size
P2PK 113.5 B 44 B
P2PKH 147.5 B 34 B
P2WPKH 41 B 107.5 B 31 B
P2TR (key path) 41 B 66 B 43 B
P2WSH-1-of-1-multisig 41 B 112.5 B 43 B
P2WSH-2-of-2-multisig 41 B 219 B 43 B
P2WSH-2-of-3-multisig 41 B 253 B 43 B
P2SH-2-of-2-multisig 259 B 32 B
P2SH-2-of-3-multisig 296 B 32 B
P2SH-P2WSH-2-of-2-multisig 76 B 219 B 32 B
P2SH-P2WSH-2-of-3-multisig 76 B 253 B 32 B
P2SH-P2WPKH 64 B 107.5 B 32 B
Null Data (20-byte payload) 31 B
Null Data (80-byte payload) 92 B
Bare multisig (1-of-2) 114.5 B 80 B
Bare multisig (1-of-3) 114.5 B 114 B

Moreover, libtxsize, a library that automates transaction-size estimates was presented.

If you found the information in this article useful, feel free to contribute: 16pGpaoAhzoneLdRdxPSo9xAAPhzWnP2dA. If you have scientific, Bitcoin-related freelance work, let me know.