My Summer of Bitcoin 2022 Experience


What is Summer of Bitcoin?

a global, online summer internship program focused on introducing university students to bitcoin open-source development and design

When I started to write my proposal for it, there was only one week to go before the deadline. Luckily, my experience with OSS-Fuzz and my efforts paid off. I am proud to become one of the 83 students who will participate in Summer of Bitcoin 2022 and one of the 5 students to contribute to Bitcoin Core under the guidance of Marco Falke.

Onboarding to Bitcoin Core

After a Jitsi meeting with my mentor, I get some resources.

To compile the fuzzer with lcov enabled:

fuzzopt="--enable-fuzz --with-sanitizers=fuzzer"
lcovopt="--enable-lcov --enable-lcov-branch-coverage"                                                    
CC=clang CXX=clang++ ./configure --enable-debug --with-gui=no $fuzzopt $lcovopt
make -j$(nproc)

To test fuzz target and generate coverage report:

# put seed corpus in qa-assets/fuzz_seed_corpus
FUZZ=$TARGET ./src/test/fuzz/fuzz
# find . -name "*.gcda" -type f -delete
make cov_fuzz  # generate html report under fuzz.coverage

Fuzzing Orphan Transaction

I made a 5-minute presentation to my classmates, trying to illustrate what orphan transaction is and why it can cause problems.

I also wrote an article about Bitcoin’s fuzzing infrastracture, which has been posted on Summer of Bitcoin’s official blog.

The code that handles orphan transactions is in src/txorphanage.cpp, of which the code coverage is about 65%.

To be specific, 3 out of 8 functions are not covered yet, which are listed below.

void TxOrphanage::AddChildrenToWorkSet(const CTransaction& tx, std::set<uint256>& orphan_work_set) const
std::pair<CTransactionRef, NodeId> TxOrphanage::GetTx(const uint256& txid) const
void TxOrphanage::EraseForBlock(const CBlock& block)

I find that AddChildrenToWorkSet will be called only when a new tx is validated, then it iterate over its children and try to find some orphan who use it as prevout. These orphans will be added to workset and be processed before a new non-orphan is processed, where GetTx is called.

The low-level fuzz target

After learning a bit about bitcoin p2p network transaction replay, I just trying to base the orphanage fuzz target on the existing process_message harness. However, I find it difficult to simulate the orphan and unorphan process by consuming random datastream, so fine-grained construction of transaction is needed.

My mentor kindly reminded me that I could take a look at the tx_pool_standard target, which constructs well defined transactions. After some research, I was still confused about the setup process, like PeerManager, mempool and transaction validation. Again it was my mentor that gave me some hints.

For orphan handling we don’t care about the layout of the transaction itself.

The goal of fuzzing low level (on a “unit test”/function level) is obviously finer control and mocking capabilities, however it may be less relevant to “real world” program execution. Thus, it may also be interesting to fuzz on a higher level of, let’s say PeerManager.

I think if you want to implement the low level, you wouldn’t need a mempool, just transactions and the txorphanage.

If you want to implement the higher level, you will likely need a similar module setup to the process_messages fuzz target. However, the transaction creation part would likely be the same for both fuzz targets.

Implementing a fuzz target for just the functions in txoprhanage should indeed be simpler.

I think you can just remove the chainstate/mempool stuff from the tx_pool_standard and then call into txorphanage wherever it called into mempool. (After all, txorphanage is just like the mempool a data structure to store transactions).

The tx-creation function you can leave as-is or simplify a bit, as most parts aren’t needed.

After some dumb commits, I created PR#25447 to add low-level fuzz target for txorphanage, which got merged after lots of review and discussion.

The bug in Fuzz Target itself

Soon after the txorphan fuzz target get merged, my mentor reminded me there was a bug in the fuzz target:

I quickly found that that was due to a wrong assertion. AddTx will return false when tx has weight larger than the max limit, I thought it will not have too much weight so I just asserted it should return true if tx is not in m_orphans.

I submitted the fix in and it got merged.

Another bug did not appear until a week later:

Interestingly, it looks like oss-fuzz found another issue, but didn’t report it until now:

There are 3 orphans and all of them are erased due to expiration in nErased is used to count them but the function only returned nEvicted, which is 0 since they were counted only in the last loop.

I thought it was a confusing behavior andnErased should be added to nEvicted so the function return the correct number of removed tx, but my mentor argued that returning void is enough. Finally the PR is merged to make small refactoring and fix the error.

LimitOrphans() removes expired transactions from the orphanage and additionally evicts transactions at random should the limit still be exceeded after removing expired transactions. Before this PR, the number of evicted transactions (excluding the number of expired transactions) was returned, which let to hitting an asserting in the fuzz tests that assumed the return value to be the total number of removed transactions (evicted + expired).

Package Relay

I am actually thinking that we could look into fuzzing package relay, as package relay will potentially replace txorphans.

So my basic idea would be to get a nice function to create a “package” of transactions. This primitive will be likely be needed by both high-level orphan acceptance handling, as well as mempool package acceptance handling.

The definition of a package:

I have made a draft PR to test package processing, currently it contains some faults, my mentor and I are not so familiar with package processing so we are still trying to figure out the root cause.


Week 1

Reading material:

bitcoin-paper-errata-and-details describes known problems in Satoshi Nakamoto’s whitepaper and terminology changes in Bitcoin’s implementation.

Bitcoin’s Academic Pedigree is a complete survey tracing the origins of the key ideas that Nakamoto applied to Bitcoin. By reading this, we can zero in on Nakamoto’s true leap of insight—the specific, complex way in which the underlying components are put together

After reviewing The Incomplete History of Bitcoin Development, I admire the way Nakamoto and his collaborators developed Bitcoin sustainably. If I’d Known What We Were Starting emphasizes the trustless nature of Bitcoin, which is forgotten by many short-lived altcoins.

Moreover, I have learned a lot about Bitcoin’s security model by diving deep into the assumptions and guarantees.

Glossaries I have learned:

  • full node, pruned node, SPV node
  • CoinJoin
  • Sybil attack
  • selfish attack
  • checkpoints

Scaling Bitcoin: A trip to the moon requires a rocket with multiple stages describes the limitation of Bitcoin’s capacity and scalability, which makes it inefficient for payment. Bitcoin’s primary distinguishing values are monetary sovereignty, censorship resistance, trust cost minimization, international accessibility/borderless operation, etc. So it doesn’t need to compete with Visa/Mastercard to succeed. The potential of scalable transactions may be achieved by upper layers such as Lightning Network.

Question: Do you believe that bitcoin needs to be competitive with Visa/Mastercard to succeed?

The article has explained the reasons why Bitcoin itself is not going to compete with Visa/Mastercard from a technical perspective.

If we think from a sociological perspective, Bitcoin cannot replace traditional Visa/Mastercard because of its decentralized nature. The real world runs in a centralized way where the banks are supervised by the governments. Even Bitcoin itself is mostly traded in centralized cryptocurrency exchanges such as Binance, which need approval from regulators to operate. It will be suicide for Bitcoin to become more centralized in order to beat Visa/Mastercard.

To conclude, I think it is difficult for Bitcoin to compete with Visa/Mastercard. But it does not have to be used for daily transactions to succeed. It can be a kind of investment just like gold.

The answer from my partner:

In my view, bitcoin is a digital currency, a store-of-value (incorruptible store of value), and a final settlement payment system (if you would like to see it in that way), so it does not compete with Visa or Mastercard, as they mainly focus on selling credits for its users.

The bitcoin network needs to focus on its values: sovereignty, privacy, censorship resistance, and others, instead of trying to focus on TPS metrics such as other payment systems which are easily built and replicated in a centralized fashion. If the main values are maintained and persisted we get sound money, and sound infrastructure and not in competing with legacy payment systems.

I agree with the text author in such a way that the bitcoin because of its core values is the base layer for other ideas, and payment focused systems/layers (such as the lightning network) can be built on, for example, the internet has the same layered structure, as application, transport, network layers, for bitcoin it’s the network layer and with a base protocol set that turns possible to build other layers like transport for lightning and even application layers such as the third-party centralized apps.

All that said, bitcoin does not compete with payment or credit systems, it competes with systems that could be a solid base layer, sound money, and store of value, and all the rest can be relied on.

Week 2

Reading material:

How does SegWit affect initial block download (IBD)?

As described in Segregated Witness Costs and Risks, SegWit allows larger blocks, which means the transaction and storage cost will be higher.

But I find that SegWit can actually speed up IBD by skipping download of historic signatures, as lised in Bitcoin Core 0.14.0 Release Note of IBD Performance Improvements.

Release Major improvements in IBD
0.5.0 Skip verification of historic (checkpointed) signatures
0.8.0 Switch to LevelDB & parallel signature validation
0.10.0 Headers-first sync and parallel block download
0.11.0 Optional block file pruning to save disk space
0.12.0 New fast signature validation library written from scratch (libsecp256k1)
0.13.1 Segwit to allow skipping download of historic signatures in the future

The detail can be found in Bitcoin Core 0.13.1 Release Note

  • More efficient almost-full-node security Satoshi Nakamoto’s original Bitcoin paper describes a method for allowing newly-started full nodes to skip downloading and validating some data from historic blocks that are protected by large amounts of proof of work. Unfortunately, Nakamoto’s method can’t guarantee that a newly-started node using this method will produce an accurate copy of Bitcoin’s current ledger (called the UTXO set), making the node vulnerable to falling out of consensus with other nodes. Although the problems with Nakamoto’s method can’t be fixed in a soft fork, Segwit accomplishes something similar to his original proposal: it makes it possible for a node to optionally skip downloading some blockchain data (specifically, the segregated witnesses) while still ensuring that the node can build an accurate copy of the UTXO set for the block chain with the most proof of work. Segwit enables this capability at the consensus layer, but note that Bitcoin Core does not provide an option to use this capability as of this 0.13.1 release.

Week 3

Reading material:

Can you ensure the transaction will be processed even if you send it with low fees? Which mechanisms do you have to ensure a stuck transaction (due to low fees) gets processed?

Week 4

Reading material:

What is the rationale behind the “new”/“tried” table design? Were there any prior inspirations within the field of distributed computing?

评论正在加载中...如果评论较长时间无法加载,你可以 搜索对应的 issue 或者 新建一个 issue