Skip to content

Latest commit

 

History

History
200 lines (134 loc) · 10.7 KB

adr-003-march2022-testnet.md

File metadata and controls

200 lines (134 loc) · 10.7 KB

ADR #003: March 2022 Testnet Celestia Node


Authors

@renaynay @Wondertan

Changelog

  • 2021-11-25: initial draft
  • 2022-03-30: update to bridge node definition

Legend

Celestia DA Network

Refers to the data availability "halo" network created around the Core network.

Bridge Node

A bridge node is a node that is connected to a celestia-core node via RPC. It receives a remote address from a running celestia-core node and listens for blocks from celestia-core. For each new block from celestia-core, the bridge node performs basic validation on the block via ValidateBasic(), extends the block data, generates a Data Availability Header (DAH) from the extended block data, and creates an ExtendedHeader from the block header and the DAH, and finally broadcasts it to the data availability network (DA network).

A bridge node does not care about what kind of celestia-core node it is connected to (validator or regular full node), it only cares that it has a direct RPC connection to a celestia-core node from which it can listen for new blocks.

The name bridge was chosen as the purpose of this node type is to provide a mechanism to relay celestia-core blocks to the data availability network.

Full Node

A full node is the same thing as a light node, but instead of performing LightAvailability (the process of DASing to verify a header is legitimate), it performs FullAvailability which downloads enough shares from the network in order to fully reconstruct the block and store it, serving shares to the rest of the network.

Light Node

A light node listens for ExtendedHeaders from the DA network and performs DAS on the received headers.


Context

This ADR describes a design for the March 2022 Celestia Testnet that we decided at the Berlin 2021 offsite. Now that we have a basic scaffolding and structure for a celestia node, the focus of the next engineering sprint is to continue refactoring and improving this structure to include more features (defined later in this document).


Decision

New Features

  • Introduce a standalone full node and rename current full node implementation to bridge node.
  • Remove dev as a node type and make it a flag on every available node type.

Introduce bad encoding fraud proofs

Bad encoding fraud proofs will be generated by full nodes inside of ShareService, upon reconstructing a block via the sampling process.

If fraud is detected, the full node will generate the proof and broadcast it to the FraudSub gossip network and will subsequently halt all operations. If no fraud is detected, the full node will continue operations without propagating any messages to the network. Since full nodes reconstruct every block, they do not have to listen to FraudSub as they perform the necessary encoding checks on every block.

Light nodes, however, will listen to FraudSub for bad encoding fraud proofs. Light nodes will verify the fraud proofs against the relevant header hash to ensure that the fraud proof is valid. If the fraud proof is valid, the node should immediately halt all operations. If it is invalid, the node proceeds operations as usual.

Eventually, we may choose to use the reputation tracking system provided by gossipsub for nodes who broadcast invalid fraud proofs to the network, but that is not a requirement for this iteration.

Implement scaffolding for RPC on all node types, such that a user can access the following methods:

HeaderAPI

  • Header(_height_) -> ExtendedHeader{}
  • Header(_hash_) -> ExtendedHeader{}

NodeAPI

  • P2PInfo() -> returns a blob of p2p info (can be broken into several subcommands, such as net_info)
  • Config() -> returns the node's config
  • NodeType() -> returns the node's type (e.g. full | bridge | light )
  • RPCInfo() -> RPC port, version, available APIs, etc.

UserAPI

  • AccountBalance(_acct_) -> returns balance for given account
  • SubmitTx(_txdata_) -> submits a transaction to the network

Note: it is likely more methods will be added, but the above listed are the essential ones for this iteration.

Introduce StateService

StateService is responsible for fetching state relevant to a user being able to submit a transaction, such as account balance, preparing the transaction, and propagating it via TxSub. Bridge nodes will be responsible for listening to TxSub and relaying the transactions into the Core mempool. Light and full nodes will be able to publish transactions to TxSub, but do not need to listen for them.

Celestia-node's state interaction will be detailed further in a subsequent ADR.

Currently, both light and *full nodes are unable to perform data availability sampling (DAS) while syncing. They only begin sampling once the node is synced up to head of chain.

HeaderSync and the DASer will be refactored such that the DASer will be able to perform sampling on past headers as the node is syncing. A possible approach would be to for the syncing algorithms in both the DASer and HeaderSync to align such that headers received during sync will be propagated to the DASer for sampling via an internal pubsub.

The DASer will maintain a checkpoint to the last sampled header so that it can continue sampling from the last checkpoint on any new headers.


Refactoring

HeaderService becomes main component around which most other services are focused

Initially, we started with BlockService being the more “important” component during devnet architecture, but overlooked some problems with regards to sync (we initially made the decision that a celestia full node would have to be started together at the same time as a core node).

This led us to an issue where eventually we needed to connect to an already-running core node and sync from it. We were missing a component to do that, so we implemented HeaderExchange over the core client (wrapping another interface we had previously created for BlockService called BlockFetcher), and we had to do this last minute because it wouldn’t work otherwise, leading to last-minute solutions, like having to hand both the celestia light and full node a “trusted” hash of a header from the already-running chain so that it can sync from that point and start listening for new headers.

Right now, the BlockService is in charge of fetching new blocks from the core node, erasure coding them, generating DAH, generating ExtendedHeader, broadcasting ExtendedHeader to HeaderSub network, and storing the block data (after some validation checks).

Instead, a full node will rely on ShareService sampling to fetch us enough shares to reconstruct the block inside of BlockService. Contrastingly, a bridge node will not do block reconstruction via sampling, but rather rely on the header.CoreSubscriber implementation of header.Subscriber for blocks. header.CoreSubscriber will handle listening for new block events from the core node via RPC, erasure code the new block, generate the ExtendedHeader and pipe the erasure coded block through to BlockService via an internal subscription.

HeaderSync optimizations

  • Implement disconnect toleration

Unbonding period handling

The light and full nodes currently are prone to long-range attacks. To mitigate it, we should introduce an additional trustPeriod variable (equal to unbonding period) which applies to headers. Suppose a node starts with the period between subjective head and objective head being higher than the unbonding period - in that case, the light node must not trust the subjective head anymore, specifically its ValidatorSet. Therefore, instead of syncing subsequent headers on top of the untrusted subjective head, the node should request a new objective head from the trustedPeer and set it as a new trusted subjective head. This approach will follow the Tendermint model for light client attack detection.


Nice to have

ShareService optimizations

  • Implement parallelization for retrieving shares by namespace. This issue is already being worked on.
  • NMT/Shares/Namespace storage optimizations:
    • Right now we prepend to each Share 17 additional bytes, Luckily, for each reason why the prepended bytes were added, there is an alternative solution: It is possible to get NMT Node type indirectly, without serializing the type itself by looking at the amount of links. To recover the namespace of the erasured data, we should not encode namespaces into the data itself. It is possible to get the namespace for each share encoded in inner non-leaf nodes of the NMT tree.
  • Pruning for shares.

Since the IPLD package is pretty much entirely separate from the celestia-node implementation, it makes sense that it is removed from the celestia-node repository and maintained separately. The extraction of IPLD should also include a review and refactoring as there are still some legacy components that are either no longer necessary and the documentation also needs updating.

Implement additional light node verification logic similar to the Tendermint Light Client Model

At the moment, the syncing logic for a light nodes is simple in that it syncs each header from a single peer. Instead, the light node should double-check headers with another randomly chosen "witness" peer than the primary peer from which it received the header, as described in the light client attack detector model from Tendermint.