ADR #003: March 2022 Testnet Celestia Node

Authors

@renaynay @Wondertan

Changelog

2021-11-25: initial draft
2022-03-30: update to bridge node definition

Legend

Celestia DA Network

Refers to the data availability "halo" network created around the Core network.

Bridge Node

A bridge node is a node that is connected to a celestia-core node via RPC. It receives a remote address from a running celestia-core node and listens for blocks from celestia-core. For each new block from celestia-core, the bridge node performs basic validation on the block via ValidateBasic(), extends the block data, generates a Data Availability Header (DAH) from the extended block data, and creates an ExtendedHeader from the block header and the DAH, and finally broadcasts it to the data availability network (DA network).

A bridge node does not care about what kind of celestia-core node it is connected to (validator or regular full node), it only cares that it has a direct RPC connection to a celestia-core node from which it can listen for new blocks.

The name bridge was chosen as the purpose of this node type is to provide a mechanism to relay celestia-core blocks to the data availability network.

Full Node

A full node is the same thing as a light node, but instead of performing LightAvailability (the process of DASing to verify a header is legitimate), it performs FullAvailability which downloads enough shares from the network in order to fully reconstruct the block and store it, serving shares to the rest of the network.

Light Node

A light node listens for ExtendedHeaders from the DA network and performs DAS on the received headers.

Context

This ADR describes a design for the March 2022 Celestia Testnet that we decided at the Berlin 2021 offsite. Now that we have a basic scaffolding and structure for a celestia node, the focus of the next engineering sprint is to continue refactoring and improving this structure to include more features (defined later in this document).

Decision

New Features

New node type definitions

Introduce a standalone full node and rename current full node implementation to bridge node.
Remove dev as a node type and make it a flag on every available node type.

Introduce bad encoding fraud proofs

Bad encoding fraud proofs will be generated by full nodes inside of ShareService, upon reconstructing a block via the sampling process.

If fraud is detected, the full node will generate the proof and broadcast it to the FraudSub gossip network and will subsequently halt all operations. If no fraud is detected, the full node will continue operations without propagating any messages to the network. Since full nodes reconstruct every block, they do not have to listen to FraudSub as they perform the necessary encoding checks on every block.

Light nodes, however, will listen to FraudSub for bad encoding fraud proofs. Light nodes will verify the fraud proofs against the relevant header hash to ensure that the fraud proof is valid. If the fraud proof is valid, the node should immediately halt all operations. If it is invalid, the node proceeds operations as usual.

Eventually, we may choose to use the reputation tracking system provided by gossipsub for nodes who broadcast invalid fraud proofs to the network, but that is not a requirement for this iteration.

Introduce an RPC structure and some basic APIs

Implement scaffolding for RPC on all node types, such that a user can access the following methods:

HeaderAPI

Header(_height_) -> ExtendedHeader{}
Header(_hash_) -> ExtendedHeader{}

NodeAPI

P2PInfo() -> returns a blob of p2p info (can be broken into several subcommands, such as net_info)
Config() -> returns the node's config
NodeType() -> returns the node's type (e.g. full | bridge | light )
RPCInfo() -> RPC port, version, available APIs, etc.

UserAPI

AccountBalance(_acct_) -> returns balance for given account
SubmitTx(_txdata_) -> submits a transaction to the network

Note: it is likely more methods will be added, but the above listed are the essential ones for this iteration.

Introduce `StateService`

StateService is responsible for fetching state relevant to a user being able to submit a transaction, such as account balance, preparing the transaction, and propagating it via TxSub. Bridge nodes will be responsible for listening to TxSub and relaying the transactions into the Core mempool. Light and full nodes will be able to publish transactions to TxSub, but do not need to listen for them.

Celestia-node's state interaction will be detailed further in a subsequent ADR.

Data Availability Sampling during `HeaderSync`

Currently, both light and *full nodes are unable to perform data availability sampling (DAS) while syncing. They only begin sampling once the node is synced up to head of chain.

HeaderSync and the DASer will be refactored such that the DASer will be able to perform sampling on past headers as the node is syncing. A possible approach would be to for the syncing algorithms in both the DASer and HeaderSync to align such that headers received during sync will be propagated to the DASer for sampling via an internal pubsub.

The DASer will maintain a checkpoint to the last sampled header so that it can continue sampling from the last checkpoint on any new headers.

Refactoring

`HeaderService` becomes main component around which most other services are focused

Initially, we started with BlockService being the more “important” component during devnet architecture, but overlooked some problems with regards to sync (we initially made the decision that a celestia full node would have to be started together at the same time as a core node).

This led us to an issue where eventually we needed to connect to an already-running core node and sync from it. We were missing a component to do that, so we implemented HeaderExchange over the core client (wrapping another interface we had previously created for BlockService called BlockFetcher), and we had to do this last minute because it wouldn’t work otherwise, leading to last-minute solutions, like having to hand both the celestia light and full node a “trusted” hash of a header from the already-running chain so that it can sync from that point and start listening for new headers.

Proposed new architecture: `BlockService` is only responsible for reconstructing the block from Shares handed to it by the `ShareService`

Right now, the BlockService is in charge of fetching new blocks from the core node, erasure coding them, generating DAH, generating ExtendedHeader, broadcasting ExtendedHeader to HeaderSub network, and storing the block data (after some validation checks).

Instead, a full node will rely on ShareService sampling to fetch us enough shares to reconstruct the block inside of BlockService. Contrastingly, a bridge node will not do block reconstruction via sampling, but rather rely on the header.CoreSubscriber implementation of header.Subscriber for blocks. header.CoreSubscriber will handle listening for new block events from the core node via RPC, erasure code the new block, generate the ExtendedHeader and pipe the erasure coded block through to BlockService via an internal subscription.

`HeaderSync` optimizations

Implement disconnect toleration

Unbonding period handling

The light and full nodes currently are prone to long-range attacks. To mitigate it, we should introduce an additional trustPeriod variable (equal to unbonding period) which applies to headers. Suppose a node starts with the period between subjective head and objective head being higher than the unbonding period - in that case, the light node must not trust the subjective head anymore, specifically its ValidatorSet. Therefore, instead of syncing subsequent headers on top of the untrusted subjective head, the node should request a new objective head from the trustedPeer and set it as a new trusted subjective head. This approach will follow the Tendermint model for light client attack detection.

Nice to have

`ShareService` optimizations

Implement parallelization for retrieving shares by namespace. This issue is already being worked on.
NMT/Shares/Namespace storage optimizations:
- Right now we prepend to each Share 17 additional bytes, Luckily, for each reason why the prepended bytes were added, there is an alternative solution: It is possible to get NMT Node type indirectly, without serializing the type itself by looking at the amount of links. To recover the namespace of the erasured data, we should not encode namespaces into the data itself. It is possible to get the namespace for each share encoded in inner non-leaf nodes of the NMT tree.
Pruning for shares.

Move IPLD from celetia-node repo into its own repo

Since the IPLD package is pretty much entirely separate from the celestia-node implementation, it makes sense that it is removed from the celestia-node repository and maintained separately. The extraction of IPLD should also include a review and refactoring as there are still some legacy components that are either no longer necessary and the documentation also needs updating.

Implement additional light node verification logic similar to the Tendermint Light Client Model

At the moment, the syncing logic for a light nodes is simple in that it syncs each header from a single peer. Instead, the light node should double-check headers with another randomly chosen "witness" peer than the primary peer from which it received the header, as described in the light client attack detector model from Tendermint.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adr-003-march2022-testnet.md

adr-003-march2022-testnet.md

ADR #003: March 2022 Testnet Celestia Node

Authors

Changelog

Legend

Celestia DA Network

Bridge Node

Full Node

Light Node

Context

Decision

New Features

New node type definitions

Introduce bad encoding fraud proofs

Introduce an RPC structure and some basic APIs

Introduce `StateService`

Data Availability Sampling during `HeaderSync`

Refactoring

`HeaderService` becomes main component around which most other services are focused

Proposed new architecture: `BlockService` is only responsible for reconstructing the block from Shares handed to it by the `ShareService`

`HeaderSync` optimizations

Unbonding period handling

Nice to have

`ShareService` optimizations

Move IPLD from celetia-node repo into its own repo

Implement additional light node verification logic similar to the Tendermint Light Client Model

Files

adr-003-march2022-testnet.md

Latest commit

History

adr-003-march2022-testnet.md

File metadata and controls

ADR #003: March 2022 Testnet Celestia Node

Authors

Changelog

Legend

Celestia DA Network

Bridge Node

Full Node

Light Node

Context

Decision

New Features

New node type definitions

Introduce bad encoding fraud proofs

Introduce an RPC structure and some basic APIs

Introduce StateService

Data Availability Sampling during HeaderSync

Refactoring

HeaderService becomes main component around which most other services are focused

Proposed new architecture: BlockService is only responsible for reconstructing the block from Shares handed to it by the ShareService

HeaderSync optimizations

Unbonding period handling

Nice to have

ShareService optimizations

Move IPLD from celetia-node repo into its own repo

Implement additional light node verification logic similar to the Tendermint Light Client Model

Introduce `StateService`

Data Availability Sampling during `HeaderSync`

`HeaderService` becomes main component around which most other services are focused

Proposed new architecture: `BlockService` is only responsible for reconstructing the block from Shares handed to it by the `ShareService`

`HeaderSync` optimizations

`ShareService` optimizations