Data Structures

Type Aliases

name	type
`Amount`	`uint64`
`Graffiti`	`byte[MAX_GRAFFITI_BYTES]`
`HashDigest`	`byte[32]`
`Height`	`int64`
`Nonce`	`uint64`
`Round`	`int32`
`StateSubtreeID`	`byte`
`Timestamp`	`google.protobuf.Timestamp`
`VotingPower`	`uint64`

Blockchain Data Structures

Block

Blocks are the top-level data structure of the Celestia blockchain.

name	type	description
`header`	Header	Block header. Contains primarily identification info and commitments.
`availableDataHeader`	AvailableDataHeader	Header of available data. Contains commitments to erasure-coded data.
`availableData`	AvailableData	Data that is erasure-coded for availability.
`lastCommit`	Commit	Previous block's Tendermint commit.

Block header, which is fully downloaded by both full clients and light clients.

name	type	description
`version`	ConsensusVersion	The consensus version struct.
`chainID`	`string`	The `CHAIN_ID`.
`height`	Height	Block height. The genesis block is at height `1`.
`timestamp`	Timestamp	Timestamp of this block.
`lastHeaderHash`	HashDigest	Previous block's header hash.
`lastCommitHash`	HashDigest	Previous block's Tendermint commit hash.
`consensusHash`	HashDigest	Hash of consensus parameters for this block.
`AppHash`	HashDigest	The state root after the previous block's transactions are applied.
`availableDataOriginalSharesUsed`	`uint64`	The number of shares used in the original data square that are not tail padding.
`availableDataRoot`	HashDigest	Root of commitments to erasure-coded data.
`proposerAddress`	Address	Address of this block's proposer.

The size of the original data square, availableDataOriginalSquareSize, isn't explicitly declared in the block header. Instead, it is implicitly computed as the smallest power of 2 whose square is at least availableDataOriginalSharesUsed (in other words, the smallest power of 4 that is at least availableDataOriginalSharesUsed).

The header hash is the hash of the serialized header.

AvailableDataHeader

name	type	description
`rowRoots`	HashDigest`[]`	Commitments to all erasure-coded data.
`colRoots`	HashDigest`[]`	Commitments to all erasure-coded data.

The number of row/column roots of the original data shares in square layout for this block. The availableDataRoot of the header is computed using the compact row and column roots as described here.

The number of row and column roots is each availableDataOriginalSquareSize * 2, and must be a power of 2. Note that the minimum availableDataOriginalSquareSize is 1 (not 0), therefore the number of row and column roots are each at least 2.

Implementations can prune rows containing only tail padding as they are implicitly available.

AvailableData

Data that is erasure-coded for data availability checks.

name	type	description
`transactions`	Transaction	Transactions are ordinary Cosmos SDK transactions. For example: they may modify the validator set and token balances.
`payForBlobData`	PayForBlobData	PayForBlob data. Transactions that pay for blobs to be included.
`blobData`	BlobData	Blob data is arbitrary user submitted data that will be published to the Celestia blockchain.

Commit

name	type	description
`height`	Height	Block height.
`round`	Round	Round. Incremented on view change.
`headerHash`	HashDigest	Header hash of the previous block.
`signatures`	CommitSig`[]`	List of signatures.

Timestamp

Timestamp is a type alias.

Celestia uses google.protobuf.Timestamp to represent time.

HashDigest

HashDigest is a type alias.

Output of the hashing function. Exactly 256 bits (32 bytes) long.

TransactionFee

name	type	description
`tipRate`	`uint64`	The tip rate for this transaction.

Abstraction over transaction fees.

Address

Celestia supports secp256k1 keys where addresses are 20 bytes in length.

name	type	description
`AccAddress`	`[20]byte`	AccAddress a wrapper around bytes meant to represent an account address

CommitSig

enum CommitFlag : uint8_t {
    CommitFlagAbsent = 1,
    CommitFlagCommit = 2,
    CommitFlagNil = 3,
};

name	type	description
`commitFlag`	`CommitFlag`
`validatorAddress`	Address
`timestamp`	Timestamp
`signature`	Signature

Signature

name	type	description
`r`	`byte[32]`	`r` value of the signature.
`s`	`byte[32]`	`s` value of signature.

ConsensusVersion

name	type	description
`block`	`uint64`	The `VERSION_BLOCK`.
`app`	`uint64`	The app version.

Serialization

Objects that are committed to or signed over require a canonical serialization. This is done using a deterministic (and thus, bijective) variant of protobuf defined here.

Note: there are two requirements for a serialization scheme, should this need to be changed:

Must be bijective.
Serialization must include the length of dynamic structures (e.g. arrays with variable length).

Hashing

All protocol-level hashing is done using SHA-2-256 as defined in FIPS 180-4. SHA-2-256 outputs a digest that is 256 bits (i.e. 32 bytes) long.

Libraries implementing SHA-2-256 are available in Go (https://pkg.go.dev/crypto/sha256) and Rust (https://docs.rs/sha2/latest/sha2/).

Unless otherwise indicated explicitly, objects are first serialized before being hashed.

Merkle Trees

Merkle trees are used to authenticate various pieces of data across the Celestia stack, including transactions, blobs, the validator set, etc. This section provides an overview of the different tree types used, and specifies how to construct them.

Binary Merkle Tree

Binary Merkle trees are constructed in the same fashion as described in Certificate Transparency (RFC-6962), except for using a different hashing function. Leaves are hashed once to get leaf node values and internal node values are the hash of the concatenation of their children (either leaf nodes or other internal nodes).

Nodes contain a single field:

name	type	description
`v`	HashDigest	Node value.

The base case (an empty tree) is defined as the hash of the empty string:

node.v = 0xe3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

For leaf node node of leaf data d:

node.v = h(0x00, serialize(d))

For internal node node with children l and r:

node.v = h(0x01, l.v, r.v)

Note that rather than duplicating the last node if there are an odd number of nodes (the Bitcoin design), trees are allowed to be imbalanced. In other words, the height of each leaf may be different. For an example, see Section 2.1.3 of Certificate Transparency (RFC-6962).

Leaves and internal nodes are hashed differently: the one-byte 0x00 is prepended for leaf nodes while 0x01 is prepended for internal nodes. This avoids a second-preimage attack where internal nodes are presented as leaves trees with leaves at different heights.

BinaryMerkleTreeInclusionProof

name	type	description
`siblings`	HashDigest`[]`	Sibling hash values, ordered starting from the leaf's neighbor.

A proof for a leaf in a binary Merkle tree, as per Section 2.1.1 of Certificate Transparency (RFC-6962).

Namespace Merkle Tree

Shares in Celestia are associated with a provided namespace. The Namespace Merkle Tree (NMT) is a variation of the Merkle Interval Tree, which is itself an extension of the Merkle Sum Tree. It allows for compact proofs around the inclusion or exclusion of shares with particular namespace IDs.

Nodes contain three fields:

name	type	description
`n_min`	Namespace	Min namespace in subtree rooted at this node.
`n_max`	Namespace	Max namespace in subtree rooted at this node.
`v`	HashDigest	Node value.

The base case (an empty tree) is defined as:

node.n_min = 0x0000000000000000
node.n_max = 0x0000000000000000
node.v = 0xe3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

For leaf node node of share data d:

node.n_min = d.namespace
node.n_max = d.namespace
node.v = h(0x00, d.namespace, d.rawData)

The namespace blob field here is the namespace of the leaf, which is a NAMESPACE_SIZE-long byte array.

Leaves in an NMT must be lexicographically sorted by namespace in ascending order.

For internal node node with children l and r:

node.n_min = min(l.n_min, r.n_min)
if l.n_min == PARITY_SHARE_NAMESPACE
  node.n_max = PARITY_SHARE_NAMESPACE
else if r.n_min == PARITY_SHARE_NAMESPACE
  node.n_max = l.n_max
else
  node.n_max = max(l.n_max, r.n_max)
node.v = h(0x01, l.n_min, l.n_max, l.v, r.n_min, r.n_max, r.v)

Note that the above snippet leverages the property that leaves are sorted by namespace: if l.n_min is PARITY_SHARE_NAMESPACE, so must {l,r}.n_max. By construction, either both the min and max namespace of a node will be PARITY_SHARE_NAMESPACE, or neither will: if r.n_min is PARITY_SHARE_NAMESPACE, so must r.n_max.

For some intuition: the min and max namespace for subtree roots with at least one non-parity leaf (which includes the root of an NMT, as the right half of an NMT as used in Celestia will be parity shares) ignore the namespace ID for the parity leaves. Subtree roots with only parity leaves have their min and max namespace ID set to PARITY_SHARE_NAMESPACE. This allows for shorter proofs into the tree than if the namespace ID of parity shares was not ignored (which would cause the max namespace ID of the root to always be PARITY_SHARE_NAMESPACE).

A compact commitment can be computed by taking the hash of the serialized root node.

NamespaceMerkleTreeInclusionProof

name	type	description
`siblingValues`	HashDigest`[]`	Sibling hash values, ordered starting from the leaf's neighbor.
`siblingMins`	Namespace`[]`	Sibling min namespace IDs.
`siblingMaxes`	Namespace`[]`	Sibling max namespace IDs.

When verifying an NMT proof, the root hash is checked by reconstructing the root node root_node with the computed root_node.v (computed as with a plain Merkle proof) and the provided rootNamespaceMin and rootNamespaceMax as the root_node.n_min and root_node.n_max, respectively.

Erasure Coding

In order to enable trust-minimized light clients (i.e. light clients that do not rely on an honest majority of validating state assumption), it is critical that light clients can determine whether the data in each block is available or not, without downloading the whole block itself. The technique used here was formally described in the paper Fraud and Data Availability Proofs: Maximising Light Client Security and Scaling Blockchains with Dishonest Majorities.

The remainder of the subsections below specify the 2D Reed-Solomon erasure coding scheme used, along with the format of shares and how available data is arranged into shares.

Reed-Solomon Erasure Coding

Note that while data is laid out in a two-dimensional square, rows and columns are erasure coded using a standard one-dimensional encoding.

Reed-Solomon erasure coding is used as the underlying coding scheme. The parameters are:

16-bit Galois field
availableDataOriginalSquareSize original pieces (maximum of AVAILABLE_DATA_ORIGINAL_SQUARE_MAX)
availableDataOriginalSquareSize parity pieces (maximum of AVAILABLE_DATA_ORIGINAL_SQUARE_MAX) (i.e availableDataOriginalSquareSize * 2 total pieces), for an erasure efficiency of 50%. In other words, any 50% of the pieces from the availableDataOriginalSquareSize * 2 total pieces are enough to recover the original data.
SHARE_SIZE bytes per piece

Note that availableDataOriginalSquareSize may vary each block, and is decided by the block proposer of that block. Leopard-RS is a C library that implements the above scheme with quasilinear runtime.

2D Reed-Solomon Encoding Scheme

The 2-dimensional data layout is described in this section. The roots of NMTs for each row and column across four quadrants of data in a 2k * 2k matrix of shares, Q0 to Q3 (shown below), must be computed. In other words, 2k row roots and 2k column roots must be computed. The row and column roots are stored in the availableDataCommitments of the AvailableDataHeader.

fig: RS2D encoding: data quadrants.

The data of Q0 is the original data, and the remaining quadrants are parity data. Setting k = availableDataOriginalSquareSize, the original data first must be split into shares and arranged into a k * k matrix. Then the parity data can be computed.

Where A -> B indicates that B is computed using erasure coding from A:

Q0 -> Q1 for each row in Q0 and Q1
Q0 -> Q2 for each column in Q0 and Q2
Q2 -> Q3 for each row in Q2 and Q3

Note that the parity data in Q3 will be identical if it is vertically extended from Q1 or horizontally extended from Q2.

fig: RS2D encoding: extending data.

As an example, the parity data in the second column of Q2 (in striped purple) is computed by extending the original data in the second column of Q0 (in solid blue).

fig: RS2D encoding: extending a column.

Now that all four quadrants of the 2k * 2k matrix are filled, the row and column roots can be computed. To do so, each row/column is used as the leaves of a NMT, for which the compact root is computed (i.e. an extra hash operation over the NMT root is used to produce a single HashDigest). In this example, the fourth row root value is computed as the NMT root of the fourth row of Q0 and the fourth row of Q1 as leaves.

fig: RS2D encoding: a row root.

Finally, the availableDataRoot of the block Header is computed as the Merkle root of the binary Merkle tree with the row and column roots as leaves, in that order.

fig: Available data root.

Arranging Available Data Into Shares

The previous sections described how some original data, arranged into a k * k matrix, can be extended into a 2k * 2k matrix and committed to with NMT roots. This section specifies how available data (which includes transactions, PayForBlob transactions, and blobs) is arranged into the matrix in the first place.

Note that each share only has a single namespace, and that the list of concatenated shares is lexicographically ordered by namespace.

Then,

For each of transactionData, intermediateStateRootData, PayForBlob transactions, serialize:
1. For each request in the list:
  1. Serialize the request (individually).
  2. Compute the length of each serialized request, serialize the length, and prepend the serialized request with its serialized length.
2. Split up the length/request pairs into SHARE_SIZE-NAMESPACE_ID_SIZE-SHARE_RESERVED_BYTES-byte chunks.
3. Create a share out of each chunk. This data has a reserved namespace ID, so the first NAMESPACE_SIZE+SHARE_RESERVED_BYTES bytes for these shares must be set specially.
Concatenate the lists of shares in the order: transactions, intermediate state roots, PayForBlob transactions.

These shares are arranged in the first quadrant (Q0) of the availableDataOriginalSquareSize*2 * availableDataOriginalSquareSize*2 available data matrix in row-major order. In the example below, each reserved data element takes up exactly one share.

fig: Original data: reserved.

Each blob in the list blobData:

Serialize the blob (individually).
Compute the length of each serialized blob, serialize the length, and prepend the serialized blob with its serialized length.
Split up the length/blob pairs into SHARE_SIZE-NAMESPACE_SIZE-byte chunks.
Create a share out of each chunk. The first NAMESPACE_SIZE bytes for these shares is set to the namespace.

For each blob, it is placed in the available data matrix, with row-major order, as follows:

Place the first share of the blob at the next unused location in the matrix, then place the remaining shares in the following locations.

Transactions must commit to a Merkle root of a list of hashes that are each guaranteed (assuming the block is valid) to be subtree roots in one or more of the row NMTs. For additional info, see the rationale document for this section.

However, with only the rule above, interaction between the block producer and transaction sender may be required to compute a commitment to the blob the transaction sender can sign over. To remove interaction, blobs can optionally be laid out using a non-interactive default:

Place the first share of the blob at the next unused location in the matrix whose column is aligned with the largest power of 2 that is not larger than the blob length or availableDataOriginalSquareSize, then place the remaining shares in the following locations unless there are insufficient unused locations in the row.
If there are insufficient unused locations in the row, place the first share of the blob at the first column of the next row. Then place the remaining shares in the following locations. By construction, any blob whose length is greater than availableDataOriginalSquareSize will be placed in this way.

In the example below, two blobs (of lengths 2 and 1, respectively) are placed using the aforementioned default non-interactive rules.

fig: original data blob

The blob share commitment rules may introduce empty shares that do not belong to any blob (in the example above, the top-right share is empty). These are zeroes with namespace ID equal to the either TAIL_TRANSACTION_PADDING_NAMESPACE_ID if between a request with a reserved namespace ID and a blob, or the namespace ID of the previous blob if succeeded by a blob. See the data square layout for more info.

Available Data

Transaction

Celestia transactions are Cosmos SDK transactions.

PayForBlobData

IndexWrapper

IndexWrapper are wrappers around PayForBlob transactions. They include additional metadata by the block proposer that is committed to in the available data matrix.

name	type	description
`tx`	`bytes`	Actual transaction.
`share_indexes`	`[]uint32`	Share indexes (in row-major order) of the first share for each blob this transaction pays for. Needed for light verification of proper blob inclusion.
`type_id`	`string`	Type ID of the IndexWrapper transaction type. This is used for encoding and decoding IndexWrapper transactions. It is always set to `"INDX"`.

BlobData

name	type	description
`blobs`	Blob`[]`	List of blobs.

Blob

name	type	description
`namespaceID`	NamespaceID	Namespace ID of this blob.
`rawData`	`byte[]`	Raw blob bytes.

State

The state of the Celestia chain is intentionally restricted to containing only account balances and the validator set metadata. Similar to other Cosmos SDK based chains, the state of the Celestia chain is maintained in a multistore. The root of the application state is committed to in the block header via the AppHash.

Consensus Parameters

Various consensus parameters are committed to in the block header, such as limits and constants.

name	type	description
`version`	ConsensusVersion	The consensus version struct.
`chainID`	`string`	The `CHAIN_ID`.
`shareSize`	`uint64`	The `SHARE_SIZE`.
`shareReservedBytes`	`uint64`	The `SHARE_RESERVED_BYTES`.
`availableDataOriginalSquareMax`	`uint64`	The `AVAILABLE_DATA_ORIGINAL_SQUARE_MAX`.

In order to compute the consensusHash field in the block header, the above list of parameters is hashed.

Namespace

Abstract

One of Celestia's core data structures is the namespace. When a user submits a transaction encapsulating a MsgPayForBlobs message to Celestia, they MUST associate each blob with exactly one namespace. After their transaction has been included in a block, the namespace enables users to take an interest in a subset of the blobs published to Celestia by allowing the user to query for blobs by namespace.

In order to enable efficient retrieval of blobs by namespace, Celestia makes use of a Namespaced Merkle Tree. See section 5.2 of the LazyLedger whitepaper for more details.

Overview

A namespace is composed of two fields: version and id. A namespace is encoded as a byte slice with the version and id concatenated.

namespace

Version

The namespace version is an 8-bit unsigned integer that indicates the version of the namespace. The version is used to determine the format of the namespace and is encoded as a single byte. A new namespace version MUST be introduced if the namespace format changes in a backwards incompatible way.

Below we explain supported user-specifiable namespace versions, however, we note that Celestia MAY utilize other namespace versions for internal use. For more details, see the Reserved Namespaces section.

Version 0

The only supported user-specifiable namespace version is 0. A namespace with version 0 MUST contain an id with a prefix of 18 leading 0 bytes. The remaining 10 bytes of the id are user-specified. Below, we provide examples of valid and invalid encoded user-supplied namespaces with version 0.

// Valid encoded namespaces
0x0000000000000000000000000000000000000001010101010101010101 // valid blob namespace
0x0000000000000000000000000000000000000011111111111111111111 // valid blob namespace

// Invalid encoded namespaces
0x0000000000000000000000000111111111111111111111111111111111 // invalid because it does not have 18 leading 0 bytes
0x1000000000000000000000000000000000000000000000000000000000 // invalid because it does not have version 0
0x1111111111111111111111111111111111111111111111111111111111 // invalid because it does not have version 0

Any change in the number of leading 0 bytes in the id of a namespace with version 0 is considered a backwards incompatible change and MUST be introduced as a new namespace version.

ID

The namespace ID is a 28 byte identifier that uniquely identifies a namespace. The ID is encoded as a byte slice of length 28.

Reserved Namespaces

Celestia reserves some namespaces for protocol use. These namespaces are called "reserved namespaces". Reserved namespaces are used to arrange the contents of the data square. Applications MUST NOT use reserved namespaces for their blob data. Reserved namespaces fall into two categories: Primary and Secondary.

Primary: Namespaces with values less than or equal to 0x00000000000000000000000000000000000000000000000000000000FF. Primary namespaces always have a version of 0.
Secondary: Namespaces with values greater than or equal to 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF00. Secondary namespaces always have a version of 255 (0xFF) so that they are placed after all user specifiable namespaces in a sorted data square. The PARITY_SHARE_NAMESPACE uses version 255 (0xFF) to enable more efficient proof generation within the context of nmt, where it is used in conjunction with the IgnoreMaxNamespace feature. The TAIL_PADDING_NAMESPACE uses the version 255 to ensure that padding shares are always placed at the end of the Celestia data square even if a new user-specifiable version is introduced.

Below is a list of the current reserved namespaces. For additional information on the significance and application of the reserved namespaces, please refer to the Data Square Layout specifications.

name	type	category	value	description
`TRANSACTION_NAMESPACE`	`Namespace`	Primary	`0x0000000000000000000000000000000000000000000000000000000001`	Namespace for ordinary Cosmos SDK transactions.
`INTERMEDIATE_STATE_ROOT_NAMESPACE`	`Namespace`	Primary	`0x0000000000000000000000000000000000000000000000000000000002`	Namespace for intermediate state roots (not currently utilized).
`PAY_FOR_BLOB_NAMESPACE`	`Namespace`	Primary	`0x0000000000000000000000000000000000000000000000000000000004`	Namespace for transactions that contain a PayForBlob.
`PRIMARY_RESERVED_PADDING_NAMESPACE`	`Namespace`	Primary	`0x00000000000000000000000000000000000000000000000000000000FF`	Namespace for padding after all primary reserved namespaces.
`MAX_PRIMARY_RESERVED_NAMESPACE`	`Namespace`	Primary	`0x00000000000000000000000000000000000000000000000000000000FF`	Namespace for the highest primary reserved namespace.
`MIN_SECONDARY_RESERVED_NAMESPACE`	`Namespace`	Secondary	`0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF00`	Namespace for the lowest secondary reserved namespace.
`TAIL_PADDING_NAMESPACE`	`Namespace`	Secondary	`0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFE`	Namespace for padding after all blobs to fill up the original data square.
`PARITY_SHARE_NAMESPACE`	`Namespace`	Secondary	`0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF`	Namespace for parity shares.

Assumptions and Considerations

Applications MUST refrain from using the reserved namespaces for their blob data.

Celestia does not ensure the prevention of non-reserved namespace collisions. Consequently, two distinct applications might use the same namespace. It is the responsibility of these applications to be cautious and manage the implications and consequences arising from such namespace collisions. Among the potential consequences is the Woods Attack, as elaborated in this forum post: Woods Attack on Celestia.

Implementation

See the namespace implementation in go-square. For the most recent version, which may not reflect the current specifications, refer to the latest namespace code.

Go Definition

type Namespace struct {
	Version uint8
	ID      []byte
}

References

Abstract

All available data in a Celestia block is split into fixed-size data chunks known as "shares". Shares are the atomic unit of the Celestia data square. The shares in a Celestia block are eventually erasure-coded and committed to in Namespace Merkle trees (also see NMT spec).

Terms

Blob: User specified data (e.g. a roll-up block) that is associated with exactly one namespace. Blob data are opaque bytes of data that are included in the block but do not impact Celestia's state.
Share: A fixed-size data chunk that is associated with exactly one namespace.
Share sequence: A share sequence is a contiguous set of shares that contain semantically relevant data. A share sequence MUST contain one or more shares. When a blob is split into shares, it is written to one share sequence. As a result, all shares in a share sequence are typically parsed together because the original blob data may have been split across share boundaries. All transactions in the TRANSACTION_NAMESPACE are contained in one share sequence. All transactions in the PAY_FOR_BLOB_NAMESPACE are contained in one share sequence.

Overview

User submitted transactions are split into shares (see share splitting) and arranged in a k * k matrix (see arranging available data into shares) prior to the erasure coding step. Shares in the k * k matrix are ordered by namespace and have a common share format.

Padding shares are added to the k * k matrix to ensure:

Blob sequences start on an index that conforms to blob share commitment rules (see namespace padding share and reserved padding share)
The number of shares in the matrix is a perfect square (see tail padding share)

The share version is a 7-bit big-endian unsigned integer that is used to indicate the version of the share format. A new share version MUST be introduced if the share format changes in a way that is not backwards compatible. There are two share versions share version 0 and share version 1.

Every share has a fixed size SHARE_SIZE. The share format below is consistent for all shares:

The first NAMESPACE_VERSION_SIZE bytes of a share's raw data is the namespace version of that share (denoted by "namespace version" in the figure below).
The next NAMESPACE_ID_SIZE bytes of a share's raw data is the namespace ID of that share (denoted by "namespace id" in the figure below).
The next SHARE_INFO_BYTES bytes are for share information (denoted by "info byte" in the figure below) with the following structure:
- The first 7 bits represent the share version in big endian form (initially, this will be 0000000 for version 0);
- The last bit is a sequence start indicator. The indicator is 1 if this share is the first share in a sequence or 0 if this share is a continuation share in a sequence.
If this share is the first share in a sequence, it will include the length of the sequence in bytes. The next SEQUENCE_BYTES represent a big-endian uint32 value (denoted by "sequence length" in the figure below). This length is placed immediately after the SHARE_INFO_BYTES field. It's important to note that shares that are not the first share in a sequence do not contain this field.
The remaining SHARE_SIZE-NAMESPACE_SIZE-SHARE_INFO_BYTES-SEQUENCE_BYTES bytes (if first share) or SHARE_SIZE-NAMESPACE_SIZE-SHARE_INFO_BYTES bytes (if continuation share) are raw data (denoted by "blob1" in the figure below). Typically raw data is the blob payload that user's submit in a BlobTx. However, raw data can also be transaction data (see transaction shares below).
If there is insufficient raw data to fill the share, the remaining bytes are filled with 0.

First share in a sequence:

figure 1: share start

Continuation share in a sequence:

figure 2: share continuation

Since raw data that exceeds SHARE_SIZE-NAMESPACE_SIZE-SHARE_INFO_BYTES - SEQUENCE_BYTES bytes will span more than one share, developers MAY choose to encode additional metadata in their raw blob data prior to inclusion in a Celestia block. For example, Celestia transaction shares encode additional metadata in the form of "reserved bytes".

Share version 1 is similar to share version 0 with the addition of a signer field. The signer is located after the sequence length in the first share. The signer is SIGNER_SIZE bytes.

First share in a sequence with signer:

figure 3: first share with signer

Continuation share in a sequence:

figure 4: share continuation

Transaction Shares

Transaction shares use share version 0. In order for clients to parse shares in the middle of a sequence without downloading antecedent shares, Celestia encodes additional metadata in the shares associated with reserved namespaces. At the time of writing this only applies to the TRANSACTION_NAMESPACE and PAY_FOR_BLOB_NAMESPACE. This share structure is often referred to as "compact shares" to differentiate from the share structure defined above for all shares. It conforms to the common share format with one additional field, the "reserved bytes" field, which is described below:

Every transaction share includes SHARE_RESERVED_BYTES bytes that contain the index of the starting byte of the length of the canonically serialized first transaction that starts in the share, or 0 if there is none, as a binary big endian uint32. Denoted by "reserved bytes" in the figure below. The SHARE_RESERVED_BYTES are placed immediately after the SEQUENCE_BYTES if this is the first share in a sequence or immediately after the SHARE_INFO_BYTES if this is a continuation share in a sequence.
The remaining SHARE_SIZE-NAMESPACE_SIZE-SHARE_INFO_BYTES-SEQUENCE_BYTES-SHARE_RESERVED_BYTES bytes (if first share) or SHARE_SIZE-NAMESPACE_SIZE-SHARE_INFO_BYTES-SHARE_RESERVED_BYTES bytes (if continuation share) are transaction or PayForBlob transaction data (denoted by "tx1" and "tx2" in the figure below). Each transaction or PayForBlob transaction is prefixed with a varint of the length of that unit (denoted by "len(tx1)" and "len(tx2)" in the figure below).
If there is insufficient transaction or PayForBlob transaction data to fill the share, the remaining bytes are filled with 0.

First share in a sequence:

figure 3: transaction share start

where reserved bytes would be 38 as a binary big endian uint32 ([0b00000000, 0b00000000, 0b00000000, 0b00100110]).

Continuation share in a sequence:

figure 4: transaction share continuation

where reserved bytes would be 80 as a binary big endian uint32 ([0b00000000, 0b00000000, 0b00000000, 0b01010000]).

Padding

Padding shares use share version 0 and conform to the share format described above. There are multiple variants of padding shares that differ based on their namespace.

The first NAMESPACE_VERSION_SIZE bytes of a share's raw data is the namespace version of that share (initially, this will be 0).
The next NAMESPACE_ID_SIZE bytes of a share's raw data is the namespace ID of that share. This varies based on the type of padding share.
The next SHARE_INFO_BYTES bytes are for share information.
- The first 7 bits represent the share version in big endian form (initially, this will be 0000000 for version 0);
- The last bit is a sequence start indicator. The indicator is always 1.
The next SEQUENCE_BYTES contain a big endian uint32 of value 0.
The remaining SHARE_SIZE-NAMESPACE_SIZE-SHARE_INFO_BYTES-SEQUENCE_BYTES bytes are filled with 0.

A namespace padding share uses the namespace of the blob that precedes it in the data square so that the data square can retain the property that all shares are ordered by namespace. A namespace padding share acts as padding between blobs so that the subsequent blob begins at an index that conforms to the blob share commitment rules. Clients MAY ignore the contents of these shares because they don't contain any significant data.

Primary reserved padding shares use the PRIMARY_RESERVED_PADDING_NAMESPACE. Primary reserved padding shares are placed after shares in the primary reserved namespace range so that the first blob can start at an index that conforms to blob share commitment rules. Clients MAY ignore the contents of these shares because they don't contain any significant data.

Tail padding shares use the TAIL_PADDING_NAMESPACE. Tail padding shares are placed after the last blob in the data square so that the number of shares in the data square is a perfect square. Clients MAY ignore the contents of these shares because they don't contain any significant data.

Parity shares are the output of the erasure coding step of the data square construction process. They occupy quadrants Q1, Q2, and Q3 of the extended data square and are used to reconstruct the original data square (Q0). Parity shares do not conform to the share format described above. In the square layout, parity shares do not have a significant namespace. When parity shares are used in NMTs, they are prefixed with the PARITY_SHARE_NAMESPACE to preserve the property that all shares are ordered by namespace.

Share splitting is the process of converting a blob into a share sequence. The process is as follows:

Create a new share and populate the prefix of the share with the blob's namespace and share version. Set the sequence start indicator to 1. Write the blob length as the sequence length. Write the blob's data into the share until the share is full.
If there is more data to write, create a new share (a.k.a continuation share) and populate the prefix of the share with the blob's namespace and share version. Set the sequence start indicator to 0. Write the remaining blob data into the share until the share is full.
Repeat the previous step until all blob data has been written.
If the last share is not full, fill the remainder of the share with 0.

Assumptions and Considerations

Shares are assumed to be byte slices of length 512. Parsing shares of a different length WILL result in an error.

Implementation

See go-square/shares.

References

Consensus Rules

System Parameters

Units

name	SI	value	description
`1u`	`1u`	`10**0`	`1` unit.
`2u`	`k1u`	`10**3`	`1000` units.
`3u`	`M1u`	`10**6`	`1000000` units.
`4u`	`G1u`	`10**9`	`1000000000` units.

Constants

name	type	value	unit	description
`AVAILABLE_DATA_ORIGINAL_SQUARE_MAX`	`uint64`		`share`	Maximum number of rows/columns of the original data shares in square layout.
`AVAILABLE_DATA_ORIGINAL_SQUARE_TARGET`	`uint64`		`share`	Target number of rows/columns of the original data shares in square layout.
`BLOCK_TIME`	`uint64`		second	Block time, in seconds.
`CHAIN_ID`	`string`	`"Celestia"`		Chain ID. Each chain assigns itself a (unique) ID.
`GENESIS_COIN_COUNT`	`uint64`	`10**8`	`4u`	`(= 100000000)` Number of coins at genesis.
`MAX_GRAFFITI_BYTES`	`uint64`	`32`	`byte`	Maximum size of transaction graffiti, in bytes.
`MAX_VALIDATORS`	`uint16`	`64`		Maximum number of active validators.
`NAMESPACE_VERSION_SIZE`	`int`	`1`	`byte`	Size of namespace version in bytes.
`NAMESPACE_ID_SIZE`	`int`	`28`	`byte`	Size of namespace ID in bytes.
`NAMESPACE_SIZE`	`int`	`29`	`byte`	Size of namespace in bytes.
`NAMESPACE_ID_MAX_RESERVED`	`uint64`	`255`		Value of maximum reserved namespace (inclusive). 1 byte worth of IDs.
`SEQUENCE_BYTES`	`uint64`	`4`	`byte`	The number of bytes used to store the sequence length in the first share of a sequence
`SHARE_INFO_BYTES`	`uint64`	`1`	`byte`	The number of bytes used for share information
`SHARE_RESERVED_BYTES`	`uint64`	`4`	`byte`	The number of bytes used to store the index of the first transaction in a transaction share. Must be able to represent any integer up to and including `SHARE_SIZE - 1`.
`SHARE_SIZE`	`uint64`	`512`	`byte`	Size of transaction and blob shares, in bytes.
`SignerSize`	`int`	`20`	`byte`	The number of bytes used to store the signer in a share.
`STATE_SUBTREE_RESERVED_BYTES`	`uint64`	`1`	`byte`	Number of bytes reserved to identify state subtrees.
`UNBONDING_DURATION`	`uint32`		`block`	Duration, in blocks, for unbonding a validator or delegation.
`v1.Version`	`uint64`	`1`		First version of the application. Breaking changes (hard forks) must update this parameter.
`v2.Version`	`uint64`	`2`		Second version of the application. Breaking changes (hard forks) must update this parameter.
`VERSION_BLOCK`	`uint64`	`1`		Version of the Celestia chain. Breaking changes (hard forks) must update this parameter.

Rewards and Penalties

name	type	value	unit	description
`SECONDS_PER_YEAR`	`uint64`	`31536000`	second	Seconds per year. Omit leap seconds.
`TARGET_ANNUAL_ISSUANCE`	`uint64`	`2 * 10**6`	`4u`	`(= 2000000)` Target number of coins to issue per year.

Leader Selection

Refer to the CometBFT specifications for proposer selection procedure.

Fork Choice

The Tendermint consensus protocol is fork-free by construction under an honest majority of stake assumption.

If a block has a valid commit, it is part of the canonical chain. If equivocation evidence is detected for more than 1/3 of voting power, the node must halt. See proof of fork accountability.

Block Validity

The validity of a newly-seen block, block, is determined by two components, detailed in subsequent sections:

Block structure: whether the block header is valid, and data in a block is arranged into a valid and matching data root (i.e. syntax).
State transition: whether the application of transactions in the block produces a matching and valid state root (i.e. semantics).

Pseudocode in this section is not in any specific language and should be interpreted as being in a neutral and sane language.

Block Structure

Before executing state transitions, the structure of the block must be verified.

The following block fields are acquired from the network and parsed (i.e. deserialized). If they cannot be parsed, the block is ignored but is not explicitly considered invalid by consensus rules. Further implications of ignoring a block are found in the networking spec.

If the above fields are parsed successfully, the available data block.availableData is acquired in erasure-coded form as a list of share rows, then parsed. If it cannot be parsed, the block is ignored but not explicitly invalid, as above.

`block.header`

The block header block.header (header for short) is the first thing that is downloaded from the new block, and commits to everything inside the block in some way. For previous block prev (if prev is not known, then the block is ignored), and previous block header prev.header, the following checks must be true:

availableDataOriginalSquareSize is computed as described here.

header.height == prev.header.height + 1.
header.timestamp > prev.header.timestamp.
header.lastHeaderHash == the header hash of prev.
header.lastCommitHash == the hash of lastCommit.
header.consensusHash == the value computed here.
header.stateCommitment == the root of the state, computed with the application of all state transitions in this block.
availableDataOriginalSquareSize <= AVAILABLE_DATA_ORIGINAL_SQUARE_MAX.
header.availableDataRoot == the Merkle root of the tree with the row and column roots of block.availableDataHeader as leaves.
header.proposerAddress == the leader for header.height.

`block.availableDataHeader`

The available data header block.availableDataHeader (availableDataHeader for short) is then processed. This commits to the available data, which is only downloaded after the consensus commit is processed. The following checks must be true:

Length of availableDataHeader.rowRoots == availableDataOriginalSquareSize * 2.
Length of availableDataHeader.colRoots == availableDataOriginalSquareSize * 2.
The length of each element in availableDataHeader.rowRoots and availableDataHeader.colRoots must be 32.

`block.lastCommit`

The last commit block.lastCommit (lastCommit for short) is processed next. This is the Tendermint commit (i.e. polka of votes) for the previous block. For previous block prev and previous block header prev.header, the following checks must be true:

lastCommit.height == prev.header.height.
lastCommit.round >= 1.
lastCommit.headerHash == the header hash of prev.
Length of lastCommit.signatures <= MAX_VALIDATORS.
Each of lastCommit.signatures must be a valid CommitSig
The sum of the votes for prev in lastCommit must be at least 2/3 (rounded up) of the voting power of prev's next validator set.

`block.availableData`

The block's available data (analogous to transactions in contemporary blockchain designs) block.availableData (availableData for short) is finally processed. The list of share rows is parsed into the actual data structures using the reverse of the process to encode available data into shares; if parsing fails here, the block is invalid.

Once parsed, the following checks must be true:

The commitments of the erasure-coded extended availableData must match those in header.availableDataHeader. Implicitly, this means that both rows and columns must be ordered lexicographically by namespace since they are committed to in a Namespace Merkle Tree.
Length of availableData.intermediateStateRootData == length of availableData.transactionData + length of availableData.payForBlobData + 2. (Two additional state transitions are the begin and end block implicit transitions.)

State Transitions

Once the basic structure of the block has been validated, state transitions must be applied to compute the new state and state root.

For this section, the variable state represents the state tree, with state.accounts[k], state.inactiveValidatorSet[k], state.activeValidatorSet[k], and state.delegationSet[k] being shorthand for the leaf in the state tree in the accounts, inactive validator set, active validator set, and delegation set subtrees with pre-hashed key k. E.g. state.accounts[a] is shorthand for state[(ACCOUNTS_SUBTREE_ID << 8*(32-STATE_SUBTREE_RESERVED_BYTES)) | ((-1 >> 8*STATE_SUBTREE_RESERVED_BYTES) & hash(a))].

State transitions are applied in the following order:

`block.availableData.transactionData`

Transactions are applied to the state. Note that transactions mutate the state (essentially, the validator set and minimal balances), while blobs do not.

block.availableData.transactionData is simply a list of WrappedTransactions. For each wrapped transaction in this list, wrappedTransaction, with index i (starting from 0), the following checks must be true:

wrappedTransaction.index == i.

For wrappedTransaction's transaction transaction, the following checks must be true:

transaction.signature must be a valid signature over transaction.signedTransactionData.

Finally, each wrappedTransaction is processed depending on its transaction type. These are specified in the next subsections, where tx is short for transaction.signedTransactionData, and sender is the recovered signing address. We will define a few helper functions:

tipCost(y, z) = y * z
totalCost(x, y, z) = x + tipCost(y, z)

where x above is the amount of coins sent by the transaction authorizer, y above is the tip rate set in the transaction, and z above is the measure of the block space used by the transaction (i.e. size in bytes).

Four additional helper functions are defined to manage the validator queue:

findFromQueue(power), which returns the address of the last validator in the validator queue with voting power greater than or equal to power, or 0 if the queue is empty or no validators in the queue have at least power voting power.
parentFromQueue(address), which returns the address of the parent in the validator queue of the validator with address address, or 0 if address is not in the queue or is the head of the queue.

validatorQueueInsert, defined as

function validatorQueueInsert(validator)
    # Insert the new validator into the linked list
    parent = findFromQueue(validator.votingPower)
    if parent != 0
        if state.accounts[parent].status == AccountStatus.ValidatorBonded
            validator.next = state.activeValidatorSet[parent].next
            state.activeValidatorSet[parent].next = sender
        else
            validator.next = state.inactiveValidatorSet[parent].next
            state.inactiveValidatorSet[parent].next = sender
    else
        validator.next = state.validatorQueueHead
        state.validatorQueueHead = sender

validatorQueueRemove, defined as

function validatorQueueRemove(validator, sender)
    # Remove existing validator from the linked list
    parent = parentFromQueue(sender)
    if parent != 0
        if state.accounts[parent].status == AccountStatus.ValidatorBonded
            state.activeValidatorSet[parent].next = validator.next
            validator.next = 0
        else
            state.inactiveValidatorSet[parent].next = validator.next
            validator.next = 0
    else
        state.validatorQueueHead = validator.next
        validator.next = 0

Note that light clients cannot perform a linear search through a linked list, and are instead provided logarithmic proofs (e.g. in the case of parentFromQueue, a proof to the parent is provided, which should have address as its next validator).

In addition, three helper functions to manage the blob paid list:

findFromBlobPaidList(start), which returns the transaction ID of the last transaction in the blob paid list with finish greater than start, or 0 if the list is empty or no transactions in the list have at least start finish.
parentFromBlobPaidList(txid), which returns the transaction ID of the parent in the blob paid list of the transaction with ID txid, or 0 if txid is not in the list or is the head of the list.
blobPaidListInsert, defined as

function blobPaidListInsert(tx, txid)
    # Insert the new transaction into the linked list
    parent = findFromBlobPaidList(tx.blobStartIndex)
    state.blobsPaid[txid].start = tx.blobStartIndex
    numShares = ceil(tx.blobSize / SHARE_SIZE)
    state.blobsPaid[txid].finish = tx.blobStartIndex + numShares - 1
    if parent != 0
        state.blobsPaid[txid].next = state.blobsPaid[parent].next
        state.blobsPaid[parent].next = txid
    else
        state.blobsPaid[txid].next = state.blobPaidHead
        state.blobPaidHead = txid

We define a helper function to compute F1 entries:

function compute_new_entry(reward, power)
    if power == 0
        return 0
    return reward // power

After applying a transaction, the new state root is computed.