Ton

Форк
0
/
tblkch.tex 
2327 строк · 270.5 Кб
1
\documentclass[12pt,oneside]{article}
2
\usepackage[T1]{fontenc}
3
%\usepackage{euler}
4
\usepackage{amssymb, amsmath, amsfonts, stmaryrd}
5
\usepackage[mathscr]{euscript}
6
\usepackage{mathrsfs}
7
\usepackage{theorem}
8
\usepackage[english]{babel}
9
\usepackage{bm}
10
\usepackage[all]{xy}
11
\usepackage{array}
12
\usepackage{multirow}
13
%\usepackage{chngcntr}
14
%\CompileMatrices
15
\usepackage[bookmarks=false,pdfauthor={Nikolai Durov},pdftitle={Telegram Open Network Blockchain}]{hyperref}
16
\usepackage{fancyhdr}
17
\usepackage{caption}
18
%
19
\setlength{\headheight}{15.2pt}
20
\pagestyle{fancy}
21
\renewcommand{\headrulewidth}{0.5pt}
22
%
23
\def\makepoint#1{\medbreak\noindent{\bf #1.\ }}
24
\def\zeropoint{\setcounter{subsection}{-1}}
25
\def\zerosubpoint{\setcounter{subsubsection}{-1}}
26
\def\nxpoint{\refstepcounter{subsection}%
27
  \smallbreak\makepoint{\thesubsection}}
28
\def\nxsubpoint{\refstepcounter{subsubsection}%
29
  \smallbreak\makepoint{\thesubsubsection}}
30
\def\nxsubsubpoint{\refstepcounter{paragraph}%
31
  \makepoint{\paragraph}}
32
%\setcounter{secnumdepth}{4}
33
%\counterwithin{paragraph}{subsubsection}
34
\def\refpoint#1{{\rm\textbf{\ref{#1}}}}
35
\let\ptref=\refpoint
36
\def\embt(#1.){\textbf{#1.}}
37
\def\embtx(#1){\textbf{#1}}
38
\def\emb#1{\textbf{#1.}}
39
\long\def\nodo#1{}
40
%
41
%\def\markbothsame#1{\markboth{#1}{#1}}
42
\fancyhf{}
43
\fancyfoot[C]{\thepage}
44
\def\markbothsame#1{\fancyhead[C]{#1}}
45
\def\mysection#1{\section{#1}\fancyhead[C]{\textsc{Chapter \textbf{\thesection.} #1}}}
46
\def\mysubsection#1{\subsection{#1}\fancyhead[C]{\small{\textsc{\textrm{\thesubsection.} #1}}}}
47
\def\myappendix#1{\section{#1}\fancyhead[C]{\textsc{Appendix \textbf{\thesection.} #1}}}
48
%
49
\let\tp=\textit
50
\let\vr=\textit
51
\def\workchainid{\vr{workchain\_id\/}}
52
\def\shardpfx{\vr{shard\_prefix}}
53
\def\accountid{\vr{account\_id\/}}
54
\def\currencyid{\vr{currency\_id\/}}
55
\def\uint{\tp{uint}}
56
\def\opsc#1{\operatorname{\textsc{#1}}}
57
\def\CellRepr{\opsc{CellRepr}}
58
\def\blkseqno{\opsc{blk-seqno}}
59
\def\blkprev{\opsc{blk-prev}}
60
\def\blkhash{\opsc{blk-hash}}
61
\def\Hash{\opsc{Hash}}
62
\def\Sha{\opsc{sha256}}
63
\def\SHA#1{\opsc{sha#1}}
64
\def\Int{\opsc{int}}
65
\def\height{\opsc{height}}
66
\def\len{\opsc{len}}
67
\def\leaf{\opsc{Leaf}}
68
\def\node{\opsc{Node}}
69
\def\Seqno{\opsc{SeqNo}}
70
\def\LT{\opsc{Lt}}
71
\def\NextHop{\opsc{NextHop}}
72
\def\root{\opsc{Root}}
73
\def\emptyroot{\opsc{EmptyRoot}}
74
\def\code{\opsc{code}}
75
\def\Ping{\opsc{Ping}}
76
\def\Store{\opsc{Store}}
77
\def\FindNode{\opsc{Find\_Node}}
78
\def\FindValue{\opsc{Find\_Value}}
79
\def\Bytes{\tp{Bytes}}
80
\def\Transaction{\tp{Transaction}}
81
\def\Account{\tp{Account}}
82
\def\State{\tp{State}}
83
\def\Maybe{\opsc{Maybe}}
84
\def\List{\opsc{List}}
85
\def\Block{\tp{Block}}
86
\def\Blockchain{\tp{Blockchain}}
87
\def\isValidBc{\tp{isValidBc}}
88
\def\evtrans{\vr{ev\_trans}}
89
\def\evblock{\vr{ev\_block}}
90
\def\Hashmap{\tp{Hashmap}}
91
\def\HashmapE{\tp{HashmapE}}
92
\def\Type{\tp{Type}}
93
\def\nat{\tp{nat\/}}
94
\def\hget{\vr{hget\/}}
95
\def\bbB{{\mathbb{B}}}
96
\def\bbP{{\mathbb{P}}}
97
\def\bbF{{\mathbb{F}}}
98
\def\bbZ{{\mathbb{Z}}}
99
\def\st#1{{\mathbf{#1}}}
100
\def\sgn{\operatorname{sgn}}
101
\def\charact{\operatorname{char}}
102
\def\caret{\^{}}
103
\def\cF{\mathscr{F}}
104
%
105
\hfuzz=0.8pt
106

107
\title{Telegram Open Network Blockchain}
108
\author{Nikolai Durov}
109
\begin{document}
110

111
%\pagestyle{myheadings}
112
\maketitle
113

114
\begin{abstract}
115
  The aim of this text is to provide a detailed description of the
116
  Telegram Open Network (TON) Blockchain.
117
\end{abstract}
118

119
\section*{Introduction}
120
\markbothsame{Introduction}
121

122
This document provides a detailed description of the TON Blockchain, including its precise block format, validity conditions, TON Virtual Machine (TVM) invocation details, smart-contract creation process, and cryptographic signatures. In this respect it is a continuation of the TON whitepaper (cf.~\cite{TON}), so we freely use the terminology introduced in that document.
123

124
Chapter~\ptref{sect:overview} provides a general overview of the TON Blockchain and its design principles, with particular attention to the introduction of compatibility and validity conditions and the implementation of message delivery guarantees. More detailed information, such as the TL-B schemes that describe the serialization of all required data structures into trees or collections (``bags'') of cells, is provided in subsequent chapters, culminating in a complete description of the TON Blockchain (shardchain and masterchain) block layout in Chapter~\ptref{sect:block.layout}.
125

126
A detailed description of the elliptic curve cryptography used for signing blocks and messages, also accessible through TVM primitives, is provided in Appendix~\ptref{app:ecc}. TVM itself is described in a separate document (cf.~\cite{TVM}).
127

128
Some subjects have intentionally been left out of this document. One is the Byzantine Fault Tolerant (BFT) protocol used by the validators to determine the next block of the masterchain or a shardchain; that subject is left for a forthcoming document dedicated to the TON Network. And although this document describes the precise format of TON Blockchain blocks, and discusses the blockchain's validity conditions and serialized invalidity proofs,\footnote{As of August 2018, this document does not include a detailed description of serialized invalidity proofs, because they are likely to change significantly during the development of the validator software. Only the general design principles for consistency conditions and serialized invalidity proofs are discussed.}  it provides no details about the network protocols used to propagate these blocks, block candidates, collated blocks, and invalidity proofs.
129

130
Similarly, this document does not provide the complete source code of the masterchain smart contracts used to elect the validators, change the configurable parameters or get their current values, or punish the validators for their misbehavior, even though these smart contracts form an important part of the total blockchain state and of the masterchain block zero. Instead, this document describes the location of these smart contracts and their formal interfaces.\footnote{This is not included in the present version of this document, but will be provided in a separate appendix to a future revision.} The source code of these smart contracts will be provided separately as downloadable files with comments.
131

132
Please note that the current version of this document describes a preliminary test version of the TON Blockchain; some minor details are likely to change prior to launch during the development, testing, and deployment phases.
133

134
\clearpage
135
\tableofcontents
136

137
\clearpage
138
\mysection{Overview}\label{sect:overview}
139

140
This chapter provides an overview of the main features and design principles of the TON Blockchain. More detail on each topic is provided in subsequent chapters.
141

142
\mysubsection{Everything is a bag of cells}
143
All data in the blocks and state of the TON Blockchain is represented as a collection of {\em cells\/} (cf.~\cite[2.5]{TON}). Therefore, this chapter begins with a general discussion of cells.
144

145
\nxsubpoint\emb{TVM cells}
146
Recall that the TON Blockchain, as well as the TON Virtual Machine (TVM; cf.~\cite{TVM}), represents all permanently stored data as a {\em collection\/} or {\em bag\/} of so-called {\em cells}. Each cell consists of up to 1023 data bits and up to four references to other cells. Cyclic cell references are not allowed, so the cells are usually organized into {\em trees of cells\/}, or rather {\em directed acyclic graphs (DAGs) of cells}.\footnote{Completely identical cells are often identified in memory and in disk storage; this is the reason why trees of cells are transparently transformed into DAGs of cells. From this perspective, a DAG is just a storage optimization of the underlying tree of cells, irrelevant for most considerations.} Any value of an abstract algebraic (dependent) data type may be represented (serialized) as a tree of cells. The precise way of representing values of an abstract data type as a tree of cells is expressed by means of a {\em TL-B scheme\/}.\footnote{Cf.~\cite[3.3.3--4]{TVM}, where an example is given and explained, pending a more complete reference} A more thorough discussion of different kinds of cells may be found in~\cite[3.1]{TVM}.
147

148
\nxsubpoint\emb{Application to TON Blockchain blocks and state}
149
The above is particularly applicable to the blocks and state of the TON Blockchain, which also are values of certain (quite convoluted) dependent algebraic data types. Therefore, they are serialized according to various TL-B schemes (which are gradually presented throughout this document), and are represented as a collection or bag of cells.
150

151
\nxsubpoint\label{sp:data.cell.layout}\emb{The layout of a single cell}
152
Each single cell consists of up to 1023 data bits and up to four references to other cells. When a cell is kept in memory, its exact representation is implementation-dependent. However, there is a standard representation of cells, useful, for instance, for serializing cells for file storage or network transmission. This ``standard representation'' or ``standard layout'' $\CellRepr(c)$ of a cell $c$ consists of the following:
153
\begin{itemize}
154
\item Two {\em descriptor bytes} come first, sometimes denoted by $d_1$ and $d_2$. The first of these bytes $d_1$ equals (in the simplest case) the number of references $0\leq r\leq 4$ in the cell. The second descriptor byte $d_2$ encodes the bit length $l$ of the data part of the cell as follows: the first seven bits of $d_2$ equal $\lfloor l/8\rfloor$, the number of complete data bytes present in the cell, while the last bit of $d_2$ is the {\em completion tag}, equal to one if $l$ is not divisible by eight. Therefore,
155
\begin{equation}
156
  d_2=2\lfloor l/8\rfloor+[l\bmod 8\neq0]=\lfloor l/8\rfloor+\lceil l/8\rceil
157
\end{equation}
158
where $[A]$ equals one when condition $A$ is true, and zero otherwise.
159
\item Next, $\lceil l/8\rceil$ data bytes follow. This means that the $l$ data bits of the cell are split into groups of eight, and each group is interpreted as a big-endian 8-bit integer and stored into a byte. If $l$ is not divisible by eight, a single binary one and a suitable number of binary zeroes (up to six) are appended to the data bits, and the completion tag (the least significant bit of the descriptor byte $d_2$) is set.
160
\item Finally, $r$ references to other cells follow. Each reference is normally represented by 32 bytes containing the $\Sha$ hash of the referenced cell, computed as explained below in~\ptref{sp:sha.cell.hash}.
161
\end{itemize}
162
In this way, the standard representation $\CellRepr(c)$ of a cell $c$ with $l$ data bits and $r$ references is $2+\lfloor l/8\rfloor+\lceil l/8\rceil+32r$ bytes long.
163

164
\nxsubpoint\label{sp:sha.cell.hash}\emb{The $\Sha$ hash of a cell}
165
The $\Sha$ hash of a cell $c$ is recursively defined as the $\Sha$ of the standard representation $\CellRepr(c)$ of the cell in question:
166
\begin{equation}
167
  \Hash(c):=\Sha(c):=\Sha\bigl(\CellRepr(c)\bigr)
168
\end{equation}
169
Because cyclic cell references are not allowed (the relationships among all cells must constitute a directed acyclic graph, or DAG), the $\Sha$ hash of a cell is always well-defined.
170

171
Furthermore, because $\Sha$ is tacitly assumed to be collision-resistant, we assume that all the cells that we encounter are completely determined by their hashes. In particular, the cell references of a cell $c$ are completely determined by the hashes of the referenced cells, contained in the standard representation $\CellRepr(c)$.
172

173
\nxsubpoint\emb{Exotic cells}
174
Apart from the {\em ordinary\/} cells (also called {\em simple\/} or {\em data\/} cells) considered so far, cells of other types, called {\em exotic cells}, sometimes appear in the actual representations of TON Blockchain blocks and other data structures. Their representation is somewhat different; they are distinguished by having the first descriptor byte $d_1\geq 5$ (cf.~\cite[3.1]{TVM}).
175

176
\nxsubpoint\emb{External reference cells}
177
{\em (External) reference cells}, which contain the 32-byte $\Sha(c)$ of a ``true'' data cell $c$ instead of the data cell itself, are one example of exotic cells. These cells can be used in the serialization of a bag of cells corresponding to a TON Blockchain block in order to refer to data cells absent in the serialization of the block itself, but assumed to be present somewhere else (e.g., in the previous state of the blockchain).
178

179
\nxsubpoint\emb{Transparency of reference cells with respect to most operations}
180
Most cell operations do not observe any reference cells or other ``exotic'' kinds of cells; they see only data cells, with any reference cell transparently replaced by the cell referred to. For example, when the {\em transparent\/} cell hash $\Hash^\flat(c)$ is recursively computed, the hash of a reference cell is set to be equal to the hash of the cell referred to, not the hash of the standard representation of the reference cell.
181

182
\nxsubpoint\emb{Transparent hash and representation hash of a cell}
183
In this way, $\Sha^\flat(c)=\Hash^\flat(c)$ is the {\em transparent hash} of a cell $c$ (or the tree of cells rooted in $c$). 
184

185
However, sometimes we need to reason about the exact representation of a tree of cells present in a block. To this end, a {\em representation hash\/} $\Hash^\sharp(c)$ is defined, which is not transparent with respect to reference cells and other exotic types of cells. We often say that the representation hash of~$c$ is ``the'' hash of~$c$, because it is the most frequently used hash of a cell.
186

187
\nxsubpoint\label{sp:sign.repr.hash}\emb{Use of representation hashes for signatures}
188
Signatures are an excellent example of the application of representation hashes. For instance:
189
\begin{itemize}
190
\item Validators sign the representation hash of a block, not just its transparent hash, because they need to certify that the block does contain the required data, not just some external references to them.
191
\item When external messages are signed and sent by off-chain parties (e.g., human clients using an application to initiate blockchain transactions), if external references may be present in some of these messages, it is the representation hashes of the messages that must be signed.
192
\end{itemize}
193

194
\nxsubpoint\emb{Higher hashes of a cell}
195
In addition to the transparent and representation hashes of a cell~$c$, a sequence of {\em higher hashes\/} $\Hash_i(c)$, $i=1,2,\dots$ may be defined, which eventually stabilizes at $\Hash_\infty(c)$. (More detail may be found in~\cite[3.1]{TVM}.)
196

197
\mysubsection{Principal components of a block and the blockchain state}
198
This section briefly describes the principal components of a block and of the blockchain state, without delving too much into the details.
199

200
\nxsubpoint\label{sp:isp.blk.state}\emb{The Infinite Sharding Paradigm (ISP) applied to blockchain block and state}
201
Recall that according to the Infinite Sharding Paradigm, each account can be considered as lying in its separate ``accountchain'', and the (virtual) blocks of these accountchains are then grouped into shardchain blocks for efficiency purposes. Specifically, the state of a shardchain consists, roughly speaking, of the states of all its ``accountchains'' (i.e., of all accounts assigned to it); similarly, a block of a shardchain essentially consists of a collection of virtual ``blocks'' for some accounts assigned to the shardchain.\footnote{If there are no transactions related to an account, the corresponding virtual block is empty and is omitted in the shardchain block}
202

203
We can summarize this as follows:
204
\begin{align}\label{eq:sstate.approx}
205
  \textit{ShardState}&\approx\Hashmap(n,\textit{AccountState})\\
206
  \textit{ShardBlock}&\approx\Hashmap(n,\textit{AccountBlock})
207
\end{align}
208
where $n$ is the bit length of the $\accountid$, and $\Hashmap(n,X)$ describes a partial map $\st2^n\dashrightarrow X$ from bitstrings of length $n$ into values of type~$X$.
209

210
Recall that each shardchain---or, more precisely, each shardchain block\footnote{Recall that TON Blockchain supports {\em dynamic\/} sharding, so the shard configuration may change from block to block because of shard merge and split events. Therefore, we cannot simply say that each shardchain corresponds to a fixed set of accountchains.}---corresponds to all accountchains that belong to the same ``workchain'' (i.e., have the same $\workchainid=w$) and have an $\accountid$ beginning with the same binary prefix $s$, so that $(w,s)$ completely determines a shard. Therefore, the above hashmaps must contain only keys beginning with prefix~$s$.
211

212
We will see in a moment that the above description is only an approximation: the state and block of the shardchain need to contain some extra data that are not split according to the $\accountid$ as suggested by~\eqref{eq:sstate.approx}.
213

214
\nxsubpoint\label{sp:split.blk.part}\emb{Split and non-split part of the shardchain block and state}
215
A shardchain block and its state may each be classified into two distinct parts. The parts with the ISP-dictated form of \eqref{eq:sstate.approx} will be called the {\em split\/} parts of the block and its state, while the remainder will be called the {\em non-split\/} parts.
216

217
\nxsubpoint\label{sp:blk.inter.1}\emb{Interaction with other blocks and the outside world. Global and local consistency conditions}
218
The non-split parts of the shardchain block and its state are mostly related to the interaction of this block with some other ``neighboring'' blocks. The global consistency conditions of the blockchain as a whole are reduced to internal consistency conditions of separate blocks by themselves as well as external local consistency conditions between certain blocks (cf.~\ptref{p:cons.cond}).
219

220
Most of these local consistency conditions are related to message forwarding between different shardchains, transactions involving more than one shardchain, and message delivery guarantees. However, another group of local consistency conditions relates a block with its immediate antecessors and successors inside a shardchain; for instance, the initial state of a block usually must coincide with the final state of its immediate antecessor.\footnote{This condition applies if there is exactly one immediate antecessor (i.e., if a shardchain merge event did not occur immediately before the block in question); otherwise, this condition becomes more convoluted.}
221

222
\nxsubpoint\emb{Inbound and outbound messages of a block}
223
The most important components of the non-split part of a shardchain block are the following:
224
\begin{itemize}
225
\item {\em InMsgDescr} --- The description of all messages ``imported'' into this block (i.e., either processed by a transaction included in the block, or forwarded to an output queue, in the case of a transit message travelling along the path dictated by Hypercube Routing).
226
\item {\em OutMsgDescr} --- The description of all messages ``exported'' or ``generated'' by the block (i.e., either messages generated by a transaction included in the block, or transit messages with destination not belonging to the current shardchain, forwarded from {\em InMsgDescr}).
227
\end{itemize}
228

229
\nxsubpoint\label{sp:blk.hdr}\emb{Block header}
230
Another non-split component of a shardchain block is the {\em block header}, which contains general information such as $(w,s)$ (i.e., the $\workchainid$ and the common binary prefix of all $\accountid$s assigned to the current shardchain), the block's {\em sequence number\/} (defined to be the smallest non-negative integer larger than the sequence numbers of its predecessors), {\em logical time}, and {\em generation unixtime}. It also contains the hash of the immediate antecessor of the block (or of its two immediate antecessors in the case of a preceding shardchain merge event), the hashes of its initial and final states (i.e., of the states of the shardchain immediately before and immediately after processing the current block), and the hash of the most recent masterchain block known when the shardchain block was generated.
231

232
\nxsubpoint\label{sp:val.sign}\emb{Validator signatures, signed and unsigned blocks}
233
The block described so far is an {\em unsigned block}; it is generated in its entirety and considered as a whole by the validators. When the validators ultimately sign it, the {\em signed block} is created, consisting of the unsigned block along with a list of validator signatures (of a certain representation hash of the unsigned block, cf.~\ptref{sp:sign.repr.hash}). This list of signatures is also a non-split component of the (signed) block; however, since it lies outside the unsigned block, it is somewhat different from the other data kept in a block.
234

235
\nxsubpoint\label{sp:outmsgq}\emb{Outbound message queue of a shardchain}
236
Similarly, the most important non-split part of the shardchain state is {\em OutMsgQueue}, the outbound message queue. It contains {\em undelivered\/} messages included into {\em OutMsgDescr\/}, either by the last shardchain block leading to this state or by one of its antecessors.
237

238
Originally, each outbound message is included into {\em OutMsgQueue}; it is removed from the queue only after it has either been included into the {\em InMsgDescr\/} of a block of a ``neighboring'' shardchain (the next one with respect to Hypercube Routing), or has been delivered to (i.e., has appeared in the {\em InMsgDescr\/} of) its ultimate destination shardchain via Instant Hypercube Routing. In both cases, the {\em reason\/} for the removal of a message from the {\em OutMsgQueue\/} is made explicit in the {\em OutMsgDescr\/} of the block in which such a state transformation has occurred.
239

240
\nxsubpoint\emb{Layout of {\em InMsgDescr}, {\em OutMsgDescr} and {\em OutMsgQueue}}
241
All of the most important non-split shardchain data structures related to messages are organized as {\em hashmaps\/} or {\em dictionaries\/} (implemented by means of Patricia trees serialized into a tree of cells as described in \cite[3.3]{TVM}), with the following keys:
242
\begin{itemize}
243
\item The inbound message description {\em InMsgDescr\/} uses the 256-bit message hash as a key.
244
\item The outbound message description {\em OutMsgDescr\/} uses the 256-bit message hash as a key.
245
\item The outbound message queue {\em OutMsgQueue\/} uses the 352-bit concatenation of the 32-bit destination $\workchainid$, the first 64 bits of destination address $\accountid$, and the 256-bit message hash as a key.
246
\end{itemize}
247

248
\nxsubpoint\emb{The split part of the block: transaction chains}
249
The split part of a shardchain block consists of a hashmap mapping some of the accounts assigned to the shardchain to ``virtual accountchain blocks'' {\em AccountBlock}, cf.~\eqref{eq:sstate.approx}. Such a virtual accountchain block consists of a sequential list of {\em transactions} related to that account.
250

251
\nxsubpoint\emb{Transaction description}
252
Each transaction is described in the block by an instance of the {\em Transaction\/} type, which contains in particular the following information:
253
\begin{itemize}
254
\item A reference to exactly one {\em inbound message\/} (which must be present in {\em InMsgDescr\/} as well) that has been {\em processed\/} by the transaction.
255
\item References to  several (maybe zero) {\em outbound messages\/} (also present in {\em OutMsgDescr\/} and most likely included in {\em OutMsgQueue}) that have been {\em generated\/} by the transaction.
256
\end{itemize}
257

258
The transaction consists of an invocation of TVM (cf. \cite{TVM}) with the code of the smart contract corresponding to the account in question loaded into the virtual machine, and with the data root cell of the smart contract loaded into the virtual machine's register \texttt{c4}. The inbound message itself is passed in the stack as an argument to the smart contract's {\tt main()} function, along with some other important data, such as the amount of TON Grams and other defined currencies attached to the message, the sender account address, the current balance of the smart contract, and so on.
259

260
In addition to the information listed above, a {\em Transaction\/} instance also contains the original and final states of the account (i.e., of the smart contract), as well as some of the TVM running statistics (gas consumed, gas price, instructions performed, cells created/destroyed, virtual machine termination code, etc.).
261

262
\nxsubpoint\emb{The split part of the shardchain state: account states}
263
Recall that, according to \eqref{eq:sstate.approx}, the split part of the shardchain state consists of a hashmap mapping each ``defined'' account identifier (belonging to the shardchain in question) to the {\em state\/} of the corresponding account, given by an instance of the {\em AccountState\/} type.
264

265
\nxsubpoint\emb{Account state}
266
The account state itself approximately consists of the following data:
267
\begin{itemize}
268
\item Its {\em balance} in Grams and (optionally) in some other defined cryptocurrencies/tokens.
269
\item The {\em smart-contract code}, or the hash of the smart-contract code if it will be provided (uploaded) later by a separate message.
270
\item The persistent {\em smart-contract data}, which can be empty for simple smart contracts. It is a tree of cells, the root of which is loaded into register {\tt c4} during smart-contract execution.
271
\item Its {\em storage usage statistics}, including the number of cells and bytes kept in the persistent storage of the smart contract (i.e., inside the blockchain state) and the last time a storage usage payment was exacted from this account.
272
\item An optional {\em formal interface description} (intended for smart contracts) and/or {\em user public information} (intended mostly for human users and organizations).
273
\end{itemize}
274
Notice that there is no distinction between ``smart contract'' and ``account'' in the TON Blockchain. Instead, ``simple'' or ``wallet'' accounts, typically employed by human users and their cryptocurrency wallet applications for simple cryptocurrency transfers, are just simple smart contracts with standard (shared) code and with persistent data consisting of the public key of the wallet (or several public keys in the case of a multi-signature wallet; cf.~\ptref{sp:ex.simple.wallet} for more detail).
275

276
\nxsubpoint\emb{Masterchain blocks}
277
In addition to shardchain blocks and their states, the TON Blockchain contains {\em masterchain blocks\/} and the {\em masterchain state\/} (also called the {\em global state}). The masterchain blocks and state are quite similar to the shardchain blocks and state considered so far, with some notable differences:
278
\begin{itemize}
279
\item The masterchain cannot be split or merged, so a masterchain block usually has exactly one immediate antecessor. The sole exception is the ``masterchain block zero'', distinguished by having a sequence number equal to zero; it has no antecessors at all, and contains the initial configuration of the whole TON Blockchain (e.g., the original set of validators).
280
\item The masterchain blocks contain another important non-split structure: {\em ShardHashes}, a binary tree with a list of all defined shardchains along with the hashes of the latest block inside each of the listed shardchains. It is the inclusion of a shardchain block into this structure that makes a shardchain block ``canonical'', and enables other shardchains' blocks to refer to data (e.g., outbound messages) contained in the shardchain block.
281
\item The state of the masterchain contains global configuration parameters of the whole TON Blockchain, such as the minimum and maximum gas prices, the supported versions of TVM, the minimum stake for the validator candidates, the list of alternative cryptocurrencies supported in addition to Grams, the total amount of Grams issued so far, and the current set of validators responsible for creating and signing new blocks, along with their public keys.
282
\item The state of the masterchain also contains the code of the smart contracts used to elect the subsequent sets of validators and to modify the global configuration parameters. The code of these smart contracts itself is a part of the global configuration parameters and can be modified accordingly. In this respect, this code (along with the current values of these parameters) functions like a ``constitution'' for the TON Blockchain. It is initially established in masterchain block zero.
283
\item There are no transit messages through the masterchain: each inbound message must have a destination inside the masterchain, and each outbound message must have a source inside the masterchain.
284
\end{itemize}
285

286
\mysubsection{Consistency conditions}\label{p:cons.cond}
287
In addition to the data structures contained in the block and in the blockchain state, which are serialized into bags of cells according to certain TL-B schemes explained in detail later (cf. Chapters \ptref{sect:msg}--\ptref{sect:block.layout}), an important component of the blockchain layout is the {\em consistency conditions\/} between data kept inside one or in different blocks (as mentioned in~\ptref{sp:blk.inter.1}). This section describes in detail the function of consistency conditions in the blockchain.
288

289
\nxsubpoint\emb{Expressing consistency conditions}
290
In principle, dependent data types (such as those used in TL-B) could be used not only to describe the serialization of block data, but also to express conditions imposed on the components of such data types. (For instance, one could define data type \textit{OrderedIntPair}, with pairs of integers $(x,y)$, such that $x<y$, as values.) However, TL-B currently is not expressive enough to encode all the consistency conditions we need, so we opt for a semi-formalized approach in this text. In the future, we may present a subsequent complete formalization in a suitable proof assistant such as Coq.
291

292
\nxsubpoint\emb{Importance of consistency conditions}
293
The consistency conditions ultimately are at least as important as the ``unrestricted'' data structures on which they are imposed, especially in the blockchain context. For instance, the consistency conditions ensure that the state of an account does not change between blocks, and that it can change within a block only as a result of a transaction. In this way, the consistency conditions ensure the safe storage of cryptocurrency balances and other information inside the blockchain.
294

295
\nxsubpoint\emb{Kinds of consistency conditions}
296
There are several kinds of consistency conditions imposed on the TON Blockchain:
297
\begin{itemize}
298
\item {\em Global conditions} --- Express the invariants throughout the entire TON Blockchain. For instance, the {\em message delivery guarantees}, which assert that each message generated must be delivered to its destination account and delivered exactly once, are part of the global conditions.
299
\item {\em Internal (local) conditions} --- Express the conditions imposed on the data kept inside one block. For example, each transaction included in the block (i.e., present in the transaction list of some account) processes exactly one inbound message; this inbound message must be listed in the {\em InMsgDescr\/} structure of the block as well.
300
\item {\em External (local) conditions} --- Express the conditions imposed on the data of different blocks, usually belonging to the same or to neighboring shardchains (with respect to Hypercube Routing). Therefore, the external conditions come in several flavors:
301
  \begin{itemize}
302
  \item {\em Antecessor/successor conditions} --- Express the conditions imposed on the data of some block and of its immediate antecessor or (in the case of a preceding shardchain merge event) two immediate antecessors. The most important of these conditions is the one stating that the initial state for a shardchain block must coincide with final shardchain state of the immediate antecessor block, provided no shardchain split/merge event happened in between.
303
  \item {\em Masterchain/shardchain conditions} --- Express the conditions imposed on a shardchain block and on the masterchain block that refers to it in its {\em ShardHashes\/} list or is referred to in the header of the shardchain block.
304
  \item {\em Neighbor (block) conditions} --- Express the relations between the blocks of neighboring shardchains with respect to Hypercube Routing. The most important of these conditions express the relation between the {\em InMsgDescr\/} of a block and the {\em OutMsgQueue\/} of the state of a neighboring block.
305
  \end{itemize}
306
\end{itemize}
307

308
\nxsubpoint\emb{Decomposition of global and local conditions into simpler local conditions}
309
The {\em global\/} consistency conditions, such as the message delivery guarantees, are truly necessary for the blockchain to work properly; however, they are hard to enforce and verify directly. Therefore, we instead introduce a lot of simpler {\em local\/} consistency conditions, which are easier to enforce and verify since they involve only one block, or perhaps two adjacent blocks. These local conditions are chosen in such a fashion that the desired global conditions are logical consequences of (the conjunction of) all the local conditions. In this respect, we say that the global conditions have been ``decomposed'' into simpler local conditions.
310

311
Sometimes a local condition still turns out to be too cumbersome to enforce or verify. In that case it is decomposed further, into even simpler local conditions.
312

313
\nxsubpoint\emb{Decomposition may require additional data structures and additional internal consistency conditions}
314
The decomposition of a condition into simpler local consistency conditions sometimes requires the introduction of additional data structures. For example, the {\em InMsgDescr\/} explicitly lists all inbound messages processed in a block, even if this list might have been obtained by scanning the list of all the transactions present in the block. However, {\em InMsgDescr\/} greatly simplifies the neighbor conditions related to message forwarding and routing, which ultimately add up to the global message delivery guarantees.
315

316
Notice that the introduction of such additional data structures is a sort of ``database denormalization'' (i.e., it leads to some redundancy, or to some data being present more than once), and therefore more internal consistency conditions need to be imposed (e.g., if some data are now present in two copies, we must require that these two copies coincide). For instance, once we introduce {\em InMsgDescr\/} to facilitate message forwarding between shardchains, we need to introduce internal consistency conditions relating {\em InMsgDescr\/} to the transaction list of the same block.
317

318
\nxsubpoint\emb{Correct serialization conditions}
319
Apart from the high-level internal consistency conditions, which treat the contents of a block as a value of an abstract data type, there are some lower-level internal consistency conditions, called ``(correct) serialization conditions'', which ensure that the tree of cells present in the block is indeed a valid serialization of a value of the expected abstract data type. Such serialization conditions can be automatically generated from the TL-B scheme describing the abstract data type and its serialization into a tree of cells.
320

321
Notice that the serialization conditions are a set of mutually recursive predicates on cells or cell slices. For example, if a value of type $A$ consists of a 32-bit magic number $m_A$, a 64-bit integer $l$, and two references to cells containing values of types $B$ and $C$, respectively, then the correct serialization condition for values of type $A$ will require a cell or a cell slice to contain exactly 96 data bits and two cell references $r_1$ and~$r_2$, with the additional requirements that the first 32 data bits contain $m_A$, and the two cells referred to by $r_1$ and $r_2$ satisfy the serialization conditions for values of types $B$ and~$C$, respectively.
322

323
\nxsubpoint\label{sp:c.exist.elim}
324
\emb{Constructive elimination of existence quantifiers}
325
The local conditions one might want to impose sometimes are {\em non-constructible}, meaning that they do not necessarily contain an explanation of why they are true. A typical example of such a condition $C$ is given by
326
\begin{equation}\label{eq:nonc.sample}
327
  C:\equiv\forall_{(x:X)}\exists_{(y:Y)}A(x,y)\quad,
328
\end{equation}
329
``for any $x$ from $X$, there is a $y$ from $Y$ such that condition $A(x,y)$ holds''. Even if we know $C$ to be true, we do not have a way of quickly finding a $y:Y$, such that $A(x,y)$, for a given $x:X$. As a consequence, the verification of $C$ may be quite time-consuming.
330

331
In order to simplify the verification of local conditions, they are made {\em constructible\/} (i.e., verifiable in bounded time) by adding some {\em witness\/} data structures. For instance, condition $C$ of \eqref{eq:nonc.sample} may be transformed by adding a new data structure $f:X\to Y$ (a map $f$ from $X$ to $Y$) and imposing the following condition $C'$ instead:
332
\begin{equation}
333
  C':\equiv\forall_{(x:X)}A\bigl(x,f(x)\bigr)\quad.
334
\end{equation}
335
Of course, the ``witness'' value $f(x):Y$ may be included inside the (modified) data type $X$ instead of being kept in a separate table~$f$.
336

337
\nxsubpoint\label{sp:ex.exist.elim}\emb{Example: consistency condition for {\em InMsgDescr}}
338
For instance, the consistency condition between $X:=\textit{InMsgDescr}$, the list of all inbound messages processed in a block, and $Y:=\textit{Transactions}$, the list of all transactions present in a block, is of the above sort: ``For any input message $x$ present in \textit{InMsgDescr}, a transaction $y$ must be present in the block such that $y$ processes $x$''.\footnote{This example is a bit simplified since it does not take into account the presence of transit messages in \textit{InMsgDescr}, which are not processed by any explicit transaction.} The procedure of $\exists$-elimination described in \ptref{sp:c.exist.elim} leads us to introduce an additional field in the inbound message descriptors of \textit{InMsgDescr}, containing a reference to the transaction in which the message is actually processed.
339

340
\nxsubpoint\label{sp:c.disj.elim}
341
\emb{Constructive elimination of logical disjunctions}
342
Similarly to the transformation described in~\ptref{sp:c.exist.elim}, condition
343
\begin{equation}
344
  D:\equiv\forall_{(x:X)}\bigl(A_1(x)\vee A_2(x)\bigr)\quad,
345
\end{equation}
346
``for all $x$ from $X$, at least one of $A_1(x)$ and $A_2(x)$ holds'', may be transformed into a function $i:X\to\st2=\{1,2\}$ and a new condition
347
\begin{equation}
348
  D':\equiv\forall_{(x:X)}A_{i(x)}(x)
349
\end{equation}
350
This is a special case of the existential quantifier elimination considered before for $Y=\st2=\{1,2\}$. It may be useful when $A_1(x)$ and $A_2(x)$ are complicated conditions that cannot be verified quickly, so that it is useful to know in advance which of them is in fact true.
351

352
For instance, \textit{InMsgDescr\/}, as considered in~\ptref{sp:ex.exist.elim}, can contain both messages processed in the block and transit messages. We might introduce a field in the inbound message description to indicate whether the message is transit or not, and, in the latter case, include a witness field for the transaction processing the message.
353

354
\nxsubpoint\label{sp:cond.cvize}\emb{Constructivization of conditions}
355
This process of eliminating the non-constructible logical binders $\exists$ (existence quantifier) and (sometimes) $\vee$ (logical disjunction) by introducing additional data structures and fields---that is, the process of making a condition constructible---will be called {\em constructivization}. If taken to its theoretical limit, this process leads to logical formulas containing only universal quantifiers and logical conjunctions, at the expense of adding some witness fields into certain data structures.
356

357
\nxsubpoint\emb{Validity conditions for a block}
358
Ultimately, all of the internal conditions for a block, along with the local antecessor and neighbor conditions involving this block and another previously generated block, constitute the {\em validity conditions\/} for a shardchain or masterchain block. A block is {\em valid\/} if it satisfies the validity conditions. It is the responsibility of validators to generate valid blocks, as well as check the validity of blocks generated by other validators.
359

360
\nxsubpoint\emb{Witnesses of the invalidity of a block}
361
If a block does not satisfy all of the validity conditions $C_1$, \dots, $C_n$ (i.e., the conjunction $V:\equiv\bigwedge_i C_i$ of the validity conditions), it is {\em invalid}. This means that it satisfies the ``invalidity condition'' $\neg V=\bigvee_i\neg C_i$. If all of the $C_i$---and hence, also $V$---have been ``constructivized'' in the sense described in~\ptref{sp:cond.cvize}, so that they contain only logical conjunctions and universal quantifiers (and simple atomic propositions), then $\neg V$ contains only logical disjunctions and existential quantifiers. Then a constructivization of $\neg V$ may be defined, which would involve an {\em invalidity witness}, starting with an index $i$ of the specific validity condition $C_i$ which fails.
362

363
Such invalidity witnesses may also be serialized and presented to other validators or committed into the masterchain to prove that a specific block or block candidate is in fact invalid. Therefore, the construction and serialization of invalidity witnesses is an important part of a Proof-of-Stake (PoS) blockchain design.\footnote{It is interesting to note that this part of the work can be done almost automatically.}
364

365
\nxsubpoint\emb{Minimizing the size of witnesses}
366
An important consideration for the design of the local conditions, their decomposition into simpler conditions, and their constructivization is to make the verification of each condition as simple as possible. However, another requirement is that we should minimize the size of witnesses both for a condition (so that block size does not grow too much during the constructivization process) and for its negation (so that the invalidity proofs have bounded size, which simplifies their verification, transmission, and inclusion into the masterchain). These two design principles are sometimes at odds, and a compromise must be then sought.
367

368
\nxsubpoint\emb{Minimizing the size of Merkle proofs}
369
The consistency conditions are originally intended to be processed by a party who already has all the relevant data (e.g., all the blocks mentioned in the condition). On some occasions, however, they must be verified by a party who does not have all the blocks in question, but knows only their hashes. For example, suppose that a block invalidity proof were augmented by the signature of a validator that had signed an invalid block (and therefore would have to be punished). In this case, the signature would contain only the hash of the wrongly signed block; the block itself would have to be recovered from a different place before verifying the block invalidity proof.
370

371
A compromise between providing only the hash of the supposedly invalid block and providing the entire invalid block along with the invalidity witness is to augment the invalidity witness by a Merkle proof starting from the hash of the block (i.e., of the root cell of the block). Such a proof would include all the cells referred to in the invalidity witness, along with all the cells on the paths from these cells to the root cells and the hashes of their siblings. Then an invalidity proof becomes self-contained enough to provide sufficient justification on its own for punishing a validator. For example, the invalidity proof suggested above might be presented to a smart contract residing in the masterchain that punishes the validators for incorrect behavior.
372

373
Since such an invalidity proof must be augmented by a Merkle proof, it makes sense to write the consistency conditions so that the Merkle proofs for their negations would be as small as possible. In particular, each individual condition must be as ``local'' as possible (i.e., involve a minimal number of cells). This also optimizes the verification time of the invalidity proof.
374

375
\nxsubpoint\emb{Collated data for the external conditions}
376
When a validator suggests an unsigned block to the other validators of a shardchain, these other validators must check the validity of this block candidate---i.e., verify that it satisfies all of the internal and external local consistency conditions. While the internal conditions do not require any extra data in addition to the block candidate itself, the external conditions need some other blocks, or at least some information out of those blocks. Such additional information may be extracted from those blocks, along with all cells on the paths from the cells containing the required additional information to the root cell of the corresponding blocks and the hashes of the siblings of the cells on these paths, to present a Merkle proof that can be processed without knowledge of the referred blocks themselves.
377

378
This additional information, called {\em collated data}, is serialized as a bag of cells and presented by the validator along with the unsigned block candidate itself. The block candidate along with the collated data is called a {\em collated block}.
379

380
\nxsubpoint\emb{Conditions for a collated block}
381
The {\em external\/} consistency conditions for a block candidate are thus (automatically) transformed into {\em internal\/} consistency conditions for a collated block, which greatly simplifies and speeds up their verification by the other validators. However, some data---such as the final state of the immediate antecessor of the block being validated---is not collated. Instead, all validators are supposed to keep a local copy of this data.
382

383
\nxsubpoint\emb{Representation conditions and representation hashes}
384
Notice that once Merkle proofs are included into a collated block, the consistency conditions must take into account which data (i.e., which cells) are actually present in the collated block, and not just referred to by their hashes. This leads to a new group of conditions, called {\em representation conditions}, which must be able to distinguish an external cell reference (usually represented by its 256-bit hash) from the cell itself. A validator can be punished for suggesting a collated block that does not contain all of the expected collated data inside, even if the block candidate itself is valid.
385

386
This also leads to the utilization of {\em representation hashes} instead of {\em transparent hashes} for collated blocks.
387

388
\nxsubpoint\emb{Verification in the absence of the collated data}
389
Notice that a block must still be verifiable in the absence of the collated data; otherwise, no party except the validators would be able to check a previously committed block by its own means. In particular, witnesses cannot be included into the collated data: they must reside in the block itself. The collated data must contain only some portions of neighboring blocks referred to in the principal block along with suitable Merkle proofs, which can be reconstructed by anybody who has the referenced blocks themselves.
390

391
\nxsubpoint\emb{Inclusion of Merkle proofs in the block itself}
392
Notice that on some occasions Merkle proofs must be embedded into the block itself, and not just into collated data. For instance:
393
\begin{itemize}
394
\item During Instant Hypercube Routing (IHR), a message may be included directly into the \textit{InMsgDescr\/} of a block of the destination shardchain, without travelling all the way along the edges of the hypercube. In this case, a Merkle proof of the existence of the message in the \textit{OutMsgDescr\/} of a block of the originating shardchain must be included into \textit{InMsgDescr\/} along with the message itself.
395
\item An invalidity proof, or another proof of validator misbehavior, may be committed into the masterchain by including it in the body of a message sent to a special smart contract. In this case, the invalidity proof must include some cells along with a Merkle proof, which must therefore be contained in a message body.
396
\item Similarly, a smart contract defining a payment channel, or another kind of side-chain, may accept finalization messages or misbehavior proof messages that contain suitable Merkle proofs.
397
\item The final state of a shardchain is not included into a shardchain block. Instead, only the cells that have been modified are included; those cells that are inherited from the old state are referred to by their hashes, along with suitable Merkle proofs consisting of the cells on the path from the root of the old state to the cells of the old state referred to.
398
\end{itemize}
399

400
\nxsubpoint\emb{Provisions for handling incomplete data}
401
As we have seen, it is necessary to include incomplete data and Merkle proofs into the body of a block, into the body of some messages contained in a block, and into the state. This necessity is reflected by some extra representation conditions, as well as provisions for the messages (and by extension, the cell trees processed by TVM) to contain incomplete data (external cell references and Merkle proofs). In most cases, such external cell references contain only the 256-bit $\Sha$ hash of a cell along with a flag; if a smart contract attempts to inspect the contents of such a cell by a {\tt CTOS} primitive (e.g., for deserialization), an exception is triggered. However, an external reference to such a cell can be stored into the smart contract's persistent storage, and both the transparent and the representation hashes of such a cell can be computed.
402

403
\mysubsection{Logical time and logical time intervals}
404
This section takes a closer look at so-called {\em logical time}, extensively used in the TON Blockchain for message forwarding and message delivery guarantees, among other purposes.
405

406
\nxsubpoint\label{sp:logic.time}\emb{Logical time}
407
A component of the TON Blockchain that also plays an important role in message delivery is the {\em logical time}, usually denoted by $\LT$. It is a non-negative 64-bit integer, assigned to certain events roughly as follows:
408
\begin{quote}
409
  If an event $e$ logically depends on events $e_1$, \dots, $e_n$, then $\LT(e)$ is the smallest non-negative integer greater than all $\LT(e_i)$.
410
\end{quote}
411
In particular, if $n=0$ (i.e., if $e$ does not depend on any prior events), then $\LT(e)=0$.
412

413
\nxsubpoint\label{sp:logic.time.relaxed}\emb{A relaxed variant of logical time}
414
On some occasions we relax the definition of logical time, requesting only that
415
\begin{equation}\label{eq:lt.fund.ineq}
416
  \LT(e)>\LT(e')\quad\text{whenever $e\succ e'$ (i.e., $e$ logically depends on $e'$),}
417
\end{equation}
418
without insisting that $\LT(e)$ be the smallest non-negative integer with this property. In such cases we can speak about {\em relaxed\/} logical time, as opposed to the {\em strict\/} logical time defined above (cf.~\ptref{sp:logic.time}). Notice, however, that the condition~\eqref{eq:lt.fund.ineq} is a fundamental property of logical time and cannot be relaxed further.
419

420
\nxsubpoint\label{sp:logic.time.interval}\emb{Logical time intervals}
421
It makes sense to assign to some events or collections of events $C$ an {\em interval\/} of logical times $\LT^\bullet(C)=[\LT^-(C),\LT^+(C))$, meaning that the collection of events $C$ took place in the specified ``interval'' of logical times, where $\LT^-(C)<\LT^+(C)$ are some integers (64-bit integers in practice). In this case, we can say that $C$ {\em begins\/} at logical time $\LT^-(C)$, and {\em ends\/} at logical time $\LT^+(C)$.
422

423
  By default, we assume $\LT^+(e)=\LT(e)+1$ and $\LT^-(e)=\LT(e)$ for simple or ``atomic'' events, assuming that they last exactly one unit of logical time. In general, if we have a single value $\LT(C)$ as well as logical time interval $\LT^\bullet(C)=[\LT^-(C),\LT^+(C))$, we always require that
424
\begin{equation}
425
  \LT(C)\in[\LT^-(C),\LT^+(C))
426
\end{equation}
427
or, equivalently,
428
\begin{equation}
429
  \LT^-(C)\leq\LT(C)<\LT^+(C)
430
\end{equation}
431
In most cases, we choose $\LT(C)=\LT^-(C)$.
432

433
\nxsubpoint\label{sp:lt.int.cond}\emb{Requirements for logical time intervals}
434
The three principal requirements for logical time intervals are:
435
\begin{itemize}
436
\item $0\leq\LT^-(C)<\LT^+(C)$ are non-negative integers for any collection of events~$C$.
437
\item If $e'\prec e$ (i.e., if an atomic event $e$ logically depends on another atomic event $e'$), then $\LT^\bullet(e')<\LT^\bullet(e)$ (i.e., $\LT^+(e')\leq\LT^-(e)$).
438
\item If $C\supset D$ (i.e., if a collection of events $C$ contains another collection of events $D$), then $\LT^\bullet(C)\supset\LT^\bullet(D)$, i.e.,
439
\begin{equation}
440
  \LT^-(C)\leq\LT^-(D)<\LT^+(D)\leq\LT^+(C)
441
\end{equation}
442
In particular, if $C$ consists of atomic events $e_1$, \dots, $e_n$, then $\LT^-(C)\leq\inf_i\LT^-(e_i)\leq\inf_i\LT(e_i)$ and $\LT^+(C)\geq\sup_i\LT^+(e_i)\geq 1+\sup_i\LT(e_i)$.
443
\end{itemize}
444

445
\nxsubpoint\emb{Strict, or minimal, logical time intervals}
446
One can assign to any finite collection of atomic events $E=\{e\}$ related by a causality relation (partial order) $\prec$, and all subsets $C\subset E$, {\em minimal\/} logical time intervals. That is, among all assignments of logical time intervals satisfying the conditions listed in \ptref{sp:lt.int.cond}, we choose the one having all $\LT^+(C)-\LT^-(C)$ as small as possible, and if several assignments with this property exist, we choose the one that has the minimum $\LT^-(C)$ as well.
447

448
Such an assignment can be achieved by first assigning logical time $\LT(e)$ to all atomic events $e\in E$ as described in \ptref{sp:logic.time}, then setting $\LT^-(C):=\inf_{e\in C}\LT(e)$ and $\LT^+(C):=1+\sup_{e\in C}\LT(e)$ for any $C\subset E$.
449

450
In most cases when we need to assign logical time intervals, we use the minimal logical time intervals just described.
451

452
\nxsubpoint\label{sp:lt.ton.blkch}\emb{Logical time in the TON Blockchain}
453
The TON Blockchain assigns logical time and logical time intervals to several of its components.
454

455
For instance, each outbound message created in a transaction is assigned its {\em logical creation time}; for this purpose, the creation of an outbound message is considered an atomic event, logically dependent on the previous message created by the same transaction, as well as on the previous transaction of the same account, on the inbound message processed by the same transaction, and on all events contained in the blocks referred to by hashes contained in the block with the same transaction. As a consequence, {\em outbound messages created by the same smart contract have strictly increasing logical creation times.} The transaction itself is considered a collection of atomic events, and is assigned a logical time interval (cf.~\ptref{sp:trans.lt} for a more precise description).
456

457
Each block is a collection of transaction and message creation events, so it is assigned a logical time interval, explicitly mentioned in the header of the block.
458

459
\mysubsection{Total blockchain state}
460
This section discusses the total state of the TON Blockchain, as well as the states of separate shardchains and the masterchain. For example, the precise definition of the state of the neighboring shardchains becomes crucial for correctly formalizing the consistency condition asserting that the validators for a shardchain must import the oldest messages from the union of {\em OutMsgQueue\/}s taken from the states of all neighboring shardchains (cf.~\ptref{sp:monot.import}).
461

462
\nxsubpoint\emb{Total state defined by a masterchain block}
463
Every masterchain block contains a list of all currently active shards and of the latest blocks for each of them. In this respect, {\em every masterchain block defines the corresponding total state of the TON Blockchain, since it fixes the state of every shardchain, and of the masterchain as well.}
464

465
An important requirement imposed on this list of the latest blocks for all shardchain blocks is that, if a masterchain block $B$ lists $S$ as the latest block of some shardchain, and a newer masterchain block $B'$, with $B$ as one of its antecessors, lists $S'$ as the latest block of the same shardchain, then $S$ must be one of the antecessors of $S'$.\footnote{In order to express this condition correctly in the presence of dynamic sharding, one should fix some account $\xi$, and consider the latest blocks $S$ and $S'$ of the shardchains containing $\xi$ in the shard configurations of both $B$ and $B'$, since the shards containing $\xi$ might be different in $B$ and $B'$.}  This condition makes the total state of the TON blockchain defined by a subsequent masterchain block $B'$ compatible with the total state defined by a previous block $B$.
466

467
\nxsubpoint\label{sp:shard.total.state}\emb{Total state defined to by a shardchain block}
468
Every shardchain block contains the hash of the most recent masterchain block in its header. Consequently, all the blocks referred to in that masterchain block, along with their antecessors, are considered ``known'' or ``visible'' to the shardchain block, and no other blocks are visible to it, with the sole exception of its antecessors inside its proper shardchain.
469

470
In particular, when we say that a block {\em must\/} import in its {\em InMsgDescr\/} the messages from the {\em OutMsgQueue\/} of the states of all neighboring shardchains, it means that precisely the blocks of other shardchains visible to that block must be taken into account, and at the same time the block cannot contain messages from ``invisible'' blocks, even if they are otherwise correct.
471

472
\mysubsection{Configurable parameters and smart contracts}\label{p:conf.params}
473
Recall that the TON Blockchain has several so-called ``configurable parameters'' (cf.~\cite{TON}), which are either certain values or certain smart contracts residing in the masterchain. This section discusses the storage of and access to these configurable parameters.
474

475
\nxsubpoint\emb{Examples of configurable parameters}
476
The properties of the blockchain controlled by configurable parameters include:
477
\begin{itemize}
478
\item The minimum stake for validators.
479
\item The maximum size of the group of elected validators.
480
\item The maximum number of blocks for which the same group of validators are responsible.
481
\item The validator election process.
482
\item The validator punishing process.
483
\item The currently active and the next elected set of validators.
484
\item The process of changing configurable parameters, and the address of the smart contract $\gamma$ responsible for holding the values of the configurable parameters and for modifying their values.
485
\end{itemize}
486

487
\nxsubpoint\emb{Location of the values of configurable parameters}
488
The configurable parameters are kept in the persistent data of a special configuration smart contract $\gamma$ residing in the masterchain of the TON Blockchain. More precisely, the first reference of the root cell of the persistent data of that smart contract is a dictionary mapping 64-bit keys (parameter numbers) to the values of the corresponding parameters; each value is serialized into a cell slice according to the type of that value. If a value is a ``smart contract'' (necessarily residing in the masterchain), its 256-bit account address is used instead.
489

490
\nxsubpoint\label{sp:conf.par.qa}\emb{Quick access through the header of masterchain blocks}
491
To simplify access to the current values of configurable parameters, and to shorten the Merkle proofs containing references to them, the header of each masterchain block contains the address of smart contract $\gamma$. It also contains a direct cell reference to the dictionary containing all values of configurable parameters, which lies in the persistent data of~$\gamma$. Additional consistency conditions ensure that this reference coincides with the one obtained by inspecting the final state of smart contract~$\gamma$.
492

493
\nxsubpoint\emb{Getting values of configurable parameters by get methods}
494
The configuration smart contract $\gamma$ provides access to some of configurable parameters by means of ``get methods''. These special methods of the smart contract do not change its state, but instead return required data in the TVM stack.
495

496
\nxsubpoint\emb{Getting values of configurable parameters by get messages}
497
Similarly, the configuration smart contract $\gamma$ may define some ``ordinary'' methods (i.e., special inbound messages) to request the values of certain configuration parameters, which will be sent in the outbound messages generated by the transaction processing such an inbound message. This may be useful for some other fundamental smart contracts that need to know the values of certain configuration parameters.
498

499
\nxsubpoint\emb{Values obtained by get methods may be different from those obtained through the block header}
500
Notice that the state of the configuration smart contract~$\gamma$, including the values of configurable parameters, may change several times inside a masterchain block, if there are several transactions processed by~$\gamma$ in that block. As a consequence, the values obtained by invoking get methods of~$\gamma$, or sending get messages to $\gamma$, may be different from those obtained by inspecting the reference in the block header (cf.~\ptref{sp:conf.par.qa}), which refers to the {\em final\/} state of the configurable parameters in the block.
501

502
\nxsubpoint\label{sp:conf.par.change}\emb{Changing the values of configurable parameters}
503
The procedure for changing the values of configurable parameters is defined in the code of smart contract~$\gamma$. For most configurable parameters, called {\em ordinary}, any validator may suggest a new value by sending a special message with the number of the parameter and its proposed value to~$\gamma$. If the suggested value is valid, further voting messages from the validators are collected by the smart contract, and if more than two-thirds each of the current and next sets of validators support the proposal, the value is changed.
504

505
Some parameters, such as the current set of validators, cannot be changed in this way. Instead, the current configuration contains a parameter with the address of smart contract $\nu$ responsible for electing the next set of validators, and smart contract $\gamma$ accepts messages only from this smart contract $\nu$ to modify the value of the configuration parameter containing the current set of validators.
506

507
\nxsubpoint\emb{Changing the validator election procedure}
508
If the validator election procedure ever needs to be changed, this can be accomplished by first committing a new validator election smart contract into the masterchain, and then changing the ordinary configurable parameter containing the address $\nu$ of the validator election smart contract. This will require two-thirds of the validators to accept the proposal in a vote as described above in~\ptref{sp:conf.par.change}.
509

510
\nxsubpoint\emb{Changing the procedure of changing configurable parameters}
511
Similarly, the address of the configuration smart contract itself is a configurable parameter and may be changed in this fashion. In this way, most fundamental parameters and smart contracts of the TON Blockchain may be modified in any direction agreed upon by the qualified majority of the validators.
512

513
\nxsubpoint\emb{Initial values of the configurable parameters}
514
The initial values of most configurable parameters appear in block zero of the masterchain as part of the masterchain's initial state, which is explicitly present with no omissions in this block. The code of all fundamental smart contracts is also present in the initial state. In this way, the original ``constitution'' and configuration of the TON Blockchain, including the original set of validators, is made explicit in block zero.
515

516
\mysubsection{New smart contracts and their addresses}\label{p:acc.create}
517
This section discusses the creation and initialization of new smart contracts---in particular, the origin of their initial code, persistent data, and balance. It also discusses the assignment of account addresses to new smart contracts.
518

519
\nxsubpoint\emb{Description valid only for masterchain and basic workchain}
520
The mechanisms for creating new smart contracts and assigning their addresses described in this section are valid only for the basic workchain and the masterchain. Other workchains may define their own mechanisms for dealing with these problems.
521

522
\nxsubpoint\label{sp:crypto.to.uninit}\emb{Transferring cryptocurrency to uninitialized accounts}
523
First of all, {\em it is possible to send messages, including value-bearing messages, to previously unmentioned accounts.} If an inbound message arrives at a shardchain with a destination address $\eta$ corresponding to an undefined account, it is processed by a transaction as if the code of the smart contract were empty (i.e., consisting of an implicit \texttt{RET}). If the message is value-bearing, this leads to the creation of an ``uninitialized account'', which may have a non-zero balance (if value-bearing messages have been sent to it),\footnote{Value-bearing messages with the {\tt bounce} flag set will not be accepted by an uninitialized account, but will be ``bounced'' back.} but has no code and no data. Because even an uninitialized account occupies some persistent storage (needed to hold its balance), some small persistent-storage payments will be exacted from time to time from the account's balance, until it becomes negative.
524

525
\nxsubpoint\label{sp:constr.msg}\emb{Initializing smart contracts by constructor messages}
526
An account, or smart contract, is created by sending a special {\em constructor message\/} $M$ to its address $\eta$. The body of such a message contains the tree of cells with the initial code of the smart contract (which may be replaced by its hash in some situations), and the initial data of the smart contract (maybe empty; it can be replaced by its hash). The hash of the code and of the data contained in the constructor message must coincide with the address $\eta$ of the smart contract; otherwise, it is rejected.
527

528
After the code and data of the smart contract are initialized from the body of the constructor message, the remainder of the constructor message is processed by a transaction (the {\em creating transaction} for smart contract $\eta$) by invoking TVM in a manner similar to that used for processing ordinary inbound messages.
529

530
\nxsubpoint\emb{Initial balance of a smart contract}
531
Notice that the constructor message usually must bear some value, which will be transferred to the balance of the newly-created smart contract; otherwise, the new smart contract would have a balance of zero and would not be able to pay for storing its code and data in the blockchain. The minimum balance required from a newly-created smart contract is a linear (more precisely, affine) function of the storage it uses. The coefficients of this function may depend on the workchain; in particular, they are higher in the masterchain than in the basic workchain.
532

533
\nxsubpoint\emb{Creating smart contracts by external constructor messages}
534
In some cases, it is necessary to create a smart contract by a constructor message that cannot bear any value---for instance, by a constructor message ``from nowhere'' (an external inbound message). Then one should first transfer a sufficient amount of funds to the uninitialized smart contract as explained in~\ptref{sp:crypto.to.uninit}, and only then send a constructor message ``from nowhere''.
535

536
\nxsubpoint\label{sp:ex.simple.wallet}\emb{Example: creating a cryptocurrency wallet smart contract}
537
An example of the above situation is provided by cryptocurrency wallet applications for human users, which must create a special wallet smart contract in the blockchain in which to keep the user's funds. This can be achieved as follows:
538
\begin{itemize}
539
\item The cryptocurrency wallet application generates a new cryptographic public/private key pair (typically for Ed25519 elliptic curve cryptography, supported by special TVM primitives) for signing the user's future transactions.
540
\item The cryptocurrency wallet application knows the code of the smart contract to be created (which typically is the same for all users), as well as the data, which typically consists of the public key of the wallet (or of its hash) and is generated at the very beginning. The hash of this information is the address~$\xi$ of the wallet smart contract to be created.
541
\item The wallet application may display the user's address $\xi$, and the user may start to receive funds to her uninitialized account $\xi$---for example, by buying some cryptocurrency at an exchange, or by asking a friend to transfer a small sum.
542
\item The wallet application can inspect the shardchain containing account $\xi$ (in the case of a basic workchain account) or the masterchain (in the case of a masterchain account), either by itself or using a blockchain explorer, and check the balance of~$\xi$.
543
\item If the balance is sufficient, the wallet application may create and sign (with the user's private key) the constructor message (``from nowhere''), and submit it for inclusion to the validators or the collators for the corresponding blockchain.
544
\item Once the constructor message is included into a block of the blockchain and processed by a transaction, the wallet smart contract is finally created.
545
\item When the user wants to transfer some funds to some other user or smart contract $\eta$, or wants to send a value-bearing message to $\eta$, she uses her wallet application to create the message $m$ that she wants her wallet smart contract $\xi$ to send to $\eta$, envelope $m$ into a special ``message from nowhere'' $m'$ with destination $\xi$, and sign $m'$ with her private key. Some provisions against replay attacks must be made, as explained in~\ptref{sp:msg.uniq}.
546
\item The wallet smart contract receives message $m'$ and checks the validity of the signature with the aid of the public key stored in its persistent data. If the signature is correct, it extracts embedded message $m$ from $m'$ and sends it to its intended destination $\eta$, with the indicated amount of funds attached to it.
547
\item If the user does not need to immediately start transferring funds, but only wants to passively receive some funds, she may keep her account uninitialized as long as she wants (provided the persistent storage payments do not lead to the exhaustion of its balance), thus minimizing the storage profile and persistent storage payments of the account.
548
\item Notice that the wallet application may create for the human user the illusion that the funds are kept in the application itself, and provide an interface to transfer funds or send arbitrary messages ``directly'' from the user's account~$\xi$. In reality, all these operations will be performed by the user's wallet smart contract, which effectively acts as a proxy for such requests. We see that a cryptocurrency wallet is a simple example of a {\em mixed\/} application, having an on-chain part (the wallet smart contract, used as a proxy for outbound messages) and an off-chain part (the external wallet application running on a user's device and keeping the private account key).
549
\end{itemize}
550
Of course, this is just one way of dealing with the simplest user wallet smart contracts. One can create multi-signature wallet smart contracts, or create a shared wallet with internal balances kept inside it for each of its individual users, and so on.
551

552
\nxsubpoint\emb{Smart contracts may be created by other smart contracts}
553
Notice that a smart contract may generate and send a constructor message while processing any transaction. In this way, smart contracts may automatically create new smart contracts, if they need to, without any human intervention.
554

555
\nxsubpoint\emb{Smart contracts may be created by wallet smart contracts}
556
On the other hand, a user may compile the code for her new smart contract~$\nu$, generate the corresponding constructor message~$m$, and use the wallet application to force her wallet smart contract~$\xi$ to send message $m$ to~$\nu$ with an adequate amount of funds, thus creating the new smart contract~$\nu$.
557

558
\mysubsection{Modification and removal of smart contracts}
559
This section explains how the code and state of a smart contract may be changed, and how and when a smart contract may be destroyed.
560

561
\nxsubpoint\emb{Modification of the data of a smart contract}
562
The persistent data of a smart contract is usually modified as a result of executing the code of the smart contract in TVM while processing a transaction, triggered by an inbound message to the smart contract. More specifically, the code of the smart contract has access to the old persistent storage of the smart contract via TVM control register \texttt{c4}, and may modify the persistent storage by storing another value into~\texttt{c4} before normal termination.
563

564
Normally, there are no other ways to modify the data of an existing smart contract. If the code of the smart contract does not provide any ways to modify the persistent data (e.g., if it is a simple wallet smart contract as described in~\ptref{sp:ex.simple.wallet}, which initializes the persistent data with the user's public key and does not intend to ever change it), then it will be effectively immutable---unless the code of the smart contract is modified first.
565

566
\nxsubpoint\emb{Modification of the code of a smart contract}
567
Similarly, the code of an existing smart contract may be modified only if some provisions for such an upgrade are present in the current code. The code is modified by invoking TVM primitive \texttt{SETCODE}, which sets the root of the code for the current smart contract from the top value in the TVM stack. The modification is applied only after the normal termination of the current transaction.
568

569
Typically, if the developer of a smart contract wants to be able to upgrade its code in the future, she provides a special ``code upgrade method'' in the original code of the smart contract, which invokes \texttt{SETCODE} in response to certain inbound ``code upgrade'' messages, using the new code sent in the message itself as an argument to \texttt{SETCODE}. Some provisions must be made to protect the smart contract from unauthorized replacement of the code; otherwise, control of the smart contract and the funds on its balance could be lost. For example, code upgrade messages might be accepted only from a trusted source address, or they might be protected by requiring a valid cryptographic signature and a correct sequence number.
570

571
\nxsubpoint\emb{Keeping the code or data of the smart contract outside the blockchain}
572
The code or data of the smart contract may be kept outside the blockchain and be represented only by their hashes. In such cases, only empty inbound messages may be processed, as well as messages carrying a correct copy of the smart-contract code (or its portion relevant for processing the specific message) and its data inside special fields. An example of such a situation is given by the uninitialized smart contracts and constructor messages described in~\ptref{p:acc.create}.
573

574
\nxsubpoint\emb{Using code libraries}
575
Some smart contracts may share the same code, but use different data. One example of this is wallet smart contracts (cf.~\ptref{sp:ex.simple.wallet}), which are likely to use the same code (throughout all wallets created by the same software), but with different data (because each wallet must use its own pair of cryptographic keys). In this case, the code for all the wallet smart contracts is best committed by the developer into a shared {\em library}; this library would reside in the masterchain, and be referred to by its hash using a special ``external library cell reference'' as the root of the code of each wallet smart contract (or as a subtree inside that code).
576

577
Notice that even if the library code becomes unavalable---for example, because its developer stops paying for its storage in the masterchain---it is still possible to use the smart contracts referring to this library, either by committing the library again into the masterchain, or by including its relevant parts inside a message sent to the smart contract. This external cell reference resolution mechanism is discussed in more detail later in~\ptref{sp:lib.env}.
578

579
\nxsubpoint\emb{Destroying smart contracts}
580
Notice that a smart contract cannot really be destroyed until its balance becomes zero or negative. It may become negative as a result of collecting persistent storage payments, or after sending a value-bearing outbound message transferring almost all of its previous balance.
581

582
For example, a user may decide to transfer all remaining funds from her wallet to another wallet or smart contract. This may be useful, for instance, if one wants to upgrade the wallet, but the wallet smart contract does not have any provisions for future upgrades; then one can simply create a new wallet and transfer all funds to it.
583

584
\nxsubpoint\emb{Frozen accounts}
585
When the balance of an account becomes non-positive after a transaction, or smaller than a certain workchain-dependent minimum, the account is {\em frozen\/} by replacing all its code and data by a single 32-byte hash. This hash is kept afterwards for some time (e.g., a couple of months) to prevent recreation of the smart contract by its original creating transaction (which still has the correct hash, equal to the account address), and to allow its owner to recreate the account by transferring some funds and sending a message containing the account's code and data, to be reinstated in the blockchain. In this respect, frozen accounts are similar to uninitialized accounts; however, the hash of the correct code and data for a frozen account is not necessarily equal to the account address, but is kept separately.
586

587
Notice that frozen accounts may have a negative balance, indicating that persistent storage payments are due. An account cannot be unfrozen until its balance becomes positive and larger than a prescribed minimum value.
588

589
\clearpage
590
\mysection{Message forwarding and delivery guarantees}
591
This chapter discusses the forwarding of messages inside the TON Blockchain, including the Hypercube Routing (HR) and Instant Hypercube Routing (IHR) protocols. It also describes the provisions required to implement the message delivery guarantees and the FIFO ordering guarantee.
592

593
\mysubsection{Message addresses and next-hop computation}
594
This section explains the computation of transit and next-hop addresses by the variant of the hypercube routing algorithm employed in TON Blockchain. The hypercube routing protocol itself, which uses the concepts and next-hop address computation algorithm introduced in this section, is presented in the next section.
595

596
\nxsubpoint\emb{Account addresses}
597
The {\em source address\/} and {\em destination address\/} are always present in any message. Normally, they are {\em (full) account addresses}. A full account address consists of a $\workchainid$ (a signed 32-bit big-endian integer defining a workchain), followed by a (usually) 256-bit {\em internal address\/} or {\em account identifier\/} $\accountid$ (which may also be interpreted as an unsigned big-endian integer) defining the account within the chosen workchain.
598

599
Different workchains may use account identifiers that are shorter or longer than the ``standard'' 256 bits used in the masterchain ($\workchainid=-1$) and in the basic workchain ($\workchainid=0$). To this end, the masterchain state contains a list of all workchains defined so far, along with their account identifier lengths. An important restriction is that the $\accountid$ for any workchain must be at least 64 bits long.
600

601
In what follows, we often consider only the case of 256-bit account addresses for simplicity. Only the first 64 bits of the $\accountid$ are relevant for the purposes of message routing and shardchain splitting.
602

603
\nxsubpoint\emb{Source and destination addresses of a message}
604
Any message has both a {\em source address\/} and a {\em destination address}. Its source address is the address of the account (smart contract) that has created the message while processing some transaction; the source address cannot be changed or set arbitrarily, and smart contracts heavily rely on this property. By contrast, when a message is created, any well-formed destination address may be chosen; after that, the destination address cannot be changed.
605

606
\nxsubpoint\emb{External messages with no source or destination address}
607
Some messages can have no source or no destination address (though at least one of them must be present), as indicated by special flags in the message header. Such messages are the {\em external messages} intended for the interaction of the TON Blockchain with the outside world---human users and their cryptowallet applications, off-chain and mixed applications and services, other blockchains, and so on.
608

609
External messages are never routed inside the TON Blockchain. Instead, ``messages from nowhere'' (i.e., with no source address) are directly included into the \textit{InMsgDescr\/} of a destination shardchain block (provided some conditions are met) and processed by a transaction in that very block. Similarly, ``messages to nowhere'' (i.e., with no TON Blockchain destination address), also known as {\em log messages}, are also present only in the block containing the transaction that generated such a message.\footnote{``Messages to nowhere'' may have some special fields in their body indicating their destination outside the TON Blockchain---for instance, an account in some other blockchain, or an IP address and port---which may be interpreted by the third-party software appropriately. Such fields are ignored by the TON Blockchain.}
610

611
Therefore, external messages are almost irrelevant for the discussion of message routing and message delivery guarantees. In fact, the message delivery guarantees for outbound external messages are trivial (at most, the message must be included into the \textit{LogMsg} part of the block), and for inbound external messages there are none, since the validators of a shardchain block are free to include or ignore suggested inbound external messages at their discretion (e.g., according to the processing fee offered by the message).\footnote{The problem of bypassing possible validator censorship---which could happen, for instance, if all validators conspire not to include external messages sent to accounts belonging to some set of blacklisted accounts---is dealt with separately elsewhere. The main idea is that the validators may be forced to promise to include a message with a known hash in a future block, without knowing anything about the identity of the sender or the receiver; they will have to keep this promise afterwards when the message itself with pre-agreed hash is presented.}
612

613
In what follows, we focus on ``usual'' or ``internal'' messages, which have both a source and a destination address.
614

615
\nxsubpoint\emb{Transit and next-hop addresses}
616
When a message needs to be routed through intermediate shardchains before reaching its intended destination, it is assigned a {\em transit address\/} and a {\em next-hop address\/} in addition to the (immutable) source and destination addresses. When a copy of the message resides inside a transit shardchain awaiting its relay to its next hop, the {\em transit address\/} is its intermediate address lying in the transit shardchain, as if belonging to a special message-relay smart contract whose only job is to relay the unchanged message to the next shardchain on the route. The {\em next-hop address\/} is the address in a neighboring shardchain (or, on some rare occasions, in the same shardchain) to which the message needs to be relayed. After the message is relayed, the next-hop address usually becomes the transit address of the copy of the message included in the next shardchain.
617

618
Immediately after an outbound message is created in a shardchain (or in the masterchain), its transit address is set to its source address.\footnote{However, the internal routing process described in~\ptref{sp:hr.int.route} is applied immediately after that, which may further modify the transit address.}
619

620
\nxsubpoint\label{sp:hr.next.hop}\emb{Computation of the next-hop address for hypercube routing}
621
The TON Blockchain employs a variant of hypercube routing. This means that the next-hop address is computed from the transit address (originally equal to the source address) as follows:
622

623
\begin{enumerate}
624
\item The (big-endian signed) 32-bit $\workchainid$ components of both the transit address and destination address are split into groups of $n_1$ bits (currently, $n_1=32$), and they are scanned from the left (i.e., the most significant bits) to the right. If one of the groups in the transit address differs from the corresponding group in the destination address, then the value of this group in the transit address is replaced by its value in the destination address to compute the next-hop address.
625
\item If the $\workchainid$ parts of the transit and destination addresses match, then a similar process is applied to the $\accountid$ parts of the addresses: The $\accountid$ parts, or rather their first (most significant) 64 bits, are split into groups of $n_2$ bits (currently, $n_2=4$ bit groups are used, corresponding to the hexadecimal digits of the address) starting from the most significant bit, and are compared starting from the left. The first group that differs is replaced in the transit address with its value in the destination address to compute the next-hop address.
626
\item If the first 64 bits of the $\accountid$ parts of the transit and destination addresses match as well, then the destination account belongs to the current shardchain, and the message should not be forwarded outside the current shardchain at all. Instead, it must be processed by a transaction inside it.
627
\end{enumerate}
628

629
\nxsubpoint\label{sp:nh.notat}\emb{Notation for the next-hop address}
630
We denote by
631
\begin{equation}
632
  \NextHop(\xi,\eta)
633
\end{equation}
634
the next-hop address computed for current (source or transit) address $\xi$ and destination address $\eta$.
635

636
\nxsubpoint\label{sp:nh.anycast}\emb{Support for anycast addresses}
637
``Large'' smart contracts, which can have separate instances in different shardchains, may be reached using {\em anycast destination addresses}. These addresses are supported as follows.
638

639
An anycast address $(\eta,d)$ consists of a usual address $\eta$ along with its ``splitting depth'' $d\leq 31$. The idea is that the message may be delivered to any address differing from $\eta$ only in the first $d$ bits of the internal address part (i.e., not including the workchain identifier, which must match exactly). This is achieved as follows:
640
\begin{itemize}
641
\item The effective destination address $\tilde\eta$ is computed from $(\eta,d)$ by replacing the first $d$ bits of the internal address part of $\eta$ with the corresponding bits taken from the source address $\xi$.
642
\item All computations of $\NextHop(\nu,\eta)$ are replaced by $\NextHop(\nu,\tilde\eta)$, for $\nu=\xi$ as well as for all other intermediate addresses $\nu$. In this way, Hypercube Routing or Instant Hypercube Routing will ultimately deliver the message to the shardchain containing $\tilde\eta$.
643
\item When the message is processed in its destination shardchain (the one containing address $\tilde\eta$), it may be processed by an account $\eta'$ of the same shardchain differing from $\eta$ and $\tilde\eta$ only in the first $d$ bits of the internal address part. More precisely, if the common shard address prefix is $s$, so that only internal addresses starting with binary string $s$ belong to the destination shard, then $\eta'$ is computed from $\eta$ by replacing the first $\min(d,|s|)$ bits of the internal address part of $\eta$ with the corresponding bits of~$s$.
644
\end{itemize}
645
That said, we tacitly ignore the existence of anycast addresses and the additional processing they require in the following discussions.
646

647
\nxsubpoint\label{sp:nh.hamming.opt}\emb{Hamming optimality of the next-hop address algorithm}
648
Notice that the specific hypercube routing next-hop computation algorithm explained in~\ptref{sp:hr.next.hop} may potentially be replaced by another algorithm, provided it satisfies certain properties. One of these properties is the {\em Hamming optimality}, meaning that the Hamming ($L_1$) distance from $\xi$ to $\eta$ equals the sum of Hamming distances from $\xi$ to $\NextHop(\xi,\eta)$ and from $\NextHop(\xi,\eta)$ to $\eta$:
649
\begin{equation}\label{eq:hamm.opt}
650
  {\|\xi-\eta\|}_1=\bigl\|\xi-\NextHop(\xi,\eta)\bigr\|_1+\bigl\|\NextHop(\xi,\eta)-\eta\bigr\|_1
651
\end{equation}
652
Here ${\|\xi-\eta\|}_1$ is the {\em Hamming distance\/} between $\xi$ and $\eta$, equal to the number of bit positions in which $\xi$ and $\eta$ differ:\footnote{When the addresses involved are of different lengths (e.g., because they belong to different workchains), one should consider only the first 96 bits of the addresses in the above formula.}
653
\begin{equation}
654
  {\|\xi-\eta\|}_1=\sum_i|\xi_i-\eta_i|
655
\end{equation}
656

657
Notice that in general one should expect only an inequality in \eqref{eq:hamm.opt}, following from the triangle inequality for the $L_1$-metric. Hamming optimality essentially means that $\NextHop(\xi,\eta)$ lies on one of the (Hamming) shortest paths from $\xi$ to $\eta$. It can also be expressed by saying that $\nu=\NextHop(\xi,\eta)$ is always obtained from $\xi$ by changing the values of bits at some positions to their values in $\eta$: for any bit position $i$, we have $\nu_i=\xi_i$ or $\nu_i=\eta_i$.\footnote{Instead of Hamming optimality, we might have considered the equivalent property of {\em Kademlia optimality}, written for the Kademlia (or weighted $L_1$) distance as given by $\|\xi-\eta\|_K:=\sum_i2^{-i}|\xi_i-\eta_i|$ instead of the Hamming distance.}
658

659
\nxsubpoint\emb{Non-stopping of $\NextHop$}
660
Another important property of the $\NextHop$ is its {\em non-stopping}, meaning that $\NextHop(\xi,\eta)=\xi$ is possible only when $\xi=\eta$. In other words, if we have not yet arrived at $\eta$, the next hop cannot coincide with our current position.
661

662
This property implies that the path from $\xi$ to $\eta$---i.e., the sequence of intermediate addresses $\xi^{(0)}:=\xi$, $\xi^{(n)}:=\NextHop(\xi^{(n-1)},\eta)$---will gradually stabilize at $\eta$: for some $N\geq0$, we have $\xi^{(n)}=\eta$ for all $n\geq N$. Indeed, one can always take $N:={\|\xi-\eta\|}_1$.
663

664
\nxsubpoint\label{sp:path.conv}\emb{Convexity of the HR path with respect to sharding}
665
A consequence of Hamming optimality property~\eqref{eq:hamm.opt} is what we call the {\em convexity\/} of the path from $\xi$ to $\eta$ with respect to sharding. Namely, if $\xi^{(0)}:=\xi$, $\xi^{(n)}:=\NextHop(\xi^{(n-1)},\eta)$ is the computed path from $\xi$ to $\eta$, and $N$ is the first index such that $\xi^{(N)}=\eta$, and $S$ is a shard of some workchain in any shard configuration, then the indices $i$ with $\xi^{(i)}$ residing in shard~$S$ constitute a subinterval in $[0,N]$. In other words, if integers $0\leq i\leq j\leq k\leq N$ are such that $\xi^{(i)}$, $\xi^{(k)}\in S$, then $\xi^{(j)}\in S$ as well.
666

667
This convexity property is important for some proofs related to message forwarding in the presence of dynamic sharding.
668

669
\nxsubpoint\label{sp:hr.int.route}\emb{Internal routing}
670
Notice that the next-hop address computed according to the rules defined in~\ptref{sp:hr.next.hop} may belong to the same shardchain as the current one (i.e., the one containing the transit address). In that case, the ``internal routing'' occurs immediately, the transit address is replaced by the value of the computed next-hop address, and the next-hop address computation step is repeated until a next-hop address lying outside the current shardchain is obtained. The message is then kept in the transit output queue according to its computed next-hop address, with its last computed transit address as the ``intermediate owner'' of the transit message. If the current shardchain splits into two shardchains before the message is forwarded further, it is the shardchain containing the intermediate owner that inherits this transit message.
671

672
Alternatively, we might go on computing the next-hop addresses only to find out that the destination address already belongs to the current shardchain. In that case, the message will be processed (by a transaction) inside this shardchain instead of being forwarded further.
673

674
\nxsubpoint\emb{Neighboring shardchains}
675
Two shards in a shard configuration---or the two corresponding shardchains---are said to be {\em neighbors}, or {\em neighboring shardchains}, if one of them contains a next-hop address for at least one combination of allowed source and destination addresses, while the other contains the transit address for the same combination. In other words, two shardchains are neighbors if a message can be forwarded directly from one of them into the other via Hypercube Routing.
676

677
The masterchain is also included in this definition, as if it were the only shardchain of the workchain with $\workchainid=-1$. In this respect, it is a neighbor of all the other shardchains.
678

679
\nxsubpoint\emb{Any shard is a neighbor of itself}
680
Notice that a shardchain is always considered a neighbor of itself. This may seem redundant, because we always repeat the next-hop computation described in~\ptref{sp:hr.next.hop} until we obtain a next-hop address outside the current shardchain (cf.~\ptref{sp:hr.int.route}). However, there are at least two reasons for such an arrangement:
681
\begin{itemize}
682
\item Some messages have the source and the destination address inside the same shardchain, at least when the message is created. However, if such a message is not processed immediately in the same block where it has been created, it must be added to the outbound message queue of its shardchain, and be imported as an inbound message (with an entry in the {\em InMsgDescr}) in one of the subsequent blocks of the same shardchain.\footnote{Notice that the next-hop and internal-routing computations are still applied to such messages, since the current shardchain may be split before the message is processed. In this case, the new sub-shardchain containing the destination address will inherit the message.}
683
\item Alternatively, the next-hop address may originally be in some other shardchain that later gets merged with the current shardchain, so that the next hop becomes inside the same shardchain. Then the message will have to be imported from the outbound message queue of the merged shardchain, and forwarded or processed accordingly to its next-hop address, even though they reside now inside the same shardchain.
684
\end{itemize}
685

686
\nxsubpoint\label{sp:isp.hr}\emb{Hypercube Routing and the ISP}
687
Ultimately, the Infinite Sharding Paradigm (ISP) applies here: a shardchain should be considered a provisional union of accountchains, grouped together solely to minimize the block generation and transmission overhead.
688

689
The forwarding of a message runs through several intermediate account\-chains, some of which can happen to lie in the same shard. In this case, once a message reaches an accountchain lying in this shard, it is immediately (``internally'') routed inside that shard until the last accountchain lying in the same shard is reached (cf.~\ptref{sp:hr.int.route}). Then the message is enqueued in the output queue of that last accountchain.\footnote{We may define the (virtual) output queue of an account(chain) as the subset of the {\em OutMsgQueue\/} of the shard currently containing that account that consists of messages with transit addresses equal to the address of the account.}
690

691
\nxsubpoint\label{sp:repr.interm.addr}\emb{Representation of transit and next-hop addresses}
692
Notice that the transit and next-hop addresses differ from the source address only in the $\workchainid$ and in the first (most significant) 64 bits of the account address. Therefore, they may be represented by 96-bit strings. Furthermore, their $\workchainid$ usually coincides with the $\workchainid$ of either the source address or the destination address; a couple of bits may be used to indicate this situation, thus further reducing the space required to represent the transit and next-hop addresses.
693

694
In fact, the required storage may be reduced even further by observing that the specific hypercube routing algorithm described in~\ptref{sp:hr.next.hop} always generates intermediate (i.e., transit and next-hop) addresses that coincide with the destination address in their first $k$ bits, and with the source address in their remaining bits. Therefore, one might use just the values $0\leq k_{\text{tr}},k_{\text{nh}}\leq 96$ to fully specify the transit and next-hop addresses. One might also notice that $k':=k_{\text{nh}}$ turns out to be a fixed function of $k:=k_{\text{tr}}$ (for instance, $k'=k+n_2=k+4$ for $k\geq32$), and therefore include only one 7-bit value of~$k$ in the serialization.
695

696
Such optimizations have the obvious disadvantage that they rely too much on the specific routing algorithm used, which can be changed in the future, so they are used in~\ptref{sp:tl.msg.env} with a provision to specify more general intermediate addresses if necessary.
697

698
\nxsubpoint\label{sp:msg.env}\emb{Message envelopes}
699
The transit and next-hop addresses of a forwarded message are not included in the message itself, but are kept in a special {\em message envelope}, which is a cell (or a cell slice) containing the transit and next-hop addresses with the above optimizations, some other information relevant for forwarding and processing, and a reference to a cell containing the unmodified original message. In this way, a message can easily be ``extracted'' from its original envelope (e.g., the one present in the {\em InMsgDescr}) and be put into another envelope (e.g., before being included into the {\em OutMsgQueue}).
700

701
In the representation of a block as a tree, or rather a DAG, of cells, the two different envelopes will contain references to a shared cell with the original message. If the message is large, this arrangement avoids the need to keep more than one copy of the message in the block.
702

703
\mysubsection{Hypercube Routing protocol}
704
This section exposes the details of the hypercube routing protocol employed by the TON Blockchain to achieve guaranteed delivery of messages between smart contracts residing in arbitrary shardchains. For the purposes of this document, we will refer to the variant of hypercube routing employed by the TON Blockchain as Hypercube Routing (HR).
705

706
\nxsubpoint\label{sp:msg.uniq}\emb{Message uniqueness}
707
Before continuing, let us observe that any (internal) message is {\em unique}. Recall that a message contains its full source address along with its logical creation time, and all outbound messages created by the same smart contract have strictly increasing logical creation times (cf.~\ptref{sp:lt.ton.blkch}); therefore, the combination of the full source address and the logical creation time uniquely defines the message. Since we assume the chosen hash function $\Sha$ to be collision resistant, {\em a message is uniquely determined by its hash}, so we can identify two messages if we know that their hashes coincide.
708

709
This does not extend to external messages ``from nowhere'', which have no source addresses. Special care must be taken to prevent replay attacks related to such messages, especially by designers of user wallet smart contracts. One possible solution is to include a sequence number in the body of such messages, and keep the count of external messages already processed inside the smart-contract persistent data, refusing to process an external message if its sequence number differs from this count.
710

711
\nxsubpoint\label{sp:msg.hash.ident}\emb{Identifying messages with equal hashes}
712
The TON Blockchain assumes that two messages with the same hashes coincide, and treats either of them as a redundant copy of the other. As explained above in~\ptref{sp:msg.uniq}, this does not lead to any unexpected effects for internal messages. However, if one sends two coinciding ``messages from nowhere'' to a smart contract, it may happen that only one of them will be delivered---or both. If their action is not supposed to be idempotent (i.e., if processing the message twice has a different effect from processing it once), some provisions should be made to distinguish the two messages, for instance by including a sequence number in them.
713

714
In particular, the {\em InMsgDescr\/} and {\em OutMsgDescr\/} use the (unenveloped) message hash as a key, tacitly assuming that distinct messages have distinct hashes. In this way, one can trace the path and the fate of a message across different shardchains by looking up the message hash in the {\em InMsgDescr\/} and {\em OutMsgDescr\/} of different blocks.
715

716
\nxsubpoint\label{sp:out.msg.q}\emb{The structure of {\em OutMsgQueue}}
717
Recall that the outbound messages --- both those created inside the shardchain, and transit messages previously imported from a neighboring shardchain to be relayed to the next-hop shardchain --- are accumulated in the {\em OutMsgQueue}, which is part of the {\em state\/} of the shardchain (cf.~\ptref{sp:outmsgq}). In contrast with {\em InMsgDescr\/} and {\em OutMsgDescr}, the key in {\em OutMsgQueue} is not the message hash, but its next-hop address---or at least its first 96 bits---concatenated with the message hash.
718

719
Furthermore, the {\em OutMsgQueue\/} is not just a dictionary (hashmap), mapping its keys into (enveloped) messages. Rather, it is a {\em min-augmented dictionary with respect to the logical creation time}, meaning that each node of the Patricia tree representing {\em OutMsgQueue\/} has an additional value (in this case, an unsigned 64-bit integer), and that this augmentation value in each fork node is set to be equal to the minimum of the augmentation values of its children. The augmentation value of a leaf equals the logical creation time of the message contained in that leaf; it need not be stored explicitly.
720

721
\nxsubpoint\emb{Inspecting the {\em OutMsgQueue\/} of a neighbor}
722
Such a structure for the {\em OutMsgQueue\/} enables the validators of a neighboring shardchain to inspect it to find its part (Patricia subtree) relevant to them (i.e., consisting of messages with the next-hop address belonging to the neighboring shard in question---or having the next-hop address with a given binary prefix), as well as quickly compute the ``oldest'' (i.e., with the minimum logical creation time) message in that part.
723

724
Furthermore, the shard validators do not even need to track the total state of all their neighboring shardchains---they only need to keep and update a copy of their {\em OutMsgQueue}, or even of its subtree related to them.
725

726
\nxsubpoint\label{sp:monot.import}
727
\emb{Logical time monotonicity: importing the oldest message from the neighbors}
728
The first fundamental local condition of message forwarding, called {\em (message import) (logical time) monotonicity condition}, may be summarized as follows:
729
\begin{quote}
730
 While importing messages into the {\em InMsgDescr\/} of a shardchain block from the {\em OutMsgQueue\/}s of its neighboring shardchains, the validators must import the messages in the increasing order of their logical time; in the case of a tie, the message with the smaller hash is imported first.
731
\end{quote}
732
More precisely, each shardchain block contains the hash of a masterchain block (assumed to be ``the latest'' masterchain block at the time of the shardchain block's creation), which in turn contains the hashes of the most recent shardchain blocks. In this way, each shardchain block indirectly ``knows'' the most recent state of all other shardchains, and especially its neighboring shardchains, including their {\em OutMsgQueue\/}s.\footnote{In particular, if the hash of a recent block of a neighboring shardchain is not yet reflected in the latest masterchain block, its modifications to {\em OutMsgQueue\/} must not be taken into account.}
733

734
Now an alternative equivalent formulation of the monotonicity condition is as follows:
735
\begin{quote}
736
If a message is imported into the {\em InMsgDescr\/} of the new block, its logical creation time cannot be greater than that of any message left unimported in the {\em OutMsgQueue\/} of the most recent state of any of the neighboring shardchains.
737
\end{quote}
738
It is this form of the monotonicity condition that appears in the local consistency conditions of the TON Blockchain blocks and is enforced by the validators.
739

740
\nxsubpoint\emb{Witnesses to violations of the message import logical time monotonicity condition}
741
Notice that if this condition is not fulfilled, a small Merkle proof witnessing its failure may be constructed. Such a proof will contain:
742
\begin{itemize}
743
\item A path in the {\em OutMsgQueue\/} of a neighbor from the root to a certain message $m$ with small logical creation time.
744
\item A path in the {\em InMsgDescr\/} of the block under consideration showing that the key equal to $\Hash(m)$ is absent in {\em InMsgDescr} (i.e., that $m$ has not been included in the current block).
745
\item A proof that $m$ has not been included in a preceding block of the same shardchain, using the block header information containing the smallest and the largest logical time of all messages imported into the block (cf. \ptref{sp:msg.deliver.chk}--\ptref{sp:hr.ihr.deliver.chk} for more information).
746
\item A path in {\em InMsgDescr\/} to another included message $m'$, such that either $\LT(m')>\LT(m)$, or $\LT(m')=\LT(m)$ and $\Hash(m')>\Hash(m)$.
747
\end{itemize}
748

749
\nxsubpoint\label{sp:omsgq.del}\emb{Deleting a message from {\em OutMsgQueue}}
750
A message must be deleted from {\em OutMsgQueue\/} sooner or later; otherwise, the storage used by {\em OutMsgQueue\/} would grow to infinity. To this end, several ``garbage collection rules'' are introduced. They allow the deletion of a message from {\em OutMsgQueue\/} during the evaluation of a block only if an explicit special ``delivery record'' is present in the {\em OutMsgDescr\/} of that block. This record contains either a reference to the neighboring shardchain block that has included the message into its {\em InMsgDescr\/} (the hash of the block is sufficient, but collated material for the block may contain the relevant Merkle proof), or a Merkle proof of the fact that the message has been delivered to its final destination via Instant Hypercube Routing.
751

752
\nxsubpoint\emb{Guaranteed message delivery via Hypercube Routing}
753
In this way, a message cannot be deleted from the outbound message queue unless it has been either relayed to its next-hop shardchain or delivered to its final destination (cf.~\ptref{sp:omsgq.del}). Meanwhile, the message import monotonicity condition (cf.~\ptref{sp:monot.import}) ensures that any message will sooner or later be relayed into the next shardchain, taking into account other conditions which require the validators to use at least half of the block's space or gas limits for importing inbound internal messages (otherwise the validators might choose to create empty blocks or import only external messages even in the presence of non-empty outbound message queues at their neighbors).
754

755
\nxsubpoint\label{sp:msg.proc.order}\emb{Message processing order}
756
When several imported messages are processed by transactions inside a block, the {\em message processing order conditions\/} ensure that older messages are processed first. More precisely, if a block contains two transactions $t$ and $t'$ of the same account, which process inbound messages $m$ and $m'$, respectively, and $\LT(m)<\LT(m')$, then we must have $\LT(t)<\LT(t')$.
757

758
\nxsubpoint\emb{FIFO guarantees of Hypercube Routing}
759
The message processing order conditions (cf.~\ptref{sp:msg.proc.order}), along with the message import monotonicity conditions (cf.~\ptref{sp:monot.import}), imply the {\em FIFO guarantees for Hypercube Routing}. Namely, if a smart contract $\xi$ creates two messages $m$ and $m'$ with the same destination $\eta$, and $m'$ is generated later than $m$ (meaning that $m\prec m'$, hence $\LT(m)<\LT(m')$), then $m$ will be processed by $\eta$ before $m'$. This is so because both messages will follow the same routing steps on the path from $\xi$ to $\eta$ (the Hypercube Routing algorithm described in~\ptref{sp:hr.next.hop} is deterministic), and in all outbound queues and inbound message descriptions $m'$ will appear ``after'' $m$.\footnote{This statement is not as trivial as it seems at first, because some of the shardchains involved may split or merge during the routing. A correct proof may be obtained by adopting the ISP perspective to HR as explained in~\ptref{sp:isp.hr} and observing that $m'$ will always be behind $m$, either in terms of the intermediate accountchain reached or, if they happen to be in the same accountchain, in terms of logical creation time.
760

761
A crucial observation is that ``at any given moment of time'' (logically; a more precise description would be ``in the total state obtained after processing any causally closed subset $\cF$ of blocks''), the intermediate accountchains belonging to the same shard are contiguous on the path from $\xi$ to~$\eta$ (i.e., cannot have accountchains belonging to some other shard in between). This is a ``convexity property'' (cf.~\ptref{sp:path.conv}) of the Hypercube Routing algorithm described in~\ptref{sp:hr.next.hop}.}
762

763
If message $m'$ can be delivered to $B$ via Instant Hypercube Routing, this is not necessarily true anymore. Therefore, a simple way of ensuring FIFO message delivery discipline between a pair of smart contracts consists in setting a special bit in the message header preventing its delivery via IHR.
764

765
\nxsubpoint\emb{Delivery uniqueness guarantees of Hypercube Routing}
766
Notice that the message import monotonicity conditions also imply the {\em uniqueness\/} of the delivery of any message via Hypercube Routing---i.e., that it cannot be imported and processed by the destination smart contract more than once. We will see later in~\ptref{p:ihr.combine.deliv} that enforcing delivery uniqueness when both Hypercube Routing and Instant Hypercube Routing are active is more complicated.
767

768
\nxsubpoint\label{sp:hr.overview}\emb{An overview of Hypercube Routing}
769
Let us summarize all routing steps performed to deliver an internal message $m$ created by source account $\xi_0$ to destination account~$\eta$. We denote by $\xi_{k+1}:=\NextHop(\xi_k,\eta)$, $k=0,1,2,\ldots$ the intermediate addresses dictated by HR for forwarding the message $m$ to its final destination $\eta$. Let $S_k$ be the shard containing $\xi_k$.
770
\begin{itemize}
771
\item{[Birth]} --- Message $m$ with destination $\eta$ is created by a transaction $t$ belonging to an account $\xi_0$ residing in some shardchain $S_0$. The logical creation time $\LT(m)$ is fixed at this point and included into the message $m$.
772
\item{[ImmediateProcessing?]} --- If the destination $\eta$ resides in the same shardchain $S_0$, the message may be processed in the same block it was generated in. In this case, $m$ is included into {\em OutMsgDescr\/} with a flag indicating it has been processed in this very block and need not be forwarded further. Another copy of $m$ is included into {\em InMsgDescr}, along with the usual data describing the processing of inbound messages. (Notice that $m$ is not included into the {\em OutMsgQueue\/} of $S_0$.)
773
\item{[InitialInternalRouting]} --- If $m$ either has a destination outside $S_0$, or is not processed in the same block where it was generated, the internal routing procedure described in~\ptref{sp:hr.int.route} is applied, until an index $k$ is found such that $\xi_k$ lies in $S_0$, but $\xi_{k+1}=\NextHop(\xi_k,\eta)$ does not (i.e., $S_k=S_0$, but $S_{k+1}\neq S_0$). Alternatively, this process stops if $\xi_k=\eta$ or $\xi_k$ coincides with $\eta$ in its first 96 bits.
774
\item{[OutboundQueuing]} --- The message $m$ is included into {\em OutMsgDescr\/} (with the key equal to its hash), with an envelope containing its transit address $\xi_k$ and next-hop address $\xi_{k+1}$ as explained in \ptref{sp:msg.env} and~\ptref{sp:repr.interm.addr}. The same enveloped message is also included in the {\em OutMsgQueue\/} of the state of $S_k$, with the key equal to the concatenation of the first 96 bits of its next-hop address $\xi_{k+1}$ (which may be equal to $\eta$ if $\eta$ belongs to $S_k$) and the message hash $\Hash(m)$.
775
\item{[QueueWait]} --- Message $m$ waits in the {\em OutMsgQueue\/} of shardchain $S_k$ to be forwarded further. In the meantime, shardchain $S_k$ may split or merge with other shardchains; in that case, the new shard $S'_k$ containing the transit address $\xi_k$ inherits $m$ in its {\em OutMsgQueue}.
776
\item{[ImportInbound]} --- At some point in the future, the validators for the shardchain $S_{k+1}$ containing the next-hop address $\xi_{k+1}$ scan the {\em OutMsgQueue\/} in the state of shardchain $S_k$ and decide to import message $m$ in keeping with the monotonicity condition (cf.~\ptref{sp:monot.import}) and other conditions. A new block for shardchain $S_{k+1}$ is generated, with an enveloped copy of $m$ included in its {\em InMsgDescr}. The entry in {\em InMsgDescr\/} contains also the {\em reason\/} for importing $m$ into this block, with a hash of the most recent block of shardchain $S'_k$, and the previous next-hop and transit addresses $\xi_k$ and $\xi_{k+1}$, so that the corresponding entry in the {\em OutMsgQueue\/} of $S'_k$ can be easily located.
777
\item{[Confirmation]} --- This entry in the {\em InMsgDescr\/} of $S_{k+1}$ also serves as a confirmation for $S'_k$. In a later block of $S'_k$, message $m$ must be removed from the {\em OutMsgQueue\/} of $S'_k$; this modification is reflected in a special entry in the {\em OutMsgDescr\/} of the block of $S'_k$ that performs this state modification.
778
\item{[Forwarding?]} --- If the final destination $\eta$ of~$m$ does not reside in $S_{k+1}$, the message is {\em forwarded}. Hypercube Routing is applied until some $\xi_l$, $l>k$, and $\xi_{l+1}=\NextHop(\xi_l,\eta)$ are obtained, such that $\xi_l$ lies in $S_{k+1}$, but $\xi_{l+1}$ does not (cf.~\ptref{sp:hr.int.route}). After that, a newly-enveloped copy of $m$ with transit address set to $\xi_l$ and next-hop address $\xi_{l+1}$ is included into both the {\em OutMsgDescr\/} of the current block of $S_{k+1}$ and the {\em OutMsgQueue\/} of the new state of $S_{k+1}$. The entry of $m$ in {\em InMsgDescr\/} contains a flag indicating that the message has been forwarded; the entry in {\em OutMsgDescr\/} contains the newly-enveloped message and a flag indicating that this is a forwarded message. Then all the steps starting from [OutboundQueueing] are repeated, for $l$ instead of~$k$.
779
\item{[Processing?]} --- If the final destination $\eta$ of $m$ resides in $S_{k+1}$, then the block of $S_{k+1}$ that imported the message must process it by a transaction $t$ included in the same block. In this case, {\em InMsgDescr\/} contains a reference to $t$ by its logical time $\LT(t)$, and a flag indicating that the message has been processed.
780
\end{itemize}
781

782
The above message routing algorithm does not take into account some further modifications required to implement Instant Hypercube Routing (IHR). For instance, a message may be {\em discarded\/} after being imported (listed in {\em InMsgDescr}) into its final or intermediate shardchain block, because a proof of delivery via IHR to the final destination is presented. In this case, such a proof must be included into {\em InMsgDescr\/} to explain why the message was not forwarded further or processed.
783

784
\mysubsection{Instant Hypercube Routing and combined delivery guarantees}\label{p:ihr.combine.deliv}
785
This section describes the Instant Hypercube Routing protocol, normally applied by TON Blockchain in parallel to the previously discussed Hypercube Routing protocol to achieve faster message delivery. However, when both Hypercube Routing and Instant Hypercube Routing are applied to the same message in parallel, achieving delivery and unique delivery guarantees is more complicated. This topic is also discussed in this section.
786

787
\nxsubpoint\label{sp:ihr.overview}\emb{An overview of Instant Hypercube Routing}
788
Let us explain the major steps applied when the Instant Hypercube Routing (IHR) mechanism is applied to a message. (Notice that normally both the usual HR and IHR work in parallel for the same message; some provisions must be taken to guarantee the uniqueness of delivery of any message.)
789

790
Consider the routing and delivery of the same message $m$ with source $\xi$ and destination $\eta$ as discussed in~\ptref{sp:hr.overview}:
791
\begin{itemize}
792
\item{[NetworkSend]} --- After the validators of $S_0$ have agreed on and signed the block containing the creating transaction $t$ for $m$, and observed that the destination $\eta$ of $m$ does not reside inside $S_0$, they may send a datagram (encrypted network message), containing the message $m$ along with a Merkle proof of its inclusion into the {\em OutMsgDescr\/} of the block just generated, to the validator group of the shardchain $T$ currently owning the destination~$\eta$.
793
\item{[NetworkReceive]} --- If the validators of shardchain $T$ receive such a message, they check its validity starting from the most recent masterchain block and the shardchain block hashes listed in it, including the most recent ``canonical'' block of shardchain $S_0$ as well. If the message is invalid, they silently discard it. If that block of shardchain $S_0$ has a larger sequence number than the one listed in the most recent masterchain block, they may either discard it or postpone the verification until the next masterchain block appears.
794
\item{[InclusionConditions]} --- The validators check inclusion conditions for message $m$. In particular, they must check that this message has not been delivered before, and that the {\em OutMsgQueue\/}s of the neighbors do not have unprocessed outbound messages with destinations in $T$ with smaller logical creation times than $\LT(m)$.
795
\item{[Deliver]} --- The validators deliver and process the message, by including it into the {\em InMsgDescr\/} of the current shardchain block along with a bit indicating that it is an IHR message, the Merkle proof of its inclusion into the {\em OutMsgDescr\/} of the original block, and the logical time of the transaction $t'$ processing this inbound message into the currently generated block.
796
\item{[Confirm]} --- Finally, the validators send encrypted datagrams to all the validator groups of the intermediate shardchains on the path from $\xi$ to $\eta$, containing a Merkle proof of the inclusion of message $m$ into the {\em InMsgDescr\/} of its final destination. The validators of an intermediate shardchain may use this proof to {\em discard\/} the copy of message $m$ travelling by the rules of HR, by importing the message into their {\em InMsgDescr\/} along with the Merkle proof of final delivery and setting a flag indicating that the message has been discarded.
797
\end{itemize}
798

799
The overall procedure is even simpler than that for Hypercube Routing. Notice, however, that IHR comes with no delivery or FIFO guarantees: the network datagram may be lost in transit, or the validators of the destination shardchain may decide not to act on it, or they may discard it due to buffer overflow. This is the reason why IHR is used as a complement to HR, and not as a replacement.
800

801
\nxsubpoint\emb{Overall eventual delivery guarantees}
802
Notice that the combination of HR and IHR guarantees the ultimate delivery of any internal message to its final destination. Indeed, the HR by itself is guaranteed to deliver any message eventually, and the HR for message $m$ can be cancelled at an intermediate stage only by a Merkle proof of delivery of $m$ to its final destination (via IHR).
803

804
\nxsubpoint\emb{Overall unique delivery guarantees}
805
However, the {\em uniqueness\/} of message delivery for the combination of HR and IHR is more difficult to achieve. In particular, one must check the following conditions, and, if necessary, be able to provide short Merkle proofs that they do or don't hold:
806
\begin{itemize}
807
\item When a message $m$ is imported into its next intermediate shardchain block via HR, we must check that $m$ has not already been imported via HR.
808
\item When $m$ is imported and processed in its final destination shardchain, we must check that $m$ has not already been processed. If it has, there are three subcases:
809
  \begin{itemize}
810
  \item If $m$ is being considered for import via HR, and it has already been imported via HR, it must not be imported at all.
811
  \item If $m$ is being considered for import via HR, and it has already been imported via IHR (but not HR), then it must be imported and immediately discarded (without being processed by a transaction). This is necessary to remove $m$ from the {\em OutMsgQueue\/} of its previous intermediate shardchain.
812
  \item If $m$ is being considered for import via IHR, and it has already been imported via either IHR or HR, it must not be imported at all.
813
  \end{itemize}
814
\end{itemize}
815

816
\nxsubpoint\label{sp:msg.deliver.chk}\emb{Checking whether a message has already been delivered to its final destination}
817
Consider the following general algorithm for checking whether a message~$m$ has already been delivered to its final destination~$\eta$: One can simply scan the last several blocks belonging to the shardchain containing the destination address, starting from the latest block and working backwards through the previous block references. (If there are two previous blocks---i.e., if a shardchain merge event occurred at some point---one would follow the chain containing the destination address.) The {\em InMsgDescr\/} of each of these blocks can be checked for an entry with key $\Hash(m)$. If such an entry is found, the message $m$ has already been delivered, and we can easily construct a Merkle proof of this fact. If we do not find such an entry before arriving at a block $B$ with $\LT^+(B)<\LT(m)$, implying that $m$ could not be delivered in~$B$ or any of its predecessors, then the message $m$ definitely has not been delivered yet.
818

819
The obvious disadvantage of this algorithm is that, if message $m$ is very old (and most likely delivered a long time ago), meaning that it has a small value of $\LT(m)$, then a large number of blocks will need to be scanned before yielding an answer. Furthermore, if the answer is negative, the size of the Merkle proof of this fact will increase linearly with the number of blocks scanned.
820

821
\nxsubpoint\label{sp:ihr.deliver.chk}\emb{Checking whether an IHR message has already been delivered to its final destination}
822
To check whether an IHR message $m$ has already been delivered to its destination shardchain, we can apply the general algorithm described above (cf.~\ptref{sp:msg.deliver.chk}), modified to inspect only the last $c$ blocks for some small constant $c$ (say, $c=8$). If no conclusion can be reached after inspecting these blocks, then the validators for the destination shardchain may simply discard the IHR message instead of spending more resources on this check.
823

824
\nxsubpoint\emb{Checking whether an HR message has already been delivered via HR to its final destination or an intermediate shardchain}
825
To check whether an HR-received message $m$ (or rather, a message $m$ being considered for import via HR) has already been imported via HR, we can use the following algorithm: Let $\xi_k$ be the transit address of $m$ (belonging to a neighboring shardchain $S_k$) and $\xi_{k+1}$ be its next-hop address (belonging to the shardchain under consideration). Since we are considering the inclusion of $m$, $m$ must be present in the {\em OutMsgQueue\/} of the most recent state of shardchain $S_k$, with $\xi_k$ and $\xi_{k+1}$ indicated in its envelope. In particular, (a) the message has been included into {\em OutMsgQueue}, and we may even know when, because the entry in {\em OutMsgQueue\/} sometimes contains the logical time of the block where it has been added, and (b) it has not yet been removed from {\em OutMsgQueue}.
826

827
Now, the validators of the neighboring shardchain are required to remove a message from {\em OutMsgQueue\/} as soon as they observe that message (with transit and next-hop addresses $\xi_k$ and $\xi_{k+1}$ in its envelope) has been imported into the {\em InMsgDescr\/} of the message's next-hop shardchain. Therefore, (b) implies that the message could have been imported into the {\em InMsgDescr\/} of a preceding block only if this preceding block is very new (i.e., not yet known to the most recent neighboring shardchain block). Therefore, only a very limited number of preceding blocks (typically one or two, at most)  need to be scanned by the algorithm described in~\ptref{sp:msg.deliver.chk} to conclude that the message has not yet been imported.\footnote{One must not only look up the key $\Hash(m)$ in the {\em InMsgDescr\/} of these blocks, but also check the intermediate addresses in the envelope of the corresponding entry, if found.} In fact, if this check is performed by the validators or collators for the current shardchain themselves, it can be optimized by keeping in memory the {\em InMsgDescr\/}s of the several latest blocks.
828

829
\nxsubpoint\label{sp:hr.ihr.deliver.chk}\emb{Checking whether an HR message has already been delivered via IHR to its final destination}
830
Finally, to check whether an HR message has already been delivered to its final destination via IHR, one can use the general algorithm described in~\ptref{sp:msg.deliver.chk}. In contrast with \ptref{sp:ihr.deliver.chk}, we cannot abort the verification process after scanning a fixed number of the latest blocks in the destination shardchain, because HR messages cannot be dropped without a reason.
831

832
Instead, we indirectly bound the number of blocks to be inspected by forbidding the inclusion of IHR message $m$ into a block $B$ of its destination shardchain if there are already more than, say, $c=8$ blocks $B'$ in the destination shardchain with $\LT^+(B')\geq\LT(m)$.
833

834
Such a condition effectively restricts the time interval after the creation of message~$m$ in which it could have been delivered via IHR, so that only a small number of blocks of the destination shardchain (at most $c$) will need to be inspected.
835

836
Notice that this condition nicely aligns with the modified algorithm described in~\ptref{sp:ihr.deliver.chk}, effectively forbidding the validators from importing the newly-received IHR message if more than $c=8$ steps are needed to check that it had not been imported already.
837

838
\clearpage
839
\mysection{Messages, message descriptors, and queues}\label{sect:msg}
840
This chapter presents the internal layout of individual messages, message descriptors (such as {\em InMsgDescr\/} or {\em OutMsgDescr}), and message queues (such as {\em OutMsgQueue}). Enveloped messages (cf.~\ptref{sp:msg.env}) are also discussed here.
841

842
Notice that most general conventions related to messages must be obeyed by all shardchains, even if they do not belong to the basic shardchain; otherwise, messaging and interaction between different workchains would not be possible. It is the {\em interpretation\/} of the message contents and the {\em processing\/} of messages, usually by some transactions, that differs between workchains.
843

844
\mysubsection{Address, currency, and message layout}
845
This chapter begins with some general definitions, followed by the precise layout of addresses used for serializing source and destination addresses in a message.
846

847
\nxsubpoint\label{sp:tl.std.def}\emb{Some standard definitions}
848
For the reader's convenience, we reproduce here several general TL-B definitions.\footnote{A description of an older version of TL may be found at \url{https://core.telegram.org/mtproto/TL}. Alternatively, an informal introduction to TL-B schemes may be found in \cite[3.3.4]{TVM}.} These definitions are used below in the discussion of address and message layout, but otherwise are not related to the TON Blockchain.
849
\begin{verbatim}
850
unit$_ = Unit;
851
true$_ = True;
852
// EMPTY False;
853
bool_false$0 = Bool;
854
bool_true$1 = Bool;
855
nothing$0 {X:Type} = Maybe X;
856
just$1 {X:Type} value:X = Maybe X;
857
left$0 {X:Type} {Y:Type} value:X = Either X Y;
858
right$1 {X:Type} {Y:Type} value:Y = Either X Y;
859
pair$_ {X:Type} {Y:Type} first:X second:Y = Both X Y;
860

861
bit$_ _:(## 1) = Bit;
862
\end{verbatim}
863

864
\nxsubpoint\label{sp:addr.tl}\emb{TL-B scheme for addresses}
865
The serialization of source and destination addresses is defined by the following TL-B scheme:
866
\begin{verbatim}
867
addr_none$00 = MsgAddressExt;
868
addr_extern$01 len:(## 9) external_address:(len * Bit) 
869
             = MsgAddressExt;
870
anycast_info$_ depth:(## 5) rewrite_pfx:(depth * Bit) = Anycast;
871
addr_std$10 anycast:(Maybe Anycast) 
872
   workchain_id:int8 address:uint256  = MsgAddressInt;
873
addr_var$11 anycast:(Maybe Anycast) addr_len:(## 9) 
874
   workchain_id:int32 address:(addr_len * Bit) = MsgAddressInt;
875
_ MsgAddressInt = MsgAddress;
876
_ MsgAddressExt = MsgAddress;
877
\end{verbatim}
878
The two last lines define type \texttt{MsgAddress} to be the internal union of types \texttt{MsgAddressInt} and \texttt{MsgAddressExt} (not to be confused with their external union \texttt{Either MsgAddressInt MsgAddressExt} as defined in~\ptref{sp:tl.std.def}), as if the preceding four lines had been repeated with the right-hand side replaced by \texttt{MsgAddress}. In this way, type \texttt{MsgAddress} has four constructors, and types \texttt{MsgAddressInt} and \texttt{MsgAddressExt} are both subtypes of \texttt{MsgAddress}.
879

880
\nxsubpoint\emb{External addresses}
881
The first two constructors, \texttt{addr\_none} and \texttt{addr\_extern}, are used for source addresses of ``messages from nowhere'' (inbound external messages), and for destination addresses of ``messages to nowhere'' (outbound external messages). The \texttt{addr\_extern} constructor defines an ``external address'', which is ignored by the TON Blockchain software altogether (which treats \texttt{addr\_extern} as a longer variant of \texttt{addr\_none}), but may be used by external software for its own purposes. For example, a special external service may inspect the destination address of all outbound external messages found in all blocks of the TON Blockchain, and, if a special magic number is present in the \texttt{external\_address} field, parse the remainder as an IP address and UDP port or a (TON Network) ADNL address, and send a datagram with a copy of the message to the network address thus obtained.
882

883
\nxsubpoint\emb{Internal addresses}
884
The two remaining constructors, \texttt{addr\_std} and \texttt{addr\_var}, represent internal addresses. The first of them, \texttt{addr\_std}, represents a signed 8-bit $\workchainid$ (sufficient for the masterchain and for the basic workchain) and a 256-bit internal address in the selected workchain. The second of them, \texttt{addr\_var}, represents addresses in workchains with a ``large'' $\workchainid$, or internal addresses of length not equal to 256. Both of these constructors have an optional \texttt{anycast} value, absent by default, which enables ``address rewriting'' when present.\footnote{{\em Address rewriting\/} is a feature used to implement ``anycast addresses'' employed by the so-called {\em large\/} or {\em global\/} smart contracts (cf.~\cite[2.3.18]{TON}), which can have instances in several shardchains. When address rewriting is enabled, a message may be routed to and processed by a smart contract with an address coinciding with the destination address up to the first $d$ bits, where $d\leq 32$ is the ``splitting depth'' of the smart contract indicated in the {\tt anycast.depth} field (cf.~\ptref{sp:nh.anycast}). Otherwise, the addresses must match exactly.}
885

886
The validators must use \texttt{addr\_std} instead of \texttt{addr\_var} whenever possible, but must be ready to accept \texttt{addr\_var} in inbound messages. The \texttt{addr\_var} constructor is intended for future extensions.
887

888
Notice that $\workchainid$ must be a valid workchain identifier enabled in the current masterchain configuration, and the length of the internal address must be in the range allowed for the indicated workchain. For example, one cannot use $\workchainid=0$ (basic workchain) or $\workchainid=-1$ (masterchain) with addresses that are not exactly 256 bits long.
889

890
\nxsubpoint\emb{Representing Gram currency amounts}
891
Amounts of Grams are expressed with the aid of two types representing variable-length unsigned or signed integers, plus a type \texttt{Grams} explicitly dedicated to representing non-negative amounts of nanograms, as follows:
892
\begin{verbatim}
893
var_uint$_ {n:#} len:(#< n) value:(uint (len * 8))
894
         = VarUInteger n;
895
var_int$_ {n:#} len:(#< n) value:(int (len * 8)) 
896
        = VarInteger n;
897
nanograms$_ amount:(VarUInteger 16) = Grams;  
898
\end{verbatim}
899
If one wants to represent $x$ nanograms, one selects an integer $\ell<16$ such that $x<2^{8\ell}$, and serializes first $\ell$ as an unsigned 4-bit integer, then $x$ itself as an unsigned $8\ell$-bit integer. Notice that four zero bits represent a zero amount of Grams.
900

901
Recall (cf.~\cite[A]{TON}) that the original total supply of Grams is fixed at five billion (i.e., $5\cdot10^{18}<2^{63}$ nanograms), and is expected to grow very slowly. Therefore, all the amounts of Grams encountered in practice will fit in unsigned or even signed 64-bit integers. The validators may use the 64-bit integer representation of Grams in their internal computations; however, the serialization of these values the blockchain is another matter.
902

903
\nxsubpoint\emb{Representing collections of arbitrary currencies}\label{sp:extra.curr}
904
Recall that the TON Blockchain allows its users to define arbitrary cryptocurrencies or tokens apart from the Gram, provided some conditions are met. Such additional cryptocurrencies are identified by 32-bit $\currencyid$s. The list of defined additional cryptocurrencies is a part of the blockchain configuration, stored in the masterchain.
905

906
When some amounts of one or several such cryptocurrencies need to be represented, a dictionary (cf.~\cite[3.3]{TVM}) with 32-bit $\currencyid$s as keys and \texttt{VarUInteger 32} values is used:
907
\begin{verbatim}
908
extra_currencies$_ dict:(HashmapE 32 (VarUInteger 32)) 
909
                 = ExtraCurrencyCollection;
910
currencies$_ grams:Grams other:ExtraCurrencyCollection 
911
           = CurrencyCollection;
912
\end{verbatim}
913
The value attached to an internal message is represented by a value of the \texttt{CurrencyCollection} type, which may describe a certain (non-negative) amount of (nano)grams as well as some additional currencies, if needed. Notice that if no additional currencies are required, \texttt{other} reduces to just one zero bit.
914

915
\nxsubpoint\label{sp:msg.layout}\emb{Message layout}
916
A message consists of its {\em header\/} followed by its {\em body}, or {\em payload}. The body is essentially arbitrary, to be interpreted by the destination smart contract. The message header is standard and is organized as follows:
917
\begin{verbatim}
918
int_msg_info$0 ihr_disabled:Bool bounce:Bool
919
  src:MsgAddressInt dest:MsgAddressInt 
920
  value:CurrencyCollection ihr_fee:Grams fwd_fee:Grams
921
  created_lt:uint64 created_at:uint32 = CommonMsgInfo;
922
ext_in_msg_info$10 src:MsgAddressExt dest:MsgAddressInt 
923
  import_fee:Grams = CommonMsgInfo;
924
ext_out_msg_info$11 src:MsgAddressInt dest:MsgAddressExt
925
  created_lt:uint64 created_at:uint32 = CommonMsgInfo;
926

927
tick_tock$_ tick:Bool tock:Bool = TickTock;
928

929
_ split_depth:(Maybe (## 5)) special:(Maybe TickTock)
930
  code:(Maybe ^Cell) data:(Maybe ^Cell)
931
  library:(Maybe ^Cell) = StateInit;
932

933
message$_ {X:Type} info:CommonMsgInfo
934
  init:(Maybe (Either StateInit ^StateInit))
935
  body:(Either X ^X) = Message X;
936
\end{verbatim}
937
The meaning of this scheme is as follows.
938

939
Type \texttt{Message $X$} describes a message with the body (or payload) of type $X$. Its serialization starts with \texttt{info} of type \texttt{CommonMsgInfo}, which comes in three flavors: for internal messages, inbound external messages, and outbound external messages, respectively. All of them have a source address \texttt{src} and destination address \texttt{dest}, which are external or internal according to the chosen constructor. Apart from that, an internal message may bear some \texttt{value} in Grams and other defined currencies (cf.~\ptref{sp:extra.curr}), and all messages generated inside the TON Blockchain have a logical creation time \texttt{created\_lt} (cf.~\ptref{sp:lt.ton.blkch}) and creation unixtime \texttt{created\_at}, both automatically set by the generating transaction. The creation unixtime equals the creation unixtime of the block containing the generating transaction.
940

941
\nxsubpoint\emb{Forwarding and IHR fees. Total value of an internal message}
942
Internal messages define an \texttt{ihr\_fee} in Grams, which is subtracted from the value attached to the message and awarded to the validators of the destination shardchain if they include the message by the IHR mechanism. The \texttt{fwd\_fee} is the original total forwarding fee paid for using the HR mechanism; it is automatically computed from some configuration parameters and the size of the message at the time the message is generated.
943

944
Notice that the total value carried by a newly-created internal outbound message equals the sum of \texttt{value}, \texttt{ihr\_fee}, and \texttt{fwd\_fee}. This sum is deducted from the balance of the source account. Of these components, only \texttt{value} is always credited to the destination account on message delivery. The \texttt{fwd\_fee} is collected by the validators on the HR path from the source to the destination, and the \texttt{ihr\_fee} is either collected by the validators of the destination shardchain (if the message is delivered via IHR), or credited to the destination account.
945

946
\nxsubpoint\emb{Code and data portions contained in a message}
947
Apart from the common message information stored in \texttt{info}, a message can contain portions of the destination smart contract's code and data. This feature is used, for instance, in the so-called {\em constructor messages\/} (cf.~\ptref{sp:constr.msg}), which are simply internal or inbound external messages with \texttt{code} and possibly \texttt{data} fields defined in their \texttt{init} portions. If the hash of these fields is correct, and the destination smart contract has no code or data, the values from the message are used instead.\footnote{More precisely, the information from the \texttt{init} field of an inbound message is used either when the receiving account is uninitialized or frozen with the hash of {\em StateInit} equal to the one expected by the account, or when the receiving account is active, and its code or data is an external hash reference matching the hash of the code or data received in the {\em StateInit} of the message.}
948

949
\nxsubpoint\emb{Using \texttt{code} and \texttt{data} for other purposes}
950
Workchains other than the masterchain and the basic workchain are free to use the trees of cells referred to in the \texttt{code}, \texttt{data}, and \texttt{library} fields for their own purposes. The messaging system itself makes no assumptions about their contents; they become relevant only when a message is processed by a transaction.
951

952
\nxsubpoint\emb{Absence of an explicit gas price and gas limit}
953
Notice that messages do not have an explicit gas price and gas limit. Instead, the gas price is set globally by the validators for each workchain (it is a special configurable parameter), and the gas limit for each transaction has also a default value, which is a configurable parameter; the smart contract itself may lower the gas limit during its execution if so desired.
954

955
For internal messages, the initial gas limit cannot exceed the Gram value of the message divided by the current gas price. For inbound external messages, the initial gas limit is very small, and the true gas limit is set by the receiving smart contract itself, when it {\em accepts\/} the inbound message by the corresponding TVM primitive.
956

957
\nxsubpoint\emb{Deserialization of a message payload}
958
The payload, or body, of a message is deserialized by the receiving smart contract when executed by TVM. The messaging system itself makes no assumptions about the internal format of the payload. However, it makes sense to describe the serialization of supported inbound messages by TL or TL-B schemes with 32-bit constructor tags, so that the developers of other smart contracts will know the interface supported by a specific smart contract.
959

960
A message is always serialized inside the blockchain as the last field in a cell. Therefore, the blockchain software may assume that whatever bits and references left unparsed after parsing the fields of a \texttt{Message} preceding \texttt{body} belong to the payload $\texttt{body}:X$, without knowing anything about the serialization of the type~$X$.
961

962
\nxsubpoint\emb{Messages with empty payloads}
963
The payload of a message may happen to be an empty cell slice, having no data bits and no references. By convention, such messages are used for simple value transfers. The receiving smart contract is normally expected to process such messages quietly and to terminate successfully (with a zero exit code), although some smart contracts may perform non-trivial actions even when receiving a message with empty payload. For example, a smart contract may check the resulting balance, and, if it becomes sufficient for a previously postponed action, trigger this action. Alternatively, the smart contract might want to remember in its persistent storage the amount received and the corresponding sender, in order, for instance, to distribute some tokens later to each sender proportionally to the funds transferred.
964

965
Notice that even if a smart contract makes no special provisions for messages with empty payloads and throws an exception while processing such messages, the received value (minus the gas payment) will still be added to the balance of the smart contract.
966

967
\nxsubpoint\emb{Message source address and logical creation time determine its generating block}
968
Notice that {\em the source address and the logical creation time of an internal or an outbound external message uniquely determine the block in which the message has been generated}. Indeed, the source address determines the source shardchain, and the blocks of this shardchain are assigned non-intersecting logical time intervals, so only one of them may contain the indicated logical creation time. This is the reason why no explicit mention of the generating block is needed in messages.
969

970
\nxsubpoint\label{sp:tl.msg.env}\emb{Enveloped messages}
971
{\em Message envelopes\/} are used for attaching routing information, such as the current (transit) address and the next-hop address, to inbound, transit, and outbound messages (cf.~\ptref{sp:msg.env}). The message itself is kept in a separate cell and referred to from the message envelope by a cell reference.
972
\begin{verbatim}
973
interm_addr_regular$0 use_src_bits:(#<= 96) 
974
  = IntermediateAddress;
975
interm_addr_simple$10 workchain_id:int8 addr_pfx:(64 * Bit) 
976
  = IntermediateAddress;
977
interm_addr_ext$11 workchain_id:int32 addr_pfx:(64 * Bit)
978
  = IntermediateAddress;
979
msg_envelope cur_addr:IntermediateAddress 
980
  next_addr:IntermediateAddress fwd_fee_remaining:Grams 
981
  msg:^(Message Any) = MsgEnvelope;
982
\end{verbatim}
983
The \texttt{IntermediateAddress} type is used to describe the intermediate addresses of a message---that is, its current (or transit) address \texttt{cur\_addr}, and its next-hop address~\texttt{next\_addr}. The first constructor \texttt{interm\_addr\_regular} represents the intermediate address using the optimization described in~\ptref{sp:repr.interm.addr}, by storing the number of the first bits of the intermediate address that are the same as in the source address; the two other explicitly store the workchain identifier and the first 64 bits of the address inside that workchain (the remaining bits can be taken from the source address). The \texttt{fwd\_fee\_remaining} field is used to explicitly represent the maximum amount of message forwarding fees that can be deducted from the message value during the remaining HR steps; it cannot exceed the value of \texttt{fwd\_fee} indicated in the message itself.
984

985
\mysubsection{Inbound message descriptors}
986
This section discusses {\em InMsgDescr}, the structure containing a description of all inbound messages imported into a block.\footnote{Strictly speaking, {\em InMsgDescr\/} is the {\em type\/} of this structure; we deliberately use the same notation to describe the only instance of this type in a block.}
987

988
\nxsubpoint\label{sp:inb.msg.classes}\emb{Types and sources of inbound messages}
989
Each inbound message mentioned in {\em InMsgDescr\/} is described by a value of type {\em InMsg\/} (an ``inbound message descriptor''), which specifies the source of the message, the reason for its being imported into this block, and some information about its ``fate''---its processing by a transaction or forwarding inside the block.
990

991
Inbound messages may be classified as follows:
992
\begin{itemize}
993
\item {\em Inbound external messages} --- Need no additional reason for being imported into the block, but must be immediately processed by a transaction in the same block.
994
\item {\em Internal IHR messages with destination addresses in this block} --- The reason for their being imported into the block includes a Merkle proof of their generation (i.e., their inclusion in {\em OutMsgDescr\/} of their original block). Such a message must be immediately delivered to its final destination and processed by a transaction.
995
\item {\em Internal messages with destinations in this block} --- The reason for their inclusion is their presence in {\em OutMsgQueue\/} of the most recent state of a neighboring shardchain,\footnote{Recall that a shardchain is considered a neighbor of itself.} or their presence in {\em OutMsgDescr} of this very block. This neighboring shardchain is completely determined by the transit address indicated in the forwarded message envelope, which is replicated in {\em InMsg\/} as well. The ``fate'' of this message is again described by a reference to the processing transaction inside the current block.
996
\item {\em Immediately routed internal messages} --- Essentially a subclass of the previous class of messages. In this case, the imported message is one of the outbound messages generated in this very block.
997
\item {\em Transit internal messages} --- Have the same reason for inclusion as the previous class of messages. However, they are not processed inside the block, but internally forwarded into {\em OutMsgDescr\/} and {\em OutMsgQueue}. This fact, along with a reference to the new envelope of the transit message, must be registered in {\em InMsg}.
998
\item {\em Discarded internal messages with destinations in this block} --- An internal message with a destination in this block may be imported and immediately discarded instead of being processed by a transaction if it has already been received and processed via IHR in a preceding block of this shardchain. In this case, a reference to the previous processing transaction must be provided.
999
\item {\em Discarded transit internal messages} --- Similarly, a transit message may be discarded immediately after import if it has already been delivered via IHR to its final destination. In this case, a Merkle proof of its processing in the final block (as an IHR message) is required.
1000
\end{itemize}
1001

1002
\nxsubpoint\label{sp:in.msg.d}\emb{Descriptor of an inbound message}
1003
Each inbound message is described by an instance of the \texttt{InMsg} type, which has six constructors corresponding to the cases listed above in~\ptref{sp:inb.msg.classes}:
1004
\begin{verbatim}
1005
msg_import_ext$000 msg:^(Message Any) transaction:^Transaction 
1006
              = InMsg;
1007
msg_import_ihr$010 msg:^(Message Any) transaction:^Transaction 
1008
    ihr_fee:Grams proof_created:^Cell = InMsg;
1009
msg_import_imm$011 in_msg:^MsgEnvelope
1010
    transaction:^Transaction fwd_fee:Grams = InMsg;
1011
msg_import_fin$100 in_msg:^MsgEnvelope 
1012
    transaction:^Transaction fwd_fee:Grams = InMsg;
1013
msg_import_tr$101  in_msg:^MsgEnvelope out_msg:^MsgEnvelope 
1014
    transit_fee:Grams = InMsg;
1015
msg_discard_fin$110 in_msg:^MsgEnvelope transaction_id:uint64 
1016
    fwd_fee:Grams = InMsg;
1017
msg_discard_tr$111 in_msg:^MsgEnvelope transaction_id:uint64 
1018
    fwd_fee:Grams proof_delivered:^Cell = InMsg;
1019
\end{verbatim}
1020
Notice that the processing transaction is referred to in the first four constructors directly by a cell reference to \texttt{Transaction}, even though the logical time of the transaction \texttt{transaction\_lt:uint64} would suffice for this purpose. Internal consistency conditions ensure that the transaction referred to does belong to the destination smart contract indicated in the message, and that the inbound message processed by that transaction is indeed the one being described in this {\em InMsg\/} instance.
1021

1022
Furthermore, notice that \texttt{msg\_import\_imm} could be distinguished from \texttt{msg\_import\_fin} by observing that it is the only case when the logical creation time of the message being processed is greater than or equal to the (minimal) logical time of the block importing the message.
1023

1024
\nxsubpoint\label{sp:in.msg.fees}\emb{Collecting forwarding and transit fees from imported messages}
1025
The {\em InMsg\/} structure is also used to indicate the forwarding and transit fees collected from inbound messages. The fee itself is indicated in \texttt{ihr\_fee}, \texttt{fwd\_fee}, or \texttt{transit\_fee} fields; it is absent only in inbound external messages, which use other mechanisms to reward the validators for importing them. The fees must satisfy the following internal consistency conditions:
1026
\begin{itemize}
1027
\item For external messages (\texttt{msg\_import\_ext}), there is no forwarding fee.
1028
\item For IHR-imported internal messages (\texttt{msg\_import\_ihr}), the fee equals \texttt{ihr\_fee}, which must coincide with the \texttt{ihr\_fee} value indicated in the message itself. Notice that \texttt{fwd\_fee} or \texttt{fwd\_fee\_remaining} are never collected from IHR-imported messages.
1029
\item For internal messages delivered to their destination (\texttt{msg\_import\_fin} and \texttt{msg\_import\_imm}), the fee equals the \texttt{fwd\_fee\_remaining} of the enveloped inbound message \texttt{in\_msg}. Note that it cannot exceed the \texttt{fwd\_fee} value indicated in the message itself.
1030
\item For transit messages (\texttt{msg\_import\_tr}), the fee equals the difference between the \texttt{fwd\_fee\_remaining} values indicated in the \texttt{in\_msg} and \texttt{out\_msg} envelopes.
1031
\item For discarded messages, the fee also equals the \texttt{fwd\_fee\_remaining} indicated in \texttt{in\_msg}.
1032
\end{itemize}
1033

1034
\nxsubpoint\label{sp:in.msg.val}\emb{Imported value of an inbound message}
1035
Each imported message imports some value---a certain amount of one or more cryptocurrencies---into the block. This imported value is computed as follows:
1036
\begin{itemize}
1037
\item An external message imports no value.
1038
\item An IHR-imported message imports its \texttt{value} plus its \texttt{ihr\_fee}.
1039
\item A delivered or transit internal message imports its \texttt{value} plus its \texttt{ihr\_fee} plus the value of \texttt{fwd\_fee\_remaining} of its \texttt{in\_msg} envelope.
1040
\item A discarded message imports the \texttt{fwd\_fee\_remaining} of its \texttt{in\_msg} envelope.
1041
\end{itemize}
1042
Notice that the forwarding and transit fees collected from an imported message do not exceed its imported value.
1043

1044
\nxsubpoint\label{sp:aug.hm}\emb{Augmented hashmaps, or dictionaries}
1045
Before continuing, let us discuss the serialization of {\em augmented\/} hashmaps, or dictionaries.
1046

1047
Augmented hashmaps are key-value storage structures with $n$-bit keys and values of some type~$X$, similar to the ordinary hashmaps described in \cite[3.3]{TVM}. However, each intermediate node of the Patricia tree representing an augmented hashmap is {\em augmented\/} by a value of type $Y$.
1048

1049
These augmentation values must satisfy certain {\em aggregation conditions}. Typically, $Y$ is an integer type, and the aggregation condition is that the augmentation value of a fork must equal the sum of the augmentation values of its two children. In general, a {\em fork evaluation function} $S:Y\times Y\to Y$ or $S:Y\to Y\to Y$ is used instead of the sum. The augmentation value of a leaf is usually computed from the value stored in that leaf by means of a {\em leaf evaluation function\/} $L:X\to Y$. The augmentation value of a leaf may be stored explicitly in the leaf along with the value; however, in most cases there is no need for this, because the leaf evaluation function $L$ is very simple.
1050

1051
\nxsubpoint\label{sp:aug.hm.ser}\emb{Serialization of augmented hashmaps}
1052
The serialization of augmented hashmaps with $n$-bit keys, values of type $X$, and augmentation values of type $Y$ is given by the following TL-B scheme, which is an extension of the one provided in~\cite[3.3.3]{TVM}:
1053
\begin{verbatim}
1054
ahm_edge#_ {n:#} {X:Type} {Y:Type} {l:#} {m:#} 
1055
  label:(HmLabel ~l n) {n = (~m) + l} 
1056
  node:(HashmapAugNode m X Y) = HashmapAug n X Y;
1057
ahmn_leaf#_ {X:Type} {Y:Type} extra:Y value:X 
1058
  = HashmapAugNode 0 X Y;
1059
ahmn_fork#_ {n:#} {X:Type} {Y:Type}
1060
    left:^(HashmapAug n X Y) right:^(HashmapAug n X Y) extra:Y 
1061
  = HashmapAugNode (n + 1) X Y;
1062

1063
ahme_empty$0 {n:#} {X:Type} {Y:Type} extra:Y 
1064
          = HashmapAugE n X Y;
1065
ahme_root$1 {n:#} {X:Type} {Y:Type} root:^(HashmapAug n X Y)
1066
  extra:Y = HashmapAugE n X Y;
1067
\end{verbatim}
1068

1069
\nxsubpoint\label{sp:in.msg.augm}\emb{Augmentation of {\em InMsgDescr}}
1070
The collection of inbound message descriptors is augmented by a vector of two currency values, representing the imported value and the forwarding and transit fees collected from a message or a collection of messages:
1071
\begin{verbatim}
1072
import_fees$_ fees_collected:Grams 
1073
  value_imported:CurrencyCollection = ImportFees;
1074
\end{verbatim}
1075

1076
\nxsubpoint\label{sp:in.msg.descr}\emb{Structure of {\em InMsgDescr}}
1077
Now the {\em InMsgDescr\/} itself is defined as an augmented hashmap, with 256-bit keys (equal to the representation hashes of imported messages), values of type \texttt{InMsg} (cf.~\ptref{sp:in.msg.d}), and augmentation values of type \texttt{ImportFees} (cf.~\ptref{sp:in.msg.augm}):
1078
\begin{verbatim}
1079
_ (HashmapAugE 256 InMsg ImportFees) = InMsgDescr;
1080
\end{verbatim}
1081
This TL-B notation uses an anonymous constructor \texttt{\_} to define \texttt{InMsgDescr} as a synonym for another type.
1082

1083
\nxsubpoint\emb{Aggregation rules for {\em InMsgDescr}}
1084
The fork evaluation and leaf evaluation functions (cf.~\ptref{sp:aug.hm}) are not included explicitly in the above notation, because the dependent types of TL-B are not expressive enough for this purpose. In words, the fork evaluation function is just the componentwise addition of two \texttt{ImportFees} instances, and the leaf evaluation function is defined by the rules listed in \ptref{sp:in.msg.fees} and \ptref{sp:in.msg.val}. In this way, the root of the Patricia tree representing an instance of {\em InMsgDescr\/} contains an {\em ImportFees\/} instance with the total value imported by all inbound messages, and with the total forwarding fees collected from them.
1085

1086
\mysubsection{Outbound message queue and descriptors}
1087
This section discusses {\em OutMsgDescr}, the structure representing all outbound messages of a block, along with their envelopes and brief descriptions of the reasons for including them into {\em OutMsgDescr}. This structure also describes all modifications of {\em OutMsgQueue}, which is a part of the shardchain state.
1088

1089
\nxsubpoint\emb{Types of outbound messages}
1090
Outbound messages may be classified as follows:
1091
\begin{itemize}
1092
\item {\em External outbound messages}, or ``messages to nowhere'' --- Generated by a transaction inside this block. The reason for including such a message into {\em OutMsgDescr\/} is simply a reference to its generating transaction.
1093
\item {\em Immediately processed internal outbound messages} --- Generated and processed in this very block, and not included into {\em OutMsgQueue}. The reason for including such a message is a reference to its generating transaction, and its ``fate'' is described by a reference to the corresponding entry in {\em InMsgDescr}.
1094
\item {\em Ordinary (internal) outbound messages} --- Generated in this block and included into {\em OutMsgQueue}.
1095
\item {\em Transit (internal) outbound messages} --- Imported into the {\em InMsgDescr} of the same block and routed via HR until a next-hop address outside the current shard is obtained.
1096
\end{itemize}
1097

1098
\nxsubpoint\label{sp:msg.deque.recs}\emb{Message dequeueing records}
1099
Apart from the above types of outbound messages, {\em OutMsgDescr\/} can contain special ``message dequeueing records'', which indicate that a message has been removed from the {\em OutMsgQueue\/} in this block. The reason for this removal is indicated in the message deletion record; it consists of a reference to the enveloped message being deleted, and of the logical time of the neighboring shardchain block that has this enveloped message in its {\em InMsgDescr}.
1100

1101
Notice that on some occasions a message may be imported from the {\em OutMsgQueue\/} of the current shardchain, internally routed, and then included into {\em OutMsgDescr\/} and {\em OutMsgQueue\/} again with a different envelope.\footnote{This situation is rare and occurs only after shardchain merge events. Normally the messages imported from the {\em OutMsgQueue} of the same shardchain have destinations inside this shardchain, and are processed accordingly instead of being re-queued.} In this case, a variant of the transit outbound message description is used, which doubles as a message dequeueing record.
1102

1103
\nxsubpoint\emb{Descriptor of an outbound message}
1104
Each outbound message is described by an instance of {\em OutMsg}:
1105
\begin{verbatim}
1106
msg_export_ext$000 msg:^(Message Any)
1107
    transaction:^Transaction = OutMsg;
1108
msg_export_imm$010 out_msg:^MsgEnvelope 
1109
    transaction:^Transaction reimport:^InMsg = OutMsg;
1110
msg_export_new$001 out_msg:^MsgEnvelope 
1111
    transaction:^Transaction = OutMsg;
1112
msg_export_tr$011  out_msg:^MsgEnvelope 
1113
    imported:^InMsg = OutMsg;
1114
msg_export_deq$110 out_msg:^MsgEnvelope 
1115
    import_block_lt:uint64 = OutMsg;
1116
msg_export_tr_req$111 out_msg:^MsgEnvelope 
1117
    imported:^InMsg = OutMsg;
1118
\end{verbatim}
1119
The last two descriptions have the effect of removing (dequeueing) the message from {\em OutMsgQueue\/} instead of inserting it. The last one re-inserts the message into {\em OutMsgQueue\/} with a new envelope after performing the internal routing (cf.~\ptref{sp:hr.int.route}).
1120

1121
\nxsubpoint\label{sp:exp.msg.val}\emb{Exported value of an outbound message}
1122
Each outbound message described by an {\em OutMsg\/} exports some value---a certain amount of one or more cryptocurrencies---from the block. This exported value is computed as follows:
1123
\begin{itemize}
1124
\item An external outbound message exports no value.
1125
\item An internal message, generated in this block, exports its {\tt value} plus its {\tt ihr\_fee} plus its {\tt fwd\_fee}. Notice that {\tt fwd\_fee} must be equal to the {\tt fwd\_fee\_remaining} indicated in the {\tt out\_msg} envelope.
1126
\item A transit message exports its {\tt value} plus its {\tt ihr\_fee} plus the value of {\tt fwd\_fee\_remaining} of its {\tt out\_msg} envelope.
1127
\item The same holds for {\tt msg\_export\_tr\_req}, the constructor of {\em OutMsg\/} used for re-inserted dequeued messages.
1128
\item A message dequeueing record ({\tt msg\_export\_deq}; cf.~\ptref{sp:msg.deque.recs}) exports no value.
1129
\end{itemize}
1130

1131
\nxsubpoint\label{sp:out.msg.descr}\emb{Structure of {\em OutMsgDescr}}
1132
The {\em OutMsgDescr\/} itself is simply an augmented hashmap (cf.~\ptref{sp:aug.hm}), with 256-bit keys (equal to the representation hash of the message), values of type {\em OutMsg}, and augmentation values of type {\em CurrencyCollection}:
1133
\begin{verbatim}
1134
_ (HashmapAugE 256 OutMsg CurrencyCollection) = OutMsgDescr;
1135
\end{verbatim}
1136
The augmentation is the {\em exported value\/} of the corresponding message, aggregated by means of the sum, and computed at the leaves as explained in~\ptref{sp:exp.msg.val}. In this way, the total exported value appears near the root of the Patricia tree representing {\em OutMsgDescr}.
1137

1138
The most important consistency condition for {\em OutMsgDescr\/} is that its entry with key $k$ must be an {\em OutMsg\/} describing a message~$m$ with representation hash $\Hash^\flat(m)=k$.
1139

1140
\nxsubpoint\label{sp:out.msg.queue}\emb{Structure of {\em OutMsgQueue}}
1141
Recall (cf.~\ptref{sp:outmsgq}) that {\em OutMsgQueue\/} is a part of the blockchain state, not of a block. Therefore, a block contains only hash references to its initial and final state, and its newly-created cells.
1142

1143
The structure of {\em OutMsgQueue\/} is simple: it is just an augmented hashmap with 352-bit keys and values of type {\em OutMsg}:
1144
\begin{verbatim}
1145
_ (HashmapAugE 352 OutMsg uint64) = OutMsgQueue;
1146
\end{verbatim}
1147
The key used for an outbound message $m$ is the concatenation of its 32-bit next-hop $\workchainid$, the first 64 bits of the next-hop address inside that workchain, and the representation hash $\Hash^\flat(m)$ of the message~$m$ itself. The augmentation is by the logical creation time $\LT(m)$ of message~$m$ at the leaves, and by the minimum of the augmentation values of the children at the forks.
1148

1149
The most important consistency condition for {\em OutMsgQueue\/} is that the value at key $k$ must indeed contain an enveloped message with the expected next-hop address and representation hash.
1150

1151
\nxsubpoint\emb{Consistency conditions for {\em OutMsg}}
1152
Several internal consistency conditions are imposed on {\em OutMsg\/} instances present in {\em OutMsgDescr}. They include the following:
1153
\begin{itemize}
1154
\item Each of the first three constructors of outbound message descriptions includes a reference to the generating transaction. This transaction must belong to the source account of the message, it must contain a reference to the specified message as one of its outbound messages, and it must be recoverable by looking it up by its \texttt{account\_id} and \texttt{transaction\_id}.
1155
\item {\tt msg\_export\_tr} and {\tt msg\_export\_tr\_req} must refer to an {\em InMsg\/} instance describing the same message (in a different original envelope).
1156
\item If one of the first four constructors is used, the message must be absent in the initial {\em OutMsgQueue\/} of the block; otherwise, it must be present.
1157
\item If \texttt{msg\_export\_deq} is used, the message must be absent in the final {\em OutMsgQueue\/} of the block; otherwise, it must be present.
1158
\item If a message is not mentioned in {\em OutMsgDescr}, it must be the same in the initial and final {\em OutMsgQueue\/}s of the block.
1159
\end{itemize}
1160

1161
\clearpage
1162
\mysection{Accounts and transactions}
1163
This chapter discusses the layout of {\em accounts\/} (or {\em smart contracts}) and their {\em state\/} in the TON Blockchain. It also considers {\em transactions}, which are the only way to modify the state of an account, and to process inbound messages and generate new outbound messages.
1164

1165
\mysubsection{Accounts and their states}
1166
Recall that a {\em smart contract\/} and an {\em account\/} are the same thing in the context of the TON Blockchain, and that these terms can be used interchangeably, at least as long as only small (or ``usual'') smart contracts are considered. A {\em large\/} smart contract may employ several accounts lying in different shardchains of the same workchain for load balancing purposes.
1167

1168
An account is {\em identified\/} by its full address, and is {\em completely described\/} by its state. In other words, there is nothing else in an account apart from its address and state.
1169

1170
\nxsubpoint\emb{Account addresses}
1171
In general, an account is completely identified by its {\em full address}, consisting of a 32-bit $\workchainid$, and the (usually 256-bit) {\em internal address\/} or {\em account identifier\/} $\accountid$ inside the chosen workchain. In the basic workchain ($\workchainid=0$) and in the masterchain ($\workchainid=-1$) the internal address is always 256-bit. In these workchains,\footnote{For simplicity, we sometimes treat the masterchain as just another workchain with $\workchainid=-1$.} $\accountid$ cannot be chosen arbitrarily, but must be equal to the hash of the initial code and data of the smart contract; otherwise, it will be impossible to initialize the account with the intended code and data (cf.~\ptref{sp:constr.msg}), and to do anything with the accumulated funds in the account balance.
1172

1173
\nxsubpoint\emb{Zero account}
1174
By convention, the {\em zero account\/} or {\em account with zero address\/} accumulates the processing, forwarding, and transit fees, as well as any other payments collected by the validators of the masterchain or a workchain. Furthermore, the zero account is a ``large smart contract'', meaning that each shardchain has its instance of the zero account, with the most significant bits of the address adjusted to lie in the shard. Any funds transferred to the zero account, intentionally or by accident, are effectively a gift for the validators. For example, a smart contract might destroy itself by sending all its funds to the zero account.
1175

1176
\nxsubpoint\emb{Small and large smart contracts}
1177
By default, smart contracts are ``small'', meaning that they have one account address belonging to exactly one shardchain at any given moment of time. However, one can create a ``large smart contract of splitting depth $d\,$'', meaning that up to $2^d$ instances of the smart contract may be created, with the first $d$ bits of the original address of the smart contract replaced by arbitrary bit sequences.\footnote{In fact, up to the first $d$ bits are replaced in such a way that each shard contains at most one instance of the large smart contract, and that shards $(w,s)$ with prefix $s$ of length $|s|\leq d$ contain exactly one instance.} One can send messages to such smart contracts using internal anycast addresses with {\tt anycast} set to $d$ (cf.~\ptref{sp:addr.tl}). Furthermore, the instances of the large smart contract are allowed to use this anycast address as the source address of their generated messages.
1178

1179
An instance of a large smart contract is an account with non-zero {\em maximal splitting depth $d$}.
1180

1181
\nxsubpoint\emb{The three kinds of accounts}
1182
There are three kinds of accounts:
1183
\begin{itemize}
1184
\item {\em Uninitialized} --- The account only has a balance; its code and data have not yet been initialized.
1185
\item {\em Active} --- The account's code and data have been initialized as well.
1186
\item {\em Frozen} --- The account's code and data have been replaced by a hash, but the balance is still stored explicitly. The balance of a frozen account may effectively become negative, reflecting due storage payments.
1187
\end{itemize}
1188

1189
\nxsubpoint\emb{Storage profile of an account}
1190
The {\em storage profile\/} of an account is a data structure describing the amount of persistent blockchain state storage used by that account. It describes the total amount of cells, data bits, and internal and external cell references used.
1191
\begin{verbatim}
1192
storage_used$_ cells:(VarUInteger 7) bits:(VarUInteger 7) 
1193
  ext_refs:(VarUInteger 7) int_refs:(VarUInteger 7) 
1194
  public_cells:(VarUInteger 7) = StorageUsed;
1195
\end{verbatim}
1196
The same type {\tt StorageUsed} may represent the storage profile of a message, as required, for instance, to compute {\tt fwd\_fee}, the total forwarding fee for Hypercube Routing. The storage profile of an account has some additional fields indicating the last time when the storage fees were exacted:
1197
\begin{verbatim}
1198
storage_info$_ used:StorageUsed last_paid:uint32
1199
              due_payment:(Maybe Grams) = StorageInfo;
1200
\end{verbatim}
1201
The {\tt last\_paid} field contains either the unixtime of the most recent storage payment collected (usually this is the unixtime of the most recent transaction), or the unixtime when the account was created (again, by a transaction). The {\tt due\_payment} field, if present, accumulates the storage payments that could not be exacted from the balance of the account, represented by a strictly positive amount of nanograms; it can be present only for uninitialized or frozen accounts that have a balance of zero Grams (but may have non-zero balances in other cryptocurrencies). When {\tt due\_payment} becomes larger than the value of a configurable parameter of the blockchain, the account is destroyed altogether, and its balance, if any, is transferred to the zero account.
1202

1203
\nxsubpoint\label{sp:acc.descr}\emb{Account description}
1204
The state of an account is represented by an instance of type {\em Account}, described by the following TL-B scheme:\footnote{This scheme uses anonymous constructors and anonymous fields, both represented by an underscore~{\tt \_}.}
1205
\begin{verbatim}
1206
account_none$0 = Account;
1207
account$1 addr:MsgAddressInt storage_stat:StorageInfo
1208
          storage:AccountStorage = Account;
1209

1210
account_storage$_ last_trans_lt:uint64
1211
    balance:CurrencyCollection state:AccountState 
1212
  = AccountStorage;
1213

1214
account_uninit$00 = AccountState;
1215
account_active$1 _:StateInit = AccountState;
1216
account_frozen$01 state_hash:uint256 = AccountState;
1217

1218
acc_state_uninit$00 = AccountStatus;
1219
acc_state_frozen$01 = AccountStatus;
1220
acc_state_active$10 = AccountStatus;
1221
acc_state_nonexist$11 = AccountStatus;
1222

1223
tick_tock$_ tick:Bool tock:Bool = TickTock;
1224

1225
_ split_depth:(Maybe (## 5)) special:(Maybe TickTock)
1226
  code:(Maybe ^Cell) data:(Maybe ^Cell)
1227
  library:(Maybe ^Cell) = StateInit;
1228
\end{verbatim}
1229
Notice that {\tt account\_frozen} contains the representation hash of an instance of {\em StateInit}, instead of that instance itself, which would otherwise be contained in an {\tt account\_active}; {\tt account\_uninit} is similar to {\tt account\_frozen}, but it does not contain an explicit {\tt state\_hash}, because it is assumed to be equal to the internal address of the account ($\accountid$), already present in the \texttt{addr} field. The {\tt split\_depth} field is present and non-zero only in instances of large smart contracts. The {\tt special} field may be present only in the masterchain---and within the masterchain, only in some {\em fundamental\/} smart contracts required for the whole system to function.
1230

1231
The storage statistics kept in {\tt storage\_stat} reflect the total storage usage of cell slice {\tt storage}. In particular, the bits and cells used to store the {\tt balance} are also reflected in {\tt storage\_stat}.
1232

1233
When a non-existent account needs to be represented, the {\tt account\_none} constructor is used.
1234

1235
\nxsubpoint\label{sp:acc.as.msg}\emb{Account state as a message from an account to its future self}
1236
Notice that the account state is very similar to a message sent from an account to its future self participating in the next transaction, for the following reasons:
1237
\begin{itemize}
1238
\item The account state does not change between two consecutive transactions of the same account, so it is completely similar in this respect to a message sent from the earlier transaction to the later one.
1239
\item When a transaction is processed, its inputs are an inbound message and the previous account state; its outputs are outbound messages generated and the next account state. If we treat the state as a special kind of message, we see that every transaction has exactly two inputs (the account state and an inbound message) and at least one output.
1240
\item Both a message and the account state can carry code and data in an instance of {\em StateInit}, and some value in their {\tt balance}.
1241
\item An account is initialized by a {\em constructor message}, which essentially carries the future state and balance of the account.
1242
\item On some occasions messages are converted into account states, and vice versa. For instance, when a shardchain merge event occurs, and two accounts that are instances of the same large contract need to be merged, one of them is converted into a message sent to the other one (cf.~\ptref{sp:merge.trs}). Similarly, when a shardchain split event occurs, and an instance of a large smart contract needs to be split into two, this is achieved by a special transaction that creates the new instance by means of a constructor message sent from the previously existing instance to the new one (cf.~\ptref{sp:split.trs}).
1243
\item One may say that a message is involved in transferring some information {\em across space\/} (between different shardchains, or at least accountchains), while an account state transfers information {\em across time\/} (from the past to the future of the same account).
1244
\end{itemize}
1245

1246
\nxsubpoint\emb{Differences between messages and account states}
1247
Of course, there are important differences, too. For example:
1248
\begin{itemize}
1249
\item The account state is transferred only ``in time'' (for a shardchain block to its successor), but never ``in space'' (from one shardchain to another). As a consequence, this transfer is done implicitly, without creating complete copies of the account state anywhere in the blockchain.
1250
\item Storage payments collected by the validators for keeping the account state usually are considerably smaller than message forwarding fees for the same amount of data.
1251
\item When an inbound message is delivered to an account, it is the code from the account that is invoked, not the code from the message.
1252
\end{itemize}
1253

1254
\nxsubpoint\label{sp:shard.accs}\emb{The combined state of all accounts in a shard}
1255
The split part of the shardchain state (cf.~\ptref{sp:isp.blk.state} and~\ptref{sp:split.blk.part}) is given by
1256
\begin{verbatim}
1257
_ (HashmapAugE 256 Account CurrencyCollection) = ShardAccounts;
1258
\end{verbatim}
1259
This is simply a dictionary with 256-bit $\accountid$s as keys and corresponding account states as values, sum-augmented by the balances of the accounts. In this way the sum of balances of all accounts in a shardchain is computed, so that one can easily check the total amount of cryptocurrency ``stored'' in a shard.
1260

1261
Internal consistency conditions ensure that the address of an account referred to by key $k$ in {\em SmartAccounts\/} is indeed equal to~$k$. An additional internal consistency condition requires that all keys $k$ begin with the shard prefix~$s$.
1262

1263
\nxsubpoint\emb{Account owner and interface descriptions}
1264
One may want to include some optional information in a controlled account. For example, an individual user or a company may want to add a text description field to their wallet account, with the user's or company's name or address (or their hash, if the information should not be made publicly available). Alternatively, a smart contract may offer a machine-readable or human-readable description of its supported methods and their intended application, which might be used by advanced wallet applications to construct drop-down menus and forms helping a human user to create valid messages to be sent to that smart contract.
1265

1266
One way of including such information is to reserve, say, the second reference in the {\tt data} cell of the state of an account for a dictionary with 64-bit keys (corresponding to some identifiers of the standard types of extra data one might want to store) and corresponding values. Then a blockchain explorer would be able to extract the required value, along with a Merkle proof if necessary.
1267

1268
A better way of doing this is by defining some {\em get methods\/} in the smart contract.
1269

1270
\nxsubpoint\emb{Get methods of a smart contract}
1271
{\em Get methods\/} are executed by a stand-alone instance of TVM with the account's code and data loaded into it. The required parameters are passed on the stack (say, a magic number indicating the field to be fetched or the specific get method to be invoked), and the results are returned on the TVM stack as well (say, a cell slice containing the serialization of a string with the account owner's name).
1272

1273
As a bonus, get methods may be used to get answers to more sophisticated queries than just fetching a constant object. For instance, TON DNS registry smart contracts provide get methods to look up a domain string in the registry and return the corresponding record, if found.
1274

1275
By convention, get methods use large {\em negative\/} 32-bit or 64-bit indices or magic numbers, and internal functions of a smart contract use consecutive {\em positive\/} indices, to be used in TVM's \texttt{CALLDICT} instruction. The {\tt main()} function of a smart contract, used to process inbound messages in ordinary transactions, always has index zero.
1276

1277
\mysubsection{Transactions}
1278
According to the Infinite Sharding Paradigm and the actor model, the three principal components of the TON Blockchain are {\em accounts\/} (along with their states), {\em messages,} and {\em transactions}. Previous sections have already discussed the first two; this section considers transactions.
1279

1280
In contrast with messages, which have essentially the same headers throughout all workchains of the TON Blockchain, and accounts, which have at least some common parts (the address and the balance), our discussion of transactions is necessarily limited to the masterchain and the basic workchain. Other workchains may define completely different kinds of transactions.
1281

1282
\nxsubpoint\label{sp:trans.lt}\emb{Logical time of a transaction}
1283
Each transaction $t$ has a logical time interval $\LT^\bullet(t)=[\LT^-(t),\LT^+(t))$ assigned to it (cf.~\ptref{sp:lt.ton.blkch} and~\ptref{sp:logic.time.interval}). By convention, a transaction $t$ generating $n$ outbound messages $m_1$, \dots, $m_n$ is assigned a logical time interval of length $n+1$, so that
1284
\begin{equation}
1285
  \LT^+(t)=\LT^-(t)+n+1\quad.
1286
\end{equation}
1287
We also set $\LT(t):=\LT^-(t)$, and assign the logical creation time of message $m_i$, where $1\leq i\leq n$, by
1288
\begin{equation}
1289
  \LT(m_i)=\LT^-(m_i):=\LT^-(t)+i,\quad\LT^+(m_i):=\LT^-(m_i)+1\quad.
1290
\end{equation}
1291
In this way, each generated outbound message is assigned its own unit interval inside the logical time interval $\LT^\bullet(t)$ of transaction~$t$.
1292

1293
\nxsubpoint\emb{Logical time uniquely identifies transactions and outbound messages of an account}
1294
Recall that the conditions imposed on logical time imply that $\LT^-(t)\geq\LT^+(t')$ for any preceding transaction $t'$ of the same account~$\xi$, and that $\LT^-(t)>\LT(m)$ if $m$ is the inbound message processed by transaction~$t$. In this way, the logical time intervals of transactions of the same account do not intersect each other, and as a consequence, the logical time intervals of all outbound messages generated by an account do not intersect each other either. In other words, all $\LT(m)$ are different, when $m$ runs through all outbound messages of the same account~$\xi$.
1295

1296
In this way, $\LT(t)$ and $\LT(m)$, when combined with an account identifier $\xi$, uniquely determine a transaction $t$ or an outbound message~$m$ of that account. Furthermore, if one has an ordered list of all transactions of an account along with their logical times, it is easy to find the transaction that generated a given outbound message~$m$, simply by looking up the transaction $t$ with logical time $\LT(t)$ nearest to~$\LT(m)$ from below.
1297

1298
\nxsubpoint\label{sp:gen.comp.tr}%
1299
\emb{Generic components of a transaction}
1300
Each transaction $t$ contains or indirectly refers to the following data:
1301
\begin{itemize}
1302
\item The account $\xi$ to which the transaction belongs.
1303
\item The logical time $\LT(t)$ of the transaction.
1304
\item One or zero inbound messages $m$ processed by the transaction.
1305
\item The number of generated outbound messages $n\geq0$.
1306
\item The outbound messages $m_1$, \dots, $m_n$.
1307
\item The initial state of account $\xi$ (including its balance).
1308
\item The final state of account $\xi$ (including its balance).
1309
\item The total fees collected by the validators.
1310
\item A detailed description of the transaction containing all or some data needed to validate it, including the kind of the transaction (cf.~\ptref{sp:trans.kinds}) and some of the intermediate steps performed.
1311
\end{itemize}
1312
Of these components, all but the very last one are quite general and might appear in other workchains as well.
1313

1314
\nxsubpoint\label{sp:trans.kinds}\emb{Kinds of transactions}
1315
There are different kinds of transactions allowed in the masterchain and the shardchains. {\em Ordinary\/} transactions consist in the delivery of one inbound message to an account, and its processing by that account's code; this is the most common kind of transaction. Additionally, there are several kinds of {\em exotic\/} transactions.
1316

1317
Altogether, there are six kinds of transactions:
1318
\begin{itemize}
1319
\item {\em Ordinary transactions\/} --- Belong to an account $\xi$. They process exactly one inbound message $m$ (described in {\em InMsgDescr\/} of the encompassing block) with destination~$\xi$, compute the new state of the account, and generate several outbound messages (registered in {\em OutMsgDescr}) with source~$\xi$.
1320
\item {\em Storage transactions\/} --- Can be inserted by validators at their discretion. They do not process any inbound message and do not invoke any code. Their only effect is to collect storage payments from an account, affecting its storage statistics and its balance. If the resulting Gram balance of the account becomes less than a certain amount, the account may be frozen and its code and data replaced by their combined hash.
1321
\item {\em Tick transactions\/} --- Automatically invoked for certain special accounts (smart contracts) in the masterchain that have the {\tt tick} flag set in their state, as the very first transactions in every masterchain block. They have no inbound message, but may generate outbound messages and change the account state. For instance, {\em validator elections\/} are performed by tick transactions of special smart contracts in the masterchain.
1322
\item {\em Tock transactions\/} --- Similarly automatically invoked as the very last transactions in every masterchain block for certain special accounts.
1323
\item {\em Split transactions\/} --- Invoked as the last transactions of shardchain blocks immediately preceding a shardchain split event. They are triggered automatically for instances of large smart contracts that need to produce a new instance after splitting.
1324
\item {\em Merge transactions\/} --- Similarly invoked as the first transactions of shardchain blocks immediately after a shardchain merge event, if an instance of a large smart contract needs to be merged with another instance of the same smart contract.
1325
\end{itemize}
1326
Notice that out of these six kinds of transactions, only four can occur in the masterchain, and another subset of four can occur in the basic workchain.
1327

1328
\nxsubpoint\emb{Phases of an ordinary transaction}
1329
An ordinary transaction is performed in several {\em phases}, which may be thought of as several ``sub-transactions'' tightly bound into one:
1330
\begin{itemize}
1331
\item {\em Storage phase} --- Collects due storage payments for the account state (including smart-contract code and data, if present) up to the present time. The smart contract may be {\em frozen\/} as a result. If the smart contract did not exist before, the storage phase is absent.
1332
\item {\em Credit phase} --- The account is credited with the value of the inbound message received.
1333
\item {\em Computing phase} --- The code of the smart contract is invoked inside an instance of TVM with adequate parameters, including a copy of the inbound message and of the persistent data, and terminates with an exit code, the new persistent data, and an {\em action list} (which includes, for instance, outbound messages to be sent). The processing phase may lead to the creation of a new account (uninitialized or active), or to the activation of a previously uninitialized or frozen account. The {\em gas payment}, equal to the product of the gas price and the gas consumed, is exacted from the account balance.
1334
\item {\em Action phase} --- If the smart contract has terminated successfully (with exit code 0 or 1), the actions from the list are performed. If it is impossible to perform all of them---for example, because of insufficient funds to transfer with an outbound message---then the transaction is aborted and the account state is rolled back. The transaction is also aborted if the smart contract did not terminate successfully, or if it was not possible to invoke the smart contract at all because it is uninitialized or frozen.
1335
\item {\em Bounce phase} --- If the transaction has been aborted, and the inbound message has its {\tt bounce} flag set, then it is ``bounced'' by automatically generating an outbound message (with the {\tt bounce} flag clear) to its original sender. Almost all value of the original inbound message (minus gas payments and forwarding fees) is transferred to the generated message, which otherwise has an empty body.
1336
\end{itemize}
1337

1338
\nxsubpoint\emb{Bouncing inbound messages to non-existent accounts}
1339
Notice that if an inbound message with its {\tt bounce} flag set is sent to a previously non-existent account, and the transaction is aborted (for instance, because there is no code and data with the correct hash in the inbound message, so the virtual machine could not be invoked at all), then the account is not created even as an uninitialized account, since it would have zero balance and no code and data anyways.\footnote{In particular, if a user mistakenly sends some funds to a non-existent address in a bounceable message, the funds will not be wasted, but rather will be returned (bounced) back. Therefore, a user wallet application should set the {\tt bounce} flag in all generated messages by default unless explicitly instructed otherwise. However, non-bounceable messages are indispensable in some situations (cf.~\ptref{sp:ex.simple.wallet}).}
1340

1341
\nxsubpoint\label{sp:proc.in.msg}\emb{Processing of an inbound message is split between computing and action phases}
1342
Notice that the processing of an inbound message is in fact split into two phases: the {\em computing\/} phase and the {\em action\/} phase. During the computing phase, the virtual machine is invoked and the necessary computations are performed, but no actions outside the virtual machine are taken. In other words, {\em the execution of a smart contract in TVM has no side effects}; there is no way for a smart contract to interact with the blockchain directly during its execution. Instead, TVM primitives such as {\tt SENDMSG} simply store the required action (e.g., the outbound message to be sent) into the action list being gradually accumulated in TVM control register \texttt{c5}. The actions themselves are postponed until the action phase, during which the user smart contract is not invoked at all.
1343

1344
\nxsubpoint\label{sp:proc.in.msg.split}\emb{Reasons for splitting the processing into computation and action phases}
1345
Some reasons for such an arrangement are:
1346
\begin{itemize}
1347
\item It is simpler to abort the transaction if the smart contract eventually terminates with an exit code other than 0 or 1.
1348
\item The rules for processing output actions may be changed without modifying the virtual machine. (For instance, new output actions may be introduced.)
1349
\item The virtual machine itself may be modified or even replaced by another one (for instance, in a new workchain) without changing the rules for processing output actions.
1350
\item The execution of the smart contract inside the virtual machine is completely isolated from the blockchain and is a {\em pure computation}. As a consequence, this execution may be {\em virtualized\/} inside the virtual machine itself by means of TVM's {\tt RUNVM} primitive, a useful feature for validator smart contracts and for smart contracts controlling payment channels and other sidechains. Additionally, the virtual machine may be {\em emulated\/} inside itself or a stripped-down version of itself, a useful feature for validating the execution of smart contracts inside TVM.\footnote{A reference implementation of a TVM emulator running in a stripped-down version of TVM may be committed into the masterchain to be used when a disagreement between the validators on a specific run of TVM arises. In this way, flawed implementations of TVM may be detected. The reference implementation then serves as an authoritative source on the operational semantics of TVM. (Cf.~\cite[B.2]{TVM})}
1351
\end{itemize}
1352

1353
\nxsubpoint\emb{Storage, tick, and tock transactions}
1354
Storage transactions are very similar to a stand-alone storage phase of an ordinary transaction. Tick and tock transactions are similar to ordinary transactions without credit and bounce phases, because there is no inbound message.
1355

1356
\nxsubpoint\label{sp:split.trs}\emb{Split transactions}
1357
Split transactions in fact consist of two transactions. If an account $\xi$ needs to be split into two accounts $\xi$ and $\xi'$:
1358
\begin{itemize}
1359
\item First a {\em split prepare transaction}, similar to a tock transaction (but in a shardchain instead of the masterchain), is issued for account $\xi$. It must be the last transaction for $\xi$ in a shardchain block. The output of the processing stage of a split prepare transaction consists not only of the new state of account $\xi$, but also of the new state of account $\xi'$, represented by a constructor message to~$\xi'$ (cf.~\ptref{sp:acc.as.msg}).
1360
\item Then a {\em split install transaction\/} is added for account $\xi'$, with a reference to the corresponding split prepare transaction. The split install transaction must be the only transaction for a previously non-existent account $\xi'$ in the block. It effectively sets the state of $\xi'$ as defined by the split prepare transaction.
1361
\end{itemize}
1362

1363
\nxsubpoint\label{sp:merge.trs}\emb{Merge transactions}
1364
Merge transactions also consist of two transactions each. If an account $\xi'$ needs to be merged into account $\xi$:
1365
\begin{itemize}
1366
\item First a {\em merge prepare transaction\/} is issued for $\xi'$, which converts all of its persistent state and balance into a special constructor message with destination $\xi$ (cf.~\ptref{sp:acc.as.msg}).
1367
\item Then a {\em merge install transaction\/} for~$\xi$, referring to the corresponding merge prepare transaction, processes that constructor message. The merge install transaction is similar to a tick transaction in that it must be the first transaction for~$\xi$ in a block, but it is located in a shardchain block, not in the masterchain, and it has a special inbound message.
1368
\end{itemize}
1369

1370
\nxsubpoint\label{sp:trans.gen.tlb}\emb{Serialization of a general transaction}
1371
Any transaction contains the fields listed in~\ptref{sp:gen.comp.tr}. As a consequence, there are some common components in all transactions:
1372
\begin{verbatim}
1373
transaction$_ account_addr:uint256 lt:uint64 outmsg_cnt:uint15
1374
  orig_status:AccountStatus end_status:AccountStatus
1375
  in_msg:(Maybe ^(Message Any)) 
1376
  out_msgs:(HashmapE 15 ^(Message Any))
1377
  total_fees:Grams state_update:^(MERKLE_UPDATE Account)
1378
  description:^TransactionDescr = Transaction;
1379

1380
!merkle_update#02 {X:Type} old_hash:uint256 new_hash:uint256
1381
  old:^X new:^X = MERKLE_UPDATE X;
1382
\end{verbatim}
1383
The exclamation mark in the TL-B declaration of a {\tt merkle\_update} indicates special processing required for such values. In particular, they must be kept in a separate cell, which must be marked as {\em exotic} by a bit in its header (cf.~\cite[3.1]{TVM}).
1384

1385
A full explanation of the serialization of {\em TransactionDescr}, which describes one transaction according to its kind listed in~\ptref{sp:trans.kinds}, can be found in~\ptref{p:trans.descr}.
1386

1387
\nxsubpoint\emb{Representation of outbound messages generated by a transaction}
1388
The outbound messages generated by a transaction $t$ are kept in a dictionary {\tt out\_msgs} with 15-bit keys equal to 0, 1, \dots, $n-1$, where $n=\texttt{outmsg\_cnt}$ is the number of generated outbound messages. Message $m_{i+1}$ with index $0\leq i<n$ must have $\LT(m_{i+1})=\LT(t)+i+1$, and $\LT(t)=\LT^-(t)$ is explicitly stored in the {\tt lt} field.
1389

1390
\nxsubpoint\label{sp:trans.gen.cond}\emb{Consistency conditions for transactions}
1391
The common serialization of the fields present in a {\em Transaction}, independent of its type and description, enables us to impose several ``external'' consistency conditions on any transaction. The most important of them involves the {\em value flow\/} inside the transaction: the sum of all inputs (the import value of the inbound message plus the original balance of the account) must equal the sum of all outputs (the resulting balance of the account, plus the sum of the export values of all outbound messages, plus all storage, processing, and forwarding fees collected by the validators). In this way, a surface inspection of a transaction, which processes an inbound message with an import value of 1 Gram received by an account with an initial balance of 10 Grams, generating an outbound message with an export value of 100 Grams in the process, will reveal its invalidity even before checking all the details of the TVM execution.
1392

1393
Other consistency conditions may slightly depend on the description of the transaction. For instance, the inbound message processed by an ordinary transaction must be registered in the {\em InMsgDescr\/} of the encompassing block, and the corresponding record must contain a reference to this transaction. Similarly, all outbound messages generated by all transactions (with the exception of one special message generated by a split prepare or merge prepare transaction) must be registered in {\em OutMsgDescr}.
1394

1395
\nxsubpoint\emb{Collection of all transactions of an account}
1396
All transactions in a block belonging to the same account $\xi$ are collected into an ``accountchain block'' {\em AccountBlock,} which essentially is a dictionary {\tt transactions} with 64-bit keys, each equal to the logical time of the corresponding transaction:
1397
\begin{verbatim}
1398
acc_trans$_ account_addr:uint256
1399
            transactions:(HashmapAug 64 ^Transaction Grams)
1400
            state_update:^(MERKLE_UPDATE Account)
1401
          = AccountBlock;
1402
\end{verbatim}
1403
The {\tt transactions} dictionary is sum-augmented by a {\em Grams} value, which aggregates the total fees collected from these transactions.
1404

1405
In addition to this dictionary, an {\em AccountBlock\/} contains a Merkle update (cf.~\cite[3.1]{TVM}) of the total state of the account. If an account did not exist before the block, its state is represented by an {\tt account\_none}.
1406

1407
\nxsubpoint\emb{Consistency conditions for {\em AccountBlock\/}s}
1408
There are several general consistency conditions imposed on an {\em AccountBlock}. In particular:
1409
\begin{itemize}
1410
\item The transaction appearing as a value in the augmented {\tt transactions} dictionary must have its {\tt lt} value equal to its key.
1411
\item All transactions must belong to an account whose address {\tt account\_addr} is indicated in the {\em AccountBlock}.
1412
\item If $t$ and $t'$ are two transactions with $\LT(t)<\LT(t')$, and their keys are consecutive in {\tt transactions}, meaning that there is no transaction $t''$ with $\LT(t)<\LT(t'')<\LT(t')$, then the final state of $t$ must correspond to the initial state of $t'$ (their hashes as explicitly indicated in the Merkle updates must be equal).
1413
\item If $t$ is the transaction with minimal $\LT(t)$, its initial state must coincide with the initial state as indicated in {\tt state\_update} of the {\em AccountBlock}.
1414
\item If $t$ is the transaction with maximal $\LT(t)$, its final state must coincide with the final state as indicated in {\tt state\_update} of the {\em AccountBlock}.
1415
\item The list of transactions must be non-empty.
1416
\end{itemize}
1417
These conditions simply express the fact that the state of an account may change only as the result of performing a transaction.
1418

1419
\nxsubpoint\label{sp:all.transact}\emb{Collection of all transactions in a block}
1420
All transactions in a block are represented by (cf.~\ptref{sp:isp.blk.state}):
1421
\begin{verbatim}
1422
_ (HashmapAugE 256 AccountBlock Grams) = ShardAccountBlocks;
1423
\end{verbatim}
1424

1425
\nxsubpoint\emb{Consistency conditions for the collection of all transactions}
1426
Again, consistency conditions are imposed on this structure, requiring that the value at key $\xi$ be an {\em AccountBlock\/} with address equal to~$\xi$. Further consistency conditions relate this structure with the initial and final states of the shardchain indicated in the block, requiring that:
1427
\begin{itemize}
1428
\item If {\em ShardAccountBlock\/} has no key $\xi$, then the state of account $\xi$ in the initial and in the final state of the block must coincide (or it must be absent from both).
1429
\item If $\xi$ is present in {\em ShardAccountBlock}, its initial and final states as indicated in {\em AccountBlock\/} must match those indicated in the initial and final states of the shardchain block, expressed by instances of {\em ShardAccounts\/} (cf.~\ptref{sp:shard.accs}).
1430
\end{itemize}
1431
These conditions express that the shardchain state is indeed composed out of the states of separate accountchains.
1432

1433
\mysubsection{Transaction descriptions}\label{p:trans.descr}
1434
This section presents the specific TL-B schemes for transaction descriptions according to the classification provided in~\ptref{sp:trans.kinds}.
1435

1436
\nxsubpoint\emb{Reasons for omitting data from a transaction description}
1437
A transaction description for a blockchain featuring a Turing-complete virtual machine for smart-contract execution is necessarily incomplete. Indeed, a truly complete description would contain all the intermediate states of the virtual machine after each instruction is executed, something that cannot fit into a blockchain block of a reasonable size. Therefore, the description of such a transaction is likely to contain only the total number of steps and the hashes of the initial and final states of the virtual machine. The validation of such a transaction will necessarily involve the execution of the smart contract to reproduce all the intermediate steps and the final result.
1438

1439
If we compress the sequence of all intermediate steps of the virtual machine into just the hashes of the initial and final states, then no transaction details at all need to be included: a validator able to check the execution of the virtual machine by itself would also be able to check all the other actions of the transaction starting from its initial data without these details.
1440

1441
\nxsubpoint\emb{Reasons for including data into a transaction description}
1442
The above considerations notwithstanding, there are still several reasons to introduce some details in the transaction description:
1443
\begin{itemize}
1444
\item We want to impose external consistency conditions on the transaction, so that at least the validity of the value flow inside the transaction and the validity of inbound and outbound messages can be quickly checked without invoking the virtual machine (cf.~\ptref{sp:trans.gen.cond}). This at least guarantees the invariance of the total amount of each cryptocurrency in the blockchain, even if it does not guarantee the correctness of its distribution.
1445
\item We want to be able to trace principal state changes of an account (such as its being created, activated, or frozen) by inspecting the data stored in the transaction description, without figuring out the missing details of the transaction. This simplifies the verification of the consistency conditions between the accountchain and shardchain states in a block.
1446
\item Finally, certain information---such as the total steps of the virtual machine, the hashes of its initial and final states, the total gas consumed, and the exit code---might considerably simplify the debugging and implementation of the TON Blockchain software. (This information would help a human programmer understand what has happened in a particular blockchain block.)
1447
\end{itemize}
1448
On the other hand, we want to minimize the size of each transaction, because we want to maximize the number of transactions that can fit into each (bounded-size) block. Therefore, all information not required for one of the above reasons is omitted.
1449

1450
\nxsubpoint\emb{Description of a storage phase}
1451
The storage phase is present in several kinds of transactions, so a common representation for this phase is used:
1452
\begin{verbatim}
1453
tr_phase_storage$_ storage_fees_collected:Grams 
1454
  storage_fees_due:(Maybe Grams)
1455
  status_change:AccStatusChange
1456
  = TrStoragePhase;
1457

1458
acst_unchanged$0 = AccStatusChange;  // x -> x
1459
acst_frozen$10 = AccStatusChange;    // init -> frozen
1460
acst_deleted$11 = AccStatusChange;   // frozen -> deleted
1461
\end{verbatim}
1462

1463
\nxsubpoint\emb{Description of a credit phase}
1464
The credit phase can result in the collection of some due payments:
1465
\begin{verbatim}
1466
tr_phase_credit$_ due_fees_collected:(Maybe Grams)
1467
  credit:CurrencyCollection = TrCreditPhase;
1468
\end{verbatim}
1469
The sum of {\tt due\_fees\_collected} and {\tt credit} must equal the value of the message received, plus its {\tt ihr\_fee} if the message has not been received via IHR (otherwise the {\tt ihr\_fee} is awarded to the validators).
1470

1471
\nxsubpoint\emb{Description of a computing phase}
1472
The computing phase consists in invoking TVM with correct inputs. On some occasions, TVM cannot be invoked at all (e.g., if the account is absent, not initialized, or frozen, and the inbound message being processed has no code or data fields or these fields have an incorrect hash); this is reflected by corresponding constructors.
1473
\begin{verbatim}
1474
tr_phase_compute_skipped$0 reason:ComputeSkipReason
1475
  = TrComputePhase;
1476
tr_phase_compute_vm$1 success:Bool msg_state_used:Bool 
1477
  account_activated:Bool gas_fees:Grams
1478
  _:^[ gas_used:(VarUInteger 7)
1479
  gas_limit:(VarUInteger 7) gas_credit:(Maybe (VarUInteger 3))
1480
  mode:int8 exit_code:int32 exit_arg:(Maybe int32)
1481
  vm_steps:uint32
1482
  vm_init_state_hash:uint256 vm_final_state_hash:uint256 ]
1483
  = TrComputePhase;
1484
cskip_no_state$00 = ComputeSkipReason;
1485
cskip_bad_state$01 = ComputeSkipReason;
1486
cskip_no_gas$10 = ComputeSkipReason;
1487
\end{verbatim}
1488
The TL-B construct {\tt \_:\caret[...]} describes a reference to a cell containing the fields listed inside the square brackets. In this way, several fields can be moved from a cell containing a large record into a separate subcell.
1489

1490
\nxsubpoint\emb{Skipped computing phase}
1491
If the computing phase has been skipped, possible reasons include:
1492
\begin{itemize}
1493
\item The absence of funds to buy gas.
1494
\item The absence of a state (i.e., smart-contract code and data) in both the account (non-existing, uninitialized, or frozen) and the message.
1495
\item An invalid state passed in the message (i.e., the state's hash differs from the expected value) to a frozen or uninitialized account.
1496
\end{itemize}
1497

1498
\nxsubpoint\emb{Valid computing phase}
1499
If there is no reason to skip the computing phase, TVM is invoked and the results of the computation are logged. Possible parameters are as follows:
1500
\begin{itemize}
1501
\item The {\tt success} flag is set if and only if {\tt exit\_code} is either 0 or 1.
1502
\item The {\tt msg\_state\_used} parameter reflects whether the state passed in the message has been used. If it is set, the {\tt account\_activated} flag reflects whether this has resulted in the activation of a previously frozen, uninitialized or non-existent account.
1503
\item The {\tt gas\_fees} parameter reflects the total gas fees collected by the validators for executing this transaction. It must be equal to the product of {\tt gas\_used} and {\tt gas\_price} from the current block header.
1504
\item The {\tt gas\_limit} parameter reflects the gas limit for this instance of TVM. It equals the lesser of either the Grams credited in the credit phase from the value of the inbound message divided by the current gas price, or the global per-transaction gas limit.
1505
\item The {\tt gas\_credit} parameter may be non-zero only for external inbound messages. It is the lesser of either the amount of gas that can be paid from the account balance or the maximum gas credit.
1506
\item The {\tt exit\_code} and {\tt exit\_args} parameters represent the status values returned by TVM. 
1507
\item The {\tt vm\_init\_state\_hash} and {\tt vm\_final\_state\_hash} parameters are the representation hashes of the original and resulting states of TVM, and {\tt vm\_steps} is the total number of steps performed by TVM (usually equal to two plus the number of instructions executed, including implicit {\tt RET}s).\footnote{Notice that this record does not represent a change in the state of the account, because the transaction may still be aborted during the action phase. In that case, the new persistent data indirectly referenced by {\tt vm\_final\_state\_hash} will be discarded.}
1508
\end{itemize}
1509

1510
\nxsubpoint\emb{Description of the action phase}
1511
The action phase occurs after a valid computation phase. It attempts to perform the actions stored by TVM during the computing phase into the {\em action list}. It may fail, because the action list may turn out to be too long, contain invalid actions, or contain actions that cannot be completed (for instance, because of insufficient funds to create an outbound message with the required value).
1512
\begin{verbatim}
1513
tr_phase_action$_ success:Bool valid:Bool no_funds:Bool
1514
  status_change:AccStatusChange
1515
  total_fwd_fees:(Maybe Grams) total_action_fees:(Maybe Grams)
1516
  result_code:int32 result_arg:(Maybe int32) tot_actions:int16
1517
  spec_actions:int16 msgs_created:int16 
1518
  action_list_hash:uint256 tot_msg_size:StorageUsed 
1519
  = TrActionPhase;
1520
\end{verbatim}
1521

1522
\nxsubpoint\emb{Description of the bounce phase}
1523
\begin{verbatim}
1524
tr_phase_bounce_negfunds$00 = TrBouncePhase;
1525
tr_phase_bounce_nofunds$01 msg_size:StorageUsed
1526
  req_fwd_fees:Grams = TrBouncePhase;
1527
tr_phase_bounce_ok$1 msg_size:StorageUsed 
1528
  msg_fees:Grams fwd_fees:Grams = TrBouncePhase;
1529
\end{verbatim}
1530

1531
\nxsubpoint\emb{Description of an ordinary transaction}
1532
\begin{verbatim}
1533
trans_ord$0000 storage_ph:(Maybe TrStoragePhase)
1534
  credit_ph:(Maybe TrCreditPhase)
1535
  compute_ph:TrComputePhase action:(Maybe ^TrActionPhase)
1536
  aborted:Bool bounce:(Maybe TrBouncePhase)
1537
  destroyed:Bool
1538
  = TransactionDescr;
1539
\end{verbatim}
1540
Several consistency conditions are imposed on this structure:
1541
\begin{itemize}
1542
\item {\tt action} is absent if and only if the computing phase was unsuccessful.
1543
\item The {\tt aborted} flag is set either if there is no action phase or if the action phase was unsuccessful.
1544
\item The bounce phase occurs only if the {\tt aborted} flag is set and the inbound message was bounceable.
1545
\end{itemize}
1546

1547
\nxsubpoint\emb{Description of a storage transaction}
1548
A storage transaction consists just of one stand-alone storage phase:
1549
\begin{verbatim}
1550
trans_storage$0001 storage_ph:TrStoragePhase
1551
  = TransactionDescr;
1552
\end{verbatim}
1553

1554
\nxsubpoint\emb{Description of tick and tock transactions}
1555
Tick and tock transactions are similar to ordinary transactions without an inbound message, so there are no credit or bounce phases:
1556
\begin{verbatim}
1557
trans_tick_tock$001 is_tock:Bool storage:TrStoragePhase
1558
  compute_ph:TrComputePhase action:(Maybe ^TrActionPhase)
1559
  aborted:Bool destroyed:Bool = TransactionDescr;
1560
\end{verbatim}
1561

1562
\nxsubpoint\label{sp:split.trans}\emb{Split prepare and install transactions}
1563
A split prepare transaction is similar to a tock transaction in a masterchain, but it must generate exactly one special constructor message; otherwise, the action phase is aborted.
1564
\begin{verbatim}
1565
split_merge_info$_ cur_shard_pfx_len:(## 6)
1566
  acc_split_depth:(## 6) this_addr:uint256 sibling_addr:uint256
1567
  = SplitMergeInfo;
1568
trans_split_prepare$0100 split_info:SplitMergeInfo
1569
  compute_ph:TrComputePhase action:(Maybe ^TrActionPhase)
1570
  aborted:Bool destroyed:Bool
1571
  = TransactionDescr;
1572
trans_split_install$0101 split_info:SplitMergeInfo
1573
  prepare_transaction:^Transaction
1574
  installed:Bool = TransactionDescr;
1575
\end{verbatim}
1576
Notice that the split install transaction for the new sibling account $\xi'$ refers to its corresponding split prepare transaction of the previously existing account $\xi$.
1577

1578
\nxsubpoint\label{sp:merge.trans}\emb{Merge prepare and install transactions}
1579
A merge prepare transaction converts the state and balance of an account into a message, and a subsequent merge install transaction processes this state:
1580
\begin{verbatim}
1581
trans_merge_prepare$0110 split_info:SplitMergeInfo
1582
  storage_ph:TrStoragePhase aborted:Bool
1583
  = TransactionDescr;
1584
trans_merge_install$0111 split_info:SplitMergeInfo
1585
  prepare_transaction:^Transaction
1586
  credit_ph:(Maybe TrCreditPhase)
1587
  compute_ph:TrComputePhase action:(Maybe ^TrActionPhase)
1588
  aborted:Bool destroyed:Bool
1589
  = TransactionDescr;
1590
\end{verbatim}
1591

1592
\mysubsection{Invoking smart contracts in TVM}
1593
This section describes the exact parameters with which TVM is invoked during the computing phase of ordinary and other transactions.
1594

1595
\nxsubpoint\label{sp:smc.code}\emb{Smart-contract code}
1596
The {\em code\/} of a smart contract is normally a part of the account's persistent state, at least if the account is {\em active\/} (cf.~\ptref{sp:acc.descr}). However, a frozen or uninitialized (or non-existent) account has no persistent state, with the possible exception of the account's balance and the hash of its intended state (equal to the account address for uninitialized accounts). In this case, the code must be supplied in the {\tt init} field of the inbound message being processed by the transaction (cf.~\ptref{sp:msg.layout}).
1597

1598
\nxsubpoint\emb{Smart-contract persistent data}
1599
The {\em persistent data\/} of a smart contract is kept alongside its code, and remarks similar to those made above in~\ptref{sp:smc.code} apply. In this respect, the code and persistent data of a smart contract are just two parts of its persistent state, which differ only in the way they are treated by TVM during smart-contract execution.
1600

1601
\nxsubpoint\label{sp:lib.env}\emb{Smart-contract library environment}
1602
The {\em library environment\/} of a smart contract is a hashmap mapping 256-bit cell (representation) hashes into the corresponding cells themselves. When an external cell reference is accessed during the execution of a smart contract, the cell referred to is looked up in the library environment and the external cell reference is transparently replaced by the cell found.
1603

1604
The library environment for an invocation of a smart contract is computed as follows:
1605
\begin{enumerate}
1606
\item The global library environment for the workchain in question is taken from the current state of the masterchain.\footnote{The most common way of creating shared libraries for TVM is to publish a reference to the root cell of the library in the masterchain.}
1607
\item Next, it is augmented by the local library environment of the smart contract, stored in the {\tt library} field of the smart contract's state. Only 256-bit keys equal to the hashes of the corresponding value cells are taken into account. If a key is present in both the global and local library environments, the local environment takes precedence while merging the two library environments.
1608
\item Finally, the message library stored in the {\tt library} field of the {\tt init} field of the inbound message is similarly taken into account. Notice, however, that if the account is frozen or uninitialized, the {\tt library} field of the message is part of the suggested state of the account, and is used instead of the local library environment in the previous step. The message library has lower precedence than both the local and the global library environments.
1609
\end{enumerate}
1610

1611
\nxsubpoint\label{sp:tvm.smc.init}\emb{The initial state of TVM}
1612
A new instance of TVM is initialized prior to the execution of a smart contract as follows:
1613
\begin{itemize}
1614
\item The original {\tt cc} (current continuation) is initialized using the cell slice created from the cell {\tt code}, containing the code of the smart contract computed as described in~\ptref{sp:smc.code}.
1615
\item The {\tt cp} (TVM codepage) is set to zero. If the smart contract wants to use another TVM codepage $x$, it must switch to it by using {\tt SETCODEPAGE $x$} as the first instruction of its code.
1616
\item Control register {\tt c0} (return continuation) is initialized by extraordinary continuation {\tt ec\_quit} with parameter 0. When executed, this continuation leads to a termination of TVM with exit code 0.
1617
\item Control register {\tt c1} (alternative return continuation) is initialized by extraordinary continuation {\tt ec\_quit} with parameter 1. When invoked, it leads to a termination of TVM with exit code 1. (Notice that terminating with exit code 0 or 1 is considered a successful termination.)
1618
\item Control register {\tt c2} (exception handler) is initialized by extraordinary continuation {\tt ec\_quit\_exc}. When invoked, it takes the top integer from the stack (equal to the exception number) and terminates TVM with exit code equal to that integer. In this way, by default all exceptions terminate the smart-contract execution with exit code equal to the exception number.
1619
\item Control register {\tt c3} (code dictionary) is initialized by the cell with the smart-contract code, similarly to the initial current continuation ({\tt cc}).
1620
\item Control register {\tt c4} (root of persistent data) is initialized by the persistent data of the smart contract.\footnote{The persistent data of the smart contract need not be loaded in its entirety for this to occur. Instead the root is loaded, and TVM may load other cells by their references from the root only when they are accessed, thus providing a form of virtual memory.}
1621
\item Control register {\tt c5} (root of actions) is initialized by an empty cell. The ``output action'' primitives of TVM, such as {\tt SENDMSG}, use {\tt c5} to accumulate the list of actions (e.g., outbound messages) to be performed upon successful termination of the smart contract (cf.~\ptref{sp:proc.in.msg} and~\ptref{sp:proc.in.msg.split}).
1622
\item Control register {\tt c7} (root of temporary data) is initialized by a singleton {\em Tuple}, the only component of which is a {\em Tuple\/} containing an instance of {\em SmartContractInfo\/} with smart contract balance and other useful information (cf.~\ptref{sp:smc.info}). The smart contract may replace the temporary data, especially all components of the {\em Tuple\/} at {\tt c7} but the first one, with whatever other temporary data it may require. However, the original content of the {\em SmartContractInfo\/} at the first component of the {\em Tuple\/} held in {\tt c7} is inspected and sometimes modified by {\tt SENDMSG} TVM primitives and other ``output action'' primitives of TVM.
1623
\item The {\em gas limits\/} $\texttt{gas}=(g_m,g_l,g_c,g_r)$ are initialized as follows:
1624
  \begin{itemize}
1625
  \item The {\em maximal gas limit\/} $g_m$ is set to the lesser of either the total Gram balance of the smart contract (after the the credit phase---i.e., combined with the value of the inbound message) divided by the current gas price, or the per-execution global gas limit.\footnote{Both the global gas limit and the gas price are configurable parameters determined by the current state of the masterchain.}
1626
  \item The {\em current gas limit\/} $g_l$ is set to the lesser of either the Gram value of the inbound message divided by the gas price, or the global per-execution gas limit. In this way, always $g_l\leq g_m$. For inbound external messages $g_l=0$, since they cannot carry any value.
1627
  \item The {\em gas credit\/} $g_c$ is set to zero for inbound internal messages, and to the lesser of either $g_m$ or a fixed small value (the default external message gas credit, a configurable parameter) for inbound external messages.
1628
  \item Finally, the {\em remaining gas limit\/} $g_r$ is automatically initialized by $g_l+g_c$.
1629
  \end{itemize}
1630
\end{itemize}
1631

1632
\nxsubpoint\label{sp:smc.stack.init}\emb{The initial stack of TVM for processing an internal message}
1633
After TVM is initialized as described in~\ptref{sp:tvm.smc.init}, its stack is initialized by pushing the arguments to the {\tt main()} function of the smart contract as follows:
1634
\begin{itemize}
1635
\item The Gram balance $b$ of the smart contract (after crediting the value of the inbound message) is passed as an {\em Integer\/} amount of nanograms.
1636
\item The Gram balance $b_m$ of inbound message $m$ is passed as an {\em Integer\/} amount of nanograms.
1637
\item The inbound message $m$ is passed as a cell, which contains a serialized value of type {\em Message $X$}, where $X$ is the type of the message body.
1638
\item The body $m_b:X$ of the inbound message, equal to the value of field {\tt body} of $m$, is passed as a cell slice.
1639
\item Finally, the {\em function selector\/} $s$, an {\em Integer\/} normally equal to zero, is pushed into the stack.
1640
\end{itemize}
1641
After that, the code of the smart contract, equal to its initial value of {\tt c3}, is executed. It selects the correct function according to $s$, which is expected to process the remaining arguments to the function and terminate afterwards.
1642

1643
\nxsubpoint\emb{Processing an inbound external message}
1644
An inbound external message is processed similarly to \ptref{sp:tvm.smc.init} and~\ptref{sp:smc.stack.init}, with the following modifications:
1645
\begin{itemize}
1646
\item The function selector $s$ is set to $-1$, not to 0.
1647
\item The Gram balance $b_m$ of inbound message is always 0.
1648
\item The initial current gas limit $g_l$ is always 0. However, the initial gas credit $g_c>0$.
1649
\end{itemize}
1650
The smart contract must terminate with $g_c=0$ or $g_r\geq g_c$; otherwise, the transaction and the block containing it are invalid. Validators or collators suggesting a block candidate must never include transactions processing inbound external messages that are invalid.
1651

1652
\nxsubpoint\emb{Processing tick and tock transactions}
1653
The TVM stack for processing tick and tock transactions (cf.~\ptref{sp:trans.kinds}) is initialized by pushing the following values:
1654
\begin{itemize}
1655
\item The Gram balance $b$ of the current account in nanograms (an {\em Integer}).
1656
\item The 256-bit address $\xi$ of the current account inside the masterchain, represented by an unsigned {\em Integer}.
1657
\item An integer equal to $0$ for tick transactions and to $-1$ for tock transactions.
1658
\item The function selector $s$, equal to $-2$.
1659
\end{itemize}
1660

1661
\nxsubpoint\emb{Processing split prepare transactions}
1662
For processing split prepare transactions (cf.~\ptref{sp:split.trans}), the TVM stack is initialized by pushing the following values:
1663
\begin{itemize}
1664
\item The Gram balance $b$ of the current account.
1665
\item A {\em Slice\/} containing {\em SplitMergeInfo\/} (cf.~\ptref{sp:split.trans}).
1666
\item The 256-bit address $\xi$ of the current account.
1667
\item The 256-bit address $\tilde\xi$ of the sibling account.
1668
\item An integer $0\leq d\leq 63$, equal to the position of the only bit in which $\xi$ and $\tilde\xi$ differ.
1669
\item The function selector $s$, equal to $-3$.
1670
\end{itemize}
1671

1672
\nxsubpoint\emb{Processing merge install transactions}
1673
For processing merge install transactions (cf.~\ptref{sp:merge.trans}), the TVM stack is initialized by pushing the following values:
1674
\begin{itemize}
1675
\item The Gram balance $b$ of the current account (already combined with the Gram balance of the sibling account).
1676
\item The Gram balance $b'$ of the sibling account, taken from the inbound message $m$.
1677
\item The message $m$ from the sibling account, automatically generated by a merge prepare transaction. Its {\tt init} field contains the final state $\tilde S$ of the sibling account.
1678
\item The state $\tilde S$ of the sibling account, represented by a {\em StateInit} (cf.~\ptref{sp:msg.layout}).
1679
\item A {\em Slice\/} containing {\em SplitMergeInfo\/} (cf.~\ptref{sp:split.trans}).
1680
\item The 256-bit address $\xi$ of the current account.
1681
\item The 256-bit address $\tilde\xi$ of the sibling account.
1682
\item An integer $0\leq d\leq 63$, equal to the position of the only bit in which $\xi$ and $\tilde\xi$ differ.
1683
\item The function selector $s$, equal to $-4$.
1684
\end{itemize}
1685

1686
\nxsubpoint\label{sp:smc.info}\emb{Smart-contract information}
1687
The smart-contract information structure {\em SmartContractInfo}, passed in the first component of the {\em Tuple\/} contained in control register {\tt c7}, is also a {\em Tuple} containing the following data:
1688
\begin{verbatim}
1689
[ magic:0x076ef1ea actions:Integer msgs_sent:Integer
1690
  unixtime:Integer block_lt:Integer trans_lt:Integer 
1691
  rand_seed:Integer balance_remaining:[Integer (Maybe Cell)]
1692
  myself:MsgAddressInt global_config:(Maybe Cell) 
1693
] = SmartContractInfo;
1694
\end{verbatim}
1695
In other words, the first component of this tuple is an {\em Integer\/} {\tt magic\/} always equal to {\tt 0x076ef1ea}, the second component is an {\em Integer\/} {\tt actions}, originally initialized by zero, but incremented by one whenever an output action is installed by a non-{\tt RAW} output action primitive of the TVM, and so on. The remaining balance is represented by a pair, i.e., a two-component {\em Tuple}: the first component is the nanogram balance, and the second component is a dictionary with 32-bit keys representing all other currencies, if any (cf.~\ptref{sp:extra.curr}).
1696

1697
The {\tt rand\_seed} field (an unsigned 256-bit integer) here is initialized deterministically starting from the {\tt rand\_seed} of the block, the account address, the hash of the inbound message being processed (if any), and the transaction logical time {\tt trans\_lt}.
1698

1699
\nxsubpoint\label{sp:out.act.ser}\emb{Serialization of output actions}
1700
The {\em output actions\/} of a smart contract are accumulated in a linked list stored in control register {\tt c5}. The list of output actions is serialized as a value of type {\em OutList $n$}, where $n$ is the length of the list:
1701
\begin{verbatim}
1702
out_list_empty$_ = OutList 0;
1703
out_list$_ {n:#} prev:^(OutList n) action:OutAction
1704
  = OutList (n + 1);
1705
action_send_msg#0ec3c86d out_msg:^(Message Any) = OutAction;
1706
action_set_code#ad4de08e new_code:^Cell = OutAction;
1707
\end{verbatim}
1708

1709
\clearpage
1710
\mysection{Block layout}\label{sect:block.layout}
1711
This chapter presents the block layout used by the TON Blockchain, combining the data structures described separately in previous chapters to produce a complete description of a shardchain block. In addition to the TL-B schemes that define the representation of a shardchain block by a tree of cells, this chapter describes exact serialization formats for the resulting bags (collections) of cells, which are necessary to represent a shardchain block as a file.
1712

1713
Masterchain blocks are similar to shardchain blocks, but have some additional fields. The necessary modifications are discussed separately in~\ptref{p:master.layout}.
1714

1715
\mysubsection{Shardchain block layout}\label{p:shard.layout}
1716
This section lists the data structures that must be contained in a shardchain block and in the shardchain state, and concludes by presenting a formal TL-B scheme for a shardchain block.
1717

1718
\nxsubpoint\label{sp:shard.state.comp}\emb{Components of the shardchain state}
1719
The shardchain state consists of:
1720
\begin{itemize}
1721
\item {\em ShardAccounts}, the split part of the shardchain state (cf.~\ptref{sp:split.blk.part}) containing the state of all accounts assigned to this shard (cf.~\ptref{sp:shard.accs}).
1722
\item {\em OutMsgQueue}, the output message queue of the shardchain (cf.~\ptref{sp:out.msg.queue}).
1723
\item {\em SharedLibraries}, the description of all shared libraries of the shardchain (for now, non-empty only in the masterchain).
1724
\item The logical time and the unixtime of the last modification of the state.
1725
\item The total balance of the shard.
1726
\item A hash reference to the most recent masterchain block, indirectly describing the state of the masterchain and, through it, the state of all other shardchains of the TON Blockchain (cf.~\ptref{sp:shard.total.state}).
1727
\end{itemize}
1728

1729
\nxsubpoint\label{sp:shard.blk.comp}\emb{Components of a shardchain block}
1730
A shardchain block must contain:
1731
\begin{itemize}
1732
\item A list of {\em validator signatures\/} (cf.~\ptref{sp:val.sign}), which is external with respect to all other contents of the block.
1733
\item {\em BlockHeader}, containing general information about the block (cf.~\ptref{sp:blk.hdr})
1734
\item Hash references to the immediately preceding block or blocks of the same shardchain, and to the most recent masterchain block.
1735
\item {\em InMsgDescr\/} and {\em OutMsgDescr}, the inbound and outbound message descriptors (cf.~\ptref{sp:in.msg.descr} and~\ptref{sp:out.msg.descr}).
1736
\item {\em ShardAccountBlocks}, the collection of all transactions processed in the block (cf.~\ptref{sp:all.transact}) along with all updates of the states of the accounts assigned to the shard. This is the {\em split\/} part of the shardchain block (cf.~\ptref{sp:split.blk.part}).
1737
\item The {\em value flow}, describing the total value imported from the preceding blocks of the same shardchain and from inbound messages, the total value exported by outbound message, the total fees collected by validators, and the total value remaining in the shard.
1738
\item A {\em Merkle update\/} (cf.~\cite[3.1]{TVM}) of the shardchain state. Such a Merkle update contains the hashes of the initial and final shardchain states with respect to the block, along with all new cells of the final state that have been created while processing the block.\footnote{In principle, an experimental version of TON Blockchain might choose to keep only the hashes of the initial and final states of the shardchain. The Merkle update increases the block size, but it is handy for full nodes that want to keep and update their copy of the shardchain state. Otherwise, the full nodes would have to repeat all the computations contained in a block to compute the updated state of the shardchain by themselves.}
1739
\end{itemize}
1740

1741
\nxsubpoint\emb{Common parts of the block layout for all workchains}
1742
Recall that different workchains may define their own rules for processing messages, other types of transactions, other components of the state, and other ways to serialize all this data. However, some components of the block and its state must be common for all workchains in order to maintain the interoperability between different workchains. Such common components include:
1743
\begin{itemize}
1744
\item {\em OutMsgQueue}, the outbound message queue of a shardchain, which is scanned by neighboring shardchains for messages addressed to them.
1745
\item The outer structure of {\em InMsgDescr\/} as a hashmap with 256-bit keys equal to the hashes of the imported messages. (The inbound message descriptors themselves need not have the same structure.)
1746
\item Some fields in the block header identifying the shardchain and the block, along with the paths from the block header to the other information indicated in this list.
1747
\item The value flow information.
1748
\end{itemize}
1749

1750
\nxsubpoint\label{sp:shard.state.tlb}\emb{TL-B scheme for the shardchain state}
1751
The shardchain state (cf.~\ptref{sp:isp.blk.state} and~\ptref{sp:shard.state.comp}) is serialized according to the following TL-B scheme:
1752
\begin{verbatim}
1753
ext_blk_ref$_ start_lt:uint64 end_lt:uint64
1754
  seq_no:uint32 hash:uint256 = ExtBlkRef;
1755

1756
master_info$_ master:ExtBlkRef = BlkMasterInfo;
1757

1758
shard_ident$00 shard_pfx_bits:(## 6) 
1759
  workchain_id:int32 shard_prefix:uint64 = ShardIdent;
1760

1761
shard_state shard_id:ShardIdent 
1762
  out_msg_queue:OutMsgQueue
1763
  total_balance:CurrencyCollection
1764
  total_validator_fees:CurrencyCollection
1765
  accounts:ShardAccounts
1766
  libraries:(HashmapE 256 LibDescr)
1767
  master_ref:(Maybe BlkMasterInfo)
1768
  custom:(Maybe ^McStateExtra)
1769
  = ShardState;
1770
\end{verbatim}
1771
The field {\tt custom} is usually present only in the masterchain and contains all the masterchain-specific data. However, other workchains may use the same cell reference to refer to their specific state data.
1772

1773
\nxsubpoint\emb{Shared libraries description}
1774
Shared libraries currently can be present only in masterchain blocks. They are described by an instance of $\textit{HashmapE}(256,\textit{LibDescr})$, where the 256-bit key is the representation hash of the library, and \textit{LibDescr\/} describes one library:
1775
\begin{verbatim}
1776
shared_lib_descr$00 lib:^Cell publishers:(Hashmap 256 True)
1777
  = LibDescr;
1778
\end{verbatim}
1779
Here {\tt publishers} is a hashmap with keys equal to the addresses of all accounts that have published the corresponding shared library. The shared library is preserved as long as at least one account keeps it in its published libraries collection.
1780

1781
\nxsubpoint\emb{TL-B scheme for an unsigned shardchain block}
1782
The precise format of an {\em unsigned\/} (cf.~\ptref{sp:val.sign}) shardchain block is given by the following TL-B scheme:
1783
\begin{verbatim}
1784
block_info version:uint32 
1785
  not_master:(## 1) 
1786
  after_merge:(## 1) before_split:(## 1) flags:(## 13)
1787
  seq_no:# vert_seq_no:# 
1788
  shard:ShardIdent gen_utime:uint32
1789
  start_lt:uint64 end_lt:uint64
1790
  master_ref:not_master?^BlkMasterInfo 
1791
  prev_ref:seq_no?^(BlkPrevInfo after_merge)
1792
  prev_vert_ref:vert_seq_no?^(BlkPrevInfo 0)
1793
  = BlockInfo;
1794

1795
prev_blk_info#_ {merged:#} prev:ExtBlkRef
1796
  prev_alt:merged?ExtBlkRef = BlkPrevInfo merged;
1797

1798
unsigned_block info:^BlockInfo value_flow:^ValueFlow
1799
  state_update:^(MERKLE_UPDATE ShardState) 
1800
  extra:^BlockExtra = Block;
1801

1802
block_extra in_msg_descr:^InMsgDescr
1803
  out_msg_descr:^OutMsgDescr
1804
  account_blocks:ShardAccountBlocks
1805
  rand_seed:uint256
1806
  custom:(Maybe ^McBlockExtra) = BlockExtra;
1807
\end{verbatim}
1808
The field {\tt custom} is usually present only in the masterchain and contains all the masterchain-specific data. However, other workchains may use the same cell reference to refer to their specific block data.
1809

1810
\nxsubpoint\emb{Description of total value flow through a block}
1811
The total value flow through a block is serialized according to the following TL-B scheme:
1812
\begin{verbatim}
1813
value_flow _:^[ from_prev_blk:CurrencyCollection 
1814
  to_next_blk:CurrencyCollection
1815
  imported:CurrencyCollection
1816
  exported:CurrencyCollection ]
1817
  fees_collected:CurrencyCollection
1818
  _:^[
1819
  fees_imported:CurrencyCollection
1820
  created:CurrencyCollection
1821
  minted:CurrencyCollection
1822
  ] = ValueFlow;
1823
\end{verbatim}
1824
Recall that \texttt{\_:\caret[}\dots\texttt{]} is a TL-B construction indicating that a group of fields has been moved into a separate cell. The last three fields may be non-zero only in masterchain blocks.
1825

1826
\nxsubpoint\label{sp:signed.blk}\emb{Signed shardchain block}
1827
A signed shardchain block is just an unsigned block augmented by a collection of validator signatures: 
1828
\begin{verbatim}
1829
ed25519_signature#5 R:uint256 s:uint256 = CryptoSignature; 
1830

1831
signed_block block:^Block blk_serialize_hash:uint256
1832
  signatures:(HashmapE 64 CryptoSignature)
1833
  = SignedBlock;
1834
\end{verbatim}
1835
The {\em serialization hash\/} {\tt blk\_serialize\_hash} of the unsigned block {\tt block} is essentially a hash of a specific serialization of the block into an octet string (cf.~\ptref{sp:blk.ser.hash} for a more detailed explanation). The signatures collected in {\tt signatures} are Ed25519-signatures (cf.~\ptref{p:ed25519}) made with a validator's private keys of the $\Sha$ of the concatenation of the 256-bit representation hash of the block {\tt block} and of its 256-bit serialization hash {\tt blk\_serialize\_hash}. The 64-bit keys in dictionary {\tt signatures} represent the first 64 bits of the public keys of the corresponding validators.
1836

1837
\nxsubpoint\emb{Serialization of a signed block}
1838
The overall procedure of serializing and signing a block may be described as follows:
1839
\begin{enumerate}
1840
\item An unsigned block $B$ is generated, transformed into a complete bag of cells (cf.~\ptref{sp:boc.complete}), and serialized into an octet string $S_B$.
1841
\item Validators sign the 256-bit combined hash
1842
\begin{equation}
1843
  H_B:=\Sha\bigl(\Hash_\infty(B).\Hash_M(S_B)\bigr)
1844
\end{equation}
1845
of the representation hash of $B$ and of the Merkle hash of its serialization~$S_B$.
1846
\item A signed shardchain block $\tilde B$ is generated from $B$ and these validator signatures as described above (cf.~\ptref{sp:signed.blk}).
1847
\item This signed block $\tilde B$ is transformed into an incomplete bag of cells, which contains only the validator signatures, but the unsigned block itself is absent from this bag of cells, being its only absent cell.
1848
\item This incomplete bag of cells is serialized, and its serialization is prepended to the previously constructed serialization of the unsigned block.
1849
\end{enumerate}
1850
The result is the serialization of the signed block into an octet string. It may be propagated by network or stored into a disk file.
1851

1852
\mysubsection{Masterchain block layout}\label{p:master.layout}
1853
Masterchain blocks are very similar to shardchain blocks of the basic work\-chain. This section lists some of the modifications needed to obtain the description of a masterchain block from the description of a shardchain block given in~\ptref{p:shard.layout}.
1854

1855
\nxsubpoint\emb{Additional components present in the masterchain state}
1856
In addition to the components listed in~\ptref{sp:shard.state.comp}, the masterchain state must contain:
1857
\begin{itemize}
1858
\item {\em ShardHashes\/} --- Describes the current shard configuration, and contains the hashes of the latest blocks of the corresponding shardchains.
1859
\item {\em ShardFees\/} --- Describes the total fees collected by the validators of each shardchain.
1860
\item {\em ShardSplitMerge\/} --- Describes future shard split/merge events. It is serialized as a part of {\em ShardHashes}.
1861
\item {\em ConfigParams\/} --- Describes the values of all configurable parameters of the TON Blockchain.
1862
\end{itemize}
1863

1864
\nxsubpoint\emb{Additional components present in masterchain blocks}
1865
In addition to the components listed in~\ptref{sp:shard.blk.comp}, each masterchain block must contain:
1866
\begin{itemize}
1867
\item {\em ShardHashes\/} --- Describes the current shard configuration, and contains the hashes of the latest blocks of the corresponding shardchains. (Notice that this component is also present in the masterchain state.)
1868
\end{itemize}
1869

1870
\nxsubpoint\label{sp:shard.hashes}\emb{Description of {\em ShardHashes}}
1871
{\em ShardHashes\/} is represented by a dictionary with 32-bit $\workchainid$s as keys, and ``shard binary trees'', represented by TL-B type {\em BinTree ShardDescr}, as values. Each leaf of this shard binary tree contains a value of type {\em ShardDescr}, which describes a single shard by indicating the sequence number {\tt seq\_no}, the logical time {\tt lt}, and the hash {\tt hash} of the latest (signed) block of the corresponding shardchain.
1872
\begin{verbatim}
1873
bt_leaf$0 {X:Type} leaf:X = BinTree X;
1874
bt_fork$1 {X:Type} left:^(BinTree X) right:^(BinTree X) 
1875
          = BinTree X;
1876

1877
fsm_none$0 = FutureSplitMerge;
1878
fsm_split$10 mc_seqno:uint32 = FutureSplitMerge;
1879
fsm_merge$11 mc_seqno:uint32 = FutureSplitMerge;
1880

1881
shard_descr$_ seq_no:uint32 lt:uint64 hash:uint256 
1882
  split_merge_at:FutureSplitMerge = ShardDescr;
1883

1884
_ (HashmapE 32 ^(BinTree ShardDescr)) = ShardHashes;
1885
\end{verbatim}
1886
Fields {\tt mc\_seqno} of {\tt fsm\_split} and {\tt fsm\_merge} are used to signal future shard merge or split events. Shardchain blocks referring to masterchain blocks with sequence numbers up to, but not including, the one indicated in {\tt mc\_seqno} are generated in the usual way. Once the indicated sequence number is reached, a shard merge or split event must occur.
1887

1888
Notice that the masterchain itself is omitted from {\em ShardHashes\/} (i.e., 32-bit index $-1$ is absent from this dictionary).
1889

1890
\nxsubpoint\emb{Description of {\em ShardFees}}
1891
{\em ShardFees\/} is a masterchain structure used to reflect the total fees collected so far by the validators of a shardchain. The total fees reflected in this structure are accumulated in the masterchain by crediting them to a special account, whose address is a configurable parameter. Typically this account is the smart contract that computes and distributes the rewards to all validators.
1892
\begin{verbatim}
1893
bta_leaf$0 {X:Type} {Y:Type} leaf:X extra:Y = BinTreeAug X Y;
1894
bta_fork$1 {X:Type} {Y:Type} left:^(BinTreeAug X Y) 
1895
           right:^(BinTreeAug X Y) extra:Y = BinTreeAug X Y;
1896

1897
_ (HashmapAugE 32 ^(BinTreeAug True CurrencyCollection)
1898
  CurrencyCollection) = ShardFees;
1899
\end{verbatim}
1900
The structure of {\em ShardFees\/} is similar to that of {\em ShardHashes\/} (cf.~\ptref{sp:shard.hashes}), but the dictionary and binary trees involved are augmented by currency values, equal to the {\tt total\_validator\_fees} values of the final states of the corresponding shardchain blocks. The value aggregated at the root of {\em ShardFees\/} is added together with the {\tt total\_validator\_fees} of the masterchain state, yielding the total TON Blockchain validator fees. The increase of the value aggregated at the root of {\em ShardFees\/} from the initial to the final state of a masterchain block is reflected in the {\tt fees\_imported} in the value flow of that masterchain block.
1901

1902
\nxsubpoint\emb{Description of {\em ConfigParams}}
1903
Recall that the {\em configurable parameters\/} or the {\em configuration dictionary\/} is a dictionary {\tt config} with 32-bit keys kept inside the first cell reference of the persistent data of the configuration smart contract $\gamma$ (cf.~\ptref{p:conf.params}). The address $\gamma$ of the configuration smart contract and a copy of the configuration dictionary are duplicated in fields {\tt config\_addr} and {\tt config} of a {\em ConfigParams\/} structure, explicitly included into masterchain state to facilitate access to the current values of the configurable parameters (cf.~\ptref{sp:conf.par.qa}):
1904
\begin{verbatim}
1905
_ config_addr:uint256 config:^(Hashmap 32 ^Cell) 
1906
  = ConfigParams;
1907
\end{verbatim}
1908

1909
\nxsubpoint\emb{Masterchain state data}
1910
The data specific to the masterchain state is collected into {\em McStateExtra}, already mentioned in~\ptref{sp:shard.state.tlb}:
1911
\begin{verbatim}
1912
masterchain_state_extra#cc1f
1913
  shard_hashes:ShardHashes
1914
  shard_fees:ShardFees
1915
  config:ConfigParams
1916
= McStateExtra;
1917
\end{verbatim}
1918

1919
\nxsubpoint\emb{Masterchain block data}
1920
Similarly, the data specific to the masterchain blocks is collected into {\em McBlockExtra\/}:
1921
\begin{verbatim}
1922
masterchain_block_extra#cc9f
1923
  shard_hashes:ShardHashes
1924
= McBlockExtra;
1925
\end{verbatim}
1926

1927
\mysubsection{Serialization of a bag of cells}
1928
The description provided in the previous section defines the way a shardchain block is represented as a tree of cells. However, this tree of cells needs to be serialized into a file, suitable for disk storage or network transfer. This section discusses the standard ways of serializing a tree, a DAG, or a bag of cells into an octet string.
1929

1930
\nxsubpoint\emb{Transforming a tree of cells into a bag of cells}
1931
Recall that values of arbitrary (dependent) algebraic data types are represented in the TON Blockchain by {\em trees of cells}. Such a tree of cells is transformed into a directed acyclic graph, or {\em DAG}, of cells, by identifying identical cells in the tree. After that, we might replace each of the references of each cell by the 32-byte representation hash of the cell referred to and obtain a {\em bag of cells}. By convention, the root of the original tree of cells is a marked element of the resulting bag of cells, so that anybody receiving this bag of cells and knowing the marked element can reconstruct the original DAG of cells, hence also the original tree of cells.
1932

1933
\nxsubpoint\label{sp:boc.complete}\emb{Complete bags of cells}
1934
Let us say that a bag of cells is {\em complete\/} if it contains all cells referred to by any of its cells. In other words, a complete bag of cells does not have any ``unresolved'' hash references to cells outside that bag of cells. In most cases, we need to serialize only complete bags of cells.
1935

1936
\nxsubpoint\emb{Internal references inside a bag of cells}
1937
Let us say that a reference of a cell $c$ belonging to a bag of cells $B$ is {\em internal (with respect to $B$)\/} if the cell $c_i$ referred to by this reference belongs to~$B$ as well. Otherwise, the reference is called {\em external}. A bag of cells is complete if and only if all references of its constituent cells are internal.
1938

1939
\nxsubpoint\emb{Assigning indices to the cells from a bag of cells}
1940
Let $c_0$, \dots, $c_{n-1}$ be the $n$ distinct cells belonging to a bag of cells~$B$. We can list these cells in some order, and then assign indices from $0$ to $n-1$, so that cell $c_i$ gets index $i$. Some options for ordering cells are:
1941
\begin{itemize}
1942
\item Order cells by their representation hash. Then $\Hash(c_i)<\Hash(c_j)$ whenever $i<j$.
1943
\item Topological order: if cell $c_i$ refers to cell $c_j$, then $i<j$. In general, there is more than one topological order for the same bag of cells. There are two standard ways for constructing topological orders:
1944
  \begin{itemize}
1945
  \item Depth-first order: apply a depth-first search to the directed acyclic graph of cells starting from its root (i.e., marked cell), and list cells in the order they are visited.
1946
  \item Breadth-first order: same as above, but applying a breadth-first search.
1947
  \end{itemize}
1948
\end{itemize}
1949
Notice that the topological order always assigns index $0$ to the root cell of a bag of cells constructed from a tree of cells. In most cases, we opt to use a topological order, or the depth-first order if we want to be more specific.
1950

1951
If cells are listed in a topological order, then the verification that there are no cyclic references in a bag of cells is immediate. On the other hand, ordering cells by their representation hash simplifies the verification that there are no duplicates in a serialized bag of cells.
1952

1953
\nxsubpoint\label{sp:outline.ser.boc}\emb{Outline of serialization process}
1954
The serialization process of a bag of cells $B$ consisting of $n$ cells can be outlined as follows:
1955
\begin{enumerate}
1956
\item List the cells from $B$ in a topological order: $c_0$, $c_1$, \dots, $c_{n-1}$. Then $c_0$ is the root cell of $B$.
1957
\item Choose an integer $s$, such that $n\leq 2^s$. Represent each cell $c_i$ by an integral number of octets in the standard way (cf.~\ptref{sp:data.cell.layout} or \cite[3.1.4]{TVM}), but using unsigned big-endian $s$-bit integer $j$ instead of hash $\Hash(c_j)$ to represent internal references to cell $c_j$ (cf.~\ptref{sp:boc.cell.repr} below).
1958
\item Concatenate the representations of cells $c_i$ thus obtained in the increasing order of~$i$.
1959
\item Optionally, an index can be constructed that consists of $n+1$ $t$-bit integer entries $L_0$, \dots, $L_n$, where $L_i$ is the total length (in octets) of the representations of cells $c_j$ with $j\leq i$, and integer $t\geq0$ is chosen so that $L_n\leq 2^t$.
1960
\item The serialization of the bag of cells now consists of a magic number indicating the precise format of the serialization, followed by integers $s\geq 0$, $t\geq0$, $n\leq 2^s$, an optional index consisting of $\lceil(n+1)t/8\rceil$ octets, and $L_n$ octets with the cell representations.
1961
\item An optional CRC32 may be appended to the serialization for integrity verification purposes.
1962
\end{enumerate}
1963
If an index is included, any cell $c_i$ in the serialized bag of cells may be easily accessed by its index $i$ without deserializing all other cells, or even without loading the entire serialized bag of cells in memory.
1964

1965
\nxsubpoint\label{sp:boc.cell.repr}\emb{Serialization of one cell from a bag of cells}
1966
More precisely, each individual cell $c=c_i$ is serialized as follows, provided $s$ is a multiple of eight (usually $s=8$, $16$, $24$, or $32$):
1967
\begin{enumerate}
1968
\item Two descriptor bytes $d_1$ and $d_2$ are computed similarly to \cite[3.1.4]{TVM} by setting $d_1=r+8s+16h+32l$ and $d_2=\lfloor b/8\rfloor+\lceil b/8\rfloor$, where:
1969
  \begin{itemize}
1970
  \item $0\leq r\leq 4$ is the number of cell references present in cell~$c$; if $c$ is absent from the bag of cells being serialized and is represented by its hashes only, then $r=7$.\footnote{Notice that these ``absent cells'' are different from the library reference and external reference cells, which are kinds of exotic cells (cf.~\cite[3.1.7]{TVM}). Absent cells, by contrast, are introduced only for the purpose of serializing incomplete bags of cells, and can never be processed by TVM.}
1971
  \item $0\leq b\leq 1023$ is the number of data bits in cell~$c$.
1972
  \item $0\leq l\leq 3$ is the level of cell $c$ (cf.~\cite[3.1.3]{TVM}).
1973
  \item $s=1$ for exotic cells and $s=0$ for ordinary cells.
1974
  \item $h=1$ if the cell's hashes are explicitly included into the serialization; otherwise, $h=0$. (When $r=7$, we must always have $h=1$.)
1975
  \end{itemize}
1976
For absent cells (i.e., external references), only $d_1$ is present, always equal to $23+32l$.
1977
\item Two bytes $d_1$ and $d_2$ (if $r<7$) or one byte $d_1$ (if $r=7$) begin the serialization of cell~$c$.
1978
\item If $h=1$, the serialization is continued by $l+1$ 32-byte higher hashes of $c$ (cf.~\cite[3.1.6]{TVM}): $\Hash_1(c)$, \dots, $\Hash_{l+1}(c)=\Hash_\infty(c)$.
1979
\item After that, $\lceil b/8\rceil$ data bytes are serialized, by splitting $b$ data bits into 8-bit groups and interpreting each group as a big-endian integer in the range $0\ldots255$. If $b$ is not divisible by $8$, then the data bits are first augmented by one binary \texttt{1} and up to six binary \texttt{0}, so as to make the number of data bits divisible by eight.\footnote{Notice that exotic cells (with $s=1$) always have $b\geq8$, with the cell type encoded in the first eight data bits (cf.~\cite[3.1.7]{TVM}).}
1980
\item Finally, $r$ cell references to cells $c_{j_1}$, \dots, $c_{j_r}$ are encoded by means of $r$ $s$-bit big-endian integers $j_1$, \dots, $j_r$.\footnote{If the bag of cells is not complete, some of these cell references may refer to cells $c'$ absent from the bag of cells. In that case, special ``absent cells'' with $r=7$ are included into the bag of cells and are assigned some indices~$j$. These indices are then used to represent references to absent cells.}
1981
\end{enumerate}
1982

1983
\nxsubpoint\label{sp:boc.ser.class}\emb{A classification of serialization schemes for bags of cells}
1984
A serialization scheme for a bag of cells must specify the following parameters:
1985
\begin{itemize}
1986
\item The 4-byte magic number prepended to the serialization.
1987
\item The number of bits $s$ used to represent cell indices. Usually $s$ is a multiple of eight (e.g., $8$, $16$, $24$, or $32$).
1988
\item The number of bits $t$ used to represent offsets of cell serializations (cf.~\ptref{sp:outline.ser.boc}). Usually $t$ is also a multiple of eight.
1989
\item A flag indicating whether an index with offsets $L_0$, \dots, $L_n$ of cell serializations is present. This flag may be combined with $t$ by setting $t=0$ when the index is absent.
1990
\item A flag indicating whether the CRC32-C of the whole serialization is appended to it for integrity verification purposes.
1991
\end{itemize}
1992

1993
\nxsubpoint\emb{Fields present in the serialization of a bag of cells}
1994
In addition to the values listed in~\ptref{sp:boc.ser.class}, fixed by the choice of a serialization scheme for bags of cells, the serialization of a specific bag of cells must specify the following parameters:
1995
\begin{itemize}
1996
\item The total number of cells $n$ present in the serialization.
1997
\item The number of ``root cells'' $k\leq n$ present in the serialization. The root cells themselves are $c_0$, \dots, $c_{k-1}$. All other cells present in the bag of cells are expected to be reachable by chains of references starting from the root cells.
1998
\item The number of ``absent cells'' $l\leq n-k$, which represent cells that are actually absent from this bag of cells, but are referred to from it. The absent cells themselves are represented by $c_{n-l}$, \dots, $c_{n-1}$, and only these cells may (and also must) have $r=7$. Complete bags of cells have $l=0$.
1999
\item The total length in bytes $L_n$ of the serialization of all cells. If the index is present, $L_n$ might not be stored explicitly since it can be recovered as the last entry of the index.
2000
\end{itemize}
2001

2002
\nxsubpoint\label{sp:boc.ser.sch}\emb{TL-B scheme for serializing bags of cells}
2003
Several TL-B constructors can be used to serialize bags of cells into octet (i.e., 8-bit byte) sequences. The only one that is currently used to serialize new bags of cell is
2004
\begin{verbatim}
2005
serialized_boc#b5ee9c72 has_idx:(## 1) has_crc32c:(## 1) 
2006
  has_cache_bits:(## 1) flags:(## 2) { flags = 0 }
2007
  size:(## 3) { size <= 4 }
2008
  off_bytes:(## 8) { off_bytes <= 8 } 
2009
  cells:(##(size * 8)) 
2010
  roots:(##(size * 8)) { roots >= 1 }
2011
  absent:(##(size * 8)) { roots + absent <= cells }
2012
  tot_cells_size:(##(off_bytes * 8))
2013
  root_list:(roots * ##(size * 8))
2014
  index:has_idx?(cells * ##(off_bytes * 8))
2015
  cell_data:(tot_cells_size * [ uint8 ])
2016
  crc32c:has_crc32c?uint32
2017
  = BagOfCells;
2018
\end{verbatim}
2019
Field {\tt cells} is $n$, {\tt roots} is $k$, {\tt absent} is $l$, and {\tt tot\_cells\_size} is $L_n$ (the total size of the serialization of all cells in bytes). If an index is present, parameters $s/8$ and $t/8$ are serialized separately as {\tt size} and {\tt off\_bytes}, respectively, and the flag {\tt has\_idx} is set. The index itself is contained in {\tt index}, present only if {\tt has\_idx} is set. The field {\tt root\_list} contains the (zero-based) indices of the root nodes of the bag of cells.
2020

2021
Two older constructors are still supported in the bag-of-cells deserialization functions:
2022
\begin{verbatim}
2023
serialized_boc_idx#68ff65f3 size:(## 8) { size <= 4 }
2024
  off_bytes:(## 8) { off_bytes <= 8 } 
2025
  cells:(##(size * 8)) 
2026
  roots:(##(size * 8)) { roots = 1 }
2027
  absent:(##(size * 8)) { roots + absent <= cells }
2028
  tot_cells_size:(##(off_bytes * 8))
2029
  index:(cells * ##(off_bytes * 8))
2030
  cell_data:(tot_cells_size * [ uint8 ])
2031
  = BagOfCells;
2032

2033
serialized_boc_idx_crc32c#acc3a728 size:(## 8) { size <= 4 }
2034
  off_bytes:(## 8) { off_bytes <= 8 } 
2035
  cells:(##(size * 8)) 
2036
  roots:(##(size * 8)) { roots = 1 }
2037
  absent:(##(size * 8)) { roots + absent <= cells }
2038
  tot_cells_size:(##(off_bytes * 8))
2039
  index:(cells * ##(off_bytes * 8))
2040
  cell_data:(tot_cells_size * [ uint8 ])
2041
  crc32c:uint32 = BagOfCells;
2042
\end{verbatim}
2043

2044
\nxsubpoint\emb{Storing compiled TVM code in files}
2045
Notice that the above procedure for serializing bags of cells may be used to serialize compiled smart contracts and other TVM code. One must define a TL-B constructor similar to the following:
2046
\begin{verbatim}
2047
compiled_smart_contract
2048
  compiled_at:uint32 code:^Cell data:^Cell
2049
  description:(Maybe ^TinyString)
2050
  _:^[ source_file:(Maybe ^TinyString)
2051
       compiler_version:(Maybe ^TinyString) ]
2052
  = CompiledSmartContract;
2053

2054
tiny_string#_ len:(#<= 126) str:(len * [ uint8 ]) = TinyString; 
2055
\end{verbatim}
2056
Then a compiled smart contract may be represented by a value of type {\em CompiledSmartContract}, transformed into a tree of cells and then into a bag of cells, and then serialized using one of the constructors listed in~\ptref{sp:boc.ser.sch}. The resulting octet string may be then written into a file with suffix {\tt .tvc} (``TVM smart contract''), and this file may be used to distribute the compiled smart contract, download it into a wallet application for deploying into the TON Blockchain, and so on.
2057

2058
\nxsubpoint\label{sp:merkle.octet.str}\emb{Merkle hashes for an octet string}
2059
On some occasions, we must define a Merkle hash $\Hash_M(s)$ of an arbitrary octet string $s$ of length $|s|$. We do this as follows:
2060
\begin{itemize}
2061
\item If $|s|\leq 256$ octets, then the Merkle hash of $s$ is just its $\Sha$:
2062
\begin{equation}
2063
  \Hash_M(s):=\Sha(s)\quad\text{if $|s|\leq 256$.}
2064
\end{equation}
2065
\item If $|s|>256$, let $n=2^k$ be the largest power of two less than $|s|$ (i.e., $k:=\lfloor\log_2(|s|-1)\rfloor$, $n:=2^k$). If $s'$ is the prefix of $s$ of length $n$, and $s''$ is the suffix of $s$ of length $|s|-n$, so that $s$ is the concatenation $s'.s''$ of $s'$ and $s''$, we define
2066
\begin{equation}
2067
  \Hash_M(s):=\Sha\bigl(\Int_{64}(|s|).\Hash_M(s').\Hash_M(s'')\bigr)
2068
\end{equation}
2069
In other words, we concatenate the 64-bit big-endian representation of $|s|$ and the recursively computed Merkle hashes of $s'$ and $s''$, and compute $\Sha$ of the resulting string.
2070
\end{itemize}
2071
One can check that $\Hash_M(s)=\Hash_M(t)$ for octet strings $s$ and $t$ of length less than $2^{64}-2^{56}$ implies $s=t$ unless a hash collision for $\Sha$ has been found.
2072

2073
\nxsubpoint\label{sp:blk.ser.hash}\emb{The serialization hash of a block}
2074
The construction of~\ptref{sp:merkle.octet.str} is applied in particular to the serialization of the bag of cells representing an unsigned shardchain or masterchain block. The validators sign not only the representation hash of the unsigned block, but also the ``serialization hash'' of the unsigned block, defined as $\Hash_M$ of the serialization of the unsigned block. In this way, the validators certify that this octet string is indeed a serialization of the corresponding block.
2075

2076
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2077
%
2078
%
2079
%                  bibliography
2080
%
2081
%
2082
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2083

2084
\clearpage
2085
\markbothsame{\textsc{References}}
2086

2087
\begin{thebibliography}{2}
2088

2089
\bibitem{Curve25519}
2090
  {\sc Daniel J.~Bernstein}, {\sl Curve25519: New Diffie--Hellman Speed Records} (2006), in: M.~Yung, Ye.~Dodis, A.~Kiayas et al, {\it Public Key Cryptography}, Lecture Notes in Computer Science {\bf 3958}, pp.~207--228. Available at \url{https://cr.yp.to/ecdh/curve25519-20060209.pdf}.
2091

2092
\bibitem{Ed25519}
2093
  {\sc Daniel J.~Bernstein, Niels Duif, Tanja Lange et al.}, {\sl High-speed high-security signatures} (2012), {\it Journal of Cryptographic Engineering} {\bf 2} (2), pp.~77--89. Available at \url{https://ed25519.cr.yp.to/ed25519-20110926.pdf}.
2094

2095
\bibitem{TON}
2096
  {\sc N.~Durov}, {\sl Telegram Open Network}, 2017.
2097

2098
\bibitem{TVM}
2099
  {\sc N.~Durov}, {\sl Telegram Open Network Virtual Machine}, 2018.
2100

2101
\end{thebibliography}
2102

2103
\clearpage
2104
\appendix\mysection{Elliptic curve cryptography}\label{app:ecc}
2105
This appendix contains a formal description of the elliptic curve cryptography currently used in TON, particularly in the TON Blockchain and the TON Network.
2106

2107
TON uses two forms of elliptic curve cryptography: Ed25519 is used for cryptographic Schnorr signatures, while Curve25519 is used for asymmetric cryptography. These curves are used in the standard way (as defined in the original articles \cite{Curve25519} and \cite{Ed25519} by D.~Bernstein and RFCs 7748 and 8032); however, some serialization details specific to TON must be explained. One unique adaptation of these curves for TON is that TON supports automatic conversion of Ed25519 keys into Curve25519 keys, so that the same keys can be used for signatures and for asymmetric cryptography.
2108

2109
\mysubsection{Elliptic curves}
2110
Some general facts on elliptic curves over finite fields, relevant for elliptic curve cryptography, are collected in this section.
2111

2112
\nxsubpoint\emb{Finite fields}
2113
We consider elliptic curves over finite fields. For the purposes of the Curve25519 and Ed25519 algorithms, we will be mostly concerned with elliptic curves over the finite prime field $k:=\bbF_p$ of residues modulo~$p$, where $p=2^{255}-19$ is a prime number, and over finite extensions $\bbF_q$ of $\bbF_p$, especially the quadratic extension $\bbF_{p^2}$.\footnote{Arithmetic modulo $p$ for a modulus $p$ near a power of two can be implemented very efficiently. On the other hand, residues modulo $2^{255}-19$ can be represented by 255-bit integers. This is the reason this particular value of $p$ has been chosen by D.~Bernstein.}
2114

2115
\nxsubpoint\emb{Elliptic curves}
2116
An {\em elliptic curve\/} $E=(E,O)$ over a field $k$ is a geometrically integral smooth projective curve $E/k$ of genus $g=1$, along with a marked $k$-rational point $O\in E(k)$. It is well-known that an elliptic curve $E$ over a field $k$ can be represented in (generalized) Weierstrass form:
2117
\begin{equation}\label{eq:gen.weier.form}
2118
  y^2+a_1xy+a_3y=x^3+a_2x^2+a_4x+a_6\quad\text{for some $a_1$, \dots, $a_6\in k$.}
2119
\end{equation}
2120
More precisely, this is only the affine part of the elliptic curve, written in coordinates $(x,y)$. For any field extension $K$ of $k$, $E(K)$ consists of all solutions $(x,y)\in K^2$ of equation~\eqref{eq:gen.weier.form}, called {\em finite points of $E(K)$}, along with a point at infinity, which is the marked point $O$.
2121

2122
\nxsubpoint\emb{Weierstrass form in homogeneous coordinates}
2123
In homogeneous coordinates $[X:Y:Z]$, \eqref{eq:gen.weier.form} corresponds to
2124
\begin{equation}\label{eq:h.gen.weier.form}
2125
  Y^2Z+a_1XYZ+a_3YZ^2=X^3+a_2X^2Z+a_4XZ^2+a_6Z^3
2126
\end{equation}
2127
When $Z\neq0$, we can set $x:=X/Z$, $y:=Y/Z$, and obtain a solution $(x,y)$ of \eqref{eq:gen.weier.form} (i.e., a finite point of $E$). On the other hand, the only solution (up to proportionality) of \eqref{eq:h.gen.weier.form} with $Z=0$ is $[0:1:0]$; this is the point at infinity~$O$.
2128

2129
\nxsubpoint\emb{Standard Weierstrass form}
2130
When the characteristic $\charact k$ of field $k$ is $\neq2$, $3$, the Weierstrass form of \eqref{eq:gen.weier.form} or \eqref{eq:h.gen.weier.form} can be simplified with the aid of linear transformations $y':=y+a_1x/2+a_3/2$, $x':=x+a_2/3$, thus making $a_1=a_3=a_2=0$ and obtaining
2131
\begin{equation}\label{eq:weier.form}
2132
  y^2=x^3+a_4x+a_6
2133
\end{equation}
2134
and
2135
\begin{equation}\label{eq:h.weier.form}
2136
  Y^2Z=X^3+a_4XZ^2+a_6Z^3
2137
\end{equation}
2138
Such an equation defines an elliptic curve if and only if the cubic polynomial $P(x):=x^3+a_4x+a_6$ has no multiple roots, i.e., if the discriminant $D:=-4a_4^3-27a_6^2$ is non-zero.
2139

2140
\nxsubpoint\emb{Addition of points on elliptic curve~$E$}
2141
Let $K$ be a field extension of field~$k$, and let $E=(E,O)$ be any elliptic curve in Weierstrass form defined over~$k$. Then any line $l\subset\bbP^2_K$ intersects the elliptic curve $E_{(K)}$ (which is the base change of curve $E$ to field $K$, i.e., the curve defined by the same equations over a larger field $K$) at exactly three points $P$, $Q$, $R$, considered with multiplicities. We define the {\em addition of points\/} on elliptic curve $E$ (or rather the addition of its $K$-valued points~$E(K)$) by requiring that
2142
\begin{equation}
2143
  P+Q+R=O\quad\text{whenever $\{P,Q,R\}=l\cap E$ for some line $l\subset\bbP^2_K$.}
2144
\end{equation}
2145
It is well-known that this requirement defines a unique commutative law $[+]:E\times_k E\to E$ on the points of the elliptic curve $E$, having $O$ as its neutral element. When elliptic curve~$E$ is represented by its Weierstrass form~\eqref{eq:gen.weier.form}, one can write explicit formulas expressing the coordinates $x_{P+Q}$, $y_{P+Q}$ of the sum $P+Q$ of two $K$-valued points $P$, $Q\in E(K)$ of elliptic curve $E$ as rational functions of the coordinates $x_P$, $y_P$, $x_Q$, $y_Q\in K$ of points $P$ and $Q$ and of the coefficients $a_i\in k$ of~\eqref{eq:gen.weier.form}.
2146

2147
\nxsubpoint\emb{Power maps}
2148
Since $E(K)$ is an abelian group, one can define {\em multiples\/} or {\em powers\/} $[n]X$ for any point $X\in E(K)$ and any integer $n\in\bbZ$. If $n=0$, then $[0]X=O$; if $n>0$, then $[n]X=[n-1]X+X$; if $n<0$, then $[n]X=-[-n]X$. The map $[n]=[n]_E:E\to E$ for $n\neq0$ is an {\em isogeny}, meaning that it is a non-constant homomorphism for the group law of $E$:
2149
\begin{equation}
2150
  [n](P+Q)=[n]P+[n]Q\quad\text{for any $P$, $Q\in E(K)$ and $n\in\bbZ$.}
2151
\end{equation}
2152
In particular, $[-1]_E:E\to E$, $P\mapsto -P$, is an involutive automorphism of elliptic curve~$E$. If $E$ is in Weierstrass form, $[-1]_E$ maps $(x,y)\mapsto(x,-y)$, and two points $P$, $Q\in E(\bbF_q)$ have equal $x$-coordinates if and only if $Q=\pm P$.
2153

2154
\nxsubpoint\emb{The order of the group of rational points of~$E$}
2155
Let $E$ be an elliptic curve defined over a finite base field $k$, and let $K=\bbF_q$ be a finite extension of~$k$. Then $E(\bbF_q)$ is a finite abelian group. By a well-known result of Hasse, the order $n$ of this group is not too distant from $q$:
2156
\begin{equation}\label{eq:weil.ec}
2157
  n=|E(\bbF_q)|=q-t+1\quad\text{where $t^2\leq 4q$, i.e., $|t|\leq2\sqrt{q}$.}
2158
\end{equation}
2159
We will be mostly interested in the case $K=k=\bbF_p$, with $q=p$ a prime number.
2160

2161
\nxsubpoint\emb{Cyclic subgroups of large prime order}
2162
Elliptic curve cryptography is usually performed using elliptic curves that admit a (necessarily cyclic) subgroup $C\subset E(\bbF_q)$ of prime order $\ell$. Equivalently, a rational point $G\in E(\bbF_q)$ of prime order $\ell$ may be given; then $C$ can be recovered as the cyclic subgroup $\langle G\rangle$ generated by~$G$. In order to verify that a point $G\in E(\bbF_q)$ generates a cyclic group of prime order~$\ell$, one can check that $G\neq O$, but $[\ell]G=O$.
2163

2164
By the Legendre theorem, $\ell$ is necessarily a divisor of the order $n=|E(\bbF_q)|$ of the finite abelian group~$E(\bbF_q)$:
2165
\begin{equation}
2166
  n=|E(\bbF_q)|=c\ell\quad\text{for some integer $c\geq1$}
2167
\end{equation}
2168
The integer $c$ is called the {\em cofactor}; one usually wants the cofactor to be as small as possible, so as to make $\ell=n/c$ as large as possible. Recall that $n$ always has the same order of magnitude as $q$ by~\eqref{eq:weil.ec}, so it cannot be changed much by varying $E$ once $q$ is fixed.
2169

2170
\nxsubpoint\label{sp:ecc.data}\emb{Data for elliptic curve cryptography}
2171
In order to define specific elliptic curve cryptography, one must fix
2172
a finite base field $\bbF_q$ (if $q=p$ is a prime, it is sufficient to fix prime~$p$), an elliptic curve $E/\bbF_q$ (usually represented by the coefficients of its Weierstrass form \eqref{eq:weier.form} or~\eqref{eq:gen.weier.form}), the base point $O$ (which usually is the point at infinity of an elliptic curve written in Weierstrass form), and the generator $G\in E(\bbF_q)$ (usually determined by its coordinates $(x,y)$ with respect to the equation of the elliptic curve) of a cyclic subgroup of large prime order~$\ell$. Prime number $\ell$ and the cofactor $c$ are usually also part of the elliptic cryptography data.
2173

2174
\nxsubpoint\emb{Main operations of elliptic curve cryptography}
2175
Elliptic curve cryptography usually deals with a fixed cyclic subgroup $C$ of a large prime order $\ell$ inside the group of points of an elliptic curve $E$ over a finite field~$\bbF_q$. A generator $G$ of $C$ is usually fixed. It is usually assumed that, given a point $X$ of $C$, one cannot find its ``discrete logarithm base $G$'' (i.e., a residue $n$ modulo $\ell$ such that $X=[n]G$) faster than in $O(\sqrt\ell)$ operations. The most important operations used in elliptic curve cryptography are the addition of points from $C\subset E(\bbF_q)$ and the computation of their powers, or multiples. 
2176

2177
\nxsubpoint\emb{Private and public keys for elliptic curve cryptography}
2178
Usually a private key for elliptic curve cryptography described by the data listed in~\ptref{sp:ecc.data} is a ``random'' integer $0<a<\ell$, called the {\em secret exponent}, and the corresponding public key is the point $A:=[a]G$ (or just its $x$-coordinate $x_A$), suitably serialized. 
2179

2180
\nxsubpoint\emb{Montgomery curves}
2181
Elliptic curves with the specific Weierstrass equation
2182
\begin{equation}
2183
  y^2=x^3+Ax^2+x\quad\text{where $A=4a-2$ for some $a\in k$, $a\neq0$, $a\neq1$}
2184
\end{equation}
2185
are called {\em Montgomery curves}. They have the convenient property that $x_{P+Q}x_{P-Q}$ can be expressed as a simple rational function of $x_P$ and $x_Q$:
2186
\begin{equation}
2187
  x_{P+Q}x_{P-Q}=\left(\frac{x_Px_Q-A}{x_P-x_Q}\right)^2
2188
\end{equation}
2189
This means that $x_{P+Q}$ can be computed provided $x_{P-Q}$, $x_P$, and $x_Q$ are known. In particular, if $x_P$, $x_{[n]P}$, and $x_{[n+1]P}$ are known, then $x_{[2n]P}$, $x_{[2n+1]P}$, and $x_{[2n+2]P}$ can be computed. Using the binary representation of $0<n<2^s$, one can compute $x_{[\lfloor n/2^{s-i}\rfloor]P}$, $x_{[\lfloor n/2^{s-i}\rfloor+1]P}$ for $i=0$, $1$, \dots, $s$, thus obtaining $x_{[n]P}$ (this algorithm for quickly computing $x_{[n]P}$ starting from $x_P$ on Montgomery curves is called a {\em Montgomery ladder}). Hence we see the importance of Montgomery curves for elliptic curve cryptography.
2190

2191
\mysubsection{Curve25519 cryptography}
2192
This section describes the well-known Curve25519 cryptography proposed by Daniel Bernstein \cite{Curve25519} and its usage in TON.
2193

2194
\nxsubpoint\emb{Curve25519}
2195
{\em Curve25519\/} is defined as the Montgomery curve
2196
\begin{equation}\label{eq:montg.curve}
2197
  y^2=x^3+Ax^2+x\quad\text{over $\bbF_p$, where $p=2^{255}-19$ and $A=486662$.}
2198
\end{equation}
2199
The order of this curve is $8\ell$, where $\ell$ is a prime number, and $c=8$ is the cofactor. The cyclic subgroup of order $\ell$ is generated by a point $G$ with $x_G=9$ (this determines $G$ up to the sign of $y_G$, which is unimportant). The order of the quadratic twist $2y^2=x^3+Ax^2+x$ of Curve25519 is $4\ell'$ for another prime number $\ell'$.\footnote{Actually, D.~Bernstein chose $A=486662$ because it is the smallest positive integer $A\equiv 2\pmod 4$ such that both the corresponding Montgomery curve \eqref{eq:montg.curve} over $\bbF_p$ for $p=2^{255}-19$ and the quadratic twist of this curve have small cofactors. Such an arrangement avoids the necessity to check whether an $x$-coordinate $x_P\in\bbF_p$ of a point $P$ defines a point $(x_P,y_P)\in\bbF_p^2$ lying on the Montgomery curve itself or on its quadratic twist.}
2200

2201
\nxsubpoint\label{sp:curve25519.param}\emb{Parameters of Curve25519}
2202
The parameters of Curve25519 are as follows:
2203
\begin{itemize}
2204
\item Base field: Prime finite field $\bbF_p$ for $p=2^{255}-19$.
2205
\item Equation: $y^2=x^3+Ax^2+x$ for $A=486662$.
2206
\item Base point $G$: Characterized by $x_G=9$ (nine is the smallest positive integer $x$-coordinate of a generator of the subgroup of large prime order of $E(\bbF_p)$).
2207
\item Order of $E(\bbF_p)$:
2208
\begin{align}\label{eq:curve25519.ell}
2209
  |E(\bbF_p)|=&\,p-t+1=8\ell,\quad\text{where}\\
2210
  \ell=&\,2^{252}+27742317777372353535851937790883648493\quad\text{is prime.}
2211
\end{align}
2212
\item Order of $\tilde E(\bbF_p)$, where $\tilde E$ is the quadratic twist of~$E$:
2213
\begin{align}
2214
  |\tilde E(\bbF_p)|=&\,p+t+1=2p+2-8\ell=4\ell',\quad\text{where}\\
2215
  \ell'=&\,2^{253}-55484635554744707071703875581767296995\quad\text{is prime.}
2216
\end{align}
2217
\end{itemize}
2218

2219
\nxsubpoint\label{sp:std.curve25519}\emb{Private and public keys for standard Curve25519 cryptography}
2220
A private key for Curve25519 cryptography is usually defined as a {\em secret exponent\/} $a$, while the corresponding public key is $x_A$, the $x$-coordinate of point $A:=[a]G$. This is usually sufficient for performing ECDH (elliptic curve Diffie--Hellman key exchange) and asymmetric elliptic curve cryptography, as follows:
2221

2222
If a party wants to send a message $M$ to another party, which has public key $x_A$ (and private key $a$), the following computations are performed. A one-time random secret exponent $b$ is generated, and $x_B:=x_{[b]G}$ and $x_{[b]A}$ are computed using a Montgomery ladder. After that, the message $M$ is encrypted by a symmetric cypher such as AES using the 256-bit ``shared secret'' $S:=x_{[b]A}$ as a key, and 256-bit integer (``one-time public key'') $x_B$ is prepended to the encrypted message. Once the party with public key $x_A$ receives this message, it can compute $x_{[a]B}$ starting from $x_B$ (transmitted with the encrypted message) and the private key $a$. Since $x_{[a]B}=x_{[ab]G}=x_{[b]A}=S$, the receiving party recovers the shared secret $S$ and is able to decrypt the remainder of the message.
2223

2224
\nxsubpoint\label{sp:ton.curve25519}\emb{Public and private keys for TON Curve25519 cryptography}
2225
TON uses another form for public and private keys of Curve25519 cryptography, borrowed from Ed25519 cryptography.
2226

2227
A private key for TON Curve25519 cryptography is just a random 256-bit string $k$. It is used by computing $\SHA{512}(k)$, taking the first 256 bits of the result, interpreting them as a little-endian 256-bit integer $a$, clearing bits $0$, $1$, $2$, and $255$ of $a$, and setting bit $254$ so as to obtain a value $2^{254}\leq a<2^{255}$, divisible by eight. The value $a$ thus obtained is the {\em secret exponent\/} corresponding to $k$; meanwhile, the remaining 256 bits of $\SHA{512}(k)$ constitute the {\em secret salt\/} $k''$.
2228

2229
The public key corresponding to $k$---or to the secret exponent $a$---is just the $x$-coordinate $x_A$ of the point $A:=[a]G$. Once $a$ and $x_A$ are computed, they are used in exactly the same way as in~\ptref{sp:std.curve25519}. In particular, if $x_A$ needs to be serialized, it is serialized into 32 octets as an unsigned little-endian 256-bit integer.
2230

2231
\nxsubpoint\emb{Curve25519 is used in the TON Network}
2232
Notice that the asymmetric Curve25519 cryptography described in~\ptref{sp:ton.curve25519} is extensively used by the TON Network, especially the ADNL (Abstract Datagram Network Layer) protocol. However, TON Blockchain needs elliptic curve cryptography mostly for signatures. For this purpose, Ed25519 signatures described in the next section are used.
2233

2234
\mysubsection{Ed25519 cryptography}\label{p:ed25519}
2235
Ed25519 cryptography is extensively used for fast cryptographic signatures by both the TON Blockchain and the TON Network. This section describes the variant of Ed25519 cryptography used by TON. An important difference from the standard approaches (as defined by D.~Bernstein et al.\ in~\cite{Ed25519}) is that TON provides automatic conversion of private and public Ed25519 keys into Curve25519 keys, so that the same keys could be used both for encrypting/decrypting and for signing messages.
2236

2237
\nxsubpoint\emb{Twisted Edwards curves}
2238
A {\em twisted Edwards curve\/} $E_{a,d}$ with parameters $a\neq 0$ and $d\neq0,a$ over a field~$k$ is given by equation
2239
\begin{equation}
2240
  E_{a,d}:\, ax^2+y^2=1+dx^2y^2\quad\text{over $k$}
2241
\end{equation}
2242
If $a=1$, this equation defines an (untwisted) Edwards curve. Point $O(0,1)$ is usually chosen as the marked point of $E_{a,d}$.
2243

2244
\nxsubpoint\emb{Twisted Edwards curves are birationally equivalent to Montgomery curves}
2245
A twisted Edwards curve $E_{a,d}$ is birationally equivalent to a Montgomery elliptic curve
2246
\begin{equation}
2247
  M_A:\, v^2=u^3+Au^2+u
2248
\end{equation}
2249
where $A=2(a+d)/(a-d)$ and $d/a=(A-2)/(A+2)$. The birational equivalence $\phi:E_{a,d}\dashrightarrow M_A$ and its inverse $\phi^{-1}$ are given by
2250
\begin{equation}\label{eq:phi.dir}
2251
  \phi:(x,y)\mapsto\left(\frac{1+y}{1-y},\frac{c(1+y)}{x(1-y)}\right)
2252
\end{equation}
2253
and
2254
\begin{equation}\label{eq:phi.inv}
2255
  \phi^{-1}:(u,v)\mapsto\left(\frac{cu}{v},\frac{u-1}{u+1}\right)
2256
\end{equation}
2257
where
2258
\begin{equation}
2259
  c=\sqrt{\frac{A+2}{a}}
2260
\end{equation}
2261
Notice that $\phi$ transforms the marked point $O(0,1)$ of $E_{a,d}$ into the marked point of $M_A$ (i.e., its point at infinity).
2262

2263
\nxsubpoint\emb{Addition of points on a twisted Edwards curve}
2264
Since $E_{a,d}$ is birationally equivalent to an elliptic curve $M_A$, the addition of points on $M_A$ can be transferred to $E_{a,d}$ by setting
2265
\begin{equation}
2266
  P+Q:=\phi^{-1}\bigl(\phi(P)+\phi(Q)\bigr)\quad\text{for any $P$, $Q\in E_{a,d}(k)$.}
2267
\end{equation}
2268
Notice that the marked point $O(0,1)$ is the neutral element with respect to this addition, and that $-(x_P,y_P)=(-x_P,y_P)$.
2269

2270
\nxsubpoint\emb{Formulas for adding points on a twisted Edwards curve}
2271
The coordinates $x_{P+Q}$ and $y_{P+Q}$ admit simple expressions as rational functions of $x_P$, $y_P$, $x_Q$, $y_Q$:
2272
\begin{align}\label{eq:tw.edw.point.add}
2273
  x_{P+Q}=&\frac{x_Py_Q+x_Qy_P}{1+dx_Px_Qy_Py_Q}\\
2274
  \label{eq:tw.edw.point.add.y}
2275
  y_{P+Q}=&\frac{y_Py_Q-ax_Px_Q}{1-dx_Px_Qy_Py_Q}
2276
\end{align}
2277
These expressions can be efficiently computed, especially if $a=-1$. This is the reason twisted Edwards curves are important for fast elliptic curve cryptography.
2278

2279
\nxsubpoint\emb{Ed25519 twisted Edwards curve}
2280
Ed25519 is the twisted Edwards curve $E_{-1,d}$ over $\bbF_p$, where $p=2^{255}-19$ is the same prime number as that used for Curve25519, and $d=-(A-2)/(A+2)=-121665/121666$, where $A=486662$ is the same as in the equation~\eqref{eq:montg.curve}:
2281
\begin{equation}
2282
  -x^2+y^2=1-\frac{121665}{121666}x^2y^2\quad\text{for $x$, $y\in\bbF_p$, $p=2^{255}-19$.}
2283
\end{equation}
2284
In this way, Ed25519-curve $E_{-1,d}$ is birationally equivalent to Curve25519 \eqref{eq:montg.curve}, and one can use $E_{-1,d}$ and formulas~\eqref{eq:tw.edw.point.add}--\eqref{eq:tw.edw.point.add.y} for point addition on either Ed25519 or Curve25519, using \eqref{eq:phi.dir} and~\eqref{eq:phi.inv} to convert points on Ed25519 into corresponding points on Curve25519, and vice versa.
2285

2286
\nxsubpoint\emb{Generator of Ed25519}
2287
The generator of Ed25519 is the point $G'$ with $y(G')=4/5$ and $0\leq x(G')<p$ even. According to \eqref{eq:phi.dir}, it corresponds to the point $(u,v)$ of Curve25519 with $u=(1+4/5)/(1-4/5)=9$ (i.e., to the generator $G$ of Curve25519 chosen in~\ptref{sp:curve25519.param}). In particular, $G=\phi(G')$, $G'$ generates a cyclic subgroup of the same large prime order $\ell$ given in~\eqref{eq:curve25519.ell}, and for any integer $a$,
2288
\begin{equation}\label{eq:phi.power.iso}
2289
  \phi([a]G')=[a]G\quad.
2290
\end{equation}
2291
In this way, we can perform computations with Curve25519 and its generator $G$, or with Ed25519 and generator $G'$, and obtain essentially the same results.
2292

2293
\nxsubpoint\label{sp:ed25519.std.pt}\emb{Standard representation of points on Ed25519}
2294
A point $P(x,y)$ on Ed25519 may be represented by its two coordinates $x_P$ and $y_P$, residues modulo $p=2^{255}-19$. In their turn, both these coordinates may be represented by unsigned 255- or 256-bit integers $0\leq x_P,y_P<p<2^{255}$.
2295

2296
However, a more compact representation of $P$ by one little-endian unsigned 256-bit integer $\tilde P$ is commonly used (and is used by TON as well). Namely, the 255 lower-order bits of $\tilde P$ contain $y_P$, $0\leq y_P<p<2^{255}$, and bit 255 is used to store $x_P\bmod 2$, the lower-order bit of $x_P$. Since $y_P$ always determines $x_P$ up to sign (i.e., up to replacing $x_P$ with $p-x_P$), $x_P$ and $p-x_P$ can always be distinguished by their lower-order bit, $p$ being odd.
2297

2298
If it is sufficient to know $\pm P$ up to sign, one can ignore $x_P\bmod 2$ and consider only the little-endian 255-bit integer $y_P$, setting the bit 255 arbitrarily, ignoring its previously defined value, or clearing it.
2299

2300
\nxsubpoint\label{sp:ed25519.priv.key}\emb{Private key for Ed25519}
2301
A {\em private key\/} for Ed25519 is just an arbitrary 256-bit string $k$. A {\em secret exponent\/} $a$ and {\em secret salt\/} $k''$ are derived from $k$ by first computing $\SHA{512}(k)$, and then taking the first 256 bits of this $\SHA{512}$ as the little-endian representation of $a$ (but with bits 255, 2, 1, and 0 cleared, and bit 254 set); the last 256 bits of $\SHA{512}(k)$ then constitute $k''$.
2302

2303
This is essentially the same procedure as described in~\ptref{sp:ton.curve25519}, but with Curve25519 replaced by the birationally equivalent curve Ed25519. (In fact, it is the other way around: this procedure is standard for Ed25519-based elliptic curve cryptography, and TON extends the procedure to Curve25519.)
2304

2305
\nxsubpoint\emb{Public key for Ed25519}
2306
A {\em public key\/} corresponding to a private key $k$ for Ed25519 is the standard representation (cf.~\ptref{sp:ed25519.std.pt}) of the point $A=[a]G'$, where $a$ is the secret exponent (cf.~\ptref{sp:ed25519.priv.key}) defined by the private key~$k$.
2307

2308
Notice that $\phi(A)$ is the public key for Curve25519 defined by the same private key $k$ according to~\ptref{sp:ton.curve25519} and~\eqref{eq:phi.power.iso}. In this way, we can convert public keys for Ed25519 into corresponding public keys for Curve25519, and vice versa. Private keys do not need to be transformed at all.
2309

2310
\nxsubpoint\emb{Cryptographic Ed25519-signatures}
2311
If a message (octet string) $M$ needs to be signed by a private key $k$ defining secret exponent $a$ and secret salt $k''$, the following computations are performed:
2312
\begin{itemize}
2313
\item $r:=\SHA{512}(k''|M)$, interpreted as a little-endian 512-bit integer. Here $s|t$ denotes the concatenation of octet strings $s$ and~$t$.
2314
\item $R:=[r]G'$ is a point on Ed25519.
2315
\item $\tilde R$ is the standard representation (cf.~\ptref{sp:ed25519.std.pt}) of point $R$ as a 32-octet string.
2316
\item $s:=r+a\cdot\SHA{512}(\tilde R|\tilde A|M)\bmod\ell$, encoded as a little-endian 256-bit integer. Here $\tilde A$ is the standard representation of point $A=[a]G'$, the public key corresponding to $k$.
2317
\end{itemize}
2318
The (Schnorr) signature is a 64-octet string $(\tilde R,s)$, consisting of the standard representation of the point~$R$ and of the 256-bit integer $s$.
2319

2320
\nxsubpoint\emb{Checking Ed25519-signatures}
2321
In order to verify signature $(\tilde R,s)$ of a message~$M$, supposedly made by the owner of the private key $k$ corresponding to a known public key $A$, the following steps are performed:
2322
\begin{itemize}
2323
\item Points $[s]G'$ and $R+[\SHA{512}(\tilde R|\tilde A|M)]A$ of Ed25519 are computed.
2324
\item If these two points coincide, the signature is correct.
2325
\end{itemize}
2326

2327
\end{document}
2328

Использование cookies

Мы используем файлы cookie в соответствии с Политикой конфиденциальности и Политикой использования cookies.

Нажимая кнопку «Принимаю», Вы даете АО «СберТех» согласие на обработку Ваших персональных данных в целях совершенствования нашего веб-сайта и Сервиса GitVerse, а также повышения удобства их использования.

Запретить использование cookies Вы можете самостоятельно в настройках Вашего браузера.