A Dive into Fluidex's ArchitectureJuly 15, 2021
The cryptography underlying zero knowledge proofs has undergone a Moore’s Law-like trajectory over the last few years, and it shows no sign of slowing down.
ZK-Rollup, with its terrific security and decentralization properties, is believed as the most important Layer 2 scaling solution in the long term. However, the nice features of ZK-Rollup come with a cost of technical difficulties, in terms of both cryptography and engineering. No wonder why there are only a few relevant devtools or user-end products out there. As one of the a few teams that are developing a ZK-Rollup system from scratch instead of forking, Fluidex decides to share some of our experience and outcomes with the industry, to help explode the ZK-Rollup ecosystem.
Before moving on, we recommend our readers to check out the article “ZK-Rollup development experience sharing, Part I”, in which we talk about how to develop and optimize a ZK-Rollup. As the second part of this “development experience-sharing” series, this article focuses on our recently open-sourced back-end architecture, aiming at guiding more developers into the ZK-Rollup ecosystem.
The diagram below shows the overall architecture of Fluidex’s back-end. In a nutshell, users send order requests to the matching engine, and the matching engine sends all the finished orders to the message queue. The rollup module then updates the states (users’ orders, users’ balances…) on the Merkle tree and packs the messages (after some format conversions) into L2 blocks. After L2 blocks being proved by our prover cluster, they will be published onto chain.
We will now first introduce the functionalities and responsibilities of each submodule, and then summarize the design principles of our system.
Gateway is to accept order requests from front-end or quant trading bots, and to route them into different micro-services. Gateway will also push the up-to-date internal market k-line and orderbook information to the ticker subscribers1 in a desired format. Given the excellent performance and configuration flexibility, we choose Envoy for our gateway. Besides, note that Fluidex uses GRPC extensively including both unary RPC and bidirectional streaming RPC, Envoy’s excellent support for GRPC can fulfill our requirements.
dingir exchange is a high-performance exchange matching engine. It stores and matches user orders in RAM in real time. We use BTreeMap2 for our orderbook, because it requires both key-value query (for order details) and in-order traversal (for order matching), which means that it needs an ordered associative array like AVL tree / skip list. Moreover, BTreeMap can benefit from modern CPUs’ cache architecture.
The persistence of the global state is achieved by periodical dumps and operation logs. By periodical process forks, which has lower latency than “stop-world” and than “deep-copy”, the new child process persists the global state. In addition, all user requests are persisted into the database in batches (otherwise leading to heavy database pressure) as operation logs. The combination of the two persistence mechanisms ensures that if the system suddenly goes down, the system state can be quickly recovered.
High, low, open, close and volume are queried from TimescaleDB, a time series database, to generate K-line.
Rollup State Manager
In a ZK-Rollup system, the smart contract only needs to store the Merkle root of the global state instead of the entire Merkle tree of all the states. The maintenance of the Merkle tree is done by the off-chain rollup state manager. Rollup state manager receives finished orders and other operations (e.g., withdrawal, transfer…) from the message queue and update the Merkle tree. Operations are packed into L2 Blocks.
Rollup will periodically dump checkpoints (with message queue offsets). When the system restarts, it will load the state of Merkle tree from the last checkpoint, seek the corresponding offset in the message queue, and reprocess the messages in the message queue to recover the latest state.
After a L2 block is generated by the rollup state manager, a cryptographic proof is needed so that this block can be verified as correct on chain. This requires a prover cluster to provide desired computing power. What’s more, since the transaction volume of a DEX may vary considerably in different periods of time, this proof cluster needs to be highly scalable and elastic.
To meet these requirements, we adopt the Master-Worker architecture, which consists of a stateful master node that manages a list of proving tasks, and a couple of stateless worker nodes that retrieve tasks from the master and submit proofs to the master after proving. Similar to PoW mining, since the computation for ZK-Rollup verification is much more lightweight than that of proving, we consider switching to a more “trustless” architecture in the future, so that miners can join and quit proving permissionlessly at any time. The cryptography behind can ensure that if a proof is validated (which can be checked quickly), the miner is not misbehaving.
At present, prover cluster provides two different deployment styles - via Docker Compose and via K8S - to support local development/debugging and production environment deployment.
The Design Principles of Fluidex Back-end
CQRS and Global Message Bus
The status update of a rollup system requires extremely strict consistency and accuracy – not even a slightest error is allowed. All status update operations should be traceable and recorded. To provide such reliable status updates, we adopt the CQRS design pattern. All writes to the global state are synchronized by a message queue. Specifically, we use Kafka as the global message bus. The rollup system uses the message queue for the ground truth, receives notifications for status updates from the message queue, and update the global Merkle tree accordingly.
Memory-centric Data Maintenance
Conventional Internet services use databases for their data ground truths. They usually achieve their systems’ scalability and resilience through data sharding and stateless services.
However, our ZK-Rollup system consists of many services that have to maintain a large number of complex data structures in memory (such as the rollup and the matching engine that maintains the Merkle tree and orderbook respectively). This requires an memory-centric architecture design. As a result, our design principles may differ from the 12 Factor recommended in many Internet business, but are more similar to game server developments.
Unified Technology Stack
Thanks to Rust’s type safety and ownership checks, as well as the performance comparable with C++, Rust has become the first choice for many cryptographic libraries, and the ecosystem has been exploding. Therefore, it is not surprising that we choose Rust for our rollup state manager and prover cluster. Besides, since a unified technology stack can greatly reduce our team’s cooperation overhead, other modules are also implemented in Rust.
Fluidex-backend has been open-sourced on Github and please refer to https://github.com/Fluidex/fluidex-backend. (Currently only with instructions on how to run it as a local cluster.)
using “grpc->websocket” but not implemented yet.↩