This specification defines a test suite for validating Aranya daemon convergence behavior with a large number of nodes on a single device. The primary goal is to test larger number of daemons on a team and verify that all nodes in a network eventually reach a consistent state after commands are issued and synchronized across a defined network topology. We use a label command to easily track which nodes are up to date.
This specification is designed for use with duvet for requirements traceability.
Existing Aranya integration tests typically use 5 nodes (DevicesCtx with owner, admin, operator, membera, memberb). While sufficient for testing role-based access control and basic synchronization, these tests do not exercise:
This test suite addresses these gaps by providing a framework for large-scale convergence testing with configurable topologies.
The test uses a scalable node context that extends the patterns from DeviceCtx in the existing test infrastructure.
/// Type-safe index into the node list (0 to N-1).
#[derive(Copy, Clone, Debug, PartialEq, Eq, Hash)]
struct NodeIndex(usize);
struct NodeCtx {
/// Unique node identifier (0 to N-1)
index: NodeIndex,
/// Aranya client connection
client: Client,
/// Device's public key bundle
pk: KeyBundle,
/// Device ID
id: DeviceId,
/// Daemon handle
daemon: DaemonHandle,
/// Sync peers (indices of connected nodes)
peers: Vec<NodeIndex>,
}
The test context manages all nodes and the configured topology.
struct TestCtx {
/// All nodes in the test
nodes: Vec<NodeCtx>,
/// The topologies used to connect nodes. When multiple topologies
/// are provided they are applied sequentially, each one adding
/// its peers on top of any previously configured peers. `None`
/// when using `add_sync_peer` exclusively to wire peers manually.
topology: Option<Vec<Topology>>,
/// The sync mode used for this test run
sync_mode: SyncMode,
/// Team ID for the test
team_id: TeamId,
/// Convergence tracker
tracker: ConvergenceTracker,
}
impl TestCtx {
/// Manually add a sync peer relationship between two nodes.
/// Used with the Custom topology to build arbitrary network graphs.
fn add_sync_peer(&mut self, from: NodeIndex, to: NodeIndex) { ... }
/// Remove a sync peer relationship between two nodes.
fn remove_sync_peer(&mut self, from: NodeIndex, to: NodeIndex) { ... }
}
/// A function that takes the total node count and returns
/// the peer list for each node. `peers[i]` contains the
/// `NodeIndex`s of node `i`'s sync peers.
type TopologyConnectFn = fn(usize) -> Vec<Vec<NodeIndex>>;
enum Topology {
Ring,
Custom {
connect: TopologyConnectFn,
},
}
// Example: a star topology where node 0 is the hub
fn star_topology(node_count: usize) -> Vec<Vec<NodeIndex>> {
(0..node_count)
.map(|i| {
if i == 0 {
// Hub connects to all other nodes
(1..node_count).map(NodeIndex).collect()
} else {
// Spokes connect only to the hub
vec![NodeIndex(0)]
}
})
.collect()
}
// Usage:
let topology = Topology::Custom { connect: star_topology };
enum SyncMode {
/// All nodes use interval-based polling to discover new commands
Poll {
/// How frequently each node polls its sync peers
interval: Duration,
},
/// All nodes use hello notifications to trigger sync on graph changes
Hello {
/// Minimum time between hello notifications to the same peer
debounce: Duration,
/// How long a hello subscription remains valid before expiring
subscription_duration: Duration,
},
}
The Topology enum is expected to grow as additional topologies (star, mesh, etc.) are added in future extensions. The Custom variant accepts a closure that generates the full peer adjacency list from the node count, allowing declarative topology definitions. When multiple topologies are provided, they are applied sequentially, each one adding its peers on top of any previously configured peers. For example, combining a Ring with a Custom star topology produces a ring where the hub node additionally peers with every other node. For cases requiring dynamic or incremental wiring, callers can also use TestCtx::add_sync_peer to add individual peer relationships after setup.
The SyncMode enum is expected to grow (e.g., Mixed mode) as additional sync strategies are validated.
Tracks convergence state across all nodes.
struct ConvergenceTracker {
/// The label used to track convergence
convergence_label: Label,
/// Per-node convergence status
node_status: Vec<ConvergenceStatus>,
/// Timestamps for convergence measurements
timestamps: ConvergenceTimestamps,
}
struct ConvergenceStatus {
/// Whether this node has received the convergence label
has_label: bool,
/// Time when the label was received
convergence_time: Option<Instant>,
}
The test MUST support configuring the number of nodes.
The test MUST scale to at least 70 nodes without failure.
The test MUST reject configurations with fewer than 3 nodes (the minimum for a valid ring).
In poll sync mode, the test MUST support configuring the sync interval between peers.
In poll sync mode, the default sync interval MUST be 1 second.
The test MUST support configuring a maximum test duration timeout.
The default maximum test duration MUST be 600 seconds (10 minutes).
The test MUST support configuring the sync mode (poll or hello).
The default sync mode MUST be hello.
In hello sync mode, the test MUST support configuring the hello notification debounce duration (minimum time between notifications to the same peer).
In hello sync mode, the test MUST support configuring the hello subscription duration (how long a subscription remains valid).
The test MUST support the Ring topology.
The test MUST support the Custom topology.
The initial implementation MUST include at least the Ring and Custom topologies.
When multiple topologies are configured, the test MUST apply them sequentially, each topology adding its peers on top of any previously configured peers.
In the ring topology, each node MUST connect to exactly two other nodes: its clockwise neighbor and its counter-clockwise neighbor.
The ring topology MUST form a single connected ring (each node’s two peers link to form one cycle covering all nodes).
The Custom topology MUST accept a topology connect function (TopologyConnectFn) that takes the total node count and returns the peer list for each node.
The topology connect function MUST return a peer list of length equal to the node count, where each entry contains the NodeIndexs of that node’s sync peers.
The Custom topology MUST allow defining arbitrary peer relationships between nodes, including topologies such as star, mesh, and hierarchical.
TestCtx MUST provide an add_sync_peer method that adds a sync peer relationship between two nodes identified by NodeIndex.
TestCtx MUST provide a remove_sync_peer method that removes a sync peer relationship between two nodes identified by NodeIndex.
Each node MUST be initialized with a unique daemon instance.
Each node MUST have its own cryptographic keys.
All nodes MUST have unique device IDs.
Node initialization MUST occur in parallel batches to avoid resource exhaustion.
Node initialization MUST complete within a configurable timeout (default: 60 seconds per node batch).
The test MUST verify that all nodes started successfully.
A single team MUST be created by node 0 (the designated owner).
All nodes MUST be added to the team before convergence testing begins.
A shared QUIC sync seed MUST be distributed to all nodes during team setup.
Each non-owner node MUST be added as a team member by the owner.
Team configuration MUST be synchronized to all nodes before the convergence test phase.
The test MUST verify that all nodes have received the team configuration.
Each node MUST add sync peers according to the configured topology.
Sync peer configuration MUST specify the sync interval.
The sync peer address MUST be obtained from the neighbor node’s local address.
Sync peer configuration MUST complete before the convergence test phase.
In poll sync mode, each node MUST poll its sync peers at the configured sync interval.
In hello sync mode, each node MUST subscribe to hello notifications from its sync peers.
The test MUST assign a label to the source node’s graph to mark the start of convergence testing.
The default source node for label assignment MUST be node 0.
The test MUST track when each node receives the convergence label.
Convergence MUST be defined as all nodes having received the convergence label.
The test MUST measure the total convergence time from label assignment to full convergence.
The test MUST fail if convergence is not achieved within the maximum test duration.
The test MUST report which nodes failed to converge if the timeout is reached.
Each node’s graph state MUST be queryable to determine whether it has received the convergence label.
The test MUST poll nodes periodically to check convergence status.
The polling interval MUST be configurable (default: 250 milliseconds).
A node MUST be considered converged when it has received the convergence label.
The test MUST record the timestamp when the convergence label is assigned.
The test MUST record the timestamp when each node achieves convergence.
The test MUST calculate and report the following metrics:
The test SHOULD report memory usage per node if available.
When a CSV export feature flag is enabled, the test MUST output raw convergence data as a CSV file after each test run.
The CSV output MUST include one row per node with the following columns: node index, label assignment time (T0), node convergence time, and convergence duration (time from T0 to node convergence).
The test MUST fail if any node fails to initialize.
If a node fails to initialize, the test MUST report which node failed and the cause of the failure.
The test MUST handle sync failures between nodes.
All daemon processes MUST be terminated when the test completes.
All temporary directories MUST be removed when the test completes.
Cleanup MUST occur even if the test fails or times out.
The test MUST use RAII patterns to ensure cleanup on panic.
In a bidirectional ring of N nodes:
Poll sync mode: Each node discovers new commands from its neighbors on the next poll cycle. Propagation speed is bounded by the sync interval.
Hello sync mode: When a node receives new commands, it sends hello notifications to its peers, which trigger immediate syncs. Propagation speed is bounded by network latency and processing time rather than the sync interval.
For a ring of N nodes with sync interval S:
Actual convergence time will be higher due to:
For a ring of N nodes with hello sync:
Actual convergence time depends on:
The test passes when:
add_sync_peer)This specification is designed for use with duvet. Requirements are marked with unique identifiers (e.g., CONF-001, RING-001) that can be referenced in implementation code using duvet annotations:
//= https://github.com/aranya-project/aranya-docs/docs/multi-daemon-convergence-test.md#CONF-002
//# The test MUST scale to at least 70 nodes without failure.
const MIN_SUPPORTED_NODE_COUNT: usize = 70;
To generate a requirements coverage report:
duvet report --spec docs/multi-daemon-convergence-test.md --source crates/aranya-client/tests/