Skip to content

Running a node

An AttestMesh node is a dstack CVM running two containers: your application and the AttestMesh sidecar (cluster-mesh-agent). The sidecar owns everything mesh-related; the application consumes a simple gRPC API over a unix socket.

The sidecar’s health endpoint (:9090/healthz) reports a phase you can watch through the dstack gateway:

PhaseMeaning
bootingDeriving keys from the TEE, discovering the member contract and cluster.
registeringBuilding the attestation proof; submitting the sponsored dstack_register UserOperation. Retries while the operator’s allowlist entries land.
subscribingRegistration confirmed; starting mesh bring-up and indexer discovery.
waiting-peersRegistered, but no other members on chain yet.
wg-configuringPeers found; configuring wireguard over the gateway transport.
pulling-cskMesh up; acquiring the Cluster Shared Key.
heartbeatingSigned heartbeats flowing; waiting for convergence.
healthyConverged + CSK held. The application gate opens.

The endpoint returns HTTP 200 only when healthy:

{"csk_acquired":true,"first_converged":true,"live_peers":1,"phase":"healthy"}

first_converged latches once and never resets — brief peer flaps after first convergence degrade live_peers, not the gate.

  • Wireguard mesh — one attestmesh0 interface; one peer entry per cluster member, keys and mesh IPs pinned from chain state. Transport bootstraps over the dstack gateway (see Mesh networking).
  • Heartbeats — Ed25519-signed liveness packets every 2 seconds inside the mesh. A peer is live only when its heartbeats verify against the key learned from its encrypted endpoint envelope.
  • Event intake — a verified subscription to the shared Indexer for low-latency pushes, with direct chain-log polling always running underneath. Either path alone is sufficient.
  • CSK custody — held in memory, zeroized on drop, re-acquirable on every restart without operator involvement.

The sidecar serves gRPC on a unix socket (default /var/run/attestmesh/agent.sock). Your application container never touches keys, ciphertexts, the chain, or the Indexer — only this surface:

RPCPurpose
GetMeshStatusPhase, gates, live peer count — the same data as /healthz.
ListPeersMember ids, mesh IPs, liveness.
SendMessageEncrypts to the recipient’s on-chain key and submits a sponsored MessageFacet.send.
SubscribeMessagesStream of messages addressed to this node, already decrypted and sender-authenticated.
SubscribePeerEventsJoins and liveness transitions.
GetClusterSharedKeyThe CSK — only after the health gates pass.

Gate your application’s startup on the sidecar’s health endpoint (compose depends_on + healthcheck), then talk to peers over their mesh IPs.

Everything a node is derives from its TEE identity:

  • Keys are deterministic per app identity — a restart re-derives the same x25519/Ed25519/wireguard keys and the same EIP-4337 owner key. There is no key material to back up.
  • Registration is restart-safe — the sidecar detects its existing member record and skips straight to mesh bring-up.
  • The CSK is recovered, not stored: the originator re-derives it and checks it against the on-chain commitment; everyone else re-pulls it from a live peer over the mesh.

A node can be restarted, re-imaged (with an allowlisted compose), or moved without any state hand-off — the chain plus the TEE seed reconstruct everything. A two-node cluster typically returns to healthy in under two minutes after a simultaneous restart of both nodes.