Day-2 operations

Everything below is codified in the repo’s deploy/ routines — idempotent, re-entrant bash with every step logged to deploy/logs/. The golden rule for all of it: the order is allowlist-first, because the dstack KMS will not release keys to a CVM whose compose hash isn’t approved on chain.

Updating a node’s compose or sealed env

CLUSTER=<cluster> MEMBER_IMPL=<impl> COMPOSE=deploy/compose/node-1.yaml \
  deploy/node-pathA.sh attestmesh-node-1 update

The update routine:

Runs a prepare-only update to learn the new compose hash without touching the CVM.
Checks allowedComposeHashes on the cluster and, if needed, allowlists the new hash (a cluster-owner transaction).
Applies the update — the CVM reboots into the new compose and passes the KMS boot gate because the hash is already approved.

The node re-registers nothing: its member record, keys, and identity carry over. It re-meshes and re-acquires the CSK on its way back to healthy.

Rolling a new sidecar image

Images are tagged :latest and published by CI on every push to sidecar/**. Because the compose doesn’t change, no allowlist step is needed — restart to re-pull:

deploy/node-pathA.sh attestmesh-node-1 restart
deploy/node-pathA.sh attestmesh-node-1 mesh-verify

Roll one node at a time and mesh-verify between them if you want the mesh to stay continuously available; a simultaneous roll of every node also recovers, just with a brief gap.

Operating the shared indexer

The indexer is one instance per chain, serving every cluster — never deploy it per cluster. It is a stock dstack app (not a cluster member), so its updates have no cluster allowlist step:

deploy/indexer.sh attestmesh-indexer-1 update    # new compose/env + restart
deploy/indexer.sh attestmesh-indexer-1 verify    # health via the gateway + registry read-back

Re-registration (deploy/indexer.sh <name> register) is only needed when the signing key or endpoint changes — for example, deploying a replacement CVM. The registry record is owner-controlled; subscribing sidecars re-discover it automatically and re-verify every push against the new key.

If the indexer is ever down, nothing breaks: sidecars log the failed subscription, fall back to their own chain-log polling, and resubscribe when it returns.

Adding a node to a live cluster

Identical to first bring-up — the routine is the same for node 3 as for node 1:

CLUSTER=<cluster> MEMBER_IMPL=<impl> COMPOSE=deploy/compose/node-1.yaml \
  deploy/node-pathA.sh attestmesh-node-3 all

Existing members discover the newcomer from chain state on their next reconcile pass (within seconds when the indexer push wakes them), add the wireguard peer, and serve it the CSK once its heartbeats verify.

Health checklist

When something looks wrong, in order:

/healthz via the gateway — which phase is it stuck in?
CVM logs (npx phala cvms logs <cvm>) — the sidecar logs every phase transition, registration attempt, peer configuration, envelope, and CSK step.
Chain state — memberCount, memberIdOf(app_id), cskCommitment, IndexerRegistry.current() answer most “is it registered/configured?” questions directly.
Field notes — the failure modes we have actually hit, with their signatures and fixes.