FROST/ROAST readiness branch#3866
Draft
mswilkison wants to merge 283 commits into
Draft
Conversation
…g scope Review follow-ups on the verifiable-blame revision: - Fix inverted group-shape wording (51-of-100, not 100-of-51). - State explicitly that near the assumption boundary the accuser quorum needs all but 2t-n-1 honest observers (50 of 51 at the production shape), so in the high-f regime the gate is a fabrication firewall rather than a working exclusion mechanism; proof-carrying blame restores exclusion there. - Document the n-of-n edge of ExclusionAccuserQuorum (quorum 1, consistent with f = 0 under that shape's own assumption). - Make precise that silence parking keys on bundle absence: a member that submits its evidence snapshot while withholding its signing contribution is not parked by this layer; that cost is bounded by the Annex B retry budget until t-of-included finalize. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…nge event A regen run that produces different bytes means the legacy math/rand shuffle semantics changed and deployed engines would disagree on coordinator rotation. Both language suites passing after a dual regen is not evidence of compatibility with deployed engines. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The i.i.d. subset-sampling table models the deployed legacy loop. The RFC-21 Layer B transitional path differs in both directions: one-shot silence is absorbed by parking, but staggered silence (or submitting evidence snapshots while withholding signing contributions, which bundle-absence parking does not detect) can fail every attempt deterministically at small f. The table is therefore not a worst case for Layer B -- which strengthens, not weakens, the t-of-included finalize requirement this annex codifies. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… event A regen run that produces different bytes means the normative Annex A derivation changed; it requires an annex update in the same change and a mixed-fleet rollout note. Both language suites passing after a dual regen is not evidence of compatibility with deployed engines. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Review follow-up (F2): the differential corpus did not exercise Go's rand.NewSource seed normalization, which reduces the source seed mod (2^31 - 1) and maps 0 to 89482311 -- so +/-(2^31 - 1) seed the generator identically to 0. Add both as boundary seeds (verified: same coordinator as seed 0 at attempt 0 across all boundary sets). A port that special-cases literal 0 but skips the modulo now fails its own replay. Corpus grows 600 -> 648 cases; regeneration remains deterministic. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
mswilkison
added a commit
that referenced
this pull request
Jun 11, 2026
…+ cross-language conformance vectors (#4031) Stacked on #4005 (base: `extraction/frost-signer-mirror-2026-05-26`). Implements item 3 of the review feedback (duplicated, divergent protocol constants) — Rust half; pairs with the Go-side PR #4030 stacked on #3866. ## Problem Flagged in #4026: the engine validated attempt contexts using `int64_be(MessageDigest[0..8])` with the 1-based wire attempt number (the legacy `signingAttemptSeed` convention), while the Go RFC-21 layer derives `fold(SHA256(KeyGroup ‖ SessionID ‖ MessageDigest))` with 0-based attempt numbers. At Phase-7 wiring, every Go-derived attempt context would fail the engine's strict-mode `validate_attempt_context` — a deterministic, network-wide liveness failure invisible to either side's property tests. ## What changed - **`roast_attempt_shuffle_seed(key_group, session_id, message_digest_hex)`** implements the normative RFC-21 Annex A derivation (see #4030). The key-group handle — this engine's hex-encoded serialized group verifying key — feeds the hash as an opaque UTF-8 string, exactly matching keep-core's `attempt.DeriveAttemptSeed` + `foldAttemptSeed` composition, including the strict 32-byte digest requirement. - **`validate_attempt_context` now takes the session's key group** (threaded from `dkg.key_group` at StartSignRound and the session's `DkgResult` at FinalizeSignRound) and composes the shuffle source with the **0-based** RFC-21 attempt number. The FFI wire encoding stays 1-based (`attempt_number >= 1` still enforced; `wire = AttemptNumber + 1`); the engine subtracts one before composition, per the annex. - **`testdata/coordinator_seed_vectors.json`** — byte-identical copy of the canonical file generated from the Go implementation. `coordinator_seed_derivation_matches_cross_language_vectors` pins, for all ten vectors: the folded seed (including negative values, so an unsigned port cannot pass), the selected coordinator (including the n=100 production-shape set), the 0-/1-based wire mapping, and end-to-end strict-mode `validate_attempt_context` acceptance of a context built from the wire encoding. Either language drifting now fails its own unit suite. - **`docs/roast-coordinator-seed-derivation.md`** mirrors the normative annex for signer-side readers, with the regen/copy procedure. - The coordinator-mismatch test derives the provably-wrong coordinator instead of hardcoding member 1 (which, under the new seed, happened to become the correct selection — exactly the class of silent assumption these vectors exist to catch). ## Notes - Mixed-version note: engines on the old derivation reject contexts produced under the new one (and vice versa) — strict-mode attempt contexts are not yet produced by the Go layer in any deployment, so this is pre-wiring cleanup with no live-fleet impact. - The attempt-context vector suite (`roast-attempt-context-v1.json`) is unaffected: it pins fingerprint/attempt-id domains with the coordinator as an *input*. - Port back to the tBTC monorepo signer alongside the next extraction sync. ## Tests Full suite: 245 passed, 0 failed; clippy and rustfmt clean. New conformance test exercises all ten cross-language vectors. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
…on; overflow can only park (#4029) Stacked on #3866 (base: `feat/frost-schnorr-migration-scaffold`). Implements item 2 of the review feedback on the FROST/ROAST stack (verifiable blame, not counted blame). ## Problem `NextAttempt` permanently excluded members on unverifiable observer counters — reject/conflict threshold 1, overflow 4 — **summed across observers and across reasons**, so a single byzantine observer fabricating counters could permanently exclude any honest member, and by repetition grind the included set below threshold (`ErrAttemptInfeasible`). That inverts ROAST's robustness guarantee: the design's whole point is liveness with t honest of n, and the blame layer handed a liveness-and-membership veto to any single member. Bundle evidence (`OverflowEntry`/`RejectEntry`/`ConflictEntry`) is observer-signed *claims* — nothing in a counter lets a third party re-check that the accused misbehaved. ## What changed **Accuser-quorum gate (`ExclusionAccuserQuorum`).** An accusation only produces action when made by at least `f+1 = groupSize − threshold + 1` distinct credible observers (f = the byzantine tolerance). At f+1, at least one accuser is honest under the protocol's own t-of-n assumption, so the group may act as if the fault were verified. Real faults reach quorum naturally — contributions are broadcast, every honest member observes the same bytes — while f colluding members can never reach f+1 by fabrication. Production shape (n=100, t=51): quorum 50 vs. 49 worst-case byzantine. **Counting hygiene.** Observers count once per accused per category regardless of claimed `Count` magnitude; reject reasons no longer multiply accusers; categories tally independently (reject + conflict claims no longer sum); only previous-`IncludedSet` members are credible accusers; accusations against non-original-set members are ignored. **Overflow can never be permanent.** Transport pressure is observable only at the transport layer and can never be made self-incriminating. An *established* (quorum-corroborated) overflow accusation now parks the member for one attempt — same transient mechanics as silence parking — instead of excluding forever. **Sub-quorum claims are ignored entirely**, not parked: acting on a single unverifiable claim would let one byzantine observer impose an attempt of liveness cost on any honest member at will. Established reject/conflict accusations still exclude permanently, and the policy remains a pure deterministic function of `(prev, bundle, threshold)`. ## Why quorum rather than self-incriminating proofs now The review's endgame is proof-carrying blame: the accused's own two operator-signed conflicting payloads (conflicts), or their signed contribution plus a re-checkable deterministic validation failure (rejects). That requires wire-format and verification-routine changes (the current bundle carries only counters). The quorum gate delivers the safety property — fabricated blame can never become permanent, and the grinding-to-infeasibility vector is closed — with no wire change, and is the correct *floor* even after proofs land (proof-verified entries can then bypass the quorum per category). RFC-21 Layer B now documents the policy, the rationale, and that roadmap; the residual cost (sub-quorum-observed faults burn retry attempts instead of excluding) is explicitly folded into the serial-attempt latency budget. ## Tests New regression coverage: quorum boundary at f vs f+1 for both permanent categories; fabricated-blame grinding across six attempts (single byzantine accuser, max counters, honest members never move); count-magnitude fabrication; cross-category non-summing; reason non-multiplication; non-credible accusers; non-original accused; established-overflow park-and-reinstate cycle; production-shape quorum pin `(100, 51) = 50`. Existing overflow/categories/soak tests updated to the new semantics. `go test ./pkg/frost/... ./pkg/tbtc/...` passes; `go vet` clean. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
…x A) + cross-language conformance vectors (#4030) Stacked on #3866 (base: `feat/frost-schnorr-migration-scaffold`). Implements item 3 of the review feedback (duplicated, divergent protocol constants) — Go half; the Rust half is the paired PR stacked on #4005. ## Problem The coordinator-shuffle seed derivation exists twice, in two languages, on two branches, with no single source of truth — and the two copies disagree (flagged in #4026): | | seed | attempt numbering | |---|---|---| | Go RFC-21 layer | `fold(SHA256(KeyGroup ‖ SessionID ‖ MessageDigest))` | 0-based | | Rust engine validation | `int64_be(MessageDigest[0..8])` (legacy `signingAttemptSeed` convention) | 1-based wire | At Phase-7 wiring, every Go-derived attempt context would fail the Rust engine's strict-mode validation — a network-fracturing liveness failure that property tests on either side cannot catch. ## What this PR does (Go half) 1. **RFC-21 Annex A (normative)** — single normative definition of the derivation: inputs (including the exact `KeyGroupBytes` definition for `FrostTBTCSignerV1` material — the UTF-8 bytes of the hex key-group handle, treated opaquely), the 0-based composition with the two's-complement-wrapping addition, the `wire = AttemptNumber + 1` FFI mapping, and the accepted non-goals (unframed concatenation, first-8-byte fold, grindability bounds) with rationale. The Go derivation is adopted as normative: it binds key group + session + digest rather than the digest alone, and the live `pkg/tbtc` signing loop's legacy convention is explicitly documented as the thing Phase 7 migrates *from*. 2. **Generated conformance vectors** — `pkg/frost/roast/testdata/coordinator_seed_vectors.json`: ten end-to-end vectors (folded seed int64 + selected coordinator) covering attempts 0/1/3/5/7, sparse and production-size (n=100) member sets, opaque key-group handles, and negative folded seeds. Regenerated from the deterministic input matrix via `ROAST_SEED_VECTORS_REGEN=1 go test -run TestRegenerateCoordinatorSeedVectors` — generation-from-spec rather than hand-pinning, per the review. 3. **Conformance test** — `TestCoordinatorSeedDerivation_ConformanceVectors` pins `DeriveAttemptSeed → foldAttemptSeed → SelectCoordinator` end to end against the file, asserts the wire-mapping invariant on every vector, and requires at least one negative-seed pin so an unsigned-integer port cannot pass. The paired Rust PR switches the engine to this derivation (subtracting 1 from the wire attempt number before composition) and consumes a byte-identical copy of the vector file, so either side drifting fails its own CI rather than fracturing coordinator agreement in a mixed deployment. No behavior change on the Go side — it was already normative-conformant; this PR makes that the *specified* behavior and pins it. ## Tests `go test ./pkg/frost/...` passes; vectors verified present with 7 negative-seed pins out of 10. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
…e coordinator shuffle (#4034) Stacked on #3866 (base: `feat/frost-schnorr-migration-scaffold`). Implements the review item "widen the Go↔Rust math/rand parity from a handful of pinned vectors to corpus-based differential fuzzing" — Go half; the Rust consumer is the paired PR stacked on #4005. ## What A generated 600-case differential corpus over `SelectCoordinator` (`testdata/coordinator_shuffle_corpus.json`, 176 KB), replayed by `TestCoordinatorShuffle_DifferentialCorpus` here and by the identical byte-for-byte copy in the Rust signer's `go_math_rand` tests: - **216 boundary cases**: seeds {0, ±1, `i64::MIN/MAX`, `MIN+3`/`MAX−3`, the #4026 pin seed and its negation} × attempts {0, 1, 7, `u32::MAX`} — exercising the two's-complement wrapping `seed + attempt` composition — × six member sets including unsorted and reversed inputs (pinning the internal sort both implementations perform). - **384 generated cases**: fixed-seed generator sweeping set sizes 1..255 (the full `group.MemberIndex` range), full-range `int64` seeds, and small/large/extreme attempt numbers. Regeneration is deterministic and gated (`ROAST_SHUFFLE_CORPUS_REGEN=1`), so the corpus provably comes from the documented case matrix rather than hand-pinning. This complements #4030's Annex-A seed-derivation vectors: those pin the *derivation* end-to-end on 10 vectors; this corpus stress-pins the *shuffle port itself* — the actual cross-language landmine — at volume, including the integer-boundary regions where a port diverges first. Not full continuous fuzzing (no coverage-guided harness); it's the pragmatic corpus-differential version that rides the existing unit-test CI on both sides at negligible cost. A coverage-guided Go-oracle harness can layer on later if desired. ## Tests `go test ./pkg/frost/roast/...` passes (corpus replay + regeneration roundtrip verified). 🤖 Generated with [Claude Code](https://claude.com/claude-code)
…chain timeouts (#4032) Stacked on #4030 (base: `unify/roast-coordinator-seed-go-2026-06-11`), which is itself stacked on #3866 — so the annex numbering (A: seed derivation, B: latency budget) lands in order. Implements item 4 of the review feedback (serial attempts vs real ROAST concurrency: "it should be a computed bound in the docs"). ## What Adds **RFC-21 Annex B (informative): serial-attempt latency budget vs on-chain timeouts** — the computed bound with current code and chain values, and an honest statement of which constraint actually binds: - **Parameters**: 41 blocks (≈8.2 min) per attempt (`signingAttemptMaximumBlocks`), `signingAttemptsLimit = 5`, engine ROAST coordinator timeout 30 s (sub-dominant), n=100/t=51/f=49, `redemptionTimeout` 5 days, moving-funds timeouts 7 days. - **Serial-delay arithmetic**: as deployed, ≤ ~41 min before the loop gives up (~175× inside the redemption timeout); the review's f·τ worst case at a hypothetical f+1 = 50-attempt limit is ~6.8 h (~17× inside). Serial latency comfortably fits the deadlines whenever attempts can succeed at all. - **The honest caveat**: the binding liveness constraint is **all-honest-subset sampling**, not serial latency. Because the transitional finalize requires every included member to contribute, per-attempt success is `∏(49−i)/(100−i)` — 0.49 / 0.24 / 0.11 / 0.025 for f = 1/2/3/5 — so for f ≥ 3 the 5-attempt loop fails with better-than-even odds long before any timeout is approached. The existing `signingAttemptsLimit` rationale in `pkg/tbtc/node.go` explicitly assumes f ≤ 2; beyond that the backstops are operator-inactivity claims, redemption-timeout slashing, and wallet retirement — all outside the signing loop. (This profile is inherited from the tECDSA-era loop, not introduced by FROST.) - **Codified recommendation**: before the ECDSA-retirement phases, adopt t-of-included finalize (first t responsive members — groundwork already in the signer's `true-late-t-of-n-finalize-considerations.md`) at minimum for redemption signings; treat bounded `n−t+1` concurrency as the follow-on; until then alert when observed attempt-failure rates imply f ≥ 3 behaviour. Docs-only; no code change. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
…ence Post-merge follow-up #4 from the June 2026 review stack: replace json.Marshal as the canonical signed-bytes encoding for evidence snapshots and transition bundles before Phase 7 wiring ossifies the format. Design - sign what you transmit, verify what you received: - new pkg/frost/roast/gen/pb/evidence.proto: a snapshot is SignedLocalEvidenceSnapshot{body, operator_signature} where body is the serialized LocalEvidenceSnapshotBody; a transition message is SignedTransitionMessage{body, coordinator_signature} whose body embeds every member's signed snapshot envelope VERBATIM (repeated bytes signed_snapshots) - producers marshal a body exactly once at signing time and cache it; parsed messages retain the received body/envelope bytes verbatim; verification always runs over exact received bytes and nothing in the evidence chain is ever re-encoded - signature validity never depends on any serializer's canonical form, across protobuf library versions or across languages (the Phase 7 Rust signer verifies and parses these same bytes) - CanonicalSnapshotBytes/CanonicalBundleBytes are replaced by SignableBytes() accessors; the coordinator's first-write-wins conflict comparison now compares exact signed bytes - Marshal of a received message returns the received envelope verbatim, so evidence bytes survive re-broadcast; wire-legal but non-canonical encodings also survive (pinned by test) - RFC-21 "Evidence message format" decision rewritten accordingly, with the rationale for retiring canonical JSON Tests: existing suite migrated off JSON fixtures (encode helpers bypass production signing to exercise structural rejection); new wire_test.go pins byte-preservation through re-broadcast, verbatim snapshot-envelope embedding in bundles, non-canonical-encoding survival, and tampered-body verification failure. go build ./..., go vet, gofmt clean; frost + tbtc package tests green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The image strips committed **/gen/**/*.go (.dockerignore) and regenerates protobufs in-image via make generate, which only sees the gen directories explicitly COPY'd before it runs. Without this line the new evidence.proto is absent at generation time and the later full COPY restores only the .proto (not the stripped .pb.go), so go mod tidy fails on the pkg/frost/roast/gen/pb import. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…o coupling Review follow-ups on #4040 (own findings; Codex and Gemini passes were clean): - SignableBytes/Marshal docs now state the returned slice is the internal cache and must not be mutated - in-tree callers are all read-only, this pins the contract for future ones - Makefile gen_proto comment points new proto packages at the Dockerfile gen-directory COPY allowlist, so the next proto package learns about the in-image regeneration coupling from the Makefile instead of from a CI failure Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ence (#4040) Post-merge follow-up **#4** from the June 2026 review stack (#4028–#4035): replace `json.Marshal` as the canonical signed-bytes encoding for evidence snapshots/bundles — explicitly scheduled to land **before Phase 7 wiring ossifies the format**. (Items 2 and 3 landed as #4036/#4037 on the mirror branch; this is the Go-side sibling on the scaffold branch.) ## Why now The RFC-21 Layer B evidence signatures were computed over canonical JSON. That byte stability is a Go-implementation accident — field-order-stable `encoding/json` output — not a portable contract. The moment Phase 7 wires evidence verification into the Rust signer (or any second implementation appears), every verifier would need to replicate Go's exact JSON emission. No persisted or cross-component evidence exists yet, so the format can still change for free. ## Design: sign what you transmit, verify what you received New `pkg/frost/roast/gen/pb/evidence.proto`: - A snapshot travels as `SignedLocalEvidenceSnapshot{body, operator_signature}` where `body` is the serialized `LocalEvidenceSnapshotBody` — the operator signs those exact bytes. - A transition message travels as `SignedTransitionMessage{body, coordinator_signature}` whose `TransitionMessageBody` embeds every member's signed snapshot envelope **verbatim** (`repeated bytes signed_snapshots`) — the coordinator attests to the exact signed snapshots it assembled, in order. - Producers marshal a body **exactly once**, at signing time, and cache it; parsed messages retain received body/envelope bytes verbatim; verification always runs over exact received bytes. **Nothing in the evidence chain is ever re-encoded**, so signature validity never depends on any serializer's canonical form — across protobuf library versions or across languages. This deliberately sidesteps protobuf's own caveat that deterministic serialization is not canonical across implementations. - `Marshal` of a received message returns the received envelope verbatim — evidence bytes survive re-broadcast, including wire-legal but non-canonical encodings (pinned by a handcrafted reversed-field-order test). - `CanonicalSnapshotBytes`/`CanonicalBundleBytes` → `SignableBytes()` accessors; the coordinator's first-write-wins conflict check now compares exact signed bytes. ## Tests - Existing suite migrated off JSON fixtures: test-only encode helpers bypass production signing so every structural-rejection path (zero sender, bad hash length, unsorted/duplicate entries, oversize caps, bundle ordering/hash-binding) is still exercised at the wire level. - New `wire_test.go` pins the format's core properties: byte-preservation through unmarshal→re-marshal, verbatim snapshot-envelope embedding inside bundle bodies, producer-signed bytes == receiver-verified bytes, non-canonical-encoding survival, tampered-body verification failure. - `go build ./...`, `go vet`, `gofmt` clean; frost + tbtc package tests green. Generated with protoc 33.4 / protoc-gen-go v1.36.3 (matches the go.mod protobuf runtime v1.36.3). ## Docs RFC-21 "Evidence message format" decision rewritten: signed-body protobuf envelopes, with the retirement rationale for canonical JSON recorded. ## Notes for reviewers - The in-memory model types (`LocalEvidenceSnapshot`, `TransitionMessage`) are unchanged apart from two unexported byte caches; all call sites kept their shapes. - Immutability contract: evidence fields must not be mutated after `SignableBytes()` is first computed (documented on the cache fields); the aggregation flow already treats snapshots as immutable post-receipt. - Phase 7 cross-language note: the Rust signer will verify operator/coordinator signatures over `body` bytes and parse them with any protobuf implementation — no canonicalization requirements transfer. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Go-host adoption of frost_tbtc_init_signer_config (the Rust side landed in keep-core#4037): hosts can now install the signer's operational configuration at startup instead of exporting ~40 TBTC_SIGNER_* process environment variables. - InstallNativeTBTCSignerConfig passes operator JSON through verbatim; the Rust signer owns the schema and validation (unknown fields rejected, policy combinations validated at install, environment ignored wholesale once installed). The C wrapper follows the existing per-call dlsym pattern, so a loaded library that predates the symbol degrades to the established ErrNativeCryptographyUnavailable classification instead of failing the link. - TBTC_SIGNER_INIT_CONFIG_PATH wires it into native engine registration: when set, the config file is installed BEFORE the engine registers and any failure (unreadable file, validation rejection, or a library without the symbol) fails registration closed - setting the path is an explicit demand for config-mode operation. Unset, registration proceeds on the environment-fallback path exactly as today. - one pointer variable replaces forty value variables; secrets stay on the dedicated env/command key-provider channel. Verified under all three build shapes: default tags (stub + unit test), frost_native only (wiring against the stub), and frost_native+frost_tbtc_signer+cgo (real bridge; compile+vet - running the bridge tests requires the built signer library, which lives on the mirror branch). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…antics Review follow-ups on #4041 (own findings; Codex and Gemini passes were clean): - installConfiguredTBTCSignerInitConfig now logs at error level with config-mode context on every failure path, in addition to the registration layer's generic warning: an operator who explicitly set TBTC_SIGNER_INIT_CONFIG_PATH gets an unmissable signal, not one generic warn line - the doc comment states the precise degradation semantics: the process keeps running on the legacy bridge with FROST operations unavailable (the registration layer deliberately never crashes the binary), so a misconfigured signer can never execute FROST operations but the binary does not exit - TBTCSignerInitConfigPathEnv doc advises restricting the config file's permissions (it may carry the state_key_command execution spec) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Go-host adoption of `frost_tbtc_init_signer_config` — the companion to keep-core#4037 (which added the init-time config FFI to the Rust signer on the mirror branch). Hosts can now install the signer's operational configuration once at startup instead of exporting ~40 `TBTC_SIGNER_*` process environment variables. ## What this adds - **`InstallNativeTBTCSignerConfig(configJSON)`** — passes operator JSON through verbatim; the Rust signer owns the schema and all validation (unknown fields rejected, enforcement-gated policy combinations validated at install with rollback, environment ignored wholesale once installed, idempotent re-install by fingerprint). Go stays a transport: no schema duplication, no drift surface. - **`TBTC_SIGNER_INIT_CONFIG_PATH`** — when set, the JSON file is installed during native FROST engine registration, *before* any other signer call. Setting the path is an explicit demand for config-mode operation, so every failure — unreadable file, validation rejection, or a loaded signer library that predates the symbol — **fails registration closed**. Unset, registration proceeds on the environment-fallback path exactly as today (zero behavior change for existing deployments). Net ops effect: one pointer variable replaces forty value variables; secrets stay on the dedicated env/command key-provider channel. ## Cross-branch compatibility (why this is safe to land now) The bridge resolves every symbol per call via `dlsym(RTLD_DEFAULT, …)` with graceful degradation — the established pattern in this file. A signer library **without** the new symbol returns the existing `ErrNativeCryptographyUnavailable` classification rather than failing the link, so: - path unset + old library → status quo; - path unset + new library → status quo (env fallback, with the signer's own one-time production warning suggesting the init FFI); - path set + new library → config installed, logged with fingerprint/key-count/idempotency; - path set + old library → registration fails closed with a precise error (operator demanded config mode; the deployment is wrong). ## Build-shape verification Three configurations verified: default tags (stub + unit test asserting the unavailable classification), `frost_native` alone (the registration wiring compiles against the stub), and `frost_native && frost_tbtc_signer && cgo` (real cgo bridge, compile + vet — running bridge tests requires the built signer library, which lives on the mirror branch until promotion). `gofmt`, `go vet`, and the full `pkg/frost` test suite are clean under default tags. ## Out of scope keep-client TOML config plumbing (a first-class config section feeding this) — deliberately left for the team's config-schema conventions; the file-pointer wiring gives ops a complete adoption path today without touching the client config surface. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
…oints Implements the binding retention condition attached to the deferral of proof-carrying blame (follow-up item 7, decision 2026-06-12): telemetry and logging must keep enough signed bytes to diagnose whether targeted equivocation is occurring, so the production revisit has data. - new EquivocationEvidence events carry the exact signed snapshot envelopes (SignedLocalEvidenceSnapshot wire bytes verbatim) behind each detection: snapshot_conflict (first-write-wins re-submission mismatch at the coordinator - two operator-signed bodies from the same sender for the same attempt are self-incriminating), own_snapshot_mutated_in_bundle, and own_snapshot_missing_from_bundle - every event is logged in full (rare events; the bytes ARE the diagnosis) and forwarded to a process-wide observer hook following the existing single-observer telemetry pattern, so the host can retain evidence in its telemetry system - emission is additive on the existing error paths and never perturbs them; envelope encoding failures degrade to nil fields with a log - cross-member equivocation comparison (receiver checking a bundle's snapshot for sender X against X's direct broadcast) deliberately remains item-7 scope; these are the detection points that exist today Tests pin byte-exact envelope retention for all three kinds and that idempotent identical re-submission emits nothing. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Self-review finding: the snapshot_conflict emit ran inside RecordEvidence while the coordinator state mutex was held, so a registered observer (host telemetry, possibly a blocking write) would stall every concurrent RecordEvidence/AggregateBundle on that coordinator. The other two emit sites (verifyOwnObservationsPresent) are already lock-free. The evidence value is now materialized under the lock (bytes copied as before) into a local, and a deferred closure - registered before the unlock defer so it runs after it - emits once c.mu is released. Emission reads only the copied bytes, so nothing touches coordinator state after unlock. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…al cache Two P2 findings from Codex's re-review of the equivocation evidence path: - observer panics could escape RecordEvidence/verifyOwnObservationsPresent and abort the protocol path - contradicting emitEquivocationEvidence's own "never fails" contract. The observer call is now wrapped in recover-and-log. - snapshotEnvelopeForEvidence handed the observer the slice returned by Marshal, which is the snapshot's internal wire-envelope cache (the same must-not-mutate contract this stack pinned in the #4040 envelope work). An observer that retained and mutated it would corrupt the cached signed bytes used by later bundle aggregation. It now returns a defensive copy. Regression tests: a panicking observer still yields ErrSnapshotConflict from the protocol path; mutating the evidence envelope bytes leaves the snapshot's cached Marshal output intact. Race detector clean. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…oints (#4044) Implements the binding retention condition from today's decision log (PR #4043): proof-carrying blame (follow-up item 7) is deferred until production, **provided** telemetry/logging retain enough signed bytes to diagnose whether targeted equivocation is occurring — otherwise the revisit condition lacks data. ## What this adds `EquivocationEvidence` events carrying the **exact signed snapshot envelopes** (wire bytes verbatim — the #4040 format makes these available at every detection point) for the three detections that exist today: - `snapshot_conflict` — a sender re-submits a *different* signed snapshot for the same attempt to the coordinator. Both envelopes are retained; two operator-signed bodies from the same sender for the same attempt are self-incriminating, which is exactly the substrate item 7's wire format will formalize. - `own_snapshot_mutated_in_bundle` — a bundle carries this member's snapshot with a signature that differs from what it submitted (both envelopes retained). - `own_snapshot_missing_from_bundle` — censorship detection (self envelope retained). Each event is logged in full (these are rare, and the bytes are the diagnosis) and forwarded to a process-wide observer hook following the repo's existing single-observer telemetry pattern, so hosts can persist evidence into their telemetry stack. Emission is purely additive on the existing error paths — encode failures degrade to nil fields with a log line, never perturbing the protocol path. ## Deliberately out of scope Cross-member comparison (a receiver checking a bundle's snapshot for sender X against X's direct broadcast) — that's item 7 proper. These are the detection points that exist today, instrumented so the production deferral is honest. Tests pin byte-exact envelope retention for all three kinds, and that an idempotent identical re-submission emits nothing. `go build ./...`, vet, gofmt clean; full frost suite green. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
…erting Annex B requires alerting when observed signing-attempt failure rates imply f >= 3 behaviour - the regime where the serial retry loop's 5-attempt budget fails with better-than-even odds and the structural fix (Phase-7 t-of-included finalize) does not exist yet. This is the interim measure scheduled in the gates-doc decision log. The signing retry loop reports every terminal attempt outcome (minority readiness after completed announcement, failed protocol run, failed done-check exchange, or success) through an optional reporter; mechanical iterations that never sample the group (block-timing skips, local announcement errors, context cancellation) are deliberately not reported. Exactly one local signer per wallet reports, so a node records one observation per network-wide attempt. Outcomes feed a process-wide rolling window (50 attempts, shared across wallets via the PerformanceMetrics instance) exporting three gauges: signing_attempt_rolling_success_rate, signing_attempt_rolling_sample_count, and signing_attempt_implied_f_alert (1 when the full window's rate is below 0.14, between the f=2 expectation 0.238 and the f=3 expectation 0.114 of the Annex B sampling model; ~4% false-alarm at f=2, ~64% detection at f=3 per independent window). Operator guidance and threshold-tuning caveats are documented in the rollout adoc; Annex B now points at the implementation. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Decision (2026-06-12): setting TBTC_SIGNER_INIT_CONFIG_PATH demands config-mode FROST operation, so any state in which the FROST-native engine does not come up under a set path is process-fatal in every profile and build flavor, replacing the continue-on-legacy-bridge degradation. The enforcement runs at the end of RegisterNativeExecutionAdapterForBuild and covers the whole failure family: config-install failure, engine-registration failure after a successful install, and a binary built without frost_native (which can never honor the demand). Env-fallback mode (path unset) keeps the safe-by-default degrade posture unchanged. The checks are positive (native adapter + FFI executor actually registered), not merely error-presence, because later registration legs reset LastNativeRegistrationError and could mask an earlier failure. Fatality is deliberately not profile-conditional: an unreadable config file cannot reveal its profile and a missing profile means production (production-by-omission), so path-set is the only non-circular trigger. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Phase 6's call-site migration is complete and the RFC now says so with the honest accounting: the one receive loop with a production consumer (collectBuildTaggedTBTCSignerRoundContributionMessages) is fully migrated - registry-sourced EvidenceRecorder, attemptContextHash binding, end-of-collect snapshot submission into Coordinator.RecordEvidence - while the other two loops named by the original plan (collectNativeFROSTRoundOneMessages/RoundTwoMessages) were deleted with the unreachable generic two-round exchange in commit 1c692bf, so the single-coordinated-change constraint is satisfied by subtraction. The surrounding orchestration (Phase 6.3 executor-adapter entry, error taxonomy, Phase 7.1 bundle production, Phase 7.2 bundle-consuming selector) shipped with it. The two open exit items (legacy evaluator deletion, build-tag removal) are explicitly re-stated as gated on Phase 7's manifest flip, with the legacy evaluator's surviving caller identified as the deliberate rollback path. The FROST readiness manifest gets its home in keep-core (docs/development/frost-readiness-manifest.adoc): it was planned for the tBTC monorepo's docs/operations/ directory, but the monorepo signer is retired and keep-core is canonical, so Phase 7's flip target now exists in this repository. Both gates (ROAST retry, transition evidence) are recorded missing-no-go with explicit flip conditions: Phase 6 shipped (met), real-testnet integration run (pending), verified FrostUniFFIV1 migration (pending). The rollout doc and the RFC's stale receive-loop drop-site references are repointed accordingly. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Releasing the mutex before the SetGauge calls let two concurrent recordings (wallet executors share one tracker) publish out of order, leaving a stale rate or alert value standing until the next attempt. The tracker.mutex -> gauge-lock nesting is the only acquisition order between the two locks (the recorder never calls back into the tracker), so holding the mutex across publication is deadlock-free. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
ObserveApplicationSource prepends the "performance_" application prefix to every gauge it registers, so the series operators query are performance_signing_attempt_*; the rollout doc and the Annex B pointer named the bare source keys, which would have made documented alert queries silently match nothing (Codex/Gemini review finding). The const block now documents the source-key-vs-exported-name distinction so the mistake is not repeated. Also from review: the attempt-outcome exclusion list names the members-selection error path (loop-terminal, not attempt-terminal), and the tracker constructor clamps minimumSamples to the window size so the alert cannot become silently unreachable if the constants ever diverge. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
When an earlier registration leg fails and a later leg succeeds, the later leg's success overwrites the recorded error, so the registration-incomplete fatal message cannot name the cause; direct the operator to the warnings emitted at failure time instead (review finding on the leg-ordering diagnostics gap). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
## Decision being implemented Team decision (2026-06-12, recorded in the Phase 5 gates-doc Decision Log on the mirror branch): setting `TBTC_SIGNER_INIT_CONFIG_PATH` demands config-mode FROST operation, so **any state in which the FROST-native engine did not come up under a set path is process-fatal**, in every profile and build flavor. This replaces #4041's continue-on-the-legacy-bridge degradation and resolves the open team question posted there. Why fatal, in one paragraph: this code ships to production only when FROST is a production duty, so "running but FROST-dead" is the dangerous silent state — a half-alive node erodes FROST wallet fault budgets invisibly, while threshold redundancy is designed to absorb loud, full, bounded outages. Fatality is not profile-conditional because an unreadable config file cannot reveal its profile and a missing profile means production (production-by-omission); path-set is the only non-circular trigger. Uniform semantics also mean testnet rehearses exactly what production will do. ## What this covers (the whole failure family) `enforceNativeInitConfigDemand` runs at the end of `RegisterNativeExecutionAdapterForBuild` and terminates on: - config-install failure (unreadable file, parse/validation rejection, init-time policy or attestation-gate failure, library predating the init symbol), - engine-registration failure after a successful install, - a binary built without `frost_native` (can never honor the demand). The checks are **positive** (native adapter + FFI executor actually registered under the mutex), not merely error-presence: later registration legs reset `LastNativeRegistrationError`, so a nil error does not prove bring-up. Env-fallback mode (path unset) keeps the safe-by-default degrade posture, byte-for-byte. ## Verification - 7 new tests (untagged, run in default CI) pin: no fatal with path unset/whitespace even under registration errors; fatal with cause context on recorded errors; flavor-aware wrong-binary vs incomplete-bringup messages; partial registration (executor without adapter) still fatal; fully-registered no-fatal; the registration entry point fires enforcement in every build flavor. - `gofmt` clean; `go vet` under all three tag shapes (default, `frost_native`, `frost_native,frost_tbtc_signer`); full `./pkg/frost/...` suite green. - The fatal seam (`fatalNativeRegistrationExit`) exists only so tests can observe the abort; production never overrides it. ## Notes for reviewers - A non-node binary (tooling, other packages' test binaries) that imports `pkg/frost/signing` with the env var exported and unsatisfiable will now die at init. That is the decided semantics: the variable demands FROST; a binary that cannot honor it says so loudly. CI does not set the variable. - Mirror-side docs PR records the decision in the gates-doc Decision Log and adds runbook prerequisite 7 (canary config pushes node-by-node; attestation-rotation cadence becomes enforced-by-restart-failure). 🤖 Generated with [Claude Code](https://claude.com/claude-code)
) ## What Implements the interim liveness measure RFC-21 Annex B requires: "alerting should fire when observed attempt failure rates imply f ≥ 3 behaviour" — the regime where the serial retry loop's 5-attempt budget fails with better-than-even odds and the structural fix (Phase-7 t-of-included finalize, gates-doc decision 5) does not exist yet. Scheduled as the pre-canary alerting item in the decision memo / gates doc. ## How - **Reporter in the retry loop** (`pkg/tbtc/signing_loop.go`): every terminal attempt outcome is reported — minority readiness after a completed announcement, failed protocol run, failed done-check exchange, or success. Mechanical iterations that never sample the group (block-timing skips, local announcement errors, context cancellation, loop-terminal members-selection errors) are deliberately excluded so the rate feeding the Annex B sampling model is not diluted by local noise. - **One observation per network attempt**: only the first local signer per wallet reports (all local signers of a wallet observe the same attempts). - **Process-wide rolling window** (`pkg/clientinfo/signing_liveness.go`): 50 outcomes, shared across wallets via the `PerformanceMetrics` instance; three gauges registered at boot. The metrics registry prepends the `performance_` application prefix, so the **exported series operators must query** are: - `performance_signing_attempt_rolling_success_rate` - `performance_signing_attempt_rolling_sample_count` - `performance_signing_attempt_implied_f_alert` — 1 when the full window's rate < 0.14, between the f=2 expectation (0.238) and f=3 expectation (0.114) of the Annex B table (n=100, t=51). Per independent window: ~4% false-alarm at true f=2, ~64% detection at f=3, ~97% at f≥4; the staggered-silence profile (deterministic attempt failure) is detected with certainty. ## Operator guidance New rollout-doc section documents the exported gauge names, the alert-on-sustained-value caveat (the window slides per attempt, consecutive scrapes are correlated), and the threshold-tuning instruction once the Phase 5 baseline calibration worksheet records the benign failure rate. Annex B now points at the implementation. ## Review follow-ups applied - `d119fbd7a`: gauges published under the tracker mutex — concurrent recordings could publish out of order and leave a stale rate/alert standing until the next attempt. - `93b37d3e8`: docs use the exported `performance_`-prefixed series names (Codex/Gemini finding — the bare source keys would have made documented alert queries silently match nothing); const block documents the source-key-vs-exported-name distinction; constructor clamps `minimumSamples` to the window size so the alert cannot become silently unreachable. ## Verification - Tracker unit tests: below-minimum-samples no-alert, alert below threshold, exact-threshold boundary (7/50 = 0.14 does not fire; IEEE 754 correctly-rounded division makes this deterministic), window rollover/eviction. - Retry-loop test pins the reported outcome sequence `[false, true]` for a failed-then-successful attempt using the existing mock harness. - `gofmt`/`go vet` clean (default + `frost_roast_retry` shapes); full `pkg/tbtc` (146s) + `pkg/clientinfo` suites green. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
…in keep-core (#4047) ## What Two things Phase 6 needed that were not code: 1. **The RFC's Phase 6 section now records reality.** The plan named three receive loops to migrate "in a single coordinated change." The tree evolved under the plan: `collectBuildTaggedTBTCSignerRoundContributionMessages` — the only loop with a production consumer — is fully migrated (registry-sourced `EvidenceRecorder`, `attemptContextHash` binding, end-of-collect snapshot submission into `Coordinator.RecordEvidence`, all verified in code), while `collectNativeFROSTRoundOneMessages`/`RoundTwoMessages` were **deleted** with the unreachable generic two-round exchange (commit `1c692bf03`, 2026-06-07) — uniffi-v1 payloads run on the legacy bridge and tbtc-signer-v1 uses the coarse flow, so nothing reached them. The shared-attempt-context constraint is satisfied by subtraction. The surrounding orchestration (Phase 6.3 executor-adapter entry, static-vs-runtime error taxonomy, Phase 7.1 bundle production, Phase 7.2 bundle-consuming selector) shipped with it. The two open exit items (legacy `EvaluateRetryParticipantsForSigning` deletion; `frost_roast_retry` tag removal) are restated as gated on Phase 7's manifest flip, with the legacy evaluator's surviving caller identified as the deliberate rollback path. 2. **The readiness manifest gets a home in the canonical repo** (`docs/development/frost-readiness-manifest.adoc`). It was planned for the tBTC monorepo's `docs/operations/` directory; the monorepo signer is retired and keep-core is canonical, so Phase 7's flip target now actually exists here. Both gates (ROAST retry, transition evidence) recorded `missing-no-go` with explicit flip conditions: Phase 6 shipped (met), real-testnet integration run (pending), verified FrostUniFFIV1 migration (pending). Update discipline restated: flips only with attached evidence. Also fixes the RFC's stale receive-loop drop-site references and repoints the rollout doc at the new manifest. ## Why now Phase 7's only remaining inputs are operational (testnet run, V1-migration verification), not code. Leaving Phase 6 documented as future work is exactly the claims-drift failure mode prior reviews of this stack flagged; this PR closes it with commit-level evidence. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Current State (as of 2026-05-17)
This draft PR is the umbrella readiness branch for
feat/frost-schnorr-migration-scaffold.It is being kept current with
mainso it can become a direct merge target if the FROST/ROAST stack is approved for activation.It remains in draft until the remaining phase-gate, governance, and cross-repository readiness items are closed.
Canonical Status Sources
docs/frost-migration/external-repository-tracking.md(intlabs-xyz/tbtc)docs/reviews/frost-roast-production-readiness-2026-05-16.md(intlabs-xyz/tbtc)Latest Refresh
maininto this branch.frost_native.Remaining Cross-Repo Closure Items
Notes