Skip to content

FROST/ROAST readiness branch#3866

Draft
mswilkison wants to merge 283 commits into
mainfrom
feat/frost-schnorr-migration-scaffold
Draft

FROST/ROAST readiness branch#3866
mswilkison wants to merge 283 commits into
mainfrom
feat/frost-schnorr-migration-scaffold

Conversation

@mswilkison

@mswilkison mswilkison commented Feb 19, 2026

Copy link
Copy Markdown
Contributor

Current State (as of 2026-05-17)

This draft PR is the umbrella readiness branch for feat/frost-schnorr-migration-scaffold.
It is being kept current with main so it can become a direct merge target if the FROST/ROAST stack is approved for activation.

It remains in draft until the remaining phase-gate, governance, and cross-repository readiness items are closed.

Canonical Status Sources

  • Cross-repo migration tracker: docs/frost-migration/external-repository-tracking.md (in tlabs-xyz/tbtc)
  • Companion tBTC umbrella draft: https://github.com/tlabs-xyz/tbtc/pull/10
  • Latest readiness audit: docs/reviews/frost-roast-production-readiness-2026-05-16.md (in tlabs-xyz/tbtc)

Latest Refresh

  • Merged current main into this branch.
  • Local verification passed for the FROST signing package and tBTC signer backend paths, with and without frost_native.
  • Local verification also passed the native TBTC signer-path tests covering the FFI signing primitive and signing executor.

Remaining Cross-Repo Closure Items

  • Wait for CI from the latest refresh to complete.
  • Capture the first post-fix funded nightly live run artifact for Phase 4.
  • Record final approver signoff in the Phase 4 decision/packet docs.
  • Execute external org archive/redirect mapping and record results.

Notes

  • Keep this PR in draft until the activation decision is explicit.
  • Treat it as the readiness branch for the integrated keep-core side of the stack, not only a historical index.

@mswilkison mswilkison changed the title Draft: Add Schnorr/FROST migration scaffold package and RFC Draft: Add Schnorr/FROST scaffold and tBTC runtime signing adapter slice Feb 20, 2026
maclane and others added 29 commits February 20, 2026 20:07
mswilkison and others added 5 commits June 11, 2026 17:49
…g scope

Review follow-ups on the verifiable-blame revision:

- Fix inverted group-shape wording (51-of-100, not 100-of-51).
- State explicitly that near the assumption boundary the accuser
  quorum needs all but 2t-n-1 honest observers (50 of 51 at the
  production shape), so in the high-f regime the gate is a
  fabrication firewall rather than a working exclusion mechanism;
  proof-carrying blame restores exclusion there.
- Document the n-of-n edge of ExclusionAccuserQuorum (quorum 1,
  consistent with f = 0 under that shape's own assumption).
- Make precise that silence parking keys on bundle absence: a member
  that submits its evidence snapshot while withholding its signing
  contribution is not parked by this layer; that cost is bounded by
  the Annex B retry budget until t-of-included finalize.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…nge event

A regen run that produces different bytes means the legacy math/rand
shuffle semantics changed and deployed engines would disagree on
coordinator rotation. Both language suites passing after a dual regen
is not evidence of compatibility with deployed engines.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The i.i.d. subset-sampling table models the deployed legacy loop. The
RFC-21 Layer B transitional path differs in both directions: one-shot
silence is absorbed by parking, but staggered silence (or submitting
evidence snapshots while withholding signing contributions, which
bundle-absence parking does not detect) can fail every attempt
deterministically at small f. The table is therefore not a worst case
for Layer B -- which strengthens, not weakens, the t-of-included
finalize requirement this annex codifies.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… event

A regen run that produces different bytes means the normative Annex A
derivation changed; it requires an annex update in the same change and
a mixed-fleet rollout note. Both language suites passing after a dual
regen is not evidence of compatibility with deployed engines.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Review follow-up (F2): the differential corpus did not exercise Go's
rand.NewSource seed normalization, which reduces the source seed mod
(2^31 - 1) and maps 0 to 89482311 -- so +/-(2^31 - 1) seed the generator
identically to 0. Add both as boundary seeds (verified: same coordinator
as seed 0 at attempt 0 across all boundary sets). A port that
special-cases literal 0 but skips the modulo now fails its own replay.
Corpus grows 600 -> 648 cases; regeneration remains deterministic.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
mswilkison added a commit that referenced this pull request Jun 11, 2026
…+ cross-language conformance vectors (#4031)

Stacked on #4005 (base: `extraction/frost-signer-mirror-2026-05-26`).
Implements item 3 of the review feedback (duplicated, divergent protocol
constants) — Rust half; pairs with the Go-side PR #4030 stacked on
#3866.

## Problem

Flagged in #4026: the engine validated attempt contexts using
`int64_be(MessageDigest[0..8])` with the 1-based wire attempt number
(the legacy `signingAttemptSeed` convention), while the Go RFC-21 layer
derives `fold(SHA256(KeyGroup ‖ SessionID ‖ MessageDigest))` with
0-based attempt numbers. At Phase-7 wiring, every Go-derived attempt
context would fail the engine's strict-mode `validate_attempt_context` —
a deterministic, network-wide liveness failure invisible to either
side's property tests.

## What changed

- **`roast_attempt_shuffle_seed(key_group, session_id,
message_digest_hex)`** implements the normative RFC-21 Annex A
derivation (see #4030). The key-group handle — this engine's hex-encoded
serialized group verifying key — feeds the hash as an opaque UTF-8
string, exactly matching keep-core's `attempt.DeriveAttemptSeed` +
`foldAttemptSeed` composition, including the strict 32-byte digest
requirement.
- **`validate_attempt_context` now takes the session's key group**
(threaded from `dkg.key_group` at StartSignRound and the session's
`DkgResult` at FinalizeSignRound) and composes the shuffle source with
the **0-based** RFC-21 attempt number. The FFI wire encoding stays
1-based (`attempt_number >= 1` still enforced; `wire = AttemptNumber +
1`); the engine subtracts one before composition, per the annex.
- **`testdata/coordinator_seed_vectors.json`** — byte-identical copy of
the canonical file generated from the Go implementation.
`coordinator_seed_derivation_matches_cross_language_vectors` pins, for
all ten vectors: the folded seed (including negative values, so an
unsigned port cannot pass), the selected coordinator (including the
n=100 production-shape set), the 0-/1-based wire mapping, and end-to-end
strict-mode `validate_attempt_context` acceptance of a context built
from the wire encoding. Either language drifting now fails its own unit
suite.
- **`docs/roast-coordinator-seed-derivation.md`** mirrors the normative
annex for signer-side readers, with the regen/copy procedure.
- The coordinator-mismatch test derives the provably-wrong coordinator
instead of hardcoding member 1 (which, under the new seed, happened to
become the correct selection — exactly the class of silent assumption
these vectors exist to catch).

## Notes

- Mixed-version note: engines on the old derivation reject contexts
produced under the new one (and vice versa) — strict-mode attempt
contexts are not yet produced by the Go layer in any deployment, so this
is pre-wiring cleanup with no live-fleet impact.
- The attempt-context vector suite (`roast-attempt-context-v1.json`) is
unaffected: it pins fingerprint/attempt-id domains with the coordinator
as an *input*.
- Port back to the tBTC monorepo signer alongside the next extraction
sync.

## Tests

Full suite: 245 passed, 0 failed; clippy and rustfmt clean. New
conformance test exercises all ten cross-language vectors.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
mswilkison and others added 24 commits June 11, 2026 19:11
…on; overflow can only park (#4029)

Stacked on #3866 (base: `feat/frost-schnorr-migration-scaffold`).
Implements item 2 of the review feedback on the FROST/ROAST stack
(verifiable blame, not counted blame).

## Problem

`NextAttempt` permanently excluded members on unverifiable observer
counters — reject/conflict threshold 1, overflow 4 — **summed across
observers and across reasons**, so a single byzantine observer
fabricating counters could permanently exclude any honest member, and by
repetition grind the included set below threshold
(`ErrAttemptInfeasible`). That inverts ROAST's robustness guarantee: the
design's whole point is liveness with t honest of n, and the blame layer
handed a liveness-and-membership veto to any single member. Bundle
evidence (`OverflowEntry`/`RejectEntry`/`ConflictEntry`) is
observer-signed *claims* — nothing in a counter lets a third party
re-check that the accused misbehaved.

## What changed

**Accuser-quorum gate (`ExclusionAccuserQuorum`).** An accusation only
produces action when made by at least `f+1 = groupSize − threshold + 1`
distinct credible observers (f = the byzantine tolerance). At f+1, at
least one accuser is honest under the protocol's own t-of-n assumption,
so the group may act as if the fault were verified. Real faults reach
quorum naturally — contributions are broadcast, every honest member
observes the same bytes — while f colluding members can never reach f+1
by fabrication. Production shape (n=100, t=51): quorum 50 vs. 49
worst-case byzantine.

**Counting hygiene.** Observers count once per accused per category
regardless of claimed `Count` magnitude; reject reasons no longer
multiply accusers; categories tally independently (reject + conflict
claims no longer sum); only previous-`IncludedSet` members are credible
accusers; accusations against non-original-set members are ignored.

**Overflow can never be permanent.** Transport pressure is observable
only at the transport layer and can never be made self-incriminating. An
*established* (quorum-corroborated) overflow accusation now parks the
member for one attempt — same transient mechanics as silence parking —
instead of excluding forever.

**Sub-quorum claims are ignored entirely**, not parked: acting on a
single unverifiable claim would let one byzantine observer impose an
attempt of liveness cost on any honest member at will.

Established reject/conflict accusations still exclude permanently, and
the policy remains a pure deterministic function of `(prev, bundle,
threshold)`.

## Why quorum rather than self-incriminating proofs now

The review's endgame is proof-carrying blame: the accused's own two
operator-signed conflicting payloads (conflicts), or their signed
contribution plus a re-checkable deterministic validation failure
(rejects). That requires wire-format and verification-routine changes
(the current bundle carries only counters). The quorum gate delivers the
safety property — fabricated blame can never become permanent, and the
grinding-to-infeasibility vector is closed — with no wire change, and is
the correct *floor* even after proofs land (proof-verified entries can
then bypass the quorum per category). RFC-21 Layer B now documents the
policy, the rationale, and that roadmap; the residual cost
(sub-quorum-observed faults burn retry attempts instead of excluding) is
explicitly folded into the serial-attempt latency budget.

## Tests

New regression coverage: quorum boundary at f vs f+1 for both permanent
categories; fabricated-blame grinding across six attempts (single
byzantine accuser, max counters, honest members never move);
count-magnitude fabrication; cross-category non-summing; reason
non-multiplication; non-credible accusers; non-original accused;
established-overflow park-and-reinstate cycle; production-shape quorum
pin `(100, 51) = 50`. Existing overflow/categories/soak tests updated to
the new semantics. `go test ./pkg/frost/... ./pkg/tbtc/...` passes; `go
vet` clean.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
…x A) + cross-language conformance vectors (#4030)

Stacked on #3866 (base: `feat/frost-schnorr-migration-scaffold`).
Implements item 3 of the review feedback (duplicated, divergent protocol
constants) — Go half; the Rust half is the paired PR stacked on #4005.

## Problem

The coordinator-shuffle seed derivation exists twice, in two languages,
on two branches, with no single source of truth — and the two copies
disagree (flagged in #4026):

| | seed | attempt numbering |
|---|---|---|
| Go RFC-21 layer | `fold(SHA256(KeyGroup ‖ SessionID ‖ MessageDigest))`
| 0-based |
| Rust engine validation | `int64_be(MessageDigest[0..8])` (legacy
`signingAttemptSeed` convention) | 1-based wire |

At Phase-7 wiring, every Go-derived attempt context would fail the Rust
engine's strict-mode validation — a network-fracturing liveness failure
that property tests on either side cannot catch.

## What this PR does (Go half)

1. **RFC-21 Annex A (normative)** — single normative definition of the
derivation: inputs (including the exact `KeyGroupBytes` definition for
`FrostTBTCSignerV1` material — the UTF-8 bytes of the hex key-group
handle, treated opaquely), the 0-based composition with the
two's-complement-wrapping addition, the `wire = AttemptNumber + 1` FFI
mapping, and the accepted non-goals (unframed concatenation,
first-8-byte fold, grindability bounds) with rationale. The Go
derivation is adopted as normative: it binds key group + session +
digest rather than the digest alone, and the live `pkg/tbtc` signing
loop's legacy convention is explicitly documented as the thing Phase 7
migrates *from*.

2. **Generated conformance vectors** —
`pkg/frost/roast/testdata/coordinator_seed_vectors.json`: ten end-to-end
vectors (folded seed int64 + selected coordinator) covering attempts
0/1/3/5/7, sparse and production-size (n=100) member sets, opaque
key-group handles, and negative folded seeds. Regenerated from the
deterministic input matrix via `ROAST_SEED_VECTORS_REGEN=1 go test -run
TestRegenerateCoordinatorSeedVectors` — generation-from-spec rather than
hand-pinning, per the review.

3. **Conformance test** —
`TestCoordinatorSeedDerivation_ConformanceVectors` pins
`DeriveAttemptSeed → foldAttemptSeed → SelectCoordinator` end to end
against the file, asserts the wire-mapping invariant on every vector,
and requires at least one negative-seed pin so an unsigned-integer port
cannot pass.

The paired Rust PR switches the engine to this derivation (subtracting 1
from the wire attempt number before composition) and consumes a
byte-identical copy of the vector file, so either side drifting fails
its own CI rather than fracturing coordinator agreement in a mixed
deployment.

No behavior change on the Go side — it was already normative-conformant;
this PR makes that the *specified* behavior and pins it.

## Tests

`go test ./pkg/frost/...` passes; vectors verified present with 7
negative-seed pins out of 10.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
…e coordinator shuffle (#4034)

Stacked on #3866 (base: `feat/frost-schnorr-migration-scaffold`).
Implements the review item "widen the Go↔Rust math/rand parity from a
handful of pinned vectors to corpus-based differential fuzzing" — Go
half; the Rust consumer is the paired PR stacked on #4005.

## What

A generated 600-case differential corpus over `SelectCoordinator`
(`testdata/coordinator_shuffle_corpus.json`, 176 KB), replayed by
`TestCoordinatorShuffle_DifferentialCorpus` here and by the identical
byte-for-byte copy in the Rust signer's `go_math_rand` tests:

- **216 boundary cases**: seeds {0, ±1, `i64::MIN/MAX`, `MIN+3`/`MAX−3`,
the #4026 pin seed and its negation} × attempts {0, 1, 7, `u32::MAX`} —
exercising the two's-complement wrapping `seed + attempt` composition —
× six member sets including unsorted and reversed inputs (pinning the
internal sort both implementations perform).
- **384 generated cases**: fixed-seed generator sweeping set sizes
1..255 (the full `group.MemberIndex` range), full-range `int64` seeds,
and small/large/extreme attempt numbers.

Regeneration is deterministic and gated
(`ROAST_SHUFFLE_CORPUS_REGEN=1`), so the corpus provably comes from the
documented case matrix rather than hand-pinning.

This complements #4030's Annex-A seed-derivation vectors: those pin the
*derivation* end-to-end on 10 vectors; this corpus stress-pins the
*shuffle port itself* — the actual cross-language landmine — at volume,
including the integer-boundary regions where a port diverges first.

Not full continuous fuzzing (no coverage-guided harness); it's the
pragmatic corpus-differential version that rides the existing unit-test
CI on both sides at negligible cost. A coverage-guided Go-oracle harness
can layer on later if desired.

## Tests

`go test ./pkg/frost/roast/...` passes (corpus replay + regeneration
roundtrip verified).

🤖 Generated with [Claude Code](https://claude.com/claude-code)
…chain timeouts (#4032)

Stacked on #4030 (base: `unify/roast-coordinator-seed-go-2026-06-11`),
which is itself stacked on #3866 — so the annex numbering (A: seed
derivation, B: latency budget) lands in order. Implements item 4 of the
review feedback (serial attempts vs real ROAST concurrency: "it should
be a computed bound in the docs").

## What

Adds **RFC-21 Annex B (informative): serial-attempt latency budget vs
on-chain timeouts** — the computed bound with current code and chain
values, and an honest statement of which constraint actually binds:

- **Parameters**: 41 blocks (≈8.2 min) per attempt
(`signingAttemptMaximumBlocks`), `signingAttemptsLimit = 5`, engine
ROAST coordinator timeout 30 s (sub-dominant), n=100/t=51/f=49,
`redemptionTimeout` 5 days, moving-funds timeouts 7 days.
- **Serial-delay arithmetic**: as deployed, ≤ ~41 min before the loop
gives up (~175× inside the redemption timeout); the review's f·τ worst
case at a hypothetical f+1 = 50-attempt limit is ~6.8 h (~17× inside).
Serial latency comfortably fits the deadlines whenever attempts can
succeed at all.
- **The honest caveat**: the binding liveness constraint is
**all-honest-subset sampling**, not serial latency. Because the
transitional finalize requires every included member to contribute,
per-attempt success is `∏(49−i)/(100−i)` — 0.49 / 0.24 / 0.11 / 0.025
for f = 1/2/3/5 — so for f ≥ 3 the 5-attempt loop fails with
better-than-even odds long before any timeout is approached. The
existing `signingAttemptsLimit` rationale in `pkg/tbtc/node.go`
explicitly assumes f ≤ 2; beyond that the backstops are
operator-inactivity claims, redemption-timeout slashing, and wallet
retirement — all outside the signing loop. (This profile is inherited
from the tECDSA-era loop, not introduced by FROST.)
- **Codified recommendation**: before the ECDSA-retirement phases, adopt
t-of-included finalize (first t responsive members — groundwork already
in the signer's `true-late-t-of-n-finalize-considerations.md`) at
minimum for redemption signings; treat bounded `n−t+1` concurrency as
the follow-on; until then alert when observed attempt-failure rates
imply f ≥ 3 behaviour.

Docs-only; no code change.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
…ence

Post-merge follow-up #4 from the June 2026 review stack: replace
json.Marshal as the canonical signed-bytes encoding for evidence
snapshots and transition bundles before Phase 7 wiring ossifies the
format.

Design - sign what you transmit, verify what you received:
- new pkg/frost/roast/gen/pb/evidence.proto: a snapshot is
  SignedLocalEvidenceSnapshot{body, operator_signature} where body is
  the serialized LocalEvidenceSnapshotBody; a transition message is
  SignedTransitionMessage{body, coordinator_signature} whose body
  embeds every member's signed snapshot envelope VERBATIM
  (repeated bytes signed_snapshots)
- producers marshal a body exactly once at signing time and cache it;
  parsed messages retain the received body/envelope bytes verbatim;
  verification always runs over exact received bytes and nothing in
  the evidence chain is ever re-encoded - signature validity never
  depends on any serializer's canonical form, across protobuf library
  versions or across languages (the Phase 7 Rust signer verifies and
  parses these same bytes)
- CanonicalSnapshotBytes/CanonicalBundleBytes are replaced by
  SignableBytes() accessors; the coordinator's first-write-wins
  conflict comparison now compares exact signed bytes
- Marshal of a received message returns the received envelope
  verbatim, so evidence bytes survive re-broadcast; wire-legal but
  non-canonical encodings also survive (pinned by test)
- RFC-21 "Evidence message format" decision rewritten accordingly,
  with the rationale for retiring canonical JSON

Tests: existing suite migrated off JSON fixtures (encode helpers
bypass production signing to exercise structural rejection); new
wire_test.go pins byte-preservation through re-broadcast, verbatim
snapshot-envelope embedding in bundles, non-canonical-encoding
survival, and tampered-body verification failure. go build ./...,
go vet, gofmt clean; frost + tbtc package tests green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The image strips committed **/gen/**/*.go (.dockerignore) and
regenerates protobufs in-image via make generate, which only sees the
gen directories explicitly COPY'd before it runs. Without this line the
new evidence.proto is absent at generation time and the later full COPY
restores only the .proto (not the stripped .pb.go), so go mod tidy
fails on the pkg/frost/roast/gen/pb import.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…o coupling

Review follow-ups on #4040 (own findings; Codex and Gemini passes were
clean):

- SignableBytes/Marshal docs now state the returned slice is the
  internal cache and must not be mutated - in-tree callers are all
  read-only, this pins the contract for future ones
- Makefile gen_proto comment points new proto packages at the
  Dockerfile gen-directory COPY allowlist, so the next proto package
  learns about the in-image regeneration coupling from the Makefile
  instead of from a CI failure

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ence (#4040)

Post-merge follow-up **#4** from the June 2026 review stack
(#4028#4035): replace `json.Marshal` as the canonical signed-bytes
encoding for evidence snapshots/bundles — explicitly scheduled to land
**before Phase 7 wiring ossifies the format**. (Items 2 and 3 landed as
#4036/#4037 on the mirror branch; this is the Go-side sibling on the
scaffold branch.)

## Why now

The RFC-21 Layer B evidence signatures were computed over canonical
JSON. That byte stability is a Go-implementation accident —
field-order-stable `encoding/json` output — not a portable contract. The
moment Phase 7 wires evidence verification into the Rust signer (or any
second implementation appears), every verifier would need to replicate
Go's exact JSON emission. No persisted or cross-component evidence
exists yet, so the format can still change for free.

## Design: sign what you transmit, verify what you received

New `pkg/frost/roast/gen/pb/evidence.proto`:

- A snapshot travels as `SignedLocalEvidenceSnapshot{body,
operator_signature}` where `body` is the serialized
`LocalEvidenceSnapshotBody` — the operator signs those exact bytes.
- A transition message travels as `SignedTransitionMessage{body,
coordinator_signature}` whose `TransitionMessageBody` embeds every
member's signed snapshot envelope **verbatim** (`repeated bytes
signed_snapshots`) — the coordinator attests to the exact signed
snapshots it assembled, in order.
- Producers marshal a body **exactly once**, at signing time, and cache
it; parsed messages retain received body/envelope bytes verbatim;
verification always runs over exact received bytes. **Nothing in the
evidence chain is ever re-encoded**, so signature validity never depends
on any serializer's canonical form — across protobuf library versions or
across languages. This deliberately sidesteps protobuf's own caveat that
deterministic serialization is not canonical across implementations.
- `Marshal` of a received message returns the received envelope verbatim
— evidence bytes survive re-broadcast, including wire-legal but
non-canonical encodings (pinned by a handcrafted reversed-field-order
test).
- `CanonicalSnapshotBytes`/`CanonicalBundleBytes` → `SignableBytes()`
accessors; the coordinator's first-write-wins conflict check now
compares exact signed bytes.

## Tests

- Existing suite migrated off JSON fixtures: test-only encode helpers
bypass production signing so every structural-rejection path (zero
sender, bad hash length, unsorted/duplicate entries, oversize caps,
bundle ordering/hash-binding) is still exercised at the wire level.
- New `wire_test.go` pins the format's core properties:
byte-preservation through unmarshal→re-marshal, verbatim
snapshot-envelope embedding inside bundle bodies, producer-signed bytes
== receiver-verified bytes, non-canonical-encoding survival,
tampered-body verification failure.
- `go build ./...`, `go vet`, `gofmt` clean; frost + tbtc package tests
green. Generated with protoc 33.4 / protoc-gen-go v1.36.3 (matches the
go.mod protobuf runtime v1.36.3).

## Docs

RFC-21 "Evidence message format" decision rewritten: signed-body
protobuf envelopes, with the retirement rationale for canonical JSON
recorded.

## Notes for reviewers

- The in-memory model types (`LocalEvidenceSnapshot`,
`TransitionMessage`) are unchanged apart from two unexported byte
caches; all call sites kept their shapes.
- Immutability contract: evidence fields must not be mutated after
`SignableBytes()` is first computed (documented on the cache fields);
the aggregation flow already treats snapshots as immutable post-receipt.
- Phase 7 cross-language note: the Rust signer will verify
operator/coordinator signatures over `body` bytes and parse them with
any protobuf implementation — no canonicalization requirements transfer.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Go-host adoption of frost_tbtc_init_signer_config (the Rust side landed
in keep-core#4037): hosts can now install the signer's operational
configuration at startup instead of exporting ~40 TBTC_SIGNER_* process
environment variables.

- InstallNativeTBTCSignerConfig passes operator JSON through verbatim;
  the Rust signer owns the schema and validation (unknown fields
  rejected, policy combinations validated at install, environment
  ignored wholesale once installed). The C wrapper follows the existing
  per-call dlsym pattern, so a loaded library that predates the symbol
  degrades to the established ErrNativeCryptographyUnavailable
  classification instead of failing the link.
- TBTC_SIGNER_INIT_CONFIG_PATH wires it into native engine
  registration: when set, the config file is installed BEFORE the
  engine registers and any failure (unreadable file, validation
  rejection, or a library without the symbol) fails registration
  closed - setting the path is an explicit demand for config-mode
  operation. Unset, registration proceeds on the environment-fallback
  path exactly as today.
- one pointer variable replaces forty value variables; secrets stay on
  the dedicated env/command key-provider channel.

Verified under all three build shapes: default tags (stub + unit test),
frost_native only (wiring against the stub), and
frost_native+frost_tbtc_signer+cgo (real bridge; compile+vet - running
the bridge tests requires the built signer library, which lives on the
mirror branch).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…antics

Review follow-ups on #4041 (own findings; Codex and Gemini passes were
clean):

- installConfiguredTBTCSignerInitConfig now logs at error level with
  config-mode context on every failure path, in addition to the
  registration layer's generic warning: an operator who explicitly set
  TBTC_SIGNER_INIT_CONFIG_PATH gets an unmissable signal, not one
  generic warn line
- the doc comment states the precise degradation semantics: the
  process keeps running on the legacy bridge with FROST operations
  unavailable (the registration layer deliberately never crashes the
  binary), so a misconfigured signer can never execute FROST
  operations but the binary does not exit
- TBTCSignerInitConfigPathEnv doc advises restricting the config
  file's permissions (it may carry the state_key_command execution
  spec)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Go-host adoption of `frost_tbtc_init_signer_config` — the companion to
keep-core#4037 (which added the init-time config FFI to the Rust signer
on the mirror branch). Hosts can now install the signer's operational
configuration once at startup instead of exporting ~40 `TBTC_SIGNER_*`
process environment variables.

## What this adds

- **`InstallNativeTBTCSignerConfig(configJSON)`** — passes operator JSON
through verbatim; the Rust signer owns the schema and all validation
(unknown fields rejected, enforcement-gated policy combinations
validated at install with rollback, environment ignored wholesale once
installed, idempotent re-install by fingerprint). Go stays a transport:
no schema duplication, no drift surface.
- **`TBTC_SIGNER_INIT_CONFIG_PATH`** — when set, the JSON file is
installed during native FROST engine registration, *before* any other
signer call. Setting the path is an explicit demand for config-mode
operation, so every failure — unreadable file, validation rejection, or
a loaded signer library that predates the symbol — **fails registration
closed**. Unset, registration proceeds on the environment-fallback path
exactly as today (zero behavior change for existing deployments). Net
ops effect: one pointer variable replaces forty value variables; secrets
stay on the dedicated env/command key-provider channel.

## Cross-branch compatibility (why this is safe to land now)

The bridge resolves every symbol per call via `dlsym(RTLD_DEFAULT, …)`
with graceful degradation — the established pattern in this file. A
signer library **without** the new symbol returns the existing
`ErrNativeCryptographyUnavailable` classification rather than failing
the link, so:

- path unset + old library → status quo;
- path unset + new library → status quo (env fallback, with the signer's
own one-time production warning suggesting the init FFI);
- path set + new library → config installed, logged with
fingerprint/key-count/idempotency;
- path set + old library → registration fails closed with a precise
error (operator demanded config mode; the deployment is wrong).

## Build-shape verification

Three configurations verified: default tags (stub + unit test asserting
the unavailable classification), `frost_native` alone (the registration
wiring compiles against the stub), and `frost_native &&
frost_tbtc_signer && cgo` (real cgo bridge, compile + vet — running
bridge tests requires the built signer library, which lives on the
mirror branch until promotion). `gofmt`, `go vet`, and the full
`pkg/frost` test suite are clean under default tags.

## Out of scope

keep-client TOML config plumbing (a first-class config section feeding
this) — deliberately left for the team's config-schema conventions; the
file-pointer wiring gives ops a complete adoption path today without
touching the client config surface.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
…oints

Implements the binding retention condition attached to the deferral of
proof-carrying blame (follow-up item 7, decision 2026-06-12): telemetry
and logging must keep enough signed bytes to diagnose whether targeted
equivocation is occurring, so the production revisit has data.

- new EquivocationEvidence events carry the exact signed snapshot
  envelopes (SignedLocalEvidenceSnapshot wire bytes verbatim) behind
  each detection: snapshot_conflict (first-write-wins re-submission
  mismatch at the coordinator - two operator-signed bodies from the
  same sender for the same attempt are self-incriminating),
  own_snapshot_mutated_in_bundle, and own_snapshot_missing_from_bundle
- every event is logged in full (rare events; the bytes ARE the
  diagnosis) and forwarded to a process-wide observer hook following
  the existing single-observer telemetry pattern, so the host can
  retain evidence in its telemetry system
- emission is additive on the existing error paths and never perturbs
  them; envelope encoding failures degrade to nil fields with a log
- cross-member equivocation comparison (receiver checking a bundle's
  snapshot for sender X against X's direct broadcast) deliberately
  remains item-7 scope; these are the detection points that exist
  today

Tests pin byte-exact envelope retention for all three kinds and that
idempotent identical re-submission emits nothing.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Self-review finding: the snapshot_conflict emit ran inside RecordEvidence
while the coordinator state mutex was held, so a registered observer
(host telemetry, possibly a blocking write) would stall every concurrent
RecordEvidence/AggregateBundle on that coordinator. The other two emit
sites (verifyOwnObservationsPresent) are already lock-free.

The evidence value is now materialized under the lock (bytes copied as
before) into a local, and a deferred closure - registered before the
unlock defer so it runs after it - emits once c.mu is released. Emission
reads only the copied bytes, so nothing touches coordinator state after
unlock.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…al cache

Two P2 findings from Codex's re-review of the equivocation evidence path:

- observer panics could escape RecordEvidence/verifyOwnObservationsPresent
  and abort the protocol path - contradicting emitEquivocationEvidence's
  own "never fails" contract. The observer call is now wrapped in
  recover-and-log.
- snapshotEnvelopeForEvidence handed the observer the slice returned by
  Marshal, which is the snapshot's internal wire-envelope cache (the same
  must-not-mutate contract this stack pinned in the #4040 envelope work).
  An observer that retained and mutated it would corrupt the cached
  signed bytes used by later bundle aggregation. It now returns a
  defensive copy.

Regression tests: a panicking observer still yields ErrSnapshotConflict
from the protocol path; mutating the evidence envelope bytes leaves the
snapshot's cached Marshal output intact. Race detector clean.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…oints (#4044)

Implements the binding retention condition from today's decision log (PR
#4043): proof-carrying blame (follow-up item 7) is deferred until
production, **provided** telemetry/logging retain enough signed bytes to
diagnose whether targeted equivocation is occurring — otherwise the
revisit condition lacks data.

## What this adds

`EquivocationEvidence` events carrying the **exact signed snapshot
envelopes** (wire bytes verbatim — the #4040 format makes these
available at every detection point) for the three detections that exist
today:

- `snapshot_conflict` — a sender re-submits a *different* signed
snapshot for the same attempt to the coordinator. Both envelopes are
retained; two operator-signed bodies from the same sender for the same
attempt are self-incriminating, which is exactly the substrate item 7's
wire format will formalize.
- `own_snapshot_mutated_in_bundle` — a bundle carries this member's
snapshot with a signature that differs from what it submitted (both
envelopes retained).
- `own_snapshot_missing_from_bundle` — censorship detection (self
envelope retained).

Each event is logged in full (these are rare, and the bytes are the
diagnosis) and forwarded to a process-wide observer hook following the
repo's existing single-observer telemetry pattern, so hosts can persist
evidence into their telemetry stack. Emission is purely additive on the
existing error paths — encode failures degrade to nil fields with a log
line, never perturbing the protocol path.

## Deliberately out of scope

Cross-member comparison (a receiver checking a bundle's snapshot for
sender X against X's direct broadcast) — that's item 7 proper. These are
the detection points that exist today, instrumented so the production
deferral is honest.

Tests pin byte-exact envelope retention for all three kinds, and that an
idempotent identical re-submission emits nothing. `go build ./...`, vet,
gofmt clean; full frost suite green.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
…erting

Annex B requires alerting when observed signing-attempt failure rates
imply f >= 3 behaviour - the regime where the serial retry loop's
5-attempt budget fails with better-than-even odds and the structural
fix (Phase-7 t-of-included finalize) does not exist yet. This is the
interim measure scheduled in the gates-doc decision log.

The signing retry loop reports every terminal attempt outcome
(minority readiness after completed announcement, failed protocol run,
failed done-check exchange, or success) through an optional reporter;
mechanical iterations that never sample the group (block-timing skips,
local announcement errors, context cancellation) are deliberately not
reported. Exactly one local signer per wallet reports, so a node
records one observation per network-wide attempt.

Outcomes feed a process-wide rolling window (50 attempts, shared
across wallets via the PerformanceMetrics instance) exporting three
gauges: signing_attempt_rolling_success_rate,
signing_attempt_rolling_sample_count, and
signing_attempt_implied_f_alert (1 when the full window's rate is
below 0.14, between the f=2 expectation 0.238 and the f=3 expectation
0.114 of the Annex B sampling model; ~4% false-alarm at f=2, ~64%
detection at f=3 per independent window). Operator guidance and
threshold-tuning caveats are documented in the rollout adoc; Annex B
now points at the implementation.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Decision (2026-06-12): setting TBTC_SIGNER_INIT_CONFIG_PATH demands
config-mode FROST operation, so any state in which the FROST-native
engine does not come up under a set path is process-fatal in every
profile and build flavor, replacing the continue-on-legacy-bridge
degradation. The enforcement runs at the end of
RegisterNativeExecutionAdapterForBuild and covers the whole failure
family: config-install failure, engine-registration failure after a
successful install, and a binary built without frost_native (which can
never honor the demand). Env-fallback mode (path unset) keeps the
safe-by-default degrade posture unchanged.

The checks are positive (native adapter + FFI executor actually
registered), not merely error-presence, because later registration legs
reset LastNativeRegistrationError and could mask an earlier failure.
Fatality is deliberately not profile-conditional: an unreadable config
file cannot reveal its profile and a missing profile means production
(production-by-omission), so path-set is the only non-circular trigger.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Phase 6's call-site migration is complete and the RFC now says so
with the honest accounting: the one receive loop with a production
consumer (collectBuildTaggedTBTCSignerRoundContributionMessages) is
fully migrated - registry-sourced EvidenceRecorder, attemptContextHash
binding, end-of-collect snapshot submission into
Coordinator.RecordEvidence - while the other two loops named by the
original plan (collectNativeFROSTRoundOneMessages/RoundTwoMessages)
were deleted with the unreachable generic two-round exchange in
commit 1c692bf, so the single-coordinated-change constraint is
satisfied by subtraction. The surrounding orchestration (Phase 6.3
executor-adapter entry, error taxonomy, Phase 7.1 bundle production,
Phase 7.2 bundle-consuming selector) shipped with it. The two open
exit items (legacy evaluator deletion, build-tag removal) are
explicitly re-stated as gated on Phase 7's manifest flip, with the
legacy evaluator's surviving caller identified as the deliberate
rollback path.

The FROST readiness manifest gets its home in keep-core
(docs/development/frost-readiness-manifest.adoc): it was planned for
the tBTC monorepo's docs/operations/ directory, but the monorepo
signer is retired and keep-core is canonical, so Phase 7's flip
target now exists in this repository. Both gates (ROAST retry,
transition evidence) are recorded missing-no-go with explicit flip
conditions: Phase 6 shipped (met), real-testnet integration run
(pending), verified FrostUniFFIV1 migration (pending). The rollout
doc and the RFC's stale receive-loop drop-site references are
repointed accordingly.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Releasing the mutex before the SetGauge calls let two concurrent
recordings (wallet executors share one tracker) publish out of order,
leaving a stale rate or alert value standing until the next attempt.
The tracker.mutex -> gauge-lock nesting is the only acquisition order
between the two locks (the recorder never calls back into the
tracker), so holding the mutex across publication is deadlock-free.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
ObserveApplicationSource prepends the "performance_" application
prefix to every gauge it registers, so the series operators query are
performance_signing_attempt_*; the rollout doc and the Annex B pointer
named the bare source keys, which would have made documented alert
queries silently match nothing (Codex/Gemini review finding). The
const block now documents the source-key-vs-exported-name distinction
so the mistake is not repeated.

Also from review: the attempt-outcome exclusion list names the
members-selection error path (loop-terminal, not attempt-terminal),
and the tracker constructor clamps minimumSamples to the window size
so the alert cannot become silently unreachable if the constants ever
diverge.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
When an earlier registration leg fails and a later leg succeeds, the
later leg's success overwrites the recorded error, so the
registration-incomplete fatal message cannot name the cause; direct
the operator to the warnings emitted at failure time instead (review
finding on the leg-ordering diagnostics gap).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
## Decision being implemented

Team decision (2026-06-12, recorded in the Phase 5 gates-doc Decision
Log on the mirror branch): setting `TBTC_SIGNER_INIT_CONFIG_PATH`
demands config-mode FROST operation, so **any state in which the
FROST-native engine did not come up under a set path is process-fatal**,
in every profile and build flavor. This replaces #4041's
continue-on-the-legacy-bridge degradation and resolves the open team
question posted there.

Why fatal, in one paragraph: this code ships to production only when
FROST is a production duty, so "running but FROST-dead" is the dangerous
silent state — a half-alive node erodes FROST wallet fault budgets
invisibly, while threshold redundancy is designed to absorb loud, full,
bounded outages. Fatality is not profile-conditional because an
unreadable config file cannot reveal its profile and a missing profile
means production (production-by-omission); path-set is the only
non-circular trigger. Uniform semantics also mean testnet rehearses
exactly what production will do.

## What this covers (the whole failure family)

`enforceNativeInitConfigDemand` runs at the end of
`RegisterNativeExecutionAdapterForBuild` and terminates on:
- config-install failure (unreadable file, parse/validation rejection,
init-time policy or attestation-gate failure, library predating the init
symbol),
- engine-registration failure after a successful install,
- a binary built without `frost_native` (can never honor the demand).

The checks are **positive** (native adapter + FFI executor actually
registered under the mutex), not merely error-presence: later
registration legs reset `LastNativeRegistrationError`, so a nil error
does not prove bring-up. Env-fallback mode (path unset) keeps the
safe-by-default degrade posture, byte-for-byte.

## Verification

- 7 new tests (untagged, run in default CI) pin: no fatal with path
unset/whitespace even under registration errors; fatal with cause
context on recorded errors; flavor-aware wrong-binary vs
incomplete-bringup messages; partial registration (executor without
adapter) still fatal; fully-registered no-fatal; the registration entry
point fires enforcement in every build flavor.
- `gofmt` clean; `go vet` under all three tag shapes (default,
`frost_native`, `frost_native,frost_tbtc_signer`); full
`./pkg/frost/...` suite green.
- The fatal seam (`fatalNativeRegistrationExit`) exists only so tests
can observe the abort; production never overrides it.

## Notes for reviewers

- A non-node binary (tooling, other packages' test binaries) that
imports `pkg/frost/signing` with the env var exported and unsatisfiable
will now die at init. That is the decided semantics: the variable
demands FROST; a binary that cannot honor it says so loudly. CI does not
set the variable.
- Mirror-side docs PR records the decision in the gates-doc Decision Log
and adds runbook prerequisite 7 (canary config pushes node-by-node;
attestation-rotation cadence becomes enforced-by-restart-failure).

🤖 Generated with [Claude Code](https://claude.com/claude-code)
)

## What

Implements the interim liveness measure RFC-21 Annex B requires:
"alerting should fire when observed attempt failure rates imply f ≥ 3
behaviour" — the regime where the serial retry loop's 5-attempt budget
fails with better-than-even odds and the structural fix (Phase-7
t-of-included finalize, gates-doc decision 5) does not exist yet.
Scheduled as the pre-canary alerting item in the decision memo / gates
doc.

## How

- **Reporter in the retry loop** (`pkg/tbtc/signing_loop.go`): every
terminal attempt outcome is reported — minority readiness after a
completed announcement, failed protocol run, failed done-check exchange,
or success. Mechanical iterations that never sample the group
(block-timing skips, local announcement errors, context cancellation,
loop-terminal members-selection errors) are deliberately excluded so the
rate feeding the Annex B sampling model is not diluted by local noise.
- **One observation per network attempt**: only the first local signer
per wallet reports (all local signers of a wallet observe the same
attempts).
- **Process-wide rolling window**
(`pkg/clientinfo/signing_liveness.go`): 50 outcomes, shared across
wallets via the `PerformanceMetrics` instance; three gauges registered
at boot. The metrics registry prepends the `performance_` application
prefix, so the **exported series operators must query** are:
  - `performance_signing_attempt_rolling_success_rate`
  - `performance_signing_attempt_rolling_sample_count`
- `performance_signing_attempt_implied_f_alert` — 1 when the full
window's rate < 0.14, between the f=2 expectation (0.238) and f=3
expectation (0.114) of the Annex B table (n=100, t=51). Per independent
window: ~4% false-alarm at true f=2, ~64% detection at f=3, ~97% at f≥4;
the staggered-silence profile (deterministic attempt failure) is
detected with certainty.

## Operator guidance

New rollout-doc section documents the exported gauge names, the
alert-on-sustained-value caveat (the window slides per attempt,
consecutive scrapes are correlated), and the threshold-tuning
instruction once the Phase 5 baseline calibration worksheet records the
benign failure rate. Annex B now points at the implementation.

## Review follow-ups applied

- `d119fbd7a`: gauges published under the tracker mutex — concurrent
recordings could publish out of order and leave a stale rate/alert
standing until the next attempt.
- `93b37d3e8`: docs use the exported `performance_`-prefixed series
names (Codex/Gemini finding — the bare source keys would have made
documented alert queries silently match nothing); const block documents
the source-key-vs-exported-name distinction; constructor clamps
`minimumSamples` to the window size so the alert cannot become silently
unreachable.

## Verification

- Tracker unit tests: below-minimum-samples no-alert, alert below
threshold, exact-threshold boundary (7/50 = 0.14 does not fire; IEEE 754
correctly-rounded division makes this deterministic), window
rollover/eviction.
- Retry-loop test pins the reported outcome sequence `[false, true]` for
a failed-then-successful attempt using the existing mock harness.
- `gofmt`/`go vet` clean (default + `frost_roast_retry` shapes); full
`pkg/tbtc` (146s) + `pkg/clientinfo` suites green.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
…in keep-core (#4047)

## What

Two things Phase 6 needed that were not code:

1. **The RFC's Phase 6 section now records reality.** The plan named
three receive loops to migrate "in a single coordinated change." The
tree evolved under the plan:
`collectBuildTaggedTBTCSignerRoundContributionMessages` — the only loop
with a production consumer — is fully migrated (registry-sourced
`EvidenceRecorder`, `attemptContextHash` binding, end-of-collect
snapshot submission into `Coordinator.RecordEvidence`, all verified in
code), while `collectNativeFROSTRoundOneMessages`/`RoundTwoMessages`
were **deleted** with the unreachable generic two-round exchange (commit
`1c692bf03`, 2026-06-07) — uniffi-v1 payloads run on the legacy bridge
and tbtc-signer-v1 uses the coarse flow, so nothing reached them. The
shared-attempt-context constraint is satisfied by subtraction. The
surrounding orchestration (Phase 6.3 executor-adapter entry,
static-vs-runtime error taxonomy, Phase 7.1 bundle production, Phase 7.2
bundle-consuming selector) shipped with it. The two open exit items
(legacy `EvaluateRetryParticipantsForSigning` deletion;
`frost_roast_retry` tag removal) are restated as gated on Phase 7's
manifest flip, with the legacy evaluator's surviving caller identified
as the deliberate rollback path.

2. **The readiness manifest gets a home in the canonical repo**
(`docs/development/frost-readiness-manifest.adoc`). It was planned for
the tBTC monorepo's `docs/operations/` directory; the monorepo signer is
retired and keep-core is canonical, so Phase 7's flip target now
actually exists here. Both gates (ROAST retry, transition evidence)
recorded `missing-no-go` with explicit flip conditions: Phase 6 shipped
(met), real-testnet integration run (pending), verified FrostUniFFIV1
migration (pending). Update discipline restated: flips only with
attached evidence.

Also fixes the RFC's stale receive-loop drop-site references and
repoints the rollout doc at the new manifest.

## Why now

Phase 7's only remaining inputs are operational (testnet run,
V1-migration verification), not code. Leaving Phase 6 documented as
future work is exactly the claims-drift failure mode prior reviews of
this stack flagged; this PR closes it with commit-level evidence.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant