Skip to content

feat(desktop): realtime-as-hub voice path (Phase 1 BYOK + Phase 2 ephemeral)#7970

Draft
vendz wants to merge 63 commits into
BasedHardware:mainfrom
vendz:feature/realtime-hub
Draft

feat(desktop): realtime-as-hub voice path (Phase 1 BYOK + Phase 2 ephemeral)#7970
vendz wants to merge 63 commits into
BasedHardware:mainfrom
vendz:feature/realtime-hub

Conversation

@vendz

@vendz vendz commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Realtime-as-hub voice path for the desktop floating bar — one realtime model does in-session STT + reasoning + routing (via tool choice) + spoken reply, replacing the STT → Haiku-router → Claude → TTS cascade on the push-to-talk path.

⚠️ Stacked on #7957. This branch sits on top of the prompt-cache / QueryTracer work in #7957 — the realtime code instruments QueryTracer and calls the prompt-cached /v2/chat/completions, so those commits are a prerequisite and can't be dropped without breaking the build. GitHub can only base this on main, so until #7957 merges the bottom ~12 commits of this diff belong to #7957 — review only the realtime commits on top; they drop out of the diff automatically once #7957 lands. Merge #7957 first.

Reliability + voice-UX pass (latest commits)

Hardening after live testing surfaced drops, cutoffs, and routing issues on the Gemini path:

  • Barge-in is now in-session — interrupt the in-flight reply with a fresh activityStart instead of tearing down + reconnecting the socket (the reconnect lost conversation context and dropped the next turn). A per-turn reply gate stops an interrupted turn's trailing audio / bookkeeping turnComplete from leaking into the next.
  • WS keepalive ping so the Gemini socket doesn't idle-close (~2.5 min) and silently fall back to STT; managed-user auto-re-warm fixed (was gated on an isActive that's false once the session is nil, so it never recovered).
  • Energy-first speech gate — loud/clear speech always passes (Silero was intermittently dropping real speech); dropped turns now log peak/RMS/device for real diagnosis instead of guesswork.
  • Screenshot is sent as a realtimeInput.video frame — the Live model rejects mid-session clientContent with close 1007, which was killing the socket.
  • Context-window compression enabled so long sessions don't degrade/stall.
  • get_tasks local read tool (speaks your real tasks, no background agent); routing rewritten — answer creative/general/long-form yourself, escalate to ask_higher_model only on user pushback or precise-fact needs, and spawn_agent must actually emit the call for actions (it was narrating instead of acting).
  • Voice follow-up to agents — a mic button on a finished pill captures via the hub STT and continues that agent's session (continueAgent); finished agents keep their session alive for follow-ups (capped at 8, oldest trimmed).

Verified (live)

  • BYOK + managed/ephemeral connect and run full turns on OpenAI and Gemini.
  • get_tasks speaks the user's real tasks; voice follow-up routes the transcript into the agent's session; mint route 401 unauthed / 200 + token authed.
  • Clean release build (arm64).

⚠️ Known limitation

Gemini seems to bug out for no apparent reason. sometimes it works flawlessly but sometimes it just doesn't. Currently trying to pinpoint the issue

🤖 Generated with Claude Code

vendz and others added 30 commits June 14, 2026 11:53
…letions

Emit ephemeral cache_control breakpoints in the OpenAI->Anthropic
translation: one on the system block (caches the static tools+system
prefix, ~11k tok) and one on the latest user message (caches the
conversation prefix, so tool-loop rounds read it at 0.1x). Surface
prompt_tokens_details.cached_tokens so cache hits propagate to traces.

Forward-ports cycle-5 caching from PR BasedHardware#7583 onto the current
desktop/macos/ layout and adds the latest-user-message breakpoint.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ChatProvider; scope floating-bar tools to user data

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s; trace floating-bar spans

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…path

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e app bundle

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…zation

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… benchmarking

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…playback

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ai) + BYOK gate

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…wn_agent/screenshot/point_click) + system prompt

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ealtime WS with function calling

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-dispatching voice hub

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…elease; bypass Haiku router on voice path

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…; rely on hard 402)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ating-bar settings

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…b turn on either provider

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…E2E verification

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…'t reach Gemini Live, this one does

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ails; TEXT models deprecated)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…EXT models deprecated)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ech is no-audio fallback

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…isting model; verified AUDIO+tools)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… heard) + OpenAI active-response guard + provider/model log tags

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…uble-warming; log which API/model handles each step

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…) to spawn_agent — model has no direct data access

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
vendz and others added 18 commits June 15, 2026 14:24
…m socket after idle-close

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…es the model answer

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… interrupts via reconnect

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…y), abandon silent turns, guard post-reconnect errors

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ilt-in) + gentler silence gate so real speech isn't dropped

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…h, not amplitude) — Clicky's commit-vs-clear, better tuned; RMS fallback

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…altime tokens (OpenAI client_secrets / Gemini auth_tokens), PaywalledAuthUser-gated

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… token (nil on non-200 → cascade fallback)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…l uses Constrained+access_token, OpenAI same Bearer

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ephemeral then connects; readiness-based isActive

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…token) to exercise the Phase 2 managed path headlessly

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… detach() for clean session handoff

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…il magic-window with deterministic session detach()

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…/mint_gemini

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…top the A2DP↔HFP reply cutoff

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
vendz and others added 10 commits June 16, 2026 17:44
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…er-turn reply gate; screenshot via realtimeInput.video; context-window compression

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ecutor, single-ack spawn_agent, drop live input transcript from bar

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…, escalate-on-pushback, must-call spawn_agent)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ms/device); voice follow-up to agent pills via omni STT

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…t + voice follow-up capture

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…gent)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ThomsenDrake

Copy link
Copy Markdown
Collaborator

ClawSweeper local pilot review

Recommendation: keep open, but this broad draft is not merge-ready.

What it found:

Suggested next step: land or close #7957 first, then rebase this branch; gate minted realtime tokens to managed paid users only, preserve BYOK direct mode, and add proof/docs before merge.

Posted from a local report-only ClawSweeper pilot by request; no labels, closes, repairs, or merges were performed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants