feat(desktop): realtime-as-hub voice path (Phase 1 BYOK + Phase 2 ephemeral)#7970
Draft
vendz wants to merge 63 commits into
Draft
feat(desktop): realtime-as-hub voice path (Phase 1 BYOK + Phase 2 ephemeral)#7970vendz wants to merge 63 commits into
vendz wants to merge 63 commits into
Conversation
…letions Emit ephemeral cache_control breakpoints in the OpenAI->Anthropic translation: one on the system block (caches the static tools+system prefix, ~11k tok) and one on the latest user message (caches the conversation prefix, so tool-loop rounds read it at 0.1x). Surface prompt_tokens_details.cached_tokens so cache hits propagate to traces. Forward-ports cycle-5 caching from PR BasedHardware#7583 onto the current desktop/macos/ layout and adds the latest-user-message breakpoint. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ChatProvider; scope floating-bar tools to user data Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s; trace floating-bar spans Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…path Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e app bundle Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…zation Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… benchmarking Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…playback Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ai) + BYOK gate Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…wn_agent/screenshot/point_click) + system prompt Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ealtime WS with function calling Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-dispatching voice hub Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…elease; bypass Haiku router on voice path Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…; rely on hard 402) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ating-bar settings Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…b turn on either provider Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…E2E verification Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…'t reach Gemini Live, this one does Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ails; TEXT models deprecated) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…EXT models deprecated) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ech is no-audio fallback Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…isting model; verified AUDIO+tools) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… heard) + OpenAI active-response guard + provider/model log tags Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…uble-warming; log which API/model handles each step Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…) to spawn_agent — model has no direct data access Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…m socket after idle-close Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…es the model answer Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… interrupts via reconnect Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…y), abandon silent turns, guard post-reconnect errors Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ilt-in) + gentler silence gate so real speech isn't dropped Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…h, not amplitude) — Clicky's commit-vs-clear, better tuned; RMS fallback Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…altime tokens (OpenAI client_secrets / Gemini auth_tokens), PaywalledAuthUser-gated Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… token (nil on non-200 → cascade fallback) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…l uses Constrained+access_token, OpenAI same Bearer Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ephemeral then connects; readiness-based isActive Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…token) to exercise the Phase 2 managed path headlessly Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… detach() for clean session handoff Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…il magic-window with deterministic session detach() Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…/mint_gemini Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…top the A2DP↔HFP reply cutoff Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…er-turn reply gate; screenshot via realtimeInput.video; context-window compression Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ecutor, single-ack spawn_agent, drop live input transcript from bar Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…, escalate-on-pushback, must-call spawn_agent) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ms/device); voice follow-up to agent pills via omni STT Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…t + voice follow-up capture Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…gent) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Collaborator
ClawSweeper local pilot reviewRecommendation: keep open, but this broad draft is not merge-ready. What it found:
Suggested next step: land or close #7957 first, then rebase this branch; gate minted realtime tokens to managed paid users only, preserve BYOK direct mode, and add proof/docs before merge. Posted from a local report-only ClawSweeper pilot by request; no labels, closes, repairs, or merges were performed. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Realtime-as-hub voice path for the desktop floating bar — one realtime model does in-session STT + reasoning + routing (via tool choice) + spoken reply, replacing the STT → Haiku-router → Claude → TTS cascade on the push-to-talk path.
Reliability + voice-UX pass (latest commits)
Hardening after live testing surfaced drops, cutoffs, and routing issues on the Gemini path:
activityStartinstead of tearing down + reconnecting the socket (the reconnect lost conversation context and dropped the next turn). A per-turn reply gate stops an interrupted turn's trailing audio / bookkeepingturnCompletefrom leaking into the next.isActivethat's false once the session is nil, so it never recovered).realtimeInput.videoframe — the Live model rejects mid-sessionclientContentwith close 1007, which was killing the socket.get_taskslocal read tool (speaks your real tasks, no background agent); routing rewritten — answer creative/general/long-form yourself, escalate toask_higher_modelonly on user pushback or precise-fact needs, andspawn_agentmust actually emit the call for actions (it was narrating instead of acting).continueAgent); finished agents keep their session alive for follow-ups (capped at 8, oldest trimmed).Verified (live)
get_tasksspeaks the user's real tasks; voice follow-up routes the transcript into the agent's session; mint route401unauthed /200+ token authed.Gemini seems to bug out for no apparent reason. sometimes it works flawlessly but sometimes it just doesn't. Currently trying to pinpoint the issue
🤖 Generated with Claude Code