feat(desktop): realtime-as-hub voice path (Phase 1 BYOK + Phase 2 ephemeral) by vendz · Pull Request #7970 · BasedHardware/omi

vendz · 2026-06-15T23:30:12Z

Realtime-as-hub voice path for the desktop floating bar — one realtime model does in-session STT + reasoning + routing (via tool choice) + spoken reply, replacing the STT → Haiku-router → Claude → TTS cascade on the push-to-talk path.

⚠️ Stacked on #7957. This branch sits on top of the prompt-cache / QueryTracer work in #7957 — the realtime code instruments QueryTracer and calls the prompt-cached /v2/chat/completions, so those commits are a prerequisite and can't be dropped without breaking the build. GitHub can only base this on main, so until #7957 merges the bottom ~12 commits of this diff belong to #7957 — review only the realtime commits on top; they drop out of the diff automatically once #7957 lands. Merge #7957 first.

Reliability + voice-UX pass (latest commits)

Hardening after live testing surfaced drops, cutoffs, and routing issues on the Gemini path:

Barge-in is now in-session — interrupt the in-flight reply with a fresh activityStart instead of tearing down + reconnecting the socket (the reconnect lost conversation context and dropped the next turn). A per-turn reply gate stops an interrupted turn's trailing audio / bookkeeping turnComplete from leaking into the next.
WS keepalive ping so the Gemini socket doesn't idle-close (~2.5 min) and silently fall back to STT; managed-user auto-re-warm fixed (was gated on an isActive that's false once the session is nil, so it never recovered).
Energy-first speech gate — loud/clear speech always passes (Silero was intermittently dropping real speech); dropped turns now log peak/RMS/device for real diagnosis instead of guesswork.
Screenshot is sent as a realtimeInput.video frame — the Live model rejects mid-session clientContent with close 1007, which was killing the socket.
Context-window compression enabled so long sessions don't degrade/stall.
get_tasks local read tool (speaks your real tasks, no background agent); routing rewritten — answer creative/general/long-form yourself, escalate to ask_higher_model only on user pushback or precise-fact needs, and spawn_agent must actually emit the call for actions (it was narrating instead of acting).
Voice follow-up to agents — a mic button on a finished pill captures via the hub STT and continues that agent's session (continueAgent); finished agents keep their session alive for follow-ups (capped at 8, oldest trimmed).

Verified (live)

BYOK + managed/ephemeral connect and run full turns on OpenAI and Gemini.
get_tasks speaks the user's real tasks; voice follow-up routes the transcript into the agent's session; mint route 401 unauthed / 200 + token authed.
Clean release build (arm64).

⚠️ Known limitation

Gemini seems to bug out for no apparent reason. sometimes it works flawlessly but sometimes it just doesn't. Currently trying to pinpoint the issue

🤖 Generated with Claude Code

…letions Emit ephemeral cache_control breakpoints in the OpenAI->Anthropic translation: one on the system block (caches the static tools+system prefix, ~11k tok) and one on the latest user message (caches the conversation prefix, so tool-loop rounds read it at 0.1x). Surface prompt_tokens_details.cached_tokens so cache hits propagate to traces. Forward-ports cycle-5 caching from PR BasedHardware#7583 onto the current desktop/macos/ layout and adds the latest-user-message breakpoint. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ChatProvider; scope floating-bar tools to user data Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…s; trace floating-bar spans Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…path Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…e app bundle Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…zation Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… benchmarking Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…playback Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ai) + BYOK gate Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…wn_agent/screenshot/point_click) + system prompt Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ealtime WS with function calling Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…-dispatching voice hub Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…elease; bypass Haiku router on voice path Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…; rely on hard 402) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ating-bar settings Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…b turn on either provider Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…E2E verification Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…'t reach Gemini Live, this one does Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ails; TEXT models deprecated) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…EXT models deprecated) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ech is no-audio fallback Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…isting model; verified AUDIO+tools) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… heard) + OpenAI active-response guard + provider/model log tags Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…uble-warming; log which API/model handles each step Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…) to spawn_agent — model has no direct data access Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…m socket after idle-close Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…es the model answer Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… interrupts via reconnect Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…y), abandon silent turns, guard post-reconnect errors Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ilt-in) + gentler silence gate so real speech isn't dropped Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…h, not amplitude) — Clicky's commit-vs-clear, better tuned; RMS fallback Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…altime tokens (OpenAI client_secrets / Gemini auth_tokens), PaywalledAuthUser-gated Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… token (nil on non-200 → cascade fallback) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…l uses Constrained+access_token, OpenAI same Bearer Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ephemeral then connects; readiness-based isActive Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…token) to exercise the Phase 2 managed path headlessly Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… detach() for clean session handoff Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…il magic-window with deterministic session detach() Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…/mint_gemini Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…top the A2DP↔HFP reply cutoff Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…er-turn reply gate; screenshot via realtimeInput.video; context-window compression Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ecutor, single-ack spawn_agent, drop live input transcript from bar Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…, escalate-on-pushback, must-call spawn_agent) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ms/device); voice follow-up to agent pills via omni STT Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…t + voice follow-up capture Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…gent) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

ThomsenDrake · 2026-06-17T00:22:13Z

ClawSweeper local pilot review

Recommendation: keep open, but this broad draft is not merge-ready.

What it found:

This PR is explicitly stacked on perf(desktop): floating-bar latency — Anthropic prompt caching, router/screenshot fast-paths, and a query tracer #7957 and should wait for that prerequisite to land or be closed first.
The realtime voice-hub direction has an auth/cost-boundary blocker around minted realtime tokens.
The branch needs a narrower reviewable diff, redacted end-to-end proof, and docs for the new desktop voice path.

Suggested next step: land or close #7957 first, then rebase this branch; gate minted realtime tokens to managed paid users only, preserve BYOK direct mode, and add proof/docs before merge.

Posted from a local report-only ClawSweeper pilot by request; no labels, closes, repairs, or merges were performed.

vendz and others added 30 commits June 14, 2026 11:53

feat(desktop): add QueryTracer for floating-bar query latency tracing

276bb18

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

perf(desktop): QueryTracer instrumentation + tracer-gated capture in …

fa72bb4

…ChatProvider; scope floating-bar tools to user data Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

perf(desktop): skip Haiku router & screenshot for obvious chat querie…

269e246

…s; trace floating-bar spans Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

perf(desktop): drop PTT ASR-cleanup round-trip; trace recording→send …

b09e016

…path Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chore(desktop): trace quota_check span in AgentBridge

abb397b

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

fix(desktop): define AGENT_DIR so the agent runtime is staged into th…

4d77b9e

…e app bundle Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

test(desktop): cover QueryTracer spans, gaps, TTFT, and JSONL seriali…

91ef95a

…zation Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

test(desktop): guardrails for router-skip and screenshot heuristics

be32cad

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(desktop): add trace_stats.py to aggregate QueryTracer traces for…

898d559

… benchmarking Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(desktop): trace tts_start span in floating-bar playback service

087c7d5

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(desktop): port StreamingPCMPlayer for realtime-hub spoken audio …

264169b

…playback Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(desktop): add RealtimeHubSettings — provider toggle (gemini|open…

356b692

…ai) + BYOK gate Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(desktop): define realtime-hub tool surface (ask_higher_model/spa…

8d3447b

…wn_agent/screenshot/point_click) + system prompt Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(desktop): add RealtimeHubSession — client-direct OpenAI/Gemini r…

7e976ac

…ealtime WS with function calling Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(desktop): add RealtimeHubController — realtime model as the tool…

cbe2937

…-dispatching voice hub Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(desktop): route PTT to the realtime hub + instant local ack on r…

a1bebdb

…elease; bypass Haiku router on voice path Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

perf(desktop): make chat-quota pre-flight optimistic (cache fast-fail…

9b75947

…; rely on hard 402) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(desktop): add Realtime Voice Hub enable + provider picker to flo…

ea293c0

…ating-bar settings Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chore(desktop): changelog — experimental Realtime Voice Hub

4ee8344

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

test(desktop): add headless RealtimeHubTestHarness to drive a real hu…

4132a75

…b turn on either provider Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

test(desktop): register hub_test_turn automation action for headless …

6434048

…E2E verification Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(desktop): hand-rolled RFC 6455 WS client — Apple's WS stacks can…

0c8da44

…'t reach Gemini Live, this one does Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(desktop): Gemini hub via raw WS + native-audio model (Apple WS f…

bdf5ad0

…ails; TEXT models deprecated) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

fix(desktop): Gemini hub uses native-audio Live model (half-cascade T…

8d82a27

…EXT models deprecated) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(desktop): play Gemini native audio via StreamingPCMPlayer; AVSpe…

92640b9

…ech is no-audio fallback Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

fix(desktop): hub Gemini uses gemini-3.1-flash-live-preview (OMI's ex…

e9777e4

…isting model; verified AUDIO+tools) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

fix(desktop): per-turn Gemini activityStart (warm session turn 2+ now…

aa47b00

… heard) + OpenAI active-response guard + provider/model log tags Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

fix(desktop): open per-turn speech window; stop duplicate-observer do…

92ee67c

…uble-warming; log which API/model handles each step Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

fix(desktop): route user-data/action queries (tasks, calendar, notes……

8a03b4b

…) to spawn_agent — model has no direct data access Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

vendz and others added 18 commits June 15, 2026 14:24

fix(desktop): cancel in-flight reply on new turn + auto-reconnect war…

4487390

…m socket after idle-close Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

fix(desktop): silence-gate hub turns — accidental ⌥ tap no longer mak…

79307eb

…es the model answer Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

fix(desktop): cancelActiveResponse clears OpenAI input buffer; Gemini…

c1a2495

… interrupts via reconnect Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

fix(desktop): clean barge-in (reconnect Gemini to stop in-flight repl…

0878699

…y), abandon silent turns, guard post-reconnect errors Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

fix(desktop): hub listens via the AirPods/default mic (not the far bu…

c4f0b60

…ilt-in) + gentler silence gate so real speech isn't dropped Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(desktop): hub silence gate uses on-device Silero VAD (real speec…

c623968

…h, not amplitude) — Clicky's commit-vs-clear, better tuned; RMS fallback Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(backend): Phase 2 — POST /v2/realtime/session mints ephemeral re…

cbe421c

…altime tokens (OpenAI client_secrets / Gemini auth_tokens), PaywalledAuthUser-gated Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(backend): register realtime_routes

7e1557a

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(backend): merge realtime_routes into the app router

13873dd

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(desktop): APIClient.mintRealtimeToken — fetch ephemeral realtime…

a2a2f7e

… token (nil on non-200 → cascade fallback) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(desktop): HubAuth (BYOK key vs ephemeral token); Gemini ephemera…

1b53d53

…l uses Constrained+access_token, OpenAI same Bearer Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(desktop): Phase 2 warm-up — BYOK connects direct, managed mints …

e7ef428

…ephemeral then connects; readiness-based isActive Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chore(desktop): harness uses HubAuth.byokKey

6037f5a

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

test(desktop): hub_test_turn supports auth=ephemeral (mints a server …

c7f4275

…token) to exercise the Phase 2 managed path headlessly Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

refactor(desktop): unify the audio-send path (appendAudioFrame) + add…

afdf0f6

… detach() for clean session handoff Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

refactor(desktop): providerTag computed prop; replace ignoreErrorsUnt…

e05a523

…il magic-window with deterministic session detach() Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

refactor(backend): extract send_and_parse helper to dedup mint_openai…

4f54dc8

…/mint_gemini Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

fix(desktop): hub captures from built-in mic on Bluetooth output to s…

e0bbb2d

…top the A2DP↔HFP reply cutoff Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

vendz mentioned this pull request Jun 15, 2026

feat(desktop): realtime-as-hub voice path (Phase 1 BYOK + Phase 2 ephemeral) vendz/omi#1

Closed

vendz and others added 10 commits June 16, 2026 17:44

fix(desktop): keepalive ping on Gemini WS to prevent ~2.5min idle-close

d93a682

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

fix(desktop): Gemini barge-in interrupts in-session (no teardown) + p…

d4f1e71

…er-turn reply gate; screenshot via realtimeInput.video; context-window compression Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

fix(desktop): in-session barge-in, managed-user re-warm, get_tasks ex…

8dca47a

…ecutor, single-ack spawn_agent, drop live input transcript from bar Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(desktop): get_tasks read tool + routing rewrite (answer-yourself…

481d3c0

…, escalate-on-pushback, must-call spawn_agent) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

test(desktop): handle get_tasks in hub test harness

e1b82b1

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chore(desktop): expose currentDeviceDescription for capture diagnostics

9640e9f

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

fix(desktop): energy-first hub speech gate + drop diagnostics (peak/r…

8d6e812

…ms/device); voice follow-up to agent pills via omni STT Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(desktop): keep agent session alive for follow-ups + continueAgen…

10c08a3

…t + voice follow-up capture Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(desktop): voice follow-up mic button in agent pill popover

688087b

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

fix(desktop): route agent follow-ups into the same session (continueA…

c18adae

…gent) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(desktop): realtime-as-hub voice path (Phase 1 BYOK + Phase 2 ephemeral)#7970

feat(desktop): realtime-as-hub voice path (Phase 1 BYOK + Phase 2 ephemeral)#7970
vendz wants to merge 63 commits into
BasedHardware:mainfrom
vendz:feature/realtime-hub

vendz commented Jun 15, 2026 •

edited

Loading

Uh oh!

ThomsenDrake commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vendz commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reliability + voice-UX pass (latest commits)

Verified (live)

⚠️ Known limitation

Uh oh!

ThomsenDrake commented Jun 17, 2026

ClawSweeper local pilot review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vendz commented Jun 15, 2026 •

edited

Loading