Add SmallestAI Hydra S2S framework support by weiz9 · Pull Request #172 · ServiceNow/eva

weiz9 · 2026-07-01T20:46:21Z

Summary

Adds SmallestAI Hydra speech-to-speech as an EVA framework (framework: smallest_hydra, s2s: hydra), alongside the existing S2S integrations (Gemini Live, ElevenLabs). Bridges the Twilio user-simulator WebSocket to Hydra over a raw WebSocket, modeled on the Gemini Live server: three concurrent tasks (forward user audio / process Hydra events / pace output) with sync_buffer_to_position recording. Audio is 16 kHz in / 48 kHz out, recorded at 48 kHz native.

Standalone against main — 1 commit, 9 files.

What's included

smallest_hydra_server.py — the server. OpenAI-Realtime-shaped protocol: session.configure handshake, input_audio_buffer.append in, response.output_audio.delta out, client-side tools (response.function_call_arguments.done → conversation.item.create → response.create), native barge-in, generate_initial_response greeting, keepalive during idle.
Transcripts — the assistant transcript comes from Hydra's native response.output_audio_transcript.delta stream (accumulated per response, finalized on response.done) — accurate, no extra STT call. Hydra emits no user transcript, so each user utterance is batch-transcribed by a configurable STT (s2s_transcription.py: smallest default / openai / deepgram), fail-soft, with a short-utterance guard (≥300 ms) to avoid hallucinations on near-silence. The audit log is timestamp-sorted so async user transcriptions stay ordered.
Metrics — model_response latency anchored on Hydra's speech_stopped (the simulator's user_speech_stop races), with a 50 ms floor; token usage from response.done.
Payload cap — instructions truncated to keep session.configure within Hydra's ~32 KB ceiling (raised from ~18 KB; it silently drops audio above the limit). The greeting steer is preserved. Tool-heavy domains can exceed the ceiling on tool schemas alone (see limitations).
Audio — new 48 kHz helpers (mulaw_8k_to_pcm16_48k, pcm16_48k_to_mulaw_8k) + in-memory WAV builder in audio_bridge.py.
Framework registered in worker._get_server_class() and the RunConfig.framework Literal; simulation_version bumped to 2.0.2.
.env.example (config example + framework enum) and contract doc §13 updated.

Testing

pytest tests/unit/assistant/test_smallest_hydra_server.py — 16 pass (tool conversion, 48 kHz audio round-trip, transcriber selection + per-provider response parsing, fail-soft). ruff check/format clean.
Validated across 4 live airline runs (--record-ids 1.1.2 --num-trials 1): conversation succeeds end-to-end (greeting, multi-turn, get_reservation + search_rebooking_options + rebook_flight + assign_seat, DB mutation), EVA-X pass 1.0; native assistant transcript accurate (e.g. "Austin AUS to Los Angeles LAX"); model_response latency records clean values; token usage per turn.

Config example

EVA_FRAMEWORK=smallest_hydra
EVA_MODEL__S2S=hydra
EVA_MODEL__S2S_PARAMS='{"model":"hydra","api_key":"<SMALLEST_API_KEY>","voice":"wren","generate_initial_response":true,"transcription":{"provider":"smallest","model":"pulse-pro","language":"en"}}'

Known limitations (Smallest-side, not this code)

~32 KB payload ceiling (Smallest-side). Airline (~22 KB) fits with full instructions. Tool-heavy domains still don't: ITSM's 59 tool schemas alone are ~45 KB, exceeding the ceiling before any instructions, so those domains stay degraded (instructions floored) until the limit clears their tools.
English-only (en) per Hydra's current support.

Bridge the Twilio user-simulator WebSocket to Smallest's Hydra speech-to-speech model (framework: smallest_hydra, s2s: hydra), modeled on the Gemini Live server: three concurrent tasks (forward user audio / process Hydra events / pace output) with sync_buffer_to_position recording. Audio is 16 kHz in / 48 kHz out; recorded at 48 kHz native. - Assistant transcript comes from Hydra's native response.output_audio_transcript .delta stream (accumulated per response, finalized on response.done). Hydra emits no user transcript, so each user utterance is batch-transcribed by a configurable STT (s2s_transcription.py: smallest default / openai / deepgram), fail-soft, with a short-utterance guard to avoid hallucinations on near-silence. - model_response latency anchored on Hydra's speech_stopped (the simulator's user_speech_stop races), with a 50 ms floor; token usage from response.done. - session.configure handshake, client-side tools, native barge-in, generate_ initial_response greeting; ~18 KB payload cap on instructions (Hydra drops audio above it). - New 48 kHz audio helpers + in-memory WAV builder in audio_bridge.py. - Register framework in worker + config Literal; bump simulation_version to 2.0.6. - Unit tests for tool conversion, audio round-trip, transcriber selection/parsing.

Smallest raised the payload limit from ~18 KB to ~32 KB. Bump _MAX_PAYLOAD_BYTES so airline-sized prompts (~22 KB) are no longer truncated. Document that tool-heavy domains (e.g. ITSM: 59 tools ≈ 45 KB) still exceed the ceiling on tool schemas alone, which instruction truncation cannot resolve.

weiz9 changed the base branch from pr/wz/deepgram-voice-agent-framework to main July 1, 2026 20:47

weiz9 force-pushed the pr/wz/smallestai-hydra-s2s-framework branch from de2077a to 7492a7b Compare July 1, 2026 20:55

weiz9 and others added 2 commits July 1, 2026 20:55

Apply pre-commit

43967c9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add SmallestAI Hydra S2S framework support#172

Add SmallestAI Hydra S2S framework support#172
weiz9 wants to merge 3 commits into
mainfrom
pr/wz/smallestai-hydra-s2s-framework

weiz9 commented Jul 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

weiz9 commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's included

Testing

Config example

Known limitations (Smallest-side, not this code)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

weiz9 commented Jul 1, 2026 •

edited

Loading