Use NumPy cosine distance in Parakeet speaker matching#7701
Use NumPy cosine distance in Parakeet speaker matching#7701tianmind-studio wants to merge 1 commit into
Conversation
Greptile SummaryThis PR replaces
Confidence Score: 4/5Safe to merge — the NumPy helper is a drop-in replacement for the cdist call and handles the actual embedding shapes correctly. The change removes an external dependency cleanly and the core cosine-distance logic is correct. The only concern is that the helper function is copy-pasted identically into two files rather than shared from one place, which creates a small maintenance surface. No data-correctness or runtime issues were found. Both stream_handler.py and transcribe.py carry the same duplicated helper; consolidating it into one location would reduce drift risk going forward. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["Speaker embedding\n(shape: 1xN)"] --> B["_cosine_distance(emb, centroid)"]
C["Stored centroid\n(shape: 1xN)"] --> B
B --> D["reshape(-1) → flat N-dim vectors"]
D --> E["denom = norm(a) * norm(b)"]
E --> F{denom <= 0?}
F -- yes --> G["return 1.0\n(zero-vector guard)"]
F -- no --> H["distance = 1 - dot(a,b)/denom"]
H --> I["clamp to [0.0, 2.0]"]
I --> J{dist < threshold?}
J -- yes --> K["Update centroid\n(running average)"]
J -- no --> L["Register new speaker"]
Reviews (1): Last reviewed commit: "Use NumPy cosine distance in Parakeet sp..." | Re-trigger Greptile |
a8f28f6 to
cf9ecba
Compare
kodjima33
left a comment
There was a problem hiding this comment.
backend bug fix (NumPy cosine distance replaces SciPy in Parakeet speaker matching, drops SciPy dep) — approve only per policy
cf9ecba to
0bcf939
Compare
kodjima33
left a comment
There was a problem hiding this comment.
Re-approved on new commits — backend (approve-only area).
0bcf939 to
d9e4df1
Compare
kodjima33
left a comment
There was a problem hiding this comment.
Backend refactor (NumPy cosine replaces SciPy) — approve only per policy.
42f7f67 to
6e4463d
Compare
6e4463d to
0e1497e
Compare
Summary
cdist(..., metric="cosine")calls in Parakeet speaker matching with a shared NumPy cosine-distance helperCurrent status
mainat1a5824403b68ce47c3b0909577cadc1242ba0d3f0e1497e9580d27912d6d3c7efe0c4481d8f1d88bbackend/parakeet/speaker_math.pyhelper andtest_stream_handler_uses_shared_cosine_distanceVerification
D:\codex-omi-work\.venvs\omi-backend-vad-refresh\Scripts\python.exe -m pytest backend\tests\unit\test_parakeet_stream_session.py -q --tb=shortD:\codex-omi-work\.venvs\omi-backend-vad-refresh\Scripts\python.exe -m py_compile backend\parakeet\speaker_math.py backend\parakeet\stream_handler.py backend\parakeet\transcribe.py backend\tests\unit\test_parakeet_stream_session.pyD:\codex-omi-work\.venvs\omi-backend-vad-refresh\Scripts\python.exe -m black --line-length 120 --skip-string-normalization --check backend\parakeet\speaker_math.py backend\parakeet\stream_handler.py backend\parakeet\transcribe.py backend\tests\unit\test_parakeet_stream_session.pyPYTHONUTF8=1 D:\codex-omi-work\.venvs\omi-backend-vad-refresh\Scripts\python.exe backend\scripts\scan_async_blockers.pygit diff --check origin/main...HEADscripts/pre-commitvia Git for Windowssh.exewith the backend Windows venv and local Dart SDK onPATH