ADR-005: Rust Audio Engine (v2m_engine)¶
Status¶
Accepted
Date¶
2025-01-10
Context¶
Audio capture in pure Python presented critical limitations for a real-time dictation experience:
Identified problems:¶
- GIL blocking: Audio capture competed with transcription for the GIL
- Variable latency: 10-50ms jitter in audio buffering
- sounddevice overhead: Python callbacks added latency
- Inefficient VAD: Silero VAD in Python processed samples with overhead
Requirements:¶
- Capture latency < 10ms
- Pre-processed VAD before Python
- Lock-free circular buffer
- Zero-copy when possible
Decision¶
Develop native Rust extension (v2m_engine) for critical audio tasks.
Components:¶
v2m_engine/
├── src/
│ ├── audio_capture.rs # CPAL-based capture
│ ├── ring_buffer.rs # Lock-free circular buffer
│ ├── vad.rs # Silero ONNX inference
│ └── lib.rs # PyO3 bindings
Python interface:¶
from v2m_engine import AudioCapture, VADProcessor
capture = AudioCapture(sample_rate=16000, buffer_size=4096)
vad = VADProcessor(threshold=0.5)
async with capture.stream() as audio:
if vad.contains_speech(audio):
await transcriber.process(audio)
Consequences¶
Positive¶
- ✅ Latency < 5ms: CPAL + lock-free buffers
- ✅ GIL-free: Audio thread independent from Python
- ✅ Efficient VAD: Native ONNX runtime, 10x faster
- ✅ Zero-copy: NumPy arrays share memory with Rust
Negative¶
- ⚠️ Build complexity: Requires Rust toolchain + maturin
- ⚠️ Cross-language debugging: Mixed Python/Rust stack traces
- ⚠️ Portability: Platform-specific binaries
Alternatives Considered¶
Pure Python (sounddevice + numpy)¶
- Rejected: GIL blocking and unacceptable latency.
Cython¶
- Rejected: Still tied to GIL, limited benefits.
C++ Extension¶
- Rejected: Rust offers memory safety without GC, better tooling (cargo).