Transcription¶
Audio to text transcription services using faster-whisper.
PersistentWhisperWorker¶
Persistent worker that keeps the Whisper model loaded in VRAM between sessions.
Location: v2m/features/transcription/persistent_model.py
Features¶
- Lazy Loading: Model loads the first time it's needed
- Keep-Warm Policy: Keeps model in VRAM based on configuration
- GPU Optimized: Uses
float16orint8_float16for maximum performance
Methods¶
class PersistentWhisperWorker:
def initialize_sync(self) -> None:
"""Loads model into VRAM (synchronous, for warmup)."""
async def transcribe(self, audio: np.ndarray) -> str:
"""Transcribes audio to text."""
async def unload(self) -> None:
"""Releases model from VRAM."""
StreamingTranscriber¶
Real-time transcriber providing provisional feedback while the user speaks.
Location: v2m/features/audio/streaming_transcriber.py
Data Flow¶
graph LR
A[AudioRecorder] --> B[VAD]
B --> C[StreamingTranscriber]
C --> D[WhisperWorker]
D --> E[WebSocket]
style C fill:#e3f2fd
Integration¶
The StreamingTranscriber emits events via WebSocket:
transcription_update: Provisional text during recordingtranscription_final: Final text on stop