Skip to content

Transcription

Audio to text transcription services using faster-whisper.


PersistentWhisperWorker

Persistent worker that keeps the Whisper model loaded in VRAM between sessions.

Location: v2m/features/transcription/persistent_model.py

Features

  • Lazy Loading: Model loads the first time it's needed
  • Keep-Warm Policy: Keeps model in VRAM based on configuration
  • GPU Optimized: Uses float16 or int8_float16 for maximum performance

Methods

class PersistentWhisperWorker:
    def initialize_sync(self) -> None:
        """Loads model into VRAM (synchronous, for warmup)."""

    async def transcribe(self, audio: np.ndarray) -> str:
        """Transcribes audio to text."""

    async def unload(self) -> None:
        """Releases model from VRAM."""

StreamingTranscriber

Real-time transcriber providing provisional feedback while the user speaks.

Location: v2m/features/audio/streaming_transcriber.py

Data Flow

graph LR
    A[AudioRecorder] --> B[VAD]
    B --> C[StreamingTranscriber]
    C --> D[WhisperWorker]
    D --> E[WebSocket]

    style C fill:#e3f2fd

Integration

The StreamingTranscriber emits events via WebSocket:

  • transcription_update: Provisional text during recording
  • transcription_final: Final text on stop