π§© System Architecture¶
Technical Philosophy
Voice2Machine implements a strict Architecture based on Workflows and Features, prioritizing decoupling, testability, and technological independence. The system adheres to SOTA 2026 standards like static typing in Python (Protocol) and Frontend/Backend separation via REST API.
ποΈ High-Level Diagram¶
graph TD
subgraph Clients ["π Clients (CLI / Scripts / GUI / Tauri)"]
ClientApp["Any HTTP Client"]
end
subgraph Backend ["π Backend Daemon (Python + FastAPI)"]
API["FastAPI Package<br>(api/)"]
subgraph Workflows ["π§ Workflows (Orchestration)"]
RecWF["RecordingWorkflow"]
LLMWF["LLMWorkflow"]
end
subgraph Features ["π§© Features (Domain + Logic)"]
AudioFeat["Audio Service"]
TranscFeat["Transcription Service"]
LLMFeat["LLM Service"]
end
subgraph Shared ["βοΈ Shared (Foundation)"]
Config["Config"]
Errors["Errors"]
Interfaces["Interfaces"]
end
end
ClientApp <-->|REST + WebSocket| API
API --> Workflows
Workflows --> Features
Features --> Shared
style Clients fill:#e3f2fd,stroke:#1565c0
style Backend fill:#e8f5e9,stroke:#2e7d32
style Workflows fill:#fff3e0,stroke:#ef6c00
style Features fill:#f3e5f5,stroke:#7b1fa2
style Shared fill:#eceff1,stroke:#455a64
π¦ Backend Components¶
1. API Layer (FastAPI)¶
Located in apps/daemon/backend/src/v2m/api/.
- Modules:
app.py,routes/,schemas.py - REST Endpoints:
/toggle,/start,/stop,/status,/health - WebSocket:
/ws/eventsfor real-time transcription streaming - Auto-documentation: Swagger UI at
/docs
Modern Structure
Starting from v0.3.0, the API is organized as a complete package, separating routes and schemas for better maintainability.
2. Workflows (Orchestration)¶
Located in apps/daemon/backend/src/v2m/orchestration/.
Instead of a monolithic Orchestrator, the system uses specialized Workflows for each business flow:
- RecordingWorkflow: Manages the complete capture and transcription lifecycle.
- LLMWorkflow: Coordinates text processing and translation.
This approach allows each flow to evolve independently without affecting the rest of the system.
3. Features (Domains)¶
Located in apps/daemon/backend/src/v2m/features/.
Each folder in features/ represents a self-contained domain of knowledge including its own services and logic:
| Feature | Responsibility |
|---|---|
| transcription | Whisper implementations (faster-whisper). |
| audio | Audio capture and management of the Rust engine (v2m_engine). |
| llm | Integrations with Gemini, Ollama, and other providers. |
4. Shared (Common Foundation)¶
Located in apps/daemon/backend/src/v2m/shared/.
- Interfaces: Global definitions via
typing.Protocol. - Config:
config.tomlmanagement via Pydantic Settings. - Errors: Shared exception hierarchies.
β‘ Client-Backend Communication¶
Voice2Machine uses FastAPI REST + WebSocket for communication:
REST (Synchronous)¶
# Toggle recording
curl -X POST http://localhost:8765/toggle | jq
# Check status
curl http://localhost:8765/status | jq
WebSocket (Streaming)¶
const ws = new WebSocket("ws://localhost:8765/ws/events");
ws.onmessage = (e) => {
const { event, data } = JSON.parse(e.data);
if (event === "transcription_update") {
console.log(data.text, data.final);
}
};
π¦ Native Extensions (Rust)¶
For critical tasks where Python's GIL is a bottleneck, we use native extensions compiled in Rust (v2m_engine):
| Component | Function |
|---|---|
| Audio I/O | Direct WAV writing to disk (zero-copy) |
| VAD | Ultra-low latency voice detection (Silero ONNX) |
| Ring Buffer | Lock-free circular buffer for real-time audio |
π Data Flow¶
sequenceDiagram
participant User
participant Client as HTTP Client
participant API as FastAPI
participant WF as Workflows
participant Audio as AudioService
participant Whisper as TranscriptionService
User->>Client: Press shortcut
Client->>API: POST /toggle
API->>WF: toggle() (RecordingWorkflow)
alt Not recording
WF->>Audio: start_recording()
Audio-->>WF: OK
WF-->>API: status=recording
else Recording
WF->>Audio: stop_recording()
Audio-->>WF: audio_buffer
WF->>Whisper: transcribe(buffer)
Whisper-->>WF: text
WF-->>API: status=idle, text=...
end
API-->>Client: ToggleResponse
Client->>User: Copy to clipboard
π‘οΈ 2026 Design Principles¶
| Principle | Implementation |
|---|---|
| Local-First | No data leaves the machine unless a cloud provider is explicitly configured |
| Privacy-By-Design | Audio processed in memory, temp files deleted after transcription |
| Resilience | Automatic error recovery, subsystem restart if they fail |
| Observability | Structured logging (OpenTelemetry), real-time metrics |
| Performance is Design | Async FastAPI, Rust for hot paths, warm model in VRAM |