🇪🇸 Español: Consulta el historial de cambios en español en .github/locales/es/CHANGELOG.md.
title: Changelog description: Change log for the Voice2Machine project. ai_context: "Versions, Change History, SemVer" depends_on: [] status: stable
Changelog¶
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[Unreleased]¶
Added¶
- Zero-Copy Audio Engine: New
ZeroCopyAudioRecorderin Rust using shared memory (/dev/shm) for true zero-capacity transfers. - Hallucination Detection: Heuristic filters and quality parameters (
no_speech,compression_ratio) inStreamingTranscriberto reduce erroneous Whisper outputs. - Performance Metrics: Inference latency tracking in logs for detailed diagnostics.
Changed¶
- Advanced Whisper Config: Increased
beam_sizeandbest_ofto 5 for higher transcription quality in the "large-v3-turbo" model. - VAD Optimization: Adjusted default threshold to 0.35 to reduce false positives from ambient noise and breathing.
- Memory Management: Forced CUDA cache reset (
torch.cuda.empty_cache()) when unloading models to effectively free VRAM. - Code Hygiene: Import refactoring and linting error fixes (
ruff) throughout the backend codebase.
Planned¶
- Support for multiple simultaneous transcription languages
- Web dashboard for real-time monitoring
- Integration with more LLM providers
[0.3.0] - 2026-01-23¶
Added¶
- Feature-Based Architecture: Total restructuring into self-contained modules in
features/(audio, llm, transcription). - Orchestration via Workflows: Introduction of
RecordingWorkflowandLLMWorkflowto decouple business logic from the monolithic legacy Orchestrator. - Strict Protocols: Implementation of
typing.Protocolfor all internal services, allowing easy swapping of providers. - Modular API: Package structure in
api/with separate routes and schemas.
Changed¶
- Elimination of Orchestrator:
services/orchestrator.pyhas been decomposed and removed. - Infrastructure Refactoring: The
infrastructure/folder has been integrated into each correspondingfeature. - Core and Domain: Simplified and moved to
shared/and local interfaces.
Removed¶
- Legacy Audio Tests: Removal of obsolete tests for the Rust extension.
- System Monitor: System telemetry removed for core simplification.
[0.2.0] - 2025-01-20¶
Added¶
- FastAPI REST API: New HTTP API replacing the Unix Sockets-based IPC system
- WebSocket streaming:
/ws/eventsendpoint for real-time provisional transcription - Swagger documentation: Interactive UI at
/docsfor testing endpoints - Orchestrator pattern: New coordination pattern that simplifies workflow
- Rust audio engine: Native
v2m_engineextension for low-latency audio capture - MkDocs documentation system: Structured documentation with Material theme
Changed¶
- Simplified architecture: From CQRS/CommandBus to more direct Orchestrator pattern
- Communication: From binary Unix Domain Sockets to standard HTTP REST
- State model: Centralized management in
DaemonStatewith lazy initialization - Updated README.md with new architecture
Removed¶
daemon.py: Replaced byapi.py(FastAPI)client.py: No longer needed, usecurlor any HTTP client- Binary IPC protocol: Replaced by standard JSON
Fixed¶
- Startup latency: Server starts in ~100ms, model loads in background
- Memory leaks in WebSocket connections
[0.1.0] - 2024-03-20¶
Added¶
- Initial Voice2Machine system version
- Local transcription support with Whisper (faster-whisper)
- Basic LLM integration (Ollama/Gemini)
- Unix Domain Sockets-based IPC system
- Hexagonal architecture with ports and adapters
- TOML-based configuration