# Project: speech-to-text tools Speech-to-text command line utilities leveraging local models (faster-whisper, Ollama). ## Environment - Debian Bookworm, kernel 6.1, X11 - Conda env: `whisper-ollama` (Python 3.10, CUDA 12.2) - mamba must be initialized before use — run: `eval "$(micromamba shell hook -s bash)"` - GPU: NVIDIA (float16 capable) - xdotool installed for keyboard simulation (X11 only) ## Tools - `assistant.py` / `talk.sh` — transcribe speech, copy to clipboard, optionally send to Ollama - `voice_to_terminal.py` / `terminal.sh` — voice-controlled terminal via Ollama tool calling - `voice_to_xdotool.py` / `xdotool.sh` — hands-free voice typing into any focused window (VAD + xdotool) ## Shared Library - `sttlib/` — shared package used by all scripts and importable by other projects - `whisper_loader.py` — model loading with GPU→CPU fallback - `audio.py` — press-enter recording, PCM conversion - `transcription.py` — Whisper transcribe wrapper, hallucination filter - `vad.py` — VADProcessor, audio callback, constants - Other projects import via: `sys.path.insert(0, "/path/to/tool-speechtotext")` ## Testing - Run tests: `mamba run -n whisper-ollama python -m pytest tests/` - Use `--model-size base` for faster iteration during development - Tests mock hardware (Whisper model, VAD, mic) — no GPU/mic needed to run them - Audio device is available — live mic testing is possible - Test xdotool output by focusing a text editor window ## Dependencies - Conda: faster-whisper, sounddevice, numpy, pyperclip, requests, ollama - Pip (in conda env): webrtcvad - System: libportaudio2, xdotool - Dev: pytest ## Conventions - Shell wrappers go in .sh files using `mamba run -n whisper-ollama` - Shared code lives in `sttlib/` — scripts are thin entry points that import from it - Whisper model loading always has GPU (cuda/float16) -> CPU (cpu/int8) fallback - `CT2_CUDA_ALLOW_FP16=1` is set by `sttlib.whisper_loader` at import time - Don't print output for non-actionable events ## Preferences - Prefer packages available via apt over building from source - Check availability before recommending a dependency - Prefer snappy/responsive defaults over cautious ones - Avoid over-engineering — keep scripts simple and focused