Add voice-to-xdotool: hands-free speech typing via VAD + Whisper + xdotool

New tool that uses webrtcvad for voice activity detection, faster-whisper for transcription, and xdotool to type into any focused window. Supports session-based listening, configurable silence threshold, and a "full stop" magic word to auto-submit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 23:37:14 +00:00
parent 370e97d08d
commit 848681087e
4 changed files with 357 additions and 2 deletions
--- a/python/tool-speechtotext/CLAUDE.md
+++ b/python/tool-speechtotext/CLAUDE.md
@@ -0,0 +1,39 @@
+# Project: speech-to-text tools
+
+Speech-to-text command line utilities leveraging local models (faster-whisper, Ollama).
+
+## Environment
+- Debian Bookworm, kernel 6.1, X11
+- Conda env: `whisper-ollama` (Python 3.10, CUDA 12.2)
+- mamba must be initialized before use — run: `eval "$(micromamba shell hook -s bash)"`
+- GPU: NVIDIA (float16 capable)
+- xdotool installed for keyboard simulation (X11 only)
+
+## Tools
+- `assistant.py` / `talk.sh` — transcribe speech, copy to clipboard, optionally send to Ollama
+- `voice_to_terminal.py` / `terminal.sh` — voice-controlled terminal via Ollama tool calling
+- `voice_to_xdotool.py` / `dotool.sh` — hands-free voice typing into any focused window (VAD + xdotool)
+
+## Testing
+- To test scripts: `mamba run -n whisper-ollama python <script.py> --model-size base`
+- Use `--model-size base` for faster iteration during development
+- Audio device is available — live mic testing is possible
+- Test xdotool output by focusing a text editor window
+
+## Dependencies
+- Conda: faster-whisper, sounddevice, numpy, pyperclip, requests, ollama
+- Pip (in conda env): webrtcvad
+- System: libportaudio2, xdotool
+
+## Conventions
+- Shell wrappers go in .sh files using `mamba run -n whisper-ollama`
+- All scripts set `CT2_CUDA_ALLOW_FP16=1`
+- Whisper model loading always has GPU (cuda/float16) -> CPU (cpu/int8) fallback
+- Keep scripts self-contained (no shared module)
+- Don't print output for non-actionable events
+
+## Preferences
+- Prefer packages available via apt over building from source
+- Check availability before recommending a dependency
+- Prefer snappy/responsive defaults over cautious ones
+- Avoid over-engineering — keep scripts simple and focused