Add voice-to-xdotool: hands-free speech typing via VAD + Whisper + xdotool
New tool that uses webrtcvad for voice activity detection, faster-whisper for transcription, and xdotool to type into any focused window. Supports session-based listening, configurable silence threshold, and a "full stop" magic word to auto-submit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
39
python/tool-speechtotext/CLAUDE.md
Normal file
39
python/tool-speechtotext/CLAUDE.md
Normal file
@@ -0,0 +1,39 @@
|
||||
# Project: speech-to-text tools
|
||||
|
||||
Speech-to-text command line utilities leveraging local models (faster-whisper, Ollama).
|
||||
|
||||
## Environment
|
||||
- Debian Bookworm, kernel 6.1, X11
|
||||
- Conda env: `whisper-ollama` (Python 3.10, CUDA 12.2)
|
||||
- mamba must be initialized before use — run: `eval "$(micromamba shell hook -s bash)"`
|
||||
- GPU: NVIDIA (float16 capable)
|
||||
- xdotool installed for keyboard simulation (X11 only)
|
||||
|
||||
## Tools
|
||||
- `assistant.py` / `talk.sh` — transcribe speech, copy to clipboard, optionally send to Ollama
|
||||
- `voice_to_terminal.py` / `terminal.sh` — voice-controlled terminal via Ollama tool calling
|
||||
- `voice_to_xdotool.py` / `dotool.sh` — hands-free voice typing into any focused window (VAD + xdotool)
|
||||
|
||||
## Testing
|
||||
- To test scripts: `mamba run -n whisper-ollama python <script.py> --model-size base`
|
||||
- Use `--model-size base` for faster iteration during development
|
||||
- Audio device is available — live mic testing is possible
|
||||
- Test xdotool output by focusing a text editor window
|
||||
|
||||
## Dependencies
|
||||
- Conda: faster-whisper, sounddevice, numpy, pyperclip, requests, ollama
|
||||
- Pip (in conda env): webrtcvad
|
||||
- System: libportaudio2, xdotool
|
||||
|
||||
## Conventions
|
||||
- Shell wrappers go in .sh files using `mamba run -n whisper-ollama`
|
||||
- All scripts set `CT2_CUDA_ALLOW_FP16=1`
|
||||
- Whisper model loading always has GPU (cuda/float16) -> CPU (cpu/int8) fallback
|
||||
- Keep scripts self-contained (no shared module)
|
||||
- Don't print output for non-actionable events
|
||||
|
||||
## Preferences
|
||||
- Prefer packages available via apt over building from source
|
||||
- Check availability before recommending a dependency
|
||||
- Prefer snappy/responsive defaults over cautious ones
|
||||
- Avoid over-engineering — keep scripts simple and focused
|
||||
Reference in New Issue
Block a user