Files
Code/python/tool-speechtotext/CLAUDE.md
local 848681087e Add voice-to-xdotool: hands-free speech typing via VAD + Whisper + xdotool
New tool that uses webrtcvad for voice activity detection, faster-whisper
for transcription, and xdotool to type into any focused window. Supports
session-based listening, configurable silence threshold, and a "full stop"
magic word to auto-submit.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 23:37:14 +00:00

1.7 KiB

Project: speech-to-text tools

Speech-to-text command line utilities leveraging local models (faster-whisper, Ollama).

Environment

  • Debian Bookworm, kernel 6.1, X11
  • Conda env: whisper-ollama (Python 3.10, CUDA 12.2)
  • mamba must be initialized before use — run: eval "$(micromamba shell hook -s bash)"
  • GPU: NVIDIA (float16 capable)
  • xdotool installed for keyboard simulation (X11 only)

Tools

  • assistant.py / talk.sh — transcribe speech, copy to clipboard, optionally send to Ollama
  • voice_to_terminal.py / terminal.sh — voice-controlled terminal via Ollama tool calling
  • voice_to_xdotool.py / dotool.sh — hands-free voice typing into any focused window (VAD + xdotool)

Testing

  • To test scripts: mamba run -n whisper-ollama python <script.py> --model-size base
  • Use --model-size base for faster iteration during development
  • Audio device is available — live mic testing is possible
  • Test xdotool output by focusing a text editor window

Dependencies

  • Conda: faster-whisper, sounddevice, numpy, pyperclip, requests, ollama
  • Pip (in conda env): webrtcvad
  • System: libportaudio2, xdotool

Conventions

  • Shell wrappers go in .sh files using mamba run -n whisper-ollama
  • All scripts set CT2_CUDA_ALLOW_FP16=1
  • Whisper model loading always has GPU (cuda/float16) -> CPU (cpu/int8) fallback
  • Keep scripts self-contained (no shared module)
  • Don't print output for non-actionable events

Preferences

  • Prefer packages available via apt over building from source
  • Check availability before recommending a dependency
  • Prefer snappy/responsive defaults over cautious ones
  • Avoid over-engineering — keep scripts simple and focused