Files

local 104da381fb Refactor tool-speechtotext: extract sttlib shared library and add tests

Extract duplicated code (Whisper loading, audio recording, transcription,
VAD processing) into reusable sttlib/ package. Rewrite all 3 scripts as
thin wrappers. Add 24 unit tests with mocked hardware. Fix GPU fallback
bug in assistant.py and args.system assignment bug.

2026-02-08 00:40:31 +00:00

.vscode

Refactor tool-speechtotext: extract sttlib shared library and add tests

2026-02-08 00:40:31 +00:00

sttlib

Refactor tool-speechtotext: extract sttlib shared library and add tests

2026-02-08 00:40:31 +00:00

tests

Refactor tool-speechtotext: extract sttlib shared library and add tests

2026-02-08 00:40:31 +00:00

assistant.py

Refactor tool-speechtotext: extract sttlib shared library and add tests

2026-02-08 00:40:31 +00:00

CLAUDE.md

Refactor tool-speechtotext: extract sttlib shared library and add tests

2026-02-08 00:40:31 +00:00

README.md

Add voice-to-xdotool: hands-free speech typing via VAD + Whisper + xdotool

2026-02-06 23:37:14 +00:00

talk.sh

command line app STT, text to local LLM

2026-01-14 00:20:56 +00:00

terminal.sh

voice_to_terminal#1

2026-01-14 01:46:31 +00:00

voice_to_terminal.py

Refactor tool-speechtotext: extract sttlib shared library and add tests

2026-02-08 00:40:31 +00:00

voice_to_xdotool.py

Refactor tool-speechtotext: extract sttlib shared library and add tests

2026-02-08 00:40:31 +00:00

xdotool.sh

Add voice-to-xdotool: hands-free speech typing via VAD + Whisper + xdotool

2026-02-06 23:37:14 +00:00

README.md

Purpose

Speech-to-text command line utilities leveraging local models (faster-whisper, Ollama).

Tools

Script	Wrapper	Description
`assistant.py`	`talk.sh`	Transcribe speech, copy to clipboard, optionally send to Ollama LLM
`voice_to_terminal.py`	`terminal.sh`	Voice-controlled terminal — AI suggests and executes bash commands
`voice_to_dotool.py`	`dotool.sh`	Hands-free voice typing into any focused window via xdotool (VAD-based)

Setup

# Create the environment with Python 3.10 and CUDA toolkit
mamba create -n whisper-ollama python=3.10 nvidia/label/cuda-12.2.0::cuda-toolkit cudnn -c nvidia -c conda-forge -y

# Activate the environment
mamba activate whisper-ollama

# Install Audio and Logic dependencies
# Note: portaudio is required for sounddevice to work on Linux
sudo apt-get update && sudo apt-get install libportaudio2 -y

pip install faster-whisper sounddevice numpy pyperclip requests webrtcvad

xdotool setup (required for voice_to_dotool.py)

xdotool simulates keyboard input via X11. Already installed on most Linux desktops.

# Install if not already present
sudo apt-get install xdotool

Note: xdotool is X11-only. For Wayland, swap to ydotool (sudo apt install ydotool).

Usage: voice_to_dotool.py

Hands-free speech input — uses VAD to auto-detect when you start/stop speaking, transcribes with Whisper, and types the text into the focused window via xdotool.

# Basic: type transcribed text (you press Enter to submit)
./dotool.sh

# Auto-submit: also presses Enter after typing
./dotool.sh --submit

# Adjust silence threshold (seconds of silence to end an utterance)
./dotool.sh --silence-threshold 2.0

# Use a smaller/faster Whisper model
./dotool.sh --model-size base

# All options
./dotool.sh --submit --silence-threshold 1.5 --model-size medium --vad-aggressiveness 3

Workflow

Press Enter to start a listening session
Speak — VAD detects speech automatically
Pause — after the silence threshold, text is transcribed and typed
Keep speaking for more utterances, or press Enter to end the session
Ctrl+C to quit