Metadata-Version: 2.4
Name: micoracle
Version: 1.3.6
Summary: Hands-free voice input for Claude Code, Codex CLI, and any terminal — cross-platform
Author-email: Pradip Tivhale <pradiptivhale@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/thepradip/micoracle
Project-URL: Repository, https://github.com/thepradip/micoracle
Project-URL: Bug Tracker, https://github.com/thepradip/micoracle/issues
Keywords: voice,speech-to-text,cli,hands-free,whisper,claude,codex
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Operating System :: MacOS
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: Microsoft :: Windows
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Utilities
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.24
Requires-Dist: sounddevice>=0.4.6
Requires-Dist: webrtcvad-wheels>=2.0.14
Requires-Dist: setuptools>=61
Requires-Dist: soundfile>=0.12.0
Requires-Dist: hf-transfer>=0.1.6; platform_system != "Windows"
Requires-Dist: huggingface_hub>=0.20.0
Requires-Dist: mlx-whisper; sys_platform == "darwin" and platform_machine == "arm64"
Requires-Dist: pyperclip>=1.8.0; platform_system == "Windows"
Requires-Dist: pyautogui>=0.9.54; platform_system == "Windows"
Requires-Dist: pywin32>=306; platform_system == "Windows"
Requires-Dist: psutil>=5.9.0; platform_system == "Windows"
Provides-Extra: faster
Requires-Dist: faster-whisper>=1.0.0; extra == "faster"
Provides-Extra: mlx
Requires-Dist: mlx-whisper; extra == "mlx"
Provides-Extra: openai
Requires-Dist: openai>=1.0.0; extra == "openai"
Provides-Extra: azure
Requires-Dist: openai>=1.0.0; extra == "azure"
Provides-Extra: gemini
Requires-Dist: google-genai>=0.8.0; extra == "gemini"
Provides-Extra: elevenlabs
Requires-Dist: requests>=2.28.0; extra == "elevenlabs"
Provides-Extra: denoise
Requires-Dist: noisereduce>=3.0.0; extra == "denoise"
Provides-Extra: speaker
Requires-Dist: resemblyzer>=0.1.1; extra == "speaker"
Provides-Extra: pyttsx3
Requires-Dist: pyttsx3; extra == "pyttsx3"
Provides-Extra: windows
Requires-Dist: pyperclip>=1.8.0; extra == "windows"
Requires-Dist: pyautogui>=0.9.54; extra == "windows"
Requires-Dist: pywin32>=306; extra == "windows"
Requires-Dist: psutil>=5.9.0; extra == "windows"
Provides-Extra: local
Requires-Dist: micoracle[faster,pyttsx3]; extra == "local"
Provides-Extra: cloud
Requires-Dist: micoracle[openai]; extra == "cloud"
Provides-Extra: all
Requires-Dist: micoracle[elevenlabs,faster,gemini,openai,pyttsx3]; extra == "all"
Dynamic: license-file

# micoracle

Hands-free voice input for **Claude Code**, **Codex CLI**, **OpenCode**, and any
terminal — on **macOS, Linux, and Windows**. OS is auto-detected at launch; STT
and TTS backends are pluggable (local or cloud).

Say *"Codex, refactor this function"* → your speech is captured, transcribed,
and pasted into the focused terminal with Enter pressed. No push-to-talk, no
cloud required.

## Features

- **Cross-platform** — `platform_adapter.py` auto-selects macOS (AppleScript) /
  Linux (xdotool on X11, wtype on Wayland) / Windows (pywin32 + pyautogui).
- **4 STT backends** — MLX Whisper (Apple Silicon), faster-whisper
  (cross-platform local), OpenAI Whisper API, Azure OpenAI Whisper.
- **4 TTS backends** — macOS `say`, `pyttsx3` (cross-platform offline), OpenAI
  TTS, Azure Speech TTS. Or `none` to stay silent.
- **Continuous listening** with WebRTC VAD + 300 ms preroll buffer (wake words
  never get chopped by onset detection).
- **Wake-word gate** — `"Claude, …"` / `"Codex, …"` with fuzzy mishear tolerance.
- **Two-step follow-up** — say the wake word alone, then the prompt within 8 s.
- **Silence-hallucination filter** — Whisper's *"Thank you."* / *"Amen."*
  artifacts are dropped silently.
- **Locked dispatch target** — frontmost app captured at startup (or pin via
  `--target-app`), stays fixed regardless of where you click.
- **Clipboard-safe** — text is pasted via clipboard; your original contents are
  restored afterwards.

## OS / backend matrix

| Platform | STT default | TTS default | Focus & paste | Notes |
|---|---|---|---|---|
| macOS (Apple Silicon) | `mlx` | `say` | AppleScript | Best local latency |
| macOS (Intel) | `faster` | `say` | AppleScript | |
| Linux X11 | `faster` | `pyttsx3` | `xdotool` + `xclip` | Full support |
| Linux Wayland | `faster` | `pyttsx3` | `wtype` + `wl-copy` | `--target-app` required; auto-focus limited |
| Windows 10/11 | `faster` | `pyttsx3` | pywin32 + pyautogui | Installed automatically by the PyPI package on Windows |

All defaults can be overridden via `--stt-backend` / `--tts-backend` or
environment variables.

## Install

Use a virtual environment first:

```bash
python -m venv .venv
```

Activate it:

```bash
# macOS / Linux
source .venv/bin/activate

# Windows PowerShell
.\.venv\Scripts\activate
```

Install the package:

```bash
pip install --upgrade pip
pip install micoracle
```

### Pick STT

**macOS Apple Silicon:** `mlx-whisper` is installed automatically by `micoracle`.

**Windows, Linux, macOS Intel local Whisper:**

```bash
pip install "micoracle[faster]"
```

If Windows has an `onnxruntime` resolver conflict, use a clean Python 3.12/3.13
venv or use OpenAI STT:

```bash
pip install "micoracle[openai]"
micoracle --stt-backend openai --no-speak --target-app active --dictate
```

Set `OPENAI_API_KEY` in your shell first, for example:

```powershell
$env:OPENAI_API_KEY="your_key_here"
```

### OS Setup

**macOS:** grant Microphone, Accessibility, and Automation permissions to your terminal.

**Linux X11:**

```bash
sudo apt install xdotool
```

**Linux Wayland:**

```bash
sudo apt install wtype wl-clipboard
```

**Linux offline TTS only:**

```bash
sudo apt install espeak
pip install "micoracle[pyttsx3]"
```

**Windows:** no extra Python dispatch packages are needed; `pyperclip`, `pyautogui`,
`pywin32`, and `psutil` are installed automatically.

### Configure

```bash
cp .env.example .env
# Edit .env to set API keys (only if using cloud backends), pick backends, etc.
```

## Quickstart

```bash
# Focus Cursor, Claude CLI, Codex CLI, or any terminal input box, then:
micoracle --no-speak --target-app active --dictate

# Wake-word mode instead of direct dictation:
micoracle --no-speak --target-app active

# Override STT/TTS at launch:
micoracle --stt-backend openai --tts-backend none --target-app active --dictate
```

Speak:

- **One-shot:** *"Codex, write a Python hello world."* → transcribed + pasted
  into the focused app.
- **Two-step:** *"Codex."* → you hear *"listening"* → within 8 s say the
  prompt → pasted.

## CLI reference

| Flag | Default | Description |
|---|---|---|
| `--device <id\|name>` | system default mic | Audio input device. |
| `--list-devices` | — | Print available input devices and exit. |
| `--target-app <name>` | `active` | Paste into focused window, or target a specific app/process. |
| `--dictate` | off | Paste every transcribed utterance without requiring "Codex" / "Claude". |
| `--stt-backend` | `auto` | `auto` / `mlx` / `faster` / `openai` / `azure` / `elevenlabs` / `gemini`. |
| `--tts-backend` | `auto` | `auto` / `say` / `pyttsx3` / `openai` / `azure` / `none`. |
| `--no-speak` | — | Alias for `--tts-backend none`. |

## Environment variables

See [`.env.example`](./.env.example) for the full commented list. Highlights:

| Variable | Purpose |
|---|---|
| `VOICE_AGENT_STT_BACKEND` | Default STT backend |
| `VOICE_AGENT_TTS_BACKEND` | Default TTS backend |
| `VOICE_AGENT_TARGET_APP` | Default dispatch target |
| `VOICE_AGENT_INPUT_DEVICE` | Default mic |
| `VOICE_AGENT_MLX_REPO` | MLX Whisper HF repo |
| `VOICE_AGENT_FASTER_MODEL` | faster-whisper model name |
| `OPENAI_API_KEY` | For `openai` STT/TTS backends |
| `AZURE_OPENAI_ENDPOINT` / `AZURE_OPENAI_KEY` / `AZURE_WHISPER_DEPLOYMENT` | For `azure` STT |
| `AZURE_SPEECH_KEY` / `AZURE_SPEECH_REGION` | For `azure` TTS |

## Architecture

```
[ mic ] ──▶ sounddevice callback ──▶ audio_q ──▶ main loop
                                                   │
                                                   ▼  (VAD + preroll buffer)
                                               utterance_q
                                                   │
                                                   ▼
                  worker: STTBackend → wake/cleanup → PlatformAdapter → TTSBackend
```

- `stt.py` — `STTBackend` interface + 4 implementations + OS-aware factory.
- `tts.py` — `TTSBackend` interface + 4 implementations + factory.
- `platform_adapter.py` — `PlatformAdapter` interface + `MacAdapter`,
  `LinuxAdapter`, `WindowsAdapter` + `get_platform_adapter()` factory.
- `hands_free_voice.py` — mic capture, VAD state machine, wake-state, wiring.

## Troubleshooting

**No input devices shown.** OS microphone permission for your terminal is
missing. Grant it (macOS: Privacy & Security → Microphone; Linux: check
PulseAudio / PipeWire; Windows: Settings → Privacy → Microphone) and relaunch
the terminal.

**Wake word never fires.** Confirm the right mic via `--list-devices`. Say
*"Codex"* slowly and clearly — fuzzy matching covers most mishears but a low
input gain can strip initial consonants.

**`[dispatch error]` on Wayland.** Wayland blocks programmatic window focusing.
Pass `--target-app` and keep that window focused yourself.

**Windows: keystrokes sent to the wrong window.** Windows' focus-stealing
prevention can block `SetForegroundWindow`. Give the target window focus
manually, or use [AutoHotkey] to nudge focus-stealing permissions.

**macOS: keystrokes ignored.** Accessibility + Automation permissions missing
for the terminal. System Settings → Privacy & Security → Accessibility /
Automation.

## Privacy & security

- **Local backends keep audio on-device.** MLX Whisper and faster-whisper do
  not make any network calls at inference time.
- **Cloud backends upload audio** (OpenAI / Azure). Use only if you're comfortable.
- **Clipboard** is temporarily overwritten with each dispatch; original
  contents are restored immediately after.
- **No telemetry.** No analytics. No phone-home.
- **Accessibility / Automation permissions are powerful** — the agent types
  into the focused app and presses Enter. Review the source before granting.

## License

[MIT](./LICENSE) © 2026 Pradip Tivhale

## Acknowledgements

- [MLX Whisper](https://github.com/ml-explore/mlx-examples), [faster-whisper](https://github.com/SYSTRAN/faster-whisper)
- [py-webrtcvad](https://github.com/wiseman/py-webrtcvad)
- [sounddevice](https://python-sounddevice.readthedocs.io/)
- [xdotool](https://github.com/jordansissel/xdotool), [wtype](https://github.com/atx/wtype), [pyautogui](https://pyautogui.readthedocs.io/)
- [pyttsx3](https://pyttsx3.readthedocs.io/)
