Alexander Brittain415 downloadsDictate notes with Whisper or Cohere Transcribe; clean up with a local Ollama model. Private, on-device speech-to-text for Obsidian.
On-device dictation plugin for Obsidian. Talk directly into your notes with fast accurate transcription from top models running on your CPU or GPU.
Transcription runs entirely on your device. No accounts or cloud required. A fast rust sidecar handles the inference and all models can be downloaded directly in the settings.
Install Local Dictation from Obsidian's Community Plugins. A setup wizard downloads the engine and a starter model on first launch.
Then click the microphone in the ribbon, or bind a hotkey to Local Dictation: Toggle dictation, and start talking. Text lands at your cursor.
CPU works everywhere with no extra setup. Hardware acceleration is available for faster transcription — use Metal (macOS, automatic) or CUDA on a recent NVIDIA GPU (RTX 20-series / GTX 16-series or newer, with a current driver). See the CUDA setup guide to enable it.
macOS and Windows are the primary tested targets. On Linux, the plugin is used daily on Fedora 44 (native and Flatpak); other distributions should work but aren't routinely verified. If something breaks on yours, open an issue.
Moonshine live dictation is English-only. Streaming models do not apply speaker labels, and long speech is split at a 30-second utterance cap. Tiny is intended for lower-end CPUs, Small is the recommended balance, and Medium trades additional compute and memory for accuracy. Batch models keep their existing behavior.
Your audio never leaves your device — transcription is always local. The sidecar and models download once from GitHub Releases, and model files live outside your vault.
Local LLMs are limited in capability, so you can route your transcribed text to OpenRouter for frontier models and much larger context windows. Restrict it to ZDR endpoints and approved providers in OpenRouter to match your own privacy standards. Remote LLM features turn off with a single toggle, and a second toggle disables all LLM features in the plugin.
A TypeScript plugin paired with a rust sidecar for inference. See CONTRIBUTING.md for the architecture, setup, and workflow.
Local Dictation is MIT-licensed — see LICENSE.
The models bundled in the sidecar are openly licensed too: Silero VAD (MIT) for voice activity detection, plus the diarization models WeSpeaker (CC-BY-4.0) and pyannote segmentation (MIT). See Full attributions THIRD_PARTY_NOTICES.md.