Search...Search plugins and themes...
⌘K
Sign in
  • Get started
  • Download
  • Pricing
  • Enterprise
  • Account
  • Obsidian
  • Overview
  • Sync
  • Publish
  • Canvas
  • Mobile
  • Web Clipper
  • CLI
  • Learn
  • Help
  • Developers
  • Changelog
  • About
  • Roadmap
  • Blog
  • Resources
  • System status
  • License overview
  • Terms of service
  • Privacy policy
  • Security
  • Community
  • Plugins
  • Themes
  • Discord
  • Forum / 中文论坛
  • Merch store
  • Brand guidelines
Follow us
DiscordTwitterBlueskyThreadsMastodonYouTubeGitHub
© 2026 Obsidian

Voxtral Transcribe

maxonamissionmaxonamission533 downloads

Real-time streaming transcription using Mistral Voxtral, supports voice commands (headings, lists, to-dos and more) and automatic text correction.

Add to Obsidian
  • Overview
  • Scorecard
  • Updates27

Thoughts move fast. Your transcription should keep up.

Voxtral Transcribe streams text into your notes as you speak. Add structure by voice (headings, bullets, to-dos), or grab the keyboard mid-dictation — the mic waits for you and resumes when you stop typing. Edits happen as you go, not after.

Dictate directly into Markdown using Mistral's Voxtral speech-to-text. Insert headings, lists, to-dos and other elements by voice, correct text inline or on the fly, use real-time streaming or batch tap-to-send. Supports transcription in 13+ languages.

Get going in under a minute

  1. Install and paste your Mistral API key
  2. Press Ctrl+Space (desktop) or tap the mic icon (mobile)
  3. Start talking — say "heading 2", "new bullet", "for the correction: ..." as you go

Why Voxtral?

Voxtral is purpose-built for transcription, not retrofitted from a general audio model. Three things that matter for dictation:

  • Low word error rate on hard audio — handles background noise, accents, and technical jargon well, including on continuous speech
  • Streaming-first — designed for low-latency partial results, which is what makes "text appears as you speak" feel real-time instead of stuttery
  • Multilingual by design — 13+ languages with consistent quality, not English-first with the rest bolted on

If you're choosing between speech-to-text models for dictation specifically (rather than, say, post-hoc transcription of meeting recordings), this is a strong fit.

Features

  • Real-time streaming (desktop) — text appears as you speak
  • Batch mode with tap-to-send (desktop + mobile) — send audio chunks while you keep talking
  • Voice commands — insert headings, bullet points, to-do items, numbered lists, and more by voice
  • 13 languages — voice commands automatically adapt to the selected language; English always works as fallback (Dutch, English, French, German, Spanish, Portuguese, Italian, Russian, Chinese, Hindi, Arabic, Japanese, Korean)
  • Voice command help panel — shows available commands and trigger phrases for the active language
  • Auto-correction — spelling, capitalization, and punctuation are automatically corrected after recording
  • Inline correction instructions — say "for the correction: ..." and the corrector will follow your instructions
  • Self-correction recognition — "no not X but Y" is handled automatically
  • Mishearing correction — common speech recognition errors are fixed automatically per language
  • Microphone selection — choose which microphone to use
  • Auto-pause on focus loss — configurable behavior when switching apps on mobile
  • Configurable Enter-to-send — optionally use Enter as tap-to-send when the mic is live (batch mode)
  • Typing cooldown — adjustable delay before the mic resumes after typing

Need coffee to process all this? Me too. ☕ Buy Me a Coffee

Requirements

  • Obsidian v1.0.0 or newer
  • Mistral API key — free to create at console.mistral.ai

Installation

From Community Plugins (recommended)

  1. Open Settings > Community plugins > Browse
  2. Search for "Voxtral Transcribe"
  3. Click Install, then Enable
  4. Go to Settings > Voxtral Transcribe and enter your Mistral API key

Manual installation

  1. Download main.js, manifest.json, and styles.css from the latest release
  2. Create a folder .obsidian/plugins/voxtral-transcribe/ in your vault
  3. Copy the three files into that folder
  4. Restart Obsidian and enable the plugin in Settings > Community plugins

Usage

Desktop (real-time mode)

  1. Open a note
  2. Click the microphone icon in the ribbon, or press Ctrl+Space
  3. Start speaking — text appears live in your note
  4. Click the microphone again or say "stop recording" to stop
  5. Auto-correction runs automatically if enabled

Mobile (batch mode)

On mobile, only batch mode is available (real-time streaming requires Node.js).

  1. Open a note
  2. Tap the microphone icon to start recording
  3. Tap the send icon in the view header to transcribe the current audio chunk — the recording keeps going
  4. On desktop, press Enter while the mic is live (not typing) to send a chunk (if Enter = tap-to-send is enabled)
  5. Keep talking and tap/press send again for the next chunk
  6. Tap the microphone to stop — the last chunk is processed automatically

Voice commands

Voice commands are recognized at the end of a sentence. Commands automatically adapt to the language selected in settings — the table below shows examples in English, but equivalent phrases are available in all 13 supported languages. Open the Voice Commands help panel (ribbon icon or command palette) to see the exact phrases for your active language.

Command Example (English) Result
New paragraph "new paragraph" Double line break
New line "new line" Single line break
Heading 1–3 "heading 1" / "heading 2" / "heading 3" # / ## / ###
Bullet point "bullet point" -
To-do item "new todo" - [ ]
Numbered item "numbered item" 1. (auto-increments)
Delete last paragraph "delete last paragraph" Removes last paragraph
Delete last line "delete last line" Removes last sentence
Undo "undo" Undo last action
Stop recording "stop recording" Stops the recording

Text correction

  • Correct selection: Select text > Command palette > "Correct selected text"
  • Correct entire note: Command palette > "Correct entire note"

Focus loss behavior

When switching apps on mobile, you can configure what happens to an active recording:

  • Pause immediately (default) — pauses and resumes when you return
  • Pause after delay — keeps recording for a configurable time (10s–5min), then pauses
  • Keep recording — continues recording in the background

Settings

Setting Description
Mistral API key Your API key from console.mistral.ai
Microphone Which microphone to use
Mode Realtime (desktop only) or Batch
Enter = tap-to-send Use Enter to send audio chunks when mic is live (batch mode, default: on)
Typing cooldown Delay before mic resumes after typing (default: 800 ms)
On focus loss Pause immediately / after delay / keep recording
Language Language for transcription and voice commands (13 languages, default: Nederlands)
Auto-correct Enable/disable automatic correction
Streaming delay Latency vs accuracy tradeoff for realtime mode

Development

npm install
npm run dev    # watch mode
npm run build  # production build

License

GPL-3.0 — Copyright (c) 2026 Max Kloosterman

95%
HealthExcellent
ReviewPassed
About
Dictate directly into Markdown using Mistral's Voxtral speech-to-text. Insert headings, lists, to-dos and other elements by voice, correct text inline or on the fly, use real-time streaming or batch tap-to-send. Supports transcription in 13+ languages.
AIWritingEditing
Details
Current version
1.1.1
Last updated
4 days ago
Created
3 months ago
Updates
27 releases
Downloads
533
Compatible with
Obsidian 1.0.0+
Platforms
Desktop, Mobile
License
GPL-3.0
Report bugRequest featureReport plugin
Sponsor
Buy Me a Coffee
Author
maxonamissionmaxonamission
github.com/maxonamission
GitHubmaxonamission
  1. Community
  2. Plugins
  3. AI
  4. Voxtral Transcribe

Related plugins

Text Generator

Generate text content using GPT-3 (OpenAI).

Smart Composer

AI chat with note context, smart writing assistance, and one-click edits for your vault.

LanguageTool Integration

advanced spell/grammar checks with the help of language-tool.

LanguageTool

Unofficial integration of the LanguageTool spell and grammar checker.

Local GPT

Local Ollama and OpenAI-like GPT's assistance for maximum privacy and offline access.

Gemini Scribe

Allows you to interact with Gemini and use your notes as context.

ChatGPT MD

A seamless integration of ChatGPT, OpenRouter.ai and local LLMs via Ollama into your notes.

Neural Composer

Local Graph RAG powered by LightRAG. Chat with your notes using deep knowledge graph connections.

BMO Chatbot

Generate and brainstorm ideas while creating your notes using Large Language Models (LLMs) such as OpenAI's "gpt-3.5-turbo" and "gpt-4".

Advanced Tables

Improved table navigation, formatting, and manipulation.