denizokcu288 downloadsRecord audio and transcribe it to markdown using OpenAI Audio Transcription API.
Capture thoughts before they disappear. Voice MD is a mobile-friendly voice capture plugin for Obsidian that records quick ideas, meeting recaps, and conversations, then turns them into Markdown notes you can actually use.
Use it when you are walking, commuting, leaving a meeting, or sitting at your desk and want speech to land directly in your vault.
Open any note, start recording, and stop when you are done. Voice MD inserts the transcript as a new paragraph at your current cursor position, so you can capture ideas without breaking your writing flow.
Voice MD registers an Obsidian URL action for iOS Shortcuts:
obsidian://voice-md?record=true&daily=true&autostart=true
Add this URL to an iOS Shortcut using Open URLs, then assign that shortcut to the iPhone Action Button. Voice MD opens or creates today's configured daily note, waits for the editor, starts recording, and appends the result at the end under a time heading:
## 14:32
Remember to follow up with Sam about the launch notes...
A bare obsidian://voice-md URL does nothing safe. Recording requires record=true, and microphone auto-start requires autostart=true.
For multi-vault setups, include the vault name if needed:
obsidian://voice-md?vault=Your%20Vault&record=true&daily=true&autostart=true
You can also pass an explicit vault-relative note path:
obsidian://voice-md?record=true&file=Daily%2F2026-05-22.md&autostart=true
Enable Meeting mode in the recording modal to use speaker-aware transcription. This works best with 2–6 speakers and recordings longer than 30 seconds.
Example output:
**Speaker A:** Let's review the Q3 numbers.
**Speaker B:** Revenue was up 12%, mostly driven by enterprise.
**Speaker A:** What about churn?
Enable Post-processing to ask a chat model to format the transcript into clean Markdown with headings, lists, and paragraphs. Voice MD saves both:
Voice Transcriptions/transcription-YYYY-MM-DD-HHMMSS-raw.md — raw transcriptVoice Transcriptions/transcription-YYYY-MM-DD-HHMMSS.md — structured note linked back to the raw transcriptRaw transcripts are saved before structuring, so a formatting failure does not discard the transcription.
Voice MD is designed for mobile use, not just desktop dictation.
Note: if iOS or Android terminates Obsidian while recording is still active, audio that has not reached the stopped/saved state may still be lost.
Settings → Voice MD
| Setting | Description | Default |
|---|---|---|
| OpenAI API key | Required for transcription. Stored with Obsidian SecretStorage when available | — |
| Max recording duration | Maximum seconds per recording | 300 |
| Auto-start recording | Start recording immediately when the modal opens | Off |
| Retain failed audio | Days to keep local audio for pending/failed retry jobs | 7 |
| Daily note folder | Folder used by obsidian://voice-md?...daily=true shortcuts |
Vault root |
| Daily note date format | Date format used by daily-note shortcuts | YYYY-MM-DD |
| Use 24-hour time | Use 24-hour timestamps for URL-appended recordings | On |
| Language | Force a language code, or leave blank for auto-detect | Auto |
| Enable post-processing | Default for the post-processing checkbox | Off |
| Chat model | Model used for post-processing. You can enter a current OpenAI model name | gpt-4o-mini |
| Custom formatting prompt | Override the default formatting instructions | — |
main.js, manifest.json, and styles.css to <vault>/.obsidian/plugins/voice-md/.DenizOkcu/voice-md.| Problem | Fix |
|---|---|
| No API key error | Add your OpenAI API key in Settings → Voice MD |
| Recording will not start | Grant microphone permission to Obsidian in your OS settings |
| Transcription fails | Check your API key, credits, and network. If audio was saved, run Retry pending voice transcriptions |
| Daily note shortcut opens the wrong place | Set Daily note folder and Daily note date format to match your Daily Notes setup; include vault= for multi-vault iOS setups |
| No speaker labels | Meeting mode works best with 2–6 speakers and recordings over 30 seconds |
| Long meeting formatting is incomplete | Try a model with a larger output/context limit or keep raw transcripts enabled as the source of truth |