tzamtzis36 downloadsTranscribe audio files (m4a, mp3) and extract actionable insights using AI. Supports all languages with local or cloud processing.
Transform your audio recordings into structured, actionable notes automatically. This plugin transcribes audio files (m4a, mp3) and extracts key points, action items, and follow-ups using AI - all within your Obsidian vault.
After installing from Obsidian Community Plugins:
You'll see a new microphone icon in your left ribbon bar.
When you first use the plugin, you'll see a welcome screen:
============================================
Welcome to Audio Transcription Plugin!
============================================
Before you can transcribe audio files,
you need to download a transcription
model.
Recommended: Medium Model (1.5 GB)
- Best balance of speed and accuracy
- Good Greek language support
- Can process 1-hour audio in ~20 mins
Other options:
- Small (466 MB) - Faster, less accurate
- Large (2.9 GB) - Best quality, slower
[Download Medium Model] [Choose Other]
Or use cloud processing (no download):
[Configure Cloud API]
============================================
If you click "Download Medium Model", you'll see:
============================================
Downloading Whisper Medium Model...
============================================
Progress: [##########....] 742 MB / 1.5 GB
Estimated time remaining: 3 minutes
This is a one-time download. The model
will be saved to your plugin folder.
[Cancel Download]
============================================
Open Settings > Audio Transcription to see:
============================================
TRANSCRIPTION SETTINGS
============================================
Processing Mode
(*) Local (Whisper.cpp) - Private, no internet needed
( ) Cloud (OpenAI Whisper) - Faster, requires API key
( ) Cloud (Groq) - Fastest, very low cost, requires API key
( ) Cloud (OpenRouter) - Use custom models
Model Size (for local processing)
[ tiny | base | small | *medium | large ]
Default Language
(*) Auto-detect - Let the AI figure it out
( ) English only
( ) Greek only
( ) Multilingual (both)
[X] Enable Speaker Diarization (identify speakers)
--------------------------------------------
ANALYSIS SETTINGS
--------------------------------------------
Analysis Provider
(*) Local (Ollama) - Requires Ollama installed
( ) Cloud (OpenRouter) - Requires API key
Custom Analysis Instructions (optional)
+---------------------------------------+
| Focus on technical decisions and |
| deadlines. Tag people using @name |
| format. Identify risks and blockers. |
+---------------------------------------+
--------------------------------------------
API KEYS (for cloud processing)
--------------------------------------------
OpenAI API Key (for Whisper API)
[sk-************************************************]
OpenRouter API Key (for analysis)
[sk-or-********************************************]
OpenRouter Model Name
[meta-llama/llama-3.2-3b-instruct ]
--------------------------------------------
MODEL MANAGEMENT
--------------------------------------------
Local Models Path: ./models/
Installed Models:
( ) tiny.bin - Not downloaded
( ) base.bin - Not downloaded
( ) small.bin - Not downloaded
(*) medium.bin - ✓ Installed (1.5 GB)
( ) large.bin - Not downloaded
[Download Selected Model] [Delete Model]
--------------------------------------------
OUTPUT SETTINGS
--------------------------------------------
Output Folder
[/Transcriptions ] [Browse]
[X] Include timestamps in transcription
[X] Auto-create tags from analysis
[X] Skip files that are already analyzed
[Save Settings]
============================================
There are two ways to start transcription:
Method 1: Right-click on an audio file
File: meeting-2025-01-15.m4a
-----------------------------
Rename
Delete
Copy path
Move file
> Transcribe audio file <-- Click here!
Properties
Method 2: Use the ribbon icon
You'll see a notification in the bottom-right corner:
======================================
Transcribing: meeting-2025-01-15
======================================
Step 1/3: Transcribing audio...
Progress: [########..] 73%
Estimated time: 4 minutes remaining
[Cancel Transcription]
======================================
After transcription completes:
======================================
Transcribing: meeting-2025-01-15
======================================
Step 2/3: Analyzing content...
Extracting key points and actions
[#########.] 85%
======================================
Then:
======================================
Transcribing: meeting-2025-01-15
======================================
Step 3/3: Creating markdown file...
[###########] 100%
======================================
Finally:
======================================
✓ Transcription Complete
======================================
File created: meeting-2025-01-15.md
Duration: 1:32:45
Processing time: 18 minutes
[Open File] [Dismiss]
======================================
Clicking "Open File" opens your new markdown note:
---
audio_file: "meeting-2025-01-15.m4a"
duration: "1:32:45"
transcribed_date: 2025-01-15T14:32:00
language: "en"
speakers: 3
tags: [meeting, transcription, project-alpha, budget-review]
---
# Meeting Transcription: Q1 Budget Review
**Audio File:** meeting-2025-01-15.m4a
**Date:** January 15, 2025
**Duration:** 1 hour 32 minutes
**Participants:** 3 speakers identified
---
## Summary
This meeting covered the Q1 budget review for Project Alpha. The team discussed resource allocation, timeline adjustments due to staffing changes, and identified three critical blockers that need immediate attention. A follow-up meeting was scheduled for next week to finalize the revised timeline.
---
## Key Points
- **Budget approved** for additional contractor support ($45K)
- **Timeline extended** by 2 weeks due to Sarah's onboarding delay
- **Marketing campaign** launch postponed to March 1st
- **New feature request** from client - needs feasibility assessment
- **Risk identified**: Current API rate limits may impact performance testing
- **Decision made**: Switch to microservices architecture for Phase 2
---
## Action Items
- [ ] @john Review and approve contractor agreements by Friday (Jan 19)
- [ ] @sarah Set up development environment and complete onboarding checklist
- [ ] @mike Research API rate limit solutions and present options (due: Jan 22)
- [ ] @team Update project timeline in Jira with new milestones
- [ ] @john Schedule client call to discuss new feature request
- [ ] @sarah Create technical specification for microservices migration
---
## Follow-up Questions
- What is the exact scope of the new client feature request?
- Do we have budget flexibility if API solution requires paid tier upgrade?
- Has legal reviewed the contractor agreements?
- When will the new designer start?
---
## Full Transcription
**Speaker 1 (John)** [00:00:15]
Good morning everyone. Thanks for joining today's Q1 budget review. I know we're all busy, so let's try to keep this focused. Sarah, welcome to the team - this is your first planning meeting with us.
**Speaker 2 (Sarah)** [00:00:28]
Thanks John! Happy to be here. Looking forward to diving in.
**Speaker 1 (John)** [00:00:32]
Great. So let me start with the budget overview. We've been tracking expenses closely, and I'm happy to report we're actually 8% under budget for Q4, which gives us some flexibility going forward.
**Speaker 3 (Mike)** [00:00:48]
That's great news. Does that mean we can move forward with the contractor support we discussed?
**Speaker 1 (John)** [00:00:53]
Yes, exactly. I'm proposing we allocate $45,000 for two contractors to help with the frontend work. This should accelerate our timeline significantly.
**Speaker 2 (Sarah)** [00:01:08]
Just to clarify - would these contractors be working on the React components or the new design system?
**Speaker 1 (John)** [00:01:15]
Both, actually. We need someone who can implement the designs and also help establish the component library patterns.
[Transcription continues for full 1:32:45...]
---
**Speaker 3 (Mike)** [01:31:52]
Alright, I think we've covered everything. I'll send out the meeting notes later today.
**Speaker 1 (John)** [01:32:02]
Perfect. Thanks everyone. Let's sync up again next Tuesday.
**Speaker 2 (Sarah)** [01:32:08]
Sounds good. Thanks all!
[End of transcription]
If you right-click on the same audio file and select "Transcribe audio file" again:
======================================
Analysis Already Exists
======================================
This audio file has already been
transcribed and analyzed.
File: meeting-2025-01-15.md
Created: 2025-01-15 at 14:32
[Open Existing File] [OK]
======================================
This prevents accidental duplicate processing and wasted time.
If something goes wrong during transcription:
======================================
Transcription Failed
======================================
⚠ The transcription process failed
Error: Could not process audio file
Possible causes:
- Audio file may be corrupted
- Unsupported audio codec
- Insufficient disk space
The plugin automatically retried
once but encountered the same error.
[View Detailed Logs] [Close]
======================================
<vault>/.obsidian/plugins/obsidian-transcription-plugin/Required for every mode: after transcription the plugin runs an AI analysis pass, and setup validation currently requires an OpenRouter API key and analysis model for all processing modes — Local, OpenAI Whisper, and Groq alike. Configure it under Configuring Analysis (AI Insights) before your first transcription, otherwise it stops with an "OpenRouter API key not configured" error.
Advantages:
Setup Steps:
Note: First transcription may take a few minutes as the system initializes. Subsequent transcriptions will be faster.
Advantages:
Cost: $0.006 per minute ($0.72 for 2-hour recording)
Setup Steps:
Advantages:
whisper-large-v3-turbo)Cost: from ~$0.04 per hour; free tier available
Setup Steps:
Note: Groq's free tier caps audio at ~25 MB; the paid tier allows up to ~100 MB.
OpenRouter does not transcribe audio — it powers the AI analysis step only. A "Cloud (OpenRouter)" entry currently appears in the processing-mode dropdown, but selecting it for transcription fails with "OpenRouter transcription not supported". For transcription, use Local, OpenAI Whisper, or Groq (Options 1–3 above). To set up your OpenRouter API key and analysis model, see Configuring Analysis (AI Insights) below.
The plugin can analyze your transcriptions to extract key information.
Option 1: Local Analysis with Ollama (Free)
ollama pull llama3.2:3b in your terminalOption 2: Cloud Analysis with OpenRouter
meta-llama/llama-3.2-3b-instruct)Want the AI to focus on specific things? Add custom instructions:
Examples:
For project meetings:
- Tag all participants with @ symbol
- Identify technical decisions and mark with [DECISION]
- Flag any mentioned deadlines with [DEADLINE]
- Highlight budget discussions
For lecture notes:
- Extract key concepts and definitions
- Create a glossary of technical terms
- Identify examples and case studies
- Note any assigned homework or readings
For interviews:
- Identify main themes discussed
- Extract interesting quotes verbatim
- Note emotional reactions or emphasis
- Highlight follow-up topics
Scenario: You recorded a 45-minute team standup meeting with 4 participants.
Steps:
team-standup-2025-01-15.m4a in your vaultteam-standup-2025-01-15.md fileResult: You get a complete transcription with:
Scenario: 1-hour client discussion with sensitive information. Privacy is critical.
Steps:
client-call-acme-corp.m4aResult: Complete transcription that never left your computer, with client commitments clearly marked.
Scenario: 90-minute university lecture in Greek
Steps:
Result: Full Greek transcription with technical terms identified and defined.
Scenario: Meeting where participants switch between English and Greek
Steps:
Result: Accurate transcription with both languages correctly identified and transcribed.
Local Processing:
Your Audio File → Your Computer → Whisper Model → Transcription
↓
Ollama (Local) → Analysis
↓
Your Vault (.md file)
Nothing leaves your computer. Complete privacy.
Cloud Processing:
Your Audio File → OpenAI/OpenRouter API → Transcription
↓
OpenRouter API → Analysis
↓
Your Vault (.md file)
Audio and transcript sent to external servers. Review your API provider's privacy policy.
This plugin is desktop-only (marked as isDesktopOnly in its manifest) and Obsidian's review surfaces a few capabilities it uses. Here is exactly what each one is for and how it is scoped, so you can make an informed decision:
fs)In every mode the plugin reads the audio file you select from disk (audio frequently lives outside the vault) so it can transcribe or upload it. Local processing additionally:
whisper.cpp binary into the plugin's own folder (<vault>/.obsidian/plugins/<plugin>/models and /bin).Generated transcripts are always written into your vault via the Obsidian vault API, never through raw filesystem calls.
child_process)The plugin runs external binaries, but no command is ever passed through a shell — each is invoked with an argument array (execFile/spawn), and the audio path is always a separate argument, never interpolated into a command string:
ffprobe — a one-shot duration probe run before transcription in every mode (local and cloud), used only to show a time estimate.ffmpeg — local mode only, converts the audio to 16 kHz mono WAV.whisper.exe — local mode only, performs the transcription.Write-only, and only from the "Copy URL / Copy path / Copy filename" buttons in the manual-download-instructions dialog, which appears if an automatic model download fails so you can fetch the file by hand. The plugin never reads your clipboard.
Even in a Cloud mode the plugin still reads the audio file you select (filesystem) and runs a single
ffprobeduration probe (child_process, argv — no shell). It does not download models, runffmpeg/whisper.exe, or write anything outside your vault unless you use Local mode or the manual-download fallback.
Q: How accurate is the transcription?
A: Using the medium or large model, transcription accuracy is typically 90-95% for clear audio in English or Greek. Accuracy depends on:
Q: Can it handle multiple speakers?
A: Yes! When speaker diarization is enabled, the plugin identifies different speakers and labels them as "Speaker 1", "Speaker 2", etc. It cannot currently identify speakers by name automatically.
Q: What audio formats are supported?
A: Currently .m4a and .mp3 files. Support for .wav, .ogg, and .flac may be added in future versions.
Q: How long does transcription take?
A: Processing time varies:
Q: Will it work offline?
A: Yes, if you use local processing. Once models are downloaded, you can transcribe without internet.
Q: Where are models stored?
A: Models are stored in <vault>/.obsidian/plugins/obsidian-transcription-plugin/models/
Q: Can I use my own Whisper model?
A: Currently, the plugin uses official Whisper.cpp models from HuggingFace. Custom model support may be added later.
Q: What if I don't have Ollama installed?
A: You can still use cloud analysis via OpenRouter, or skip the analysis step and just get the transcription.
Q: How much disk space do I need?
A: Model sizes:
Plus temporary space for audio processing (usually 2-3x the audio file size).
Q: Does it work on mobile (iOS/Android)?
A: Not yet. Currently Windows desktop only. Mobile support may come in future updates.
Q: Transcription failed with "model not found" error
A: Go to Settings → Audio Transcription → Model Management and download your selected model.
Q: The transcription is very inaccurate
A: Try these solutions:
Q: Plugin says "analysis already available" but I don't see a file
A: The markdown file might be in your configured output folder. Check Settings → Audio Transcription → Output Settings to see the folder path.
Q: Processing is very slow
A:
Q: Speaker diarization isn't working
A: Speaker diarization requires cloud processing with Assembly AI (coming in Phase 3) or local pyannote installation (advanced). Currently limited functionality.
Q: Can I edit the transcription after it's created?
A: Absolutely! It's a markdown file in your vault. Edit it like any other note.
Q: Can I re-transcribe if I'm not happy with the results?
A: Yes. Delete the generated markdown file first, then transcribe again. The plugin skips files that already have analysis.
Q: Can I transcribe video files?
A: Not directly. Extract the audio first using a tool like VLC or FFmpeg, then transcribe the audio file.
Q: How do I share transcriptions with others?
A: They're standard markdown files. Export to PDF, copy the text, or share the .md file directly.
Q: Is my audio data private?
A: With local processing: Yes, completely private. Audio never leaves your device. With cloud processing: Audio is sent to API provider (OpenAI, OpenRouter). Check their privacy policies.
Q: How much do cloud APIs cost?
A: Approximate costs:
Q: Do I need a paid Obsidian account?
A: No. This plugin works with free Obsidian.
This is an open-source project! Contributions welcome:
MIT License - Free to use, modify, and distribute.
Built with:
Special Thanks:
ffprobe duration probe via execFile (argv, no shell) — prevents command injection through crafted audio file namesMade with ♥ for the Obsidian community
Transform your audio into knowledge. Start transcribing today!