Audio Transcription

tzamtzis253 downloads

Transcribe audio files (m4a, mp3) and extract actionable insights using AI. Supports all languages with local or cloud processing.

Add to Obsidian

Overview
Scorecard
Updates3

Transform your audio recordings into structured, actionable notes automatically. This plugin transcribes audio files (m4a, mp3) and extracts key points, action items, and follow-ups using AI - all within your Obsidian vault.

Features

Automatic Transcription: Convert meeting recordings, lectures, and interviews into text
Multilingual Support: Transcribe in any language Whisper supports, with automatic language detection
AI-Powered Analysis: Extract summaries, key points, action items, and follow-up questions
Local or Cloud Processing: Choose between privacy-focused local processing or faster cloud APIs
Speaker Identification: Distinguish between different speakers in conversations
Long Audio Support: Handle recordings up to 2+ hours
Seamless Integration: Results saved as markdown files in your vault
Customizable: Add your own analysis instructions for personalized results

What It Looks Like: End-User Journey

Step 1: Installing the Plugin

After installing from Obsidian Community Plugins:

Open Obsidian Settings (gear icon)
Navigate to "Community Plugins"
Search for "Audio Transcription"
Click "Install" then "Enable"

You'll see a new microphone icon in your left ribbon bar.

Step 2: First-Time Setup

When you first use the plugin, you'll see a welcome screen:

============================================
  Welcome to Audio Transcription Plugin!
============================================

  Before you can transcribe audio files,
  you need to download a transcription
  model.

  Recommended: Medium Model (1.5 GB)
  - Best balance of speed and accuracy
  - Good Greek language support
  - Can process 1-hour audio in ~20 mins

  Other options:
  - Small (466 MB) - Faster, less accurate
  - Large (2.9 GB) - Best quality, slower

  [Download Medium Model]  [Choose Other]

  Or use cloud processing (no download):
  [Configure Cloud API]
============================================

If you click "Download Medium Model", you'll see:

============================================
  Downloading Whisper Medium Model...
============================================

  Progress: [##########....] 742 MB / 1.5 GB

  Estimated time remaining: 3 minutes

  This is a one-time download. The model
  will be saved to your plugin folder.

            [Cancel Download]
============================================

Step 3: Configuring Settings (Optional)

Open Settings > Audio Transcription to see:

============================================
        TRANSCRIPTION SETTINGS
============================================

Processing Mode
(*) Local (Whisper.cpp) - Private, no internet needed
( ) Cloud (OpenAI Whisper) - Faster, requires API key
( ) Cloud (Groq) - Fastest, very low cost, requires API key
( ) Cloud (OpenRouter) - Use custom models

Model Size (for local processing)
[  tiny  |  base  | small | *medium | large  ]

Default Language
(*) Auto-detect - Let the AI figure it out
( ) English only
( ) Greek only
( ) Multilingual (both)

[X] Enable Speaker Diarization (identify speakers)

--------------------------------------------
           ANALYSIS SETTINGS
--------------------------------------------

Analysis Provider
(*) Local (Ollama) - Requires Ollama installed
( ) Cloud (OpenRouter) - Requires API key

Custom Analysis Instructions (optional)
+---------------------------------------+
| Focus on technical decisions and      |
| deadlines. Tag people using @name     |
| format. Identify risks and blockers.  |
+---------------------------------------+

--------------------------------------------
      API KEYS (for cloud processing)
--------------------------------------------

OpenAI API Key (for Whisper API)
[sk-************************************************]

OpenRouter API Key (for analysis)
[sk-or-********************************************]

OpenRouter Model Name
[meta-llama/llama-3.2-3b-instruct              ]

--------------------------------------------
           MODEL MANAGEMENT
--------------------------------------------

Local Models Path: ./models/

Installed Models:
( ) tiny.bin    - Not downloaded
( ) base.bin    - Not downloaded
( ) small.bin   - Not downloaded
(*) medium.bin  - ✓ Installed (1.5 GB)
( ) large.bin   - Not downloaded

[Download Selected Model]  [Delete Model]

--------------------------------------------
           OUTPUT SETTINGS
--------------------------------------------

Output Folder
[/Transcriptions                           ] [Browse]

[X] Include timestamps in transcription
[X] Auto-create tags from analysis
[X] Skip files that are already analyzed

            [Save Settings]
============================================

Step 4: Transcribing Your First Audio File

There are two ways to start transcription:

Method 1: Right-click on an audio file

In your file explorer, find an audio file (meeting-2025-01-15.m4a)
Right-click on it
Select "Transcribe audio file" from the context menu

File: meeting-2025-01-15.m4a
-----------------------------
Rename
Delete
Copy path
Move file
> Transcribe audio file  <-- Click here!
Properties

Method 2: Use the ribbon icon

Click the microphone icon in the left sidebar
A file picker appears
Select your audio file

Step 5: Watching the Progress

You'll see a notification in the bottom-right corner:

======================================
  Transcribing: meeting-2025-01-15
======================================

  Step 1/3: Transcribing audio...
  Progress: [########..] 73%

  Estimated time: 4 minutes remaining

       [Cancel Transcription]
======================================

After transcription completes:

======================================
  Transcribing: meeting-2025-01-15
======================================

  Step 2/3: Analyzing content...
  Extracting key points and actions

  [#########.] 85%
======================================

Then:

======================================
  Transcribing: meeting-2025-01-15
======================================

  Step 3/3: Creating markdown file...

  [###########] 100%
======================================

Finally:

======================================
      ✓ Transcription Complete
======================================

  File created: meeting-2025-01-15.md

  Duration: 1:32:45
  Processing time: 18 minutes

       [Open File]  [Dismiss]
======================================

Step 6: Viewing the Results

Clicking "Open File" opens your new markdown note:

---
audio_file: "meeting-2025-01-15.m4a"
duration: "1:32:45"
transcribed_date: 2025-01-15T14:32:00
language: "en"
speakers: 3
tags: [meeting, transcription, project-alpha, budget-review]
---

# Meeting Transcription: Q1 Budget Review

**Audio File:** meeting-2025-01-15.m4a
**Date:** January 15, 2025
**Duration:** 1 hour 32 minutes
**Participants:** 3 speakers identified

---

## Summary

This meeting covered the Q1 budget review for Project Alpha. The team discussed resource allocation, timeline adjustments due to staffing changes, and identified three critical blockers that need immediate attention. A follow-up meeting was scheduled for next week to finalize the revised timeline.

---

## Key Points

- **Budget approved** for additional contractor support ($45K)
- **Timeline extended** by 2 weeks due to Sarah's onboarding delay
- **Marketing campaign** launch postponed to March 1st
- **New feature request** from client - needs feasibility assessment
- **Risk identified**: Current API rate limits may impact performance testing
- **Decision made**: Switch to microservices architecture for Phase 2

---

## Action Items

- [ ] @john Review and approve contractor agreements by Friday (Jan 19)
- [ ] @sarah Set up development environment and complete onboarding checklist
- [ ] @mike Research API rate limit solutions and present options (due: Jan 22)
- [ ] @team Update project timeline in Jira with new milestones
- [ ] @john Schedule client call to discuss new feature request
- [ ] @sarah Create technical specification for microservices migration

---

## Follow-up Questions

- What is the exact scope of the new client feature request?
- Do we have budget flexibility if API solution requires paid tier upgrade?
- Has legal reviewed the contractor agreements?
- When will the new designer start?

---

## Full Transcription

**Speaker 1 (John)** [00:00:15]
Good morning everyone. Thanks for joining today's Q1 budget review. I know we're all busy, so let's try to keep this focused. Sarah, welcome to the team - this is your first planning meeting with us.

**Speaker 2 (Sarah)** [00:00:28]
Thanks John! Happy to be here. Looking forward to diving in.

**Speaker 1 (John)** [00:00:32]
Great. So let me start with the budget overview. We've been tracking expenses closely, and I'm happy to report we're actually 8% under budget for Q4, which gives us some flexibility going forward.

**Speaker 3 (Mike)** [00:00:48]
That's great news. Does that mean we can move forward with the contractor support we discussed?

**Speaker 1 (John)** [00:00:53]
Yes, exactly. I'm proposing we allocate $45,000 for two contractors to help with the frontend work. This should accelerate our timeline significantly.

**Speaker 2 (Sarah)** [00:01:08]
Just to clarify - would these contractors be working on the React components or the new design system?

**Speaker 1 (John)** [00:01:15]
Both, actually. We need someone who can implement the designs and also help establish the component library patterns.

[Transcription continues for full 1:32:45...]

---

**Speaker 3 (Mike)** [01:31:52]
Alright, I think we've covered everything. I'll send out the meeting notes later today.

**Speaker 1 (John)** [01:32:02]
Perfect. Thanks everyone. Let's sync up again next Tuesday.

**Speaker 2 (Sarah)** [01:32:08]
Sounds good. Thanks all!

[End of transcription]

Step 7: What Happens If You Try Again?

If you right-click on the same audio file and select "Transcribe audio file" again:

======================================
     Analysis Already Exists
======================================

  This audio file has already been
  transcribed and analyzed.

  File: meeting-2025-01-15.md
  Created: 2025-01-15 at 14:32

  [Open Existing File]  [OK]
======================================

This prevents accidental duplicate processing and wasted time.

Step 8: Error Handling Example

If something goes wrong during transcription:

======================================
       Transcription Failed
======================================

  ⚠ The transcription process failed

  Error: Could not process audio file

  Possible causes:
  - Audio file may be corrupted
  - Unsupported audio codec
  - Insufficient disk space

  The plugin automatically retried
  once but encountered the same error.

  [View Detailed Logs]  [Close]
======================================

Installation

Requirements

Obsidian v1.4.0 or higher
For Local Processing:
- 4GB+ RAM (8GB recommended for large models)
- 2-3GB free disk space for models
- Windows 10/11 (64-bit)
For Cloud Processing:
- Internet connection
- API key from OpenAI or OpenRouter

Install from Community Plugins (Recommended)

Open Obsidian Settings
Go to "Community Plugins" and disable Safe Mode
Click "Browse" to open the community plugins browser
Search for "Audio Transcription"
Click "Install"
Once installed, enable the plugin
Follow the first-time setup wizard to download models

Manual Installation (Advanced)

Download the latest release from GitHub
Extract the files to <vault>/.obsidian/plugins/obsidian-transcription-plugin/
Reload Obsidian
Enable the plugin in Settings > Community Plugins

Setup Guide

Required for every mode: after transcription the plugin runs an AI analysis pass, and setup validation currently requires an OpenRouter API key and analysis model for all processing modes — Local, OpenAI Whisper, and Groq alike. Configure it under Configuring Analysis (AI Insights) before your first transcription, otherwise it stops with an "OpenRouter API key not configured" error.

Option 1: Local Processing (Recommended for Privacy)

Advantages:

Complete privacy - audio never leaves your device
No ongoing costs
Works offline
Full control over processing

Setup Steps:

Open plugin settings
Select "Local (Whisper.cpp)" as processing mode
Choose your model size:
- Small (466 MB): Fast, good for English-only, basic quality
- Medium (1.5 GB): Recommended - balanced speed/quality, good Greek support
- Large (2.9 GB): Best quality, excellent multilingual, slower
Click "Download Selected Model"
Wait for download to complete (one-time only)

Note: First transcription may take a few minutes as the system initializes. Subsequent transcriptions will be faster.

Option 2: Cloud Processing with OpenAI Whisper

Advantages:

Faster processing
No large downloads required
Works on any device
Excellent accuracy

Cost: ~~$0.006 per minute (~~$0.72 for 2-hour recording)

Setup Steps:

Get an OpenAI API key from https://platform.openai.com/api-keys
Open plugin settings
Select "Cloud (OpenAI Whisper)" as processing mode
Paste your API key in the "OpenAI API Key" field
Also configure OpenRouter for analysis (required) — see Configuring Analysis (AI Insights)
Save settings

Option 3: Cloud Processing with Groq

Advantages:

Fastest cloud transcription (whisper-large-v3-turbo)
Very low cost (from ~$0.04 per hour of audio)
Generous free tier (no credit card required)

Cost: from ~$0.04 per hour; free tier available

Setup Steps:

Get a free Groq API key from https://console.groq.com/keys
Open plugin settings
Select "Cloud (Groq)" as processing mode
Paste your API key in the "Groq API key" field
Choose a Groq model (turbo = fastest/cheapest, large-v3 = most accurate)
Also configure OpenRouter for analysis (required) — see Configuring Analysis (AI Insights)
Save settings

Note: Groq's free tier caps audio at ~25 MB; the paid tier allows up to ~100 MB.

OpenRouter is for analysis, not transcription

OpenRouter does not transcribe audio — it powers the AI analysis step only. A "Cloud (OpenRouter)" entry currently appears in the processing-mode dropdown, but selecting it for transcription fails with "OpenRouter transcription not supported". For transcription, use Local, OpenAI Whisper, or Groq (Options 1–3 above). To set up your OpenRouter API key and analysis model, see Configuring Analysis (AI Insights) below.

Configuring Analysis (AI Insights)

The plugin can analyze your transcriptions to extract key information.

Option 1: Local Analysis with Ollama (Free)

Install Ollama from https://ollama.ai
Run ollama pull llama3.2:3b in your terminal
In plugin settings, select "Local (Ollama)" as analysis provider
The plugin will automatically connect to Ollama

Option 2: Cloud Analysis with OpenRouter

Get an OpenRouter API key (same as above)
In plugin settings, select "Cloud (OpenRouter)" as analysis provider
Paste your API key
Choose a model (recommended: meta-llama/llama-3.2-3b-instruct)

Adding Custom Analysis Instructions

Want the AI to focus on specific things? Add custom instructions:

Examples:

For project meetings:

- Tag all participants with @ symbol
- Identify technical decisions and mark with [DECISION]
- Flag any mentioned deadlines with [DEADLINE]
- Highlight budget discussions

For lecture notes:

- Extract key concepts and definitions
- Create a glossary of technical terms
- Identify examples and case studies
- Note any assigned homework or readings

For interviews:

- Identify main themes discussed
- Extract interesting quotes verbatim
- Note emotional reactions or emphasis
- Highlight follow-up topics

Usage Examples

Example 1: Team Meeting Notes

Scenario: You recorded a 45-minute team standup meeting with 4 participants.

Steps:

Save recording as team-standup-2025-01-15.m4a in your vault
Right-click → "Transcribe audio file"
Wait ~8-12 minutes (medium model, local processing)
Open the generated team-standup-2025-01-15.md file

Result: You get a complete transcription with:

Each person's comments identified
Action items automatically extracted as checkboxes
Key decisions highlighted
Tagged with relevant project names

Example 2: Client Call (Confidential)

Scenario: 1-hour client discussion with sensitive information. Privacy is critical.

Steps:

Ensure you're using local processing (no cloud APIs)
Record and save as client-call-acme-corp.m4a
Add custom instruction: "Identify all commitments made to the client"
Transcribe

Result: Complete transcription that never left your computer, with client commitments clearly marked.

Example 3: Greek Language Lecture

Scenario: 90-minute university lecture in Greek

Steps:

Use large model for best Greek support
Set language to "Greek only" or "Auto-detect"
Add custom instruction: "Extract key concepts and create a glossary of technical terms"
Transcribe

Result: Full Greek transcription with technical terms identified and defined.

Example 4: Bilingual Meeting (English + Greek)

Scenario: Meeting where participants switch between English and Greek

Steps:

Use medium or large model
Set language to "Multilingual (both)" or "Auto-detect"
Transcribe

Result: Accurate transcription with both languages correctly identified and transcribed.

How It Works (Behind the Scenes)

The Transcription Process

Pre-check: Plugin checks if this audio file was already transcribed
Model Check: Verifies the selected model is downloaded
Audio Loading: Reads the audio file from your vault
Chunking: For long files, splits audio into manageable segments (30-min chunks)
Transcription: Processes each chunk with Whisper
Speaker Diarization: If enabled, identifies different speakers
Language Detection: Automatically detects language(s) in the audio
Assembly: Combines all chunks into complete transcript

The Analysis Process

Transcript Review: AI reads the complete transcription
Context Understanding: Identifies the type of content (meeting, lecture, etc.)
Custom Instructions: Applies any user-defined analysis rules
Extraction: Pulls out:
- Summary (2-3 sentences)
- Key points (bullet list)
- Action items (as checkboxes with assignees)
- Follow-up questions
- Relevant tags
Formatting: Creates structured markdown output

Privacy & Data Flow

Local Processing:

Your Audio File → Your Computer → Whisper Model → Transcription
                                        ↓
                               Ollama (Local) → Analysis
                                        ↓
                            Your Vault (.md file)

Nothing leaves your computer. Complete privacy.

Cloud Processing:

Your Audio File → OpenAI/OpenRouter API → Transcription
                                                ↓
                        OpenRouter API → Analysis
                                                ↓
                            Your Vault (.md file)

Audio and transcript sent to external servers. Review your API provider's privacy policy.

Permissions & capabilities

This plugin is desktop-only (marked as isDesktopOnly in its manifest) and Obsidian's review surfaces a few capabilities it uses. Here is exactly what each one is for and how it is scoped, so you can make an informed decision:

Filesystem access (Node `fs`)

In every mode the plugin reads the audio file you select from disk (audio frequently lives outside the vault) so it can transcribe or upload it. Local processing additionally:

Downloads the Whisper model files and the whisper.cpp binary into the plugin's own folder (<vault>/.obsidian/plugins/<plugin>/models and /bin).
Writes a temporary 16 kHz WAV next to the source audio for conversion, then deletes it.

Generated transcripts are always written into your vault via the Obsidian vault API, never through raw filesystem calls.

Shell execution (Node `child_process`)

The plugin runs external binaries, but no command is ever passed through a shell — each is invoked with an argument array (execFile/spawn), and the audio path is always a separate argument, never interpolated into a command string:

ffprobe — a one-shot duration probe run before transcription in every mode (local and cloud), used only to show a time estimate.
ffmpeg — local mode only, converts the audio to 16 kHz mono WAV.
the downloaded whisper.exe — local mode only, performs the transcription.

Clipboard access

Write-only, and only from the "Copy URL / Copy path / Copy filename" buttons in the manual-download-instructions dialog, which appears if an automatic model download fails so you can fetch the file by hand. The plugin never reads your clipboard.

Even in a Cloud mode the plugin still reads the audio file you select (filesystem) and runs a single ffprobe duration probe (child_process, argv — no shell). It does not download models, run ffmpeg/whisper.exe, or write anything outside your vault unless you use Local mode or the manual-download fallback.

Frequently Asked Questions (FAQ)

General Questions

Q: How accurate is the transcription?

A: Using the medium or large model, transcription accuracy is typically 90-95% for clear audio in English or Greek. Accuracy depends on:

Audio quality (clear recordings work best)
Background noise (quiet environments ideal)
Speaker clarity (distinct voices help)
Language complexity (technical jargon may need review)

Q: Can it handle multiple speakers?

A: Yes! When speaker diarization is enabled, the plugin identifies different speakers and labels them as "Speaker 1", "Speaker 2", etc. It cannot currently identify speakers by name automatically.

Q: What audio formats are supported?

A: Currently .m4a and .mp3 files. Support for .wav, .ogg, and .flac may be added in future versions.

Q: How long does transcription take?

A: Processing time varies:

Small model: ~0.1-0.2x realtime (10-min audio = 1-2 min processing)
Medium model: ~0.3-0.5x realtime (1-hour audio = 18-30 min processing)
Large model: ~0.5-1x realtime (1-hour audio = 30-60 min processing)
Cloud APIs: Much faster, usually 0.05-0.1x realtime

Q: Will it work offline?

A: Yes, if you use local processing. Once models are downloaded, you can transcribe without internet.

Technical Questions

Q: Where are models stored?

A: Models are stored in <vault>/.obsidian/plugins/obsidian-transcription-plugin/models/

Q: Can I use my own Whisper model?

A: Currently, the plugin uses official Whisper.cpp models from HuggingFace. Custom model support may be added later.

Q: What if I don't have Ollama installed?

A: You can still use cloud analysis via OpenRouter, or skip the analysis step and just get the transcription.

Q: How much disk space do I need?

A: Model sizes:

Tiny: 75 MB
Base: 142 MB
Small: 466 MB
Medium: 1.5 GB
Large: 2.9 GB

Plus temporary space for audio processing (usually 2-3x the audio file size).

Q: Does it work on mobile (iOS/Android)?

A: Not yet. Currently Windows desktop only. Mobile support may come in future updates.

Troubleshooting Questions

Q: Transcription failed with "model not found" error

A: Go to Settings → Audio Transcription → Model Management and download your selected model.

Q: The transcription is very inaccurate

A: Try these solutions:

Upgrade to a larger model (medium or large)
Check audio quality - clear recordings work best
Set the correct language instead of auto-detect
Ensure audio file isn't corrupted

Q: Plugin says "analysis already available" but I don't see a file

A: The markdown file might be in your configured output folder. Check Settings → Audio Transcription → Output Settings to see the folder path.

Q: Processing is very slow

Local processing is CPU-intensive. Close other applications.
Try a smaller model (small instead of medium)
Consider using cloud processing for faster results
Check if your antivirus is scanning the process

Q: Speaker diarization isn't working

A: Speaker diarization requires cloud processing with Assembly AI (coming in Phase 3) or local pyannote installation (advanced). Currently limited functionality.

Usage Questions

Q: Can I edit the transcription after it's created?

A: Absolutely! It's a markdown file in your vault. Edit it like any other note.

Q: Can I re-transcribe if I'm not happy with the results?

A: Yes. Delete the generated markdown file first, then transcribe again. The plugin skips files that already have analysis.

Q: Can I transcribe video files?

A: Not directly. Extract the audio first using a tool like VLC or FFmpeg, then transcribe the audio file.

Q: How do I share transcriptions with others?

A: They're standard markdown files. Export to PDF, copy the text, or share the .md file directly.

Privacy & Cost Questions

Q: Is my audio data private?

A: With local processing: Yes, completely private. Audio never leaves your device. With cloud processing: Audio is sent to API provider (OpenAI, OpenRouter). Check their privacy policies.

Q: How much do cloud APIs cost?

A: Approximate costs:

OpenAI Whisper: $0.006/minute ($7.20 per 20 hours)
OpenRouter: Varies by model, often cheaper
Local processing: Free (after model download)

Q: Do I need a paid Obsidian account?

A: No. This plugin works with free Obsidian.

Roadmap

Current Features (v1.1)

✓ Local Whisper.cpp transcription
✓ Cloud transcription (OpenAI Whisper, Groq)
✓ Multilingual transcription (any Whisper-supported language)
✓ AI-powered analysis and extraction
✓ Customizable analysis instructions
✓ Automatic model management
✓ Duplicate detection
✓ Error retry logic
✓ Custom prompt templates

Planned Features (Future Versions)

v1.1 - Enhanced Analysis

Multiple analysis profiles (meeting, lecture, interview)
Improved speaker identification
Export to other formats (PDF, DOCX)

v1.2 - Speaker Diarization

Full speaker identification
Speaker labeling and naming
Improved multi-speaker accuracy

v2.0 - Advanced Features

Real-time transcription during recording
Video file support (auto-extract audio)
Batch processing (multiple files at once)
Mobile app support (iOS/Android)
Integration with other plugins (Calendar, Tasks)

Community Requests

Your feedback shapes the roadmap! Submit feature requests on GitHub.

Support & Community

Getting Help

Documentation: You're reading it!
GitHub Issues: Report bugs or request features at github.com/tzamtzis/obsidian-transcription-plugin
Obsidian Forum: Discuss the plugin with other users

Contributing

This is an open-source project! Contributions welcome:

Report bugs
Suggest features
Submit pull requests
Improve documentation
Share your use cases

License

MIT License - Free to use, modify, and distribute.

Credits

Built with:

Whisper.cpp - Fast C++ implementation of OpenAI Whisper
Obsidian API - Plugin framework
OpenAI Whisper - Original transcription model

Special Thanks:

OpenAI for creating Whisper
Georgi Gerganov for whisper.cpp
Obsidian team for the amazing plugin API
Beta testers and early adopters

Changelog

v1.1.1 (2026-05-18)

Security: run the ffprobe duration probe via execFile (argv, no shell) — prevents command injection through crafted audio file names
Docs: accurate permissions/capabilities disclosure

v1.1.0 (2026-05-18)

Added Groq as a cloud transcription mode (selectable model: turbo / large-v3 / distil)
Hardened multipart upload (filename sanitization) and async, non-blocking audio reads
Resolved Obsidian community-plugin review findings; release workflow with build-provenance attestation

v1.0.0 (2026-01-27)

Initial release
Local and cloud transcription
Greek and English support
AI-powered analysis
Automatic model management
Windows desktop support

Made with ♥ for the Obsidian community

Transform your audio into knowledge. Start transcribing today!

HealthExcellent

ReviewSatisfactory

About

Transcribe m4a and mp3 recordings into Markdown notes with speaker identification and automatic language detection for Greek and English. Extract summaries, key points, action items, and follow-ups using AI, choose local or cloud processing, and handle recordings over two hours.

AI Import Attachments

Details

Current version

1.1.1

Last updated

2 months ago

Created

7 months ago

Updates

3 releases

Downloads

253

Compatible with

Obsidian 1.6.6+

Platforms

Desktop only

License

MIT

Sponsor

Author

Audio Transcription

Features

What It Looks Like: End-User Journey

Step 1: Installing the Plugin

Step 2: First-Time Setup

Step 3: Configuring Settings (Optional)

Step 4: Transcribing Your First Audio File

Step 5: Watching the Progress

Step 6: Viewing the Results

Step 7: What Happens If You Try Again?

Step 8: Error Handling Example

Installation

Requirements

Install from Community Plugins (Recommended)

Manual Installation (Advanced)

Setup Guide

Option 1: Local Processing (Recommended for Privacy)

Option 2: Cloud Processing with OpenAI Whisper

Option 3: Cloud Processing with Groq

OpenRouter is for analysis, not transcription

Configuring Analysis (AI Insights)

Adding Custom Analysis Instructions

Usage Examples

Example 1: Team Meeting Notes

Example 2: Client Call (Confidential)

Example 3: Greek Language Lecture

Example 4: Bilingual Meeting (English + Greek)

How It Works (Behind the Scenes)

The Transcription Process

The Analysis Process

Privacy & Data Flow

Permissions & capabilities

Filesystem access (Node fs)

Shell execution (Node child_process)

Clipboard access

Frequently Asked Questions (FAQ)

General Questions

Technical Questions

Troubleshooting Questions

Usage Questions

Privacy & Cost Questions

Roadmap

Current Features (v1.1)

Planned Features (Future Versions)

v1.1 - Enhanced Analysis

v1.2 - Speaker Diarization

v2.0 - Advanced Features

Community Requests

Support & Community

Getting Help

Contributing

License

Credits

Changelog

v1.1.1 (2026-05-18)

v1.1.0 (2026-05-18)

v1.0.0 (2026-01-27)

Audio Transcription

Features

What It Looks Like: End-User Journey

Step 1: Installing the Plugin

Step 2: First-Time Setup

Step 3: Configuring Settings (Optional)

Step 4: Transcribing Your First Audio File

Step 5: Watching the Progress

Step 6: Viewing the Results

Step 7: What Happens If You Try Again?

Step 8: Error Handling Example

Installation

Requirements

Install from Community Plugins (Recommended)

Manual Installation (Advanced)

Setup Guide

Option 1: Local Processing (Recommended for Privacy)

Option 2: Cloud Processing with OpenAI Whisper

Option 3: Cloud Processing with Groq

OpenRouter is for analysis, not transcription

Configuring Analysis (AI Insights)

Adding Custom Analysis Instructions

Usage Examples

Filesystem access (Node `fs`)

Shell execution (Node `child_process`)

Filesystem access (Node `fs`)

Shell execution (Node `child_process`)