Transcription Audio

cha-yh1k downloads

Transcribe audio files into Markdown notes.

Overview
Scorecard
Updates6

Turn your audio into structured Markdown notes inside Obsidian. This plugin detects an audio file linked in your current note, sends it to Gemini for transcription and summarization, and inserts the result back into your note. A right-hand progress panel shows what’s happening step by step.

Features

Smart audio detection from links or embeds in the active note
Google Gemini transcription and summarization
Progress panel (sidebar) with live status:
- Detected audio filename and size
- Audio preparation status
- API request start/completion times
- Gemini usage logs (prompt/output/total tokens)
- Cancel button to stop upload/API request in progress
- Success/error result
Writes the final output to the file and cursor position where you started the command

Requirements

A Google AI API key for Gemini. You can obtain one at https://aistudio.google.com/api-keys

Getting started

Open Obsidian Settings
Navigate to "Community plugins" and click "Browse"
Search for "Transcription Audio" and click Install
Enable the plugin in Community plugins
Set up your API key in plugin settings (SecretStorage recommended)

Configuration

Open Settings → Transcription Audio:

API Key (SecretStorage, recommended): Select the secret name from Obsidian SecretStorage
API Key (deprecated, not recommended): Legacy plain-text API key field kept for backward compatibility fallback
On older Obsidian versions, SecretStorage is disabled and you will see an update-required message (Obsidian 1.11.4+)
Transcription mode:
- Basic mode (default): prompt only
- Template mode: dedicated prompt + output template (both prefilled with defaults)
Model: Select a Gemini-compatible model (gemini-2.5-flash, gemini-2.5-pro, gemini-3-flash-preview, gemini-3.1-pro-preview)
gemini-3-pro-preview is deprecated by Google and shuts down on March 9, 2026. Existing settings are automatically migrated to gemini-3.1-pro-preview.
Prompt: Customize the instruction for the selected mode
Output template: Available in template mode to enforce a consistent final markdown structure

Usage

In a note, linked file before your cursor, for example:
- Wiki link: ![[example_audio.wav]]
Place the cursor after the link.
Run the command: "Transcribe audio".
A progress panel will automatically open in the right sidebar, showing real-time status updates including file upload progress, API request status, and transcription progress.
When complete, the transcription and notes are inserted at your starting cursor position.

Privacy & Data

Audio content is sent to Google’s Gemini API for processing. The plugin does not store your audio or transcripts outside your vault. Keep your API key secure and review your organization’s data policies before use.

Changelog

Version 0.5.0

Transcription mode enhancements
- Added Template mode so prompt and output template can be configured separately
Gemini 3 Pro Preview migration
- Added automatic migration from gemini-3-pro-preview to gemini-3.1-pro-preview
- Updated related settings and documentation for current model options

Version 0.4.1

Gemini 3 Pro Preview migration
- Replaced gemini-3-pro-preview with gemini-3.1-pro-preview in model selection
- Automatically migrates previously saved gemini-3-pro-preview setting to gemini-3.1-pro-preview

Version 0.4.0

SecretStorage API key support
- Added Obsidian SecretStorage-based API key selection (recommended)
- Kept legacy plain-text API key as fallback for backward compatibility
Cancelable transcription flow
- Added cancel control in the progress panel
- Improved cancellation handling for upload/request steps
Progress panel navigation improvements
- File and Target entries are clickable links
- Target navigation moves to the exact line/character position
Progress log improvements
- Added localized timestamp to the initial Log start line
Gemini usage visibility
- Added token usage logs (prompt/output/total and related token fields) in progress detail

Version 0.3.0

Add gemini-3-flash-preview(default) model to settings
Enhanced Progress Tracking: Improved transcription process with detailed progress tracking and UI updates
- Enhanced progress panel with more detailed status information
- Better visual feedback during transcription process
- Improved error handling and status reporting
Updated Default Settings: Updated default settings with new model and refined prompt structure
- Optimized default model selection
- Improved prompt structure for better transcription quality

License

This project is licensed under the MIT License.

77%

HealthExcellent

ReviewCaution

About

Transcribe audio linked or embedded in the active note with Google Gemini and insert structured Markdown containing the full transcript and summary at your cursor. Display a right-hand progress panel showing filename, upload/API timings, Gemini prompt/output/token logs and a cancel button during processing.

AI Attachments Audio

Details

Current version

0.5.0

Last updated

3 months ago

Created

9 months ago

Updates

6 releases

Downloads

Compatible with

Obsidian 0.15.0+

Platforms

Desktop, Mobile

License

MIT

Author

cha-yh

github.com/cha-yh

cha-yh

Features

Smart audio detection from links or embeds in the active note

Google Gemini transcription and summarization

Progress panel (sidebar) with live status:

Detected audio filename and size
Audio preparation status
API request start/completion times
Gemini usage logs (prompt/output/total tokens)
Cancel button to stop upload/API request in progress
Success/error result

Writes the final output to the file and cursor position where you started the command

Configuration

Open Settings → Transcription Audio:

API Key (SecretStorage, recommended): Select the secret name from Obsidian SecretStorage

API Key (deprecated, not recommended): Legacy plain-text API key field kept for backward compatibility fallback

On older Obsidian versions, SecretStorage is disabled and you will see an update-required message (Obsidian 1.11.4+)

Transcription mode:

Basic mode (default): prompt only
Template mode: dedicated prompt + output template (both prefilled with defaults)

Model: Select a Gemini-compatible model (gemini-2.5-flash, gemini-2.5-pro, gemini-3-flash-preview, gemini-3.1-pro-preview)

gemini-3-pro-preview is deprecated by Google and shuts down on March 9, 2026. Existing settings are automatically migrated to gemini-3.1-pro-preview.

Prompt: Customize the instruction for the selected mode

Output template: Available in template mode to enforce a consistent final markdown structure

Usage

In a note, linked file before your cursor, for example:

Wiki link: ![[example_audio.wav]]

Place the cursor after the link.

Run the command: "Transcribe audio".

A progress panel will automatically open in the right sidebar, showing real-time status updates including file upload progress, API request status, and transcription progress.

When complete, the transcription and notes are inserted at your starting cursor position.

Changelog

Version 0.5.0

Transcription mode enhancements

Added Template mode so prompt and output template can be configured separately

Gemini 3 Pro Preview migration

Added automatic migration from gemini-3-pro-preview to gemini-3.1-pro-preview
Updated related settings and documentation for current model options

Version 0.4.1

Gemini 3 Pro Preview migration

Replaced gemini-3-pro-preview with gemini-3.1-pro-preview in model selection
Automatically migrates previously saved gemini-3-pro-preview setting to gemini-3.1-pro-preview

Version 0.4.0

SecretStorage API key support

Added Obsidian SecretStorage-based API key selection (recommended)
Kept legacy plain-text API key as fallback for backward compatibility

Cancelable transcription flow

Added cancel control in the progress panel
Improved cancellation handling for upload/request steps

Progress panel navigation improvements

File and Target entries are clickable links
Target navigation moves to the exact line/character position

Progress log improvements

Added localized timestamp to the initial Log start line

Gemini usage visibility

Added token usage logs (prompt/output/total and related token fields) in progress detail

Version 0.3.0

Add gemini-3-flash-preview(default) model to settings

Enhanced Progress Tracking: Improved transcription process with detailed progress tracking and UI updates

Enhanced progress panel with more detailed status information
Better visual feedback during transcription process
Improved error handling and status reporting

Updated Default Settings: Updated default settings with new model and refined prompt structure

Optimized default model selection
Improved prompt structure for better transcription quality

Transcription Audio

Features

Requirements

Getting started

Configuration

Usage

Privacy & Data

Changelog

Version 0.5.0

Version 0.4.1

Version 0.4.0

Version 0.3.0

License

Transcription Audio

Features

Requirements

Getting started

Configuration

Usage

Privacy & Data

Changelog

Version 0.5.0

Version 0.4.1

Version 0.4.0

Version 0.3.0

License

Related plugins

Local GPT

Whisper

Nexus AI Chat Importer

Copilot

Claudian

Smart Connections

Agent Client

Text Generator

Smart Composer

Image Context Menus