Search...Search plugins and themes...
⌘K
Sign in
  • Get started
  • Download
  • Pricing
  • Enterprise
  • Account
  • Obsidian
  • Overview
  • Sync
  • Publish
  • Canvas
  • Mobile
  • Web Clipper
  • CLI
  • Learn
  • Help
  • Developers
  • Changelog
  • About
  • Roadmap
  • Blog
  • Resources
  • System status
  • License overview
  • Terms of service
  • Privacy policy
  • Security
  • Community
  • Plugins
  • Themes
  • Discord
  • Forum / 中文论坛
  • Merch store
  • Brand guidelines
Follow us
DiscordTwitterBlueskyThreadsMastodonYouTubeGitHub
© 2026 Obsidian

Hermes TTS

thematthiasleitnerthematthiasleitner505 downloads

Generate lightweight audio from a markdown note and prepend timestamped metadata with an embedded audio link.

Add to Obsidian
  • Overview
  • Scorecard
  • Updates1

Convert any Obsidian Markdown note into lightweight speech audio, then prepend a timestamped metadata callout with an embedded audio link.

What changed

This plugin now uses an Aloud-style API link-up pattern:

  • One Model Provider selector in settings.
  • Provider-specific fields shown only for the selected provider.
  • Voice selection is done via dropdowns for all major providers.
  • New Voice prompt section for optional speaking-style instructions.
  • Output is always normalized to MP3.
  • Character limit is no longer user-configurable (notes are processed without a fixed UI cap).
  • File name prefix and speech speed settings were removed to simplify configuration.

Supported providers

  • OpenAI
  • Google Gemini
  • Google Cloud Text-to-Speech
  • Azure Speech
  • ElevenLabs
  • AWS Polly
  • OpenAI-compatible endpoints (custom base URL)

Policy disclosures

  • Network access is required. The plugin sends note text to the selected external TTS provider.
  • External accounts and API keys are required for provider usage (OpenAI, Google, Azure, ElevenLabs, AWS, or compatible API).
  • The plugin does not include telemetry or ads.

Mobile compatibility

  • Hermes TTS is configured to load on mobile (isDesktopOnly: false).
  • The bundle is built for browser-compatible runtimes to support Obsidian mobile.
  • The plugin avoids regex lookbehind and Node-only Buffer usage in runtime paths for broader mobile compatibility.
  • Provider behavior may still vary by service/API/network conditions on mobile devices.

Voice dropdown behavior

  • OpenAI/Gemini: curated built-in voice dropdowns.
  • Google Cloud/Azure/ElevenLabs/AWS Polly: dropdowns with refresh buttons to fetch latest provider voices.
  • OpenAI-compatible: OpenAI-style voice dropdown.
  • Audio from all providers is normalized and saved as MP3.

Voice prompt behavior

  • The Voice prompt setting is global and optional.
  • OpenAI: sent as instructions only when using gpt-4o-mini-tts models (per API behavior).
  • Gemini: prepended as style notes before the transcript in the prompt.
  • Other providers currently ignore this field.

Gemini reliability fallback

  • Gemini uses the official @google/genai SDK flow (matching Aloud plugin setup).
  • On Gemini 400 "tried to generate text" errors, the plugin retries in segmented transcript mode with rolling previous-context continuity.
  • If Gemini fails with transient errors and Google Cloud TTS is configured, generation automatically falls back to Google Cloud.
  • Metadata uses the provider that actually generated the audio.

Commands

  • Generate Hermes-TTS audio (current note)

Provider documentation

Provider API docs Voice docs
OpenAI https://platform.openai.com/docs/guides/text-to-speech https://platform.openai.com/docs/guides/text-to-speech#voice-options
Google Gemini https://ai.google.dev/gemini-api/docs/speech-generation https://ai.google.dev/gemini-api/docs/speech-generation#voices
Google Cloud TTS https://cloud.google.com/text-to-speech/docs/reference/rest https://cloud.google.com/text-to-speech/docs/list-voices-and-types
Azure Speech https://learn.microsoft.com/azure/ai-services/speech-service/rest-text-to-speech https://learn.microsoft.com/azure/ai-services/speech-service/language-support?tabs=tts
ElevenLabs https://elevenlabs.io/docs/api-reference/text-to-speech/convert https://elevenlabs.io/docs/voices
AWS Polly https://docs.aws.amazon.com/polly/latest/dg/API_SynthesizeSpeech.html https://docs.aws.amazon.com/polly/latest/dg/voicelist.html
OpenAI-compatible https://platform.openai.com/docs/api-reference/audio/createSpeech https://platform.openai.com/docs/guides/text-to-speech#voice-options

The same docs are also available from buttons in the plugin settings tab.

Metadata block format

The plugin prepends a callout block near the top of the note (after frontmatter if present). Metadata lines can be toggled in settings. The title is a clean timestamp. For example:

> [!tts]+ 2026-02-17 15:42:10.321
> generated_at: 2026-02-17T14:42:10.321Z
> source_note: [[02 Projects/My Note]]
> provider: openai
> provider_name: OpenAI
> model: gpt-4o-mini-tts
> voice: shimmer
> format: mp3
> mime_type: audio/mpeg
> source_characters_sent: 2412
> provider_docs: https://platform.openai.com/docs/guides/text-to-speech
> voice_docs: https://platform.openai.com/docs/guides/text-to-speech#voice-options
> audio_file: ![[Attachments/TTS Audio/my-note-20260217-154210.mp3]]

Build

npm ci
npm run build

Release assets expected by Obsidian:

  • manifest.json
  • main.js
  • styles.css
81%
HealthExcellent
ReviewCaution
About
Convert any Obsidian Markdown note into lightweight MP3 speech and prepend a timestamped metadata callout with an embedded audio link. Select from multiple TTS providers with dropdown voice choices and optional speaking-style prompts; output is normalized to MP3 and built for mobile-compatible runtimes.
AIAttachmentsFiles
Details
Current version
0.1.0
Last updated
3 months ago
Created
3 months ago
Updates
1 release
Downloads
505
Compatible with
Obsidian 1.5.0+
Platforms
Desktop, Mobile
License
MIT
Report bugRequest featureReport plugin
Author
thematthiasleitnerthematthiasleitner
github.com/thematthiasleitner
GitHubthematthiasleitner
  1. Community
  2. Plugins
  3. AI
  4. Hermes TTS

Related plugins

Claudian

Embeds Claude Code/Codex as an AI collaborator in your vault. Your vault becomes agent's working directory, giving it full agentic capabilities: file read/write, search, bash commands, and multi-step workflows.

Local GPT

Local Ollama and OpenAI-like GPT's assistance for maximum privacy and offline access.

Gemini Scribe

Allows you to interact with Gemini and use your notes as context.

ChatGPT MD

A seamless integration of ChatGPT, OpenRouter.ai and local LLMs via Ollama into your notes.

Whisper

Speech-to-text using OpenAI Whisper.

Nexus AI Chat Importer

Import AI chat conversations from ChatGPT, Claude, and Le Chat exports into Obsidian as clean, readable Markdown files.

Janitor

Perform cleanup tasks on your vault.

Notebook Navigator

A better file browser and calendar inspired by Apple Notes, Bear, Evernote and Day One.

Local REST API & MCP Server

Unlock your automation needs by interacting with your notes over a secure REST API.

QuickAdd

Quickly add new notes or content to your vault.