Hermes TTS

Generate lightweight audio from a markdown note and prepend timestamped metadata with an embedded audio link.

Overview
Scorecard
Updates1

Convert any Obsidian Markdown note into lightweight speech audio, then prepend a timestamped metadata callout with an embedded audio link.

What changed

This plugin now uses an Aloud-style API link-up pattern:

One Model Provider selector in settings.
Provider-specific fields shown only for the selected provider.
Voice selection is done via dropdowns for all major providers.
New Voice prompt section for optional speaking-style instructions.
Output is always normalized to MP3.
Character limit is no longer user-configurable (notes are processed without a fixed UI cap).
File name prefix and speech speed settings were removed to simplify configuration.

Supported providers

OpenAI
Google Gemini
Google Cloud Text-to-Speech
Azure Speech
ElevenLabs
AWS Polly
OpenAI-compatible endpoints (custom base URL)

Policy disclosures

Network access is required. The plugin sends note text to the selected external TTS provider.
External accounts and API keys are required for provider usage (OpenAI, Google, Azure, ElevenLabs, AWS, or compatible API).
The plugin does not include telemetry or ads.

Mobile compatibility

Hermes TTS is configured to load on mobile (isDesktopOnly: false).
The bundle is built for browser-compatible runtimes to support Obsidian mobile.
The plugin avoids regex lookbehind and Node-only Buffer usage in runtime paths for broader mobile compatibility.
Provider behavior may still vary by service/API/network conditions on mobile devices.

OpenAI/Gemini: curated built-in voice dropdowns.
Google Cloud/Azure/ElevenLabs/AWS Polly: dropdowns with refresh buttons to fetch latest provider voices.
OpenAI-compatible: OpenAI-style voice dropdown.
Audio from all providers is normalized and saved as MP3.

Voice prompt behavior

The Voice prompt setting is global and optional.
OpenAI: sent as instructions only when using gpt-4o-mini-tts models (per API behavior).
Gemini: prepended as style notes before the transcript in the prompt.
Other providers currently ignore this field.

Gemini reliability fallback

Gemini uses the official @google/genai SDK flow (matching Aloud plugin setup).
On Gemini 400 "tried to generate text" errors, the plugin retries in segmented transcript mode with rolling previous-context continuity.
If Gemini fails with transient errors and Google Cloud TTS is configured, generation automatically falls back to Google Cloud.
Metadata uses the provider that actually generated the audio.

Commands

Generate Hermes-TTS audio (current note)

Provider documentation

Provider	API docs	Voice docs
OpenAI	https://platform.openai.com/docs/guides/text-to-speech	https://platform.openai.com/docs/guides/text-to-speech#voice-options
Google Gemini	https://ai.google.dev/gemini-api/docs/speech-generation	https://ai.google.dev/gemini-api/docs/speech-generation#voices
Google Cloud TTS	https://cloud.google.com/text-to-speech/docs/reference/rest	https://cloud.google.com/text-to-speech/docs/list-voices-and-types
Azure Speech	https://learn.microsoft.com/azure/ai-services/speech-service/rest-text-to-speech	https://learn.microsoft.com/azure/ai-services/speech-service/language-support?tabs=tts
ElevenLabs	https://elevenlabs.io/docs/api-reference/text-to-speech/convert	https://elevenlabs.io/docs/voices
AWS Polly	https://docs.aws.amazon.com/polly/latest/dg/API_SynthesizeSpeech.html	https://docs.aws.amazon.com/polly/latest/dg/voicelist.html
OpenAI-compatible	https://platform.openai.com/docs/api-reference/audio/createSpeech	https://platform.openai.com/docs/guides/text-to-speech#voice-options

The same docs are also available from buttons in the plugin settings tab.

Metadata block format

The plugin prepends a callout block near the top of the note (after frontmatter if present). Metadata lines can be toggled in settings. The title is a clean timestamp. For example:

> [!tts]+ 2026-02-17 15:42:10.321
> generated_at: 2026-02-17T14:42:10.321Z
> source_note: [[02 Projects/My Note]]
> provider: openai
> provider_name: OpenAI
> model: gpt-4o-mini-tts
> voice: shimmer
> format: mp3
> mime_type: audio/mpeg
> source_characters_sent: 2412
> provider_docs: https://platform.openai.com/docs/guides/text-to-speech
> voice_docs: https://platform.openai.com/docs/guides/text-to-speech#voice-options
> audio_file: ![[Attachments/TTS Audio/my-note-20260217-154210.mp3]]

Build

npm ci
npm run build

Release assets expected by Obsidian:

manifest.json
main.js
styles.css

81%

HealthExcellent

ReviewCaution

About

Convert any Obsidian Markdown note into lightweight MP3 speech and prepend a timestamped metadata callout with an embedded audio link. Select from multiple TTS providers with dropdown voice choices and optional speaking-style prompts; output is normalized to MP3 and built for mobile-compatible runtimes.

AI Attachments Files

Details

Current version

0.1.0

Last updated

3 months ago

Created

3 months ago

Updates

1 release

Downloads

505

Compatible with

Obsidian 1.5.0+

Platforms

Desktop, Mobile

License

MIT

Author

thematthiasleitner

github.com/thematthiasleitner

thematthiasleitner

What changed

This plugin now uses an Aloud-style API link-up pattern:

One Model Provider selector in settings.

Provider-specific fields shown only for the selected provider.

Voice selection is done via dropdowns for all major providers.

New Voice prompt section for optional speaking-style instructions.

Output is always normalized to MP3.

Character limit is no longer user-configurable (notes are processed without a fixed UI cap).

File name prefix and speech speed settings were removed to simplify configuration.

Mobile compatibility

Hermes TTS is configured to load on mobile (isDesktopOnly: false).

The bundle is built for browser-compatible runtimes to support Obsidian mobile.

The plugin avoids regex lookbehind and Node-only Buffer usage in runtime paths for broader mobile compatibility.

Provider behavior may still vary by service/API/network conditions on mobile devices.

Gemini reliability fallback

Gemini uses the official @google/genai SDK flow (matching Aloud plugin setup).

On Gemini 400 "tried to generate text" errors, the plugin retries in segmented transcript mode with rolling previous-context continuity.

If Gemini fails with transient errors and Google Cloud TTS is configured, generation automatically falls back to Google Cloud.

Metadata uses the provider that actually generated the audio.

Provider

API docs

Voice docs

OpenAI

https://platform.openai.com/docs/guides/text-to-speech

https://platform.openai.com/docs/guides/text-to-speech#voice-options

Google Gemini

https://ai.google.dev/gemini-api/docs/speech-generation

https://ai.google.dev/gemini-api/docs/speech-generation#voices

Google Cloud TTS

https://cloud.google.com/text-to-speech/docs/reference/rest

https://cloud.google.com/text-to-speech/docs/list-voices-and-types

Azure Speech

https://learn.microsoft.com/azure/ai-services/speech-service/rest-text-to-speech

https://learn.microsoft.com/azure/ai-services/speech-service/language-support?tabs=tts

ElevenLabs

https://elevenlabs.io/docs/api-reference/text-to-speech/convert

https://elevenlabs.io/docs/voices

AWS Polly

https://docs.aws.amazon.com/polly/latest/dg/API_SynthesizeSpeech.html

https://docs.aws.amazon.com/polly/latest/dg/voicelist.html

OpenAI-compatible

https://platform.openai.com/docs/api-reference/audio/createSpeech

https://platform.openai.com/docs/guides/text-to-speech#voice-options

Metadata block format

The plugin prepends a callout block near the top of the note (after frontmatter if present). Metadata lines can be toggled in settings. The title is a clean timestamp. For example:

> [!tts]+ 2026-02-17 15:42:10.321 > generated_at: 2026-02-17T14:42:10.321Z > source_note: [[02 Projects/My Note]] > provider: openai > provider_name: OpenAI > model: gpt-4o-mini-tts > voice: shimmer > format: mp3 > mime_type: audio/mpeg > source_characters_sent: 2412 > provider_docs: https://platform.openai.com/docs/guides/text-to-speech > voice_docs: https://platform.openai.com/docs/guides/text-to-speech#voice-options > audio_file: ![[Attachments/TTS Audio/my-note-20260217-154210.mp3]]

Hermes TTS

What changed

Supported providers

Policy disclosures

Mobile compatibility

Voice prompt behavior

Gemini reliability fallback

Commands

Provider documentation

Metadata block format

Build

Hermes TTS

What changed

Supported providers

Policy disclosures

Mobile compatibility

Voice prompt behavior

Gemini reliability fallback

Commands

Provider documentation

Metadata block format

Build

Related plugins

Claudian

Local GPT

Gemini Scribe

ChatGPT MD

Whisper

Nexus AI Chat Importer

Janitor

Notebook Navigator

Local REST API & MCP Server

QuickAdd

Hermes TTS

What changed

Supported providers

Policy disclosures

Mobile compatibility

Voice dropdown behavior

Voice prompt behavior

Gemini reliability fallback

Commands

Provider documentation

Metadata block format

Build

Hermes TTS

What changed

Supported providers

Policy disclosures

Mobile compatibility

Voice dropdown behavior

Voice prompt behavior

Gemini reliability fallback

Commands

Provider documentation

Metadata block format

Build

Related plugins

Claudian

Local GPT

Gemini Scribe

ChatGPT MD

Whisper

Nexus AI Chat Importer

Janitor

Notebook Navigator

Local REST API & MCP Server

QuickAdd