symunona680 downloadsInteractive voice interface powered by Gemini Live API. Speak to your vault, while you walk! Brainstorm, ideate, use your voice to create!
Hermes is a Real-Time, interactive voice interface for your Obsidian vault.
Powered by Google's Gemini X Native Audio API, it allows you to talk directly to your notes, perform file operations, and search the web—all via a low-latency voice channel.
It's like talking to OpenAI's ChatGPT voice interface, but with direct access to your vault, so you can structure, organize your thoughts and notes, without touching a key. When next time you sit down to it, you'll have all the output to go on, in a structured way.
To this day, I have not found a tool that could do that, so I sat down vibe coding. 24 hours later I am here, having a talking voice agent at my disposal, that uses my google cloud free tier.
"I just want to talk to my vault, while I walk, like though a vaulty-talkie?"
https://www.youtube.com/watch?v=FcX2EzMf8GY
What the problem for me with OpenAI's been, was that I never find stuff I talked about. The goal is not to produce/distill something, rather to just a stream of data.
AI is good at generating text. Well, soulless, slop, sometimes. Where is you in there? You are the curator. The builder. Who keeps the focus. Who cycles thoughts.
With your human sense of reality can we only flight the slop.
So yeah - those long walks I had talking about stuff in the forest, might as well have productive outputs from now on!
I bet you're here for similar reasons.
"Most of our work happens in the tram"
I guess there are other knowledge workers like that out there.
If you think this is useful for you, or this is what you've been waiting for, this is your lucky day too.
For your safety, it's 98% Vibe coded over one weekend. I just tested it on my live vault. It looks pretty ok, does not accidentally delete all the files...
I certainly DO recommend some safety measures/backups before using it! Something like using a git repository, syncthing file history, or whatever your weapon of choice is.
I most certainly do not take any responsibility for giving accidental commands for deleting ALL the files in your repo!
Also, be mindful, you're giving access to your notes to a tech giant.
But as some wise person said, post Orwell: "The price of privacy is the loss of convenience."
So yeah, Google's LLM Will read your notes. That's why you can talk to them.
I looked into self hosted solutions, but our home hardware is just not there yet.
Until then, the world is going by.
read_file / create_file / update_file: Full file lifecycle managementedit_file: Targeted line-based modificationssearch_keyword / search_regexp: Global vault searchinginternet_search: Real-time web grounding via Google Searchgenerate_image_from_context: AI-powered image generationtopic_switch: Automatic conversation archivingtopic_switch tool, Hermes can summarize segments of your conversation and save them as markdown notes in your vaultCurrently available in Beta via BRAT:
https://github.com/symunona/obsidian-hermesmain.js, manifest.json, and styles.css - under releaseshermes-voice-assistant in your vault's .obsidian/plugins/ directoryB: you can just check it out from the git repo if you're reading this.
main.ts: The entry point for the Obsidian pluginHermesMainViewObsidian.tsx: The Obsidian Workspace Leaf that hosts the React applicationApp.tsx: The root React component managing state and the voice sessionservices/voiceInterface.ts: Core logic for managing the @google/genai Live session, audio encoding/decoding, and tool executionservices/commands.ts: Tool registry and execution enginetools/: Individual tool definitions (declarations and execution logic)components/: UI components (Chat, Tool Results, Settings, Kernel Log)persistence/: Settings and data persistence layerutils/: Utility functions for audio processing, environment detection, etc.Started working also on an Obsidian Independent version, like the app that can just run standalone. Maybe the next free weekend I have.
# Install dependencies
pnpm install
# Start the build watcher (for development)
pnpm run dev
# Build for production
pnpm run build
# Build CSS only
pnpm run build-css
# Watch CSS changes
pnpm run watch-css
# Serve standalone version
pnpm run serve
# Development with standalone
pnpm run dev-standalone
# Build standalone version
pnpm run build-standalone
The plugin includes 20+ tools for vault operations:
File Management:
read_file: Read file contentscreate_file: Create new filesupdate_file: Update entire file contentedit_file: Line-based editingdelete_file: Delete filesrename_file: Rename filesmove_file: Move files between directoriesDirectory Operations:
list_directory: List directory contentslist_vault_files: List all vault files with filteringget_folder_tree: Get folder structuredirlist: Quick directory listingcreate_directory: Create new directoriesSearch & Replace:
search_keyword: Search for text patternssearch_regexp: Regex-based searchsearch_replace_file: Search and replace in single filesearch_replace_global: Global search and replaceAdvanced Features:
internet_search: Web search via Googlegenerate_image_from_context: AI image generationtopic_switch: Archive conversation segmentsend_conversation: Graceful session terminationTo add a new capability to Hermes:
tools/[tool_name].tsdeclaration (OpenAPI schema) and an execute functionTOOLS registry in services/commands.tsutils/defaultPrompt.tsI was thinking of adding the capacity to pull in ANY MCP, that'd make sense. But that's not the first weekend.
@google/genai (Gemini 2.5 Flash Native Audio)marked for md renderingesbuild for plugin, Vite for standalone (WIP)Speak naturally to Hermes to:
Fine tune it under settings!
It'd be good to have per folder instructions, right? PRs welcome.
list_vault_files with limits for better performance - I tried working around by doing smart chunking, but it's not a fully solved problem.Mobile support: I did not test. Will do.
See LICENSE
Hermes: Bridging the gap between thought and file.