Search...Search plugins and themes...
⌘K
Sign in
  • Get started
  • Download
  • Pricing
  • Enterprise
  • Account
  • Obsidian
  • Overview
  • Sync
  • Publish
  • Canvas
  • Mobile
  • Web Clipper
  • CLI
  • Learn
  • Help
  • Developers
  • Changelog
  • About
  • Roadmap
  • Blog
  • Resources
  • System status
  • License overview
  • Terms of service
  • Privacy policy
  • Security
  • Community
  • Plugins
  • Themes
  • Discord
  • Forum / 中文论坛
  • Merch store
  • Brand guidelines
Follow us
DiscordTwitterBlueskyThreadsMastodonYouTubeGitHub
© 2026 Obsidian

OCR Extractor

jritzijritzi3k downloads

Extract text from PDFs, images, documents, etc. and store it as Markdown in your notes.

Add to Obsidian
OCR Extractor screenshot
OCR Extractor screenshot
OCR Extractor screenshot
  • Overview
  • Scorecard
  • Updates11

OCR Extractor is a simple Obsidian plugin that uses OCR to extract text from PDFs, documents, images, etc. embedded in your notes. Different OCR services (free or paid, local or cloud-based) are available, depending on your needs.

Following Obsidian's philosophy of storing data in an open, future-proof file format, the extracted text is added below the embedded attachment as an expandable callout. This means that the text will be searchable via Obsidian's built-in search, other search plugins, and even your operating system's native file search.

Demo

Usage

Click on the ribbon icon (or use the command palette) and select one of the two options:

  1. Extract text in current note
  2. Extract text in all notes (not available on mobile)
Ribbon menu options

When extracting from all notes, you can see the progress in the status bar, or click it and select "Cancel" to cancel the operation.

Status bar info

OCR services

Depending on your needs, you can choose which OCR service to use. Select the service in the plugin settings and follow the setup steps below.

Tesseract

Tesseract (the default option) is a popular open source OCR engine. It has some limitations (only supports English text, can only process PDFs and images, can be slower, and can be less accurate), but it's completely free and local (ensuring your data is never sent to a third-party provider). This option requires no additional setup.

Mistral OCR

Mistral OCR is a powerful AI model for extracting text from complex documents and converting it to Markdown. It supports many different languages and file types. This option requires a paid Mistral AI account (at the time of writing, it costs $2 per 1000 pages processed). Attachments are sent to Mistral's OCR service for text extraction (see their privacy policy).

First, you need to create a Mistral AI account. Follow the steps in their Quickstart guide:

  1. Create an account
  2. Add payment information
  3. Recommended: Set a monthly spending limit, to avoid any unexpected charges
  4. Create an API key

Then, enter your API key in the plugin settings.

Custom command

For advanced use cases, you can provide a custom command that will be used to process attachments. This can be used, for example, to extract text with an OCR model running locally, a script that uses a third-party API (that isn't supported natively by the plugin), or Tesseract with a custom configuration.

Enter your custom command in the plugin settings, where {input} is the path to the input attachment file and {output} is the path to the produced Markdown or text file containing the extracted text. To skip an unsupported attachment, don't create the output file. For example:

tesseract {input} - -l eng+spa > {output}

Click the "Test" button to run the command on a sample image with text and confirm it correctly extracts the text. If the custom command only supports images, enable the setting to convert PDFs to PNGs before processing.

Note that this option is not supported on mobile, so if a custom command is configured, the plugin will use Tesseract on mobile instead of running the custom command.

Contributing

For details on how to report a bug, share a feature request, or contribute code, see the Contribution Guidelines. To report a security issue, see the Security Policy.

Translations

OCR Extractor is available in several languages. To request a new language (or to suggest an improvement for an existing translation), start a discussion.

License

OCR Extractor is licensed under the MIT License.

79%
HealthExcellent
ReviewSatisfactory
About
Extract text from images, PDFs, and other embedded attachments directly into your notes using OCR. Extracted text is added as expandable callouts after each attachment for full-text searchability. Choose local (Tesseract) or cloud (Mistral) OCR.
OCRAttachmentsPDF
Details
Current version
2.2.2
Last updated
3 days ago
Created
9 months ago
Updates
11 releases
Downloads
3k
Compatible with
Obsidian 1.11.4+
Platforms
Desktop, Mobile
License
MIT
Report bugRequest featureReport plugin
Author
jritzijritzi
github.com/jritzi
GitHubjritzi
  1. Community
  2. Plugins
  3. OCR
  4. OCR Extractor

Related plugins

Text Extractor

A (companion) plugin to facilitate the extraction of text from images (OCR) and PDFs.

OCR-AI

Convert PDFs to rich Markdown, including images and ocr using the marker api

Image Context Menus

Image context menus (mostly on right click): Copy to clipboard, Open in default app, Show in system explorer, Reveal file in navigation, Open in new tab.

Annotator

Read and annotate PDFs and EPUB files.

Ink

Hand write or draw directly between paragraphs using a digital pen, stylus, or Apple pencil.

Pandoc Plugin

Commands to export to Pandoc-supported formats like DOCX, ePub and PDF.

Local Images Plus

A reincarnation of Local Images to download images in Markdown notes to local storage.

Local GPT

Local Ollama and OpenAI-like GPT's assistance for maximum privacy and offline access.

Consistent Attachments and Links

Move note attachments and update links automatically.

Attachment Management

Customize attachment path, auto-rename attachments, etc.