mohrjonas22k downloadsUse optical character recognition to search for text in you images and PDFs.
Obsidian OCR allows you to search for text in your images and PDFs
tesseract for OCRimagemagick for pdf to png conversion❗Make sure the executables are in your path. If you don't know how look here: https://www.architectryan.com/2018/03/17/add-to-the-path-on-windows-10/❗
C:\Program Files\Tesseract-OCR\tessdata❗The automatic installation features is still in very early development. Expect bugs❗
ℹ️ Obsidian OCR uses chocolatey for automatic installation ℹ
settings, press the automatic install buttonbrew install tesseractbrew install tesseract-langbrew install imagemagickFor the path: check where the binaries are located and add these to "/private/etc/paths" (I also added them to ~/.zshrc, not sure if that is needed)
brew list tesseract in my case: /opt/homebrew/Cellar/tesseract/5.2.0/bin/brew list tesseract-lang in my case: /opt/homebrew/Cellar/tesseract/5.2.0/bin/brew list imagemagick in my case: /opt/homebrew/Cellar/imagemagick/7.1.0-43/bin/sudo apt install -y tesseract-ocr imagemagicktesseract-ocr-<lang>)❗The automatic installation features is still in very early development. Expect bugs❗
settings, press the automatic install buttonsudo pacman -S tesseract imagemagicktesseract-data-<lang>)❗The automatic installation features is still in very early development. Expect bugs❗
settings, press the automatic install buttonIf Obsidian is running via the Flatpak installation (such as provided by default in Pop!_OS) then this plugin will not operate. Flatpak sandboxing will change the filepaths so even providing host access will still be problematic. If you have a Flatpak installation you will need to reinstall via a different method to successfully use this plugin.

Search OCR command or the magnifying-glass icon in the ribbon to open the
search menu.


Obsidian OCR offers a variety of settings you can configure yourself.
| Name | Description | Default |
|---|---|---|
| Max OCR Processes | The maximum amount of ocr processes running at the same time. Increasing this speeds up indexing but also increases cpu usage | 1 |
| Max caching processes | The maximum amount of caching processes running at the same time. Increaing this speeds up caching but also increases cpu usage | 10 |
| OCR Image | Decides whether or not images (.png, .jpg, .jpeg) are OCRed | true |
| OCR PDF | Decides whether or not PDFs (.pdf) are OCRed | true |
| Image density | The density of generated PNGs, in dpi. Increasing this helps to OCR smaller text, but increases cpu usage | 300 |
| Image quality | The quality of generated PNGs. Increasing this helps to OCR smaller text, but increaes cpu usage | 98 |
| Additional imagemagick args | Additional commandline arguments passed to imagemagick when converting a PDF to PNG(s) | |
| Additional search paths | Additional paths that will be searched when looking for external dependencies. Useful when installing into custom directories | |
| OCR Provider | The OCR provider that will be used. See below for a description of providers | NoOp |
| Name | Description |
|---|---|
| NoOp | The NoOp (no operation) provider does, as the name implies, nothing and is only a dummy provider. To get real OCR capability, you have to switch to another provider |
| Tesseract | OCR provider using tesseract to OCR documents locally on your computer |
By default, tesseract offers two languages to choose from: eng and osd.
Orientation and script detection and is therefore not useful for our
use.ℹ After switching your language, only newly indexed documents use the new language. You can reindex your already added
documents by using the Delete all transcripts command ℹ
Tesseract supports langs and scripts for text recognition.
To add a custom OCR provider, create a new class that extends OCRProvider and register it
using OCRProviderManager.registerOCRProviders(new MyCustomProvider())