l3-n0x21k downloadsConvert PDFs to rich Markdown, including images and ocr using the marker api

Welcome to this Obsidian PDF to Markdown Converter! This plugin brings the power of advanced PDF conversion directly into your Obsidian vault. By leveraging the capabilities of Marker through a self-hosted API, the hosted solution on datalab.to, or the powerful MistralAI OCR capabilities, this plugin offers a seamless way to transform your PDFs into rich, formatted Markdown files, with support for tables, formulas and more!
[!IMPORTANT] This plugin requires a Marker API endpoint, a paid account for datalab, the python api of marker, or a free MistralAI API key to work. Without an endpoint, the application can't convert anything.
You can find the related repositories and services here:
If you enjoy this plugin, feel free to star the repository and share it with others! When you want to support the development, consider buying me a coffee:
To use this plugin, you'll need:
| Solution | Pros | Cons |
|---|---|---|
| MistralAI (recommended) | Completely free, excellent results in testing, easy setup with just an API key | Uploads your files to Mistral's servers (stored for at least 24h) |
| Hosted on datalab.to | No setup required, fast and reliable, supports the developer and is easily accessible from anywhere | Costs a few dollars |
| Self-Hosted via Docker | Full control over the conversion process, no costs for the API | Requires a powerful machine, Setup can be complex for beginners |
| Self-Hosted via Python | Easy to set up, no Docker required | Not all features available |
[!NOTE] MistralAI Privacy Consideration: When using the MistralAI endpoint, your PDFs will be uploaded to Mistral's servers for processing. These files are stored for at least 24 hours. If you have sensitive documents, consider using a self-hosted solution instead.
You can convert PDFs to Markdown in multiple ways:
Folder Integration: If a folder with the same name as your PDF already exists, the plugin will ask if you want to integrate the new files into this existing folder. This allows you to update or add to already converted documents.
| Setting | Default | Description |
|---|---|---|
| API Endpoint | 'selfhosted' | Select the API endpoint to use: 'Datalab', 'Selfhosted', 'Python API', or 'MistralAI' |
| Marker API Endpoint | 'localhost:8000' | The endpoint to use for the Marker API. Only shown when 'Selfhosted' is selected as the API endpoint. |
| Python API Endpoint | 'localhost:8001' | The endpoint to use for the Python API. Only shown when 'Python API' is selected as the API endpoint. |
| Datalab API Key | - | Enter your Datalab API key. Only shown when 'Datalab' is selected as the API endpoint. |
| MistralAI API Key | - | Enter your MistralAI API key. Only shown when 'MistralAI' is selected as the API endpoint. |
| Languages | 'en' | The languages to use if OCR is needed, separated by commas. Only shown when 'Datalab' is selected as the API endpoint. |
| Force OCR | false |
Force OCR (Activate this when auto-detect often fails, make sure to set the correct languages). Only shown when 'Datalab' is selected as the API endpoint. |
| Paginate | false |
Add horizontal rules between each page. Available for both Datalab and MistralAI endpoints. |
| Image Limit | 0 |
Maximum number of images to extract (0 for no limit). Only shown when 'MistralAI' is selected. |
| Image Minimum Size | 0 |
Minimum height and width of images to extract (0 for no minimum). Only shown when 'MistralAI' is selected. |
| Move PDF to Folder | false |
Move the PDF to the folder after conversion. |
| Create Asset Subfolder | true |
Create an asset subfolder for images. |
| Extract Content | 'all' | Select the content to extract from the PDF. Options: 'Extract everything', 'Text Only', 'Images Only'. |
| Write Metadata | false |
Write metadata as frontmatter in the Markdown file. |
| Delete Original PDF | false |
Delete the original PDF after conversion. |
This plugin wouldn't be possible without the incredible work of:
A huge thank you to these projects for their contributions to the community!
If you encounter issues related to the plugin itself, please open an issue in this repository. For problems with the conversion process or API, please refer to the Marker and Marker API repositories.
[!NOTE] When converting multiple files at once, be patient as the process can take a significant amount of time depending on the size and complexity of your PDFs. For very large batches, consider processing them in smaller groups.
Contributions, issues, and feature requests are welcome! Feel free to check the issues page.
Happy converting! 📚➡️📝