Axelle Abbadie52 downloadsPseudonymize and correct interactional transcripts (Jefferson, ICOR, SRT, CHAT/CHA). Designed for qualitative researchers in linguistics and conversation analysis.
An Obsidian plugin for pseudonymizing and correcting interactional transcripts for researchers in linguistics and conversation analysis. It fills a gap: existing tools (Sonal, Whispurge) do not integrate into a note-taking and analysis environment, and few applications support multimodal transcription conventions (Jefferson, ICOR).
Français — README.fr.md
noScribe (.html, .vtt) or raw transcript (.srt, .cha, .md, .txt)
↓ automatic import (audio file imported alongside)
Obsidian — native Markdown editing (**S00** [HH:MM:SS] : text)
↓ pseudonymization (manual · NER · dictionary scan)
Annotated source file + correspondence table + word timestamps (.words.json)
↓ export
Pseudonymized transcript (.pseudonymized.md / .pseudonymized.vtt)
Two approaches, freely combined:
Pseudonyms are wrapped in {{...}} markers in the source file and export to remain visually distinct from raw data. Correspondence tables are never included in exports.
A setup wizard opens automatically on first load:
The wizard can be relaunched at any time: Settings → Pseudonymizer Tool → Setup wizard.
| Format | Extension | Notes |
|---|---|---|
| noScribe HTML | .html |
Qt Rich Text from noScribe — speaker labels, word timestamps, audio path |
| noScribe VTT | .vtt |
noScribe v0.7 output — also standard Whisper WebVTT with word timestamps |
| Timestamped subtitles | .srt |
Whisper / AI output — timestamps and structure preserved |
| CHAT / CLAN | .cha, .chat |
@, *, % lines preserved |
| Annotated Markdown | .md |
Jefferson or ICOR conventions |
| Plain text | .txt |
No convention markers |
All formats are automatically converted to Markdown on import. Alongside the .md, the plugin creates:
<basename>.mapping.json — pseudonymization rules<basename>.words.json — word-level timestamps (noScribe / Whisper only), used for VTT re-exportInstall the Data Files Editor plugin to view mapping JSON files directly in your vault.
Via the Obsidian community plugins directory (under review):
Manual installation from the latest GitHub release:
main.js, manifest.json, styles.css.obsidian/plugins/pseudonymizer-tool/ in your vaultWASM files (NER) are downloaded automatically by the wizard on first launch if you enable automatic detection. They can also be downloaded manually from the release.
All actions are available via right-click on a selection in the editor:
| Action | Description |
|---|---|
| Pseudonymize | Replace the term (this occurrence or all) with {{...}} markers |
| Pseudonymize with Prof. Baptiste Coulmont | Queries coulmont.com — suggests sociologically equivalent first names (same social background, same decade) |
| Create a rule | Saves the correspondence in the mapping JSON without modifying the text |
| Edit rule | Modifies an existing rule (available on orange or green highlighted terms) |
| Cancel pseudonymization | Restores the original term for this occurrence (available on green terms) |
The named entity recognition engine detects first names, surnames, places and institutions without a pre-existing list, using syntactic context. Unlike a dictionary approach, it distinguishes "Florence" the person from "Florence" the city. Sub-terms of a known compound entity are automatically filtered (if "Saint-Jean-de-Luz" is a rule, "Jean" and "Luz" do not appear as NER candidates).
Usage:
Model: Xenova/bert-base-multilingual-cased-ner-hrl via transformers.js. 100% local execution. One-time downloads on first use: WASM (19 MB) + NER model (66 MB). Works offline after the first download.
Settings (NER tab in the panel):
Accessible via the ribbon icon or Ctrl+P → Pseudonymization: open panel.
| Tab | Content |
|---|---|
| Mappings | Active rules · Edit · Delete · Add · Scan file · Exceptions section |
| Dictionaries | Mini cards · Dictionary scan · Local import |
| Exports | Create pseudonymized version · Export correspondence table · Re-export as VTT/SRT/CHAT |
| NER | Visible if NER enabled · Identify candidates · Confidence threshold · Function words |
| Corpus | Files by class · Class management · Move files · Final export destination |
Highlighting is active in all open files, including .pseudonymized.* export files, which automatically inherit rules from the source file.
| Colour | Meaning |
|---|---|
| 🟠 Orange + outline | Source term still present — to be pseudonymized |
| 🟢 Green + underline | Pseudonym applied directly in the file |
| 🔵 Blue + outline | NER candidate — no rule yet |
| 🔴 Red + underline | Exception — specific occurrence explicitly ignored (context-aware; persisted in mapping) |
In exported files, pseudonyms are wrapped in {{Pierre}} markers to distinguish them from raw data (enabled by default, configurable in settings).
file.mapping.json · folder.mapping.json · vault.mapping.jsonIgnoredOccurrence {text, contextBefore, contextAfter} stored in the ruleSaint-Jean-de-Luz > Jean)SPECS.md §5Dictionaries provide replacement candidates and feed automatic detection. They are hosted in a dedicated repository and downloaded into your vault — no transcript text ever leaves Obsidian.
Installing from the catalogue:
Usage:
Ville_1, Métropole_2…)Available dictionary — French communes (GeoAPI INSEE):
Dedicated repository: core-hn/pseudobsidian-dictionaries — contributions welcome.
The plugin helps you structure your corpus in folders before starting work. Use the command Organize corpus (Ctrl+P) to define classes (sub-folders). Each class automatically creates mirrored folders for transcriptions, mapping tables and exports. When adding a transcription, you are prompted to select a class.
We recommend one Obsidian vault per corpus. This keeps mapping files, dictionaries and exports together with their transcriptions, and makes archiving or sharing a corpus straightforward.
coulmont.com/bac to suggest equivalent names. B. Coulmont states on his website that searches are not logged.git clone https://gitlab.huma-num.fr/aabbadie/pseudobsidian-ization.git
cd pseudobsidian-ization
npm install
npm run dev # watch build (esbuild)
npm test # Jest test suite
npm run build # production build
npm run deploy # build + copy to test_vault/
npm run build:cities # regenerate assets/cities.dict.json from GeoAPI INSEE
Repository structure:
src/
├── main.ts # Obsidian entry point
├── settings.ts # Persistent settings
├── types.ts # Shared types (MappingRule, IgnoredOccurrence, …)
├── i18n/ # Internationalization (en, fr)
├── parsers/ # SrtParser, ChatParser, VttParser,
│ # NoScribeHtmlParser, NoScribeVttParser, TranscriptConverter
├── mappings/ # MappingStore, ScopeResolver
├── pseudonymizer/ # Engine, ReplacementPlanner, SpanProtector, Redaction
├── scanner/ # OccurrenceScanner, OnnxNerScanner
├── dictionaries/ # DictionaryLoader
└── ui/ # PseudonymizationView, modals (incl. OccurrencesContextModal),
# CM6 highlighting (PseudonymHighlighter)
| Phase | Status | Description |
|---|---|---|
| 0–6 | ✅ | Parsers · Engine · Commands · Scopes · Highlighting · Validation |
| 7 — Coulmont | ✅ | Equivalent first name suggestions · JSON/CSV import |
| 8 — Side panel | ✅ | 3 tabs · Embedded NER · Wizard · Cancellation · Export highlighting |
| 9 — Structured dictionaries | ✅ | Format v1.1 · DictionaryLoader · Dictionary scan · Review modal · French communes |
| 10 — Refinement & noScribe | 🔄 | i18n · Corpus UI · noScribe import · per-occurrence scan · exceptions · rename cascade · export destination |
| 11 — EMCA functions | ⏳ | Turn navigation · Jefferson/ICOR correction · ELAN export |
See ROADMAP.md for the full phase breakdown and planned features.
Contributions are welcome, particularly:
Please open an issue before submitting a pull request for significant features.
GPL 3.0
The Beerware License (Revision 42)
Axelle Abbadie wrote this code. You can do whatever you want with it
as long as you keep this notice. If we meet someday, and you think
it was worth it, you can buy me a beer.
This plugin is made to be modified. If your fieldwork involves particular transcription conventions, a regional dialect, multilingual corpora or institution-specific export formats, adapt the code to your needs.
bert-base-multilingual-cased-ner-hrl, used for automatic entity detection.transformers.js library, enabling local execution in Obsidian without a Python dependency.