Files
personal-wiki/skills/wiki-ingest/SKILL.md
Daniel 3ad61b984d feat: v1.1 — URL ingestion, vision, delta tracking, 3 new skills, auto-commit
Skills (new):
- skills/obsidian-markdown/ — full Obsidian Flavored Markdown syntax reference
  (wikilinks, embeds, callouts, properties, math, Mermaid)
- skills/obsidian-bases/ — Obsidian Bases (.base files) with correct filters/views/
  formulas syntax (sourced from kepano/obsidian-skills authoritative spec)
- skills/defuddle/ — web page cleaner; strips ads/nav before URL ingestion,
  saves 40-60% tokens on web articles

wiki-ingest upgrades:
- URL ingestion: pass https:// directly, auto-fetches + runs defuddle if available
- Image/vision ingestion: .png/.jpg/.gif etc → Claude reads → description saved
  to .raw/ → standard ingest pipeline
- Delta tracking: .raw/.manifest.json tracks hash per source, skips unchanged files

wiki-query upgrades:
- Quick mode (query quick:) — hot.md + index only, ~1500 tokens
- Standard mode — existing behaviour, 3-5 pages
- Deep mode (query deep:) — full wiki + optional web search supplement

hooks:
- PostToolUse auto-commit: every Write/Edit to wiki/ or .raw/ triggers
  git add + commit automatically, vault always versioned

fixes:
- Removed invalid allowed-tools field from all 10 SKILL.md files
  (not a valid skill frontmatter attribute per spec; was silently ignored)
- Canvas SKILL.md now references json-canvas open standard and kepano/obsidian-skills

wiki research:
- Ecosystem research: 16+ Claude+Obsidian projects mapped and filed
- New pages: comparisons/claude-obsidian-ecosystem, concepts/cherry-picks,
  entities/ (6 new), sources/claude-obsidian-ecosystem-research
- Cherry-picks roadmap filed at wiki/concepts/cherry-picks.md

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 12:25:00 +03:00

181 lines
6.8 KiB
Markdown

---
name: wiki-ingest
description: "Ingest sources into the Obsidian wiki vault. Reads a source, extracts entities and concepts, creates or updates wiki pages, cross-references, and logs the operation. Supports files, URLs, and batch mode. Triggers on: ingest, process this source, add this to the wiki, read and file this, batch ingest, ingest all of these, ingest this url."
---
# wiki-ingest — Source Ingestion
Read the source. Write the wiki. Cross-reference everything. A single source typically touches 8-15 wiki pages.
**Syntax standard**: Write all Obsidian Markdown using proper Obsidian Flavored Markdown — wikilinks as `[[Note Name]]`, callouts as `> [!type] Title`, embeds as `![[file]]`, properties as YAML frontmatter. If kepano/obsidian-skills is installed, its `obsidian-markdown` skill is the authoritative syntax reference.
---
## Delta Tracking
Before ingesting any file, check `.raw/.manifest.json` to avoid re-processing unchanged sources.
```bash
# Check if manifest exists
[ -f .raw/.manifest.json ] && echo "exists" || echo "no manifest yet"
```
**Manifest format** (create if missing):
```json
{
"sources": {
".raw/articles/article-slug-2026-04-08.md": {
"hash": "abc123",
"ingested_at": "2026-04-08",
"pages_created": ["wiki/sources/article-slug.md", "wiki/entities/Person.md"],
"pages_updated": ["wiki/index.md"]
}
}
}
```
**Before ingesting a file:**
1. Compute a hash: `md5sum [file] | cut -d' ' -f1` (or `sha256sum` on Linux).
2. Check if the path exists in `.manifest.json` with the same hash.
3. If hash matches — skip. Report: "Already ingested (unchanged). Use `force` to re-ingest."
4. If missing or hash differs — proceed with ingest.
**After ingesting a file:**
1. Record `{hash, ingested_at, pages_created, pages_updated}` in `.manifest.json`.
2. Write the updated manifest back.
Skip delta checking if the user says "force ingest" or "re-ingest".
---
## URL Ingestion
Trigger: user passes a URL starting with `https://`.
Steps:
1. **Fetch** the page using WebFetch.
2. **Clean** (optional): if `defuddle` is available (`which defuddle 2>/dev/null`), run `defuddle [url]` to strip ads, nav, and clutter — typically saves 40-60% tokens. Fall back to raw WebFetch output if not installed.
3. **Derive slug** from the URL path (last segment, lowercased, spaces→hyphens, strip query strings).
4. **Save** to `.raw/articles/[slug]-[YYYY-MM-DD].md` with a frontmatter header:
```markdown
---
source_url: [url]
fetched: [YYYY-MM-DD]
---
```
5. Proceed with **Single Source Ingest** starting at step 2 (file is now in `.raw/`).
---
## Image / Vision Ingestion
Trigger: user passes an image file path (`.png`, `.jpg`, `.jpeg`, `.gif`, `.webp`, `.svg`, `.avif`).
Steps:
1. **Read** the image file using the Read tool — Claude can process images natively.
2. **Describe** the image contents: extract all text (OCR), identify key concepts, entities, diagrams, and data visible in the image.
3. **Save** the description to `.raw/images/[slug]-[YYYY-MM-DD].md`:
```markdown
---
source_type: image
original_file: [original path]
fetched: YYYY-MM-DD
---
# Image: [slug]
[Full description of image contents, transcribed text, entities visible, etc.]
```
4. Copy the image to `_attachments/images/[slug].[ext]` if it's not already in the vault.
5. Proceed with **Single Source Ingest** on the saved description file.
Use cases: whiteboard photos, screenshots, diagrams, infographics, document scans.
---
## Single Source Ingest
Trigger: user drops a file into `.raw/` or pastes content.
Steps:
1. **Read** the source completely. Do not skim.
2. **Discuss** key takeaways with the user. Ask: "What should I emphasize? How granular?" Skip this if the user says "just ingest it."
3. **Create** source summary in `wiki/sources/`. Use the source frontmatter schema from `references/frontmatter.md`.
4. **Create or update** entity pages for every person, org, product, and repo mentioned. One page per entity.
5. **Create or update** concept pages for significant ideas and frameworks.
6. **Update** relevant domain page(s) and their `_index.md` sub-indexes.
7. **Update** `wiki/overview.md` if the big picture changed.
8. **Update** `wiki/index.md`. Add entries for all new pages.
9. **Update** `wiki/hot.md` with this ingest's context.
10. **Append** to `wiki/log.md` (new entries at the TOP):
```markdown
## [YYYY-MM-DD] ingest | Source Title
- Source: `.raw/articles/filename.md`
- Summary: [[Source Title]]
- Pages created: [[Page 1]], [[Page 2]]
- Pages updated: [[Page 3]], [[Page 4]]
- Key insight: One sentence on what is new.
```
11. **Check for contradictions.** If new info conflicts with existing pages, add `> [!contradiction]` callouts on both pages.
---
## Batch Ingest
Trigger: user drops multiple files or says "ingest all of these."
Steps:
1. List all files to process. Confirm with user before starting.
2. Process each source following the single ingest flow. Defer cross-referencing between sources until step 3.
3. After all sources: do a cross-reference pass. Look for connections between the newly ingested sources.
4. Update index, hot cache, and log once at the end (not per-source).
5. Report: "Processed N sources. Created X pages, updated Y pages. Here are the key connections I found."
Batch ingest is less interactive. For 30+ sources, expect significant processing time. Check in with the user after every 10 sources.
---
## Context Window Discipline
Token budget matters. Follow these rules during ingest:
- Read `wiki/hot.md` first. If it contains the relevant context, don't re-read full pages.
- Read `wiki/index.md` to find existing pages before creating new ones.
- Read only 3-5 existing pages per ingest. If you need 10+, you are reading too broadly.
- Use PATCH for surgical edits. Never re-read an entire file just to update one field.
- Keep wiki pages short. 100-300 lines max. If a page grows beyond 300 lines, split it.
- Use search (`/search/simple/`) to find specific content without reading full pages.
---
## Contradictions
When new info contradicts an existing wiki page:
On the existing page, add:
```markdown
> [!contradiction] Conflict with [[New Source]]
> [[Existing Page]] claims X. [[New Source]] says Y.
> Needs resolution — check dates, context, and primary sources.
```
On the new source summary, reference it:
```markdown
> [!contradiction] Contradicts [[Existing Page]]
> This source says Y, but existing wiki says X. See [[Existing Page]] for details.
```
Do not silently overwrite old claims. Flag and let the user decide.
---
## What Not to Do
- Do not modify anything in `.raw/`. These are immutable source documents.
- Do not create duplicate pages. Always check the index and search before creating.
- Do not skip the log entry. Every ingest must be recorded.
- Do not skip the hot cache update. It is what keeps future sessions fast.