Skills (new): - skills/obsidian-markdown/ — full Obsidian Flavored Markdown syntax reference (wikilinks, embeds, callouts, properties, math, Mermaid) - skills/obsidian-bases/ — Obsidian Bases (.base files) with correct filters/views/ formulas syntax (sourced from kepano/obsidian-skills authoritative spec) - skills/defuddle/ — web page cleaner; strips ads/nav before URL ingestion, saves 40-60% tokens on web articles wiki-ingest upgrades: - URL ingestion: pass https:// directly, auto-fetches + runs defuddle if available - Image/vision ingestion: .png/.jpg/.gif etc → Claude reads → description saved to .raw/ → standard ingest pipeline - Delta tracking: .raw/.manifest.json tracks hash per source, skips unchanged files wiki-query upgrades: - Quick mode (query quick:) — hot.md + index only, ~1500 tokens - Standard mode — existing behaviour, 3-5 pages - Deep mode (query deep:) — full wiki + optional web search supplement hooks: - PostToolUse auto-commit: every Write/Edit to wiki/ or .raw/ triggers git add + commit automatically, vault always versioned fixes: - Removed invalid allowed-tools field from all 10 SKILL.md files (not a valid skill frontmatter attribute per spec; was silently ignored) - Canvas SKILL.md now references json-canvas open standard and kepano/obsidian-skills wiki research: - Ecosystem research: 16+ Claude+Obsidian projects mapped and filed - New pages: comparisons/claude-obsidian-ecosystem, concepts/cherry-picks, entities/ (6 new), sources/claude-obsidian-ecosystem-research - Cherry-picks roadmap filed at wiki/concepts/cherry-picks.md Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2.5 KiB
name, description
| name | description |
|---|---|
| defuddle | Strip clutter from web pages before ingesting into the wiki. Removes ads, navigation, headers, footers, and boilerplate — leaving clean readable markdown that saves 40-60% tokens. Triggers on: defuddle, clean this page, strip this url, fetch and clean, clean web content before ingesting. |
defuddle — Web Page Cleaner
Defuddle extracts the meaningful content from a web page and drops everything else: ads, cookie banners, nav bars, related articles, footers, social sharing buttons. What remains is the article body as clean markdown.
Use this before any URL ingestion. It is optional but strongly recommended — it cuts token usage by 40-60% on typical web articles and produces cleaner wiki pages.
Install
npm install -g defuddle-cli
Verify: defuddle --version
Usage
Clean a URL directly
defuddle https://example.com/article
Outputs clean markdown to stdout.
Save to .raw/
defuddle https://example.com/article > .raw/articles/article-slug-$(date +%Y-%m-%d).md
Add frontmatter header after saving
After running defuddle, prepend the source URL and fetch date:
SLUG="article-slug-$(date +%Y-%m-%d)"
{ echo "---"; echo "source_url: https://example.com/article"; echo "fetched: $(date +%Y-%m-%d)"; echo "---"; echo ""; defuddle https://example.com/article; } > .raw/articles/$SLUG.md
Clean a local HTML file
defuddle page.html
When to Use
Use defuddle when:
- Ingesting a news article, blog post, or documentation page from a URL
- The page has a lot of surrounding content (most web pages do)
- You want to stay within token budget on a long article
Skip defuddle when:
- The source is already a clean markdown or PDF file
- The page is a dashboard, app, or structured data (defuddle expects article-style content)
- defuddle is not installed and the article is short enough to process raw
Fallback
If defuddle is not installed, check:
which defuddle 2>/dev/null || echo "not installed"
If not installed: use WebFetch directly. The content will be less clean but still workable.
Integration with /wiki-ingest
The /wiki-ingest skill checks for defuddle automatically when a URL is passed. You do not need to run defuddle manually before ingesting a URL — the ingest skill will call it if available.
To manually clean a page and save before ingesting:
- Run the save command above
- Then:
ingest .raw/articles/[slug].md