feat: v1.1 — URL ingestion, vision, delta tracking, 3 new skills, auto-commit
Skills (new): - skills/obsidian-markdown/ — full Obsidian Flavored Markdown syntax reference (wikilinks, embeds, callouts, properties, math, Mermaid) - skills/obsidian-bases/ — Obsidian Bases (.base files) with correct filters/views/ formulas syntax (sourced from kepano/obsidian-skills authoritative spec) - skills/defuddle/ — web page cleaner; strips ads/nav before URL ingestion, saves 40-60% tokens on web articles wiki-ingest upgrades: - URL ingestion: pass https:// directly, auto-fetches + runs defuddle if available - Image/vision ingestion: .png/.jpg/.gif etc → Claude reads → description saved to .raw/ → standard ingest pipeline - Delta tracking: .raw/.manifest.json tracks hash per source, skips unchanged files wiki-query upgrades: - Quick mode (query quick:) — hot.md + index only, ~1500 tokens - Standard mode — existing behaviour, 3-5 pages - Deep mode (query deep:) — full wiki + optional web search supplement hooks: - PostToolUse auto-commit: every Write/Edit to wiki/ or .raw/ triggers git add + commit automatically, vault always versioned fixes: - Removed invalid allowed-tools field from all 10 SKILL.md files (not a valid skill frontmatter attribute per spec; was silently ignored) - Canvas SKILL.md now references json-canvas open standard and kepano/obsidian-skills wiki research: - Ecosystem research: 16+ Claude+Obsidian projects mapped and filed - New pages: comparisons/claude-obsidian-ecosystem, concepts/cherry-picks, entities/ (6 new), sources/claude-obsidian-ecosystem-research - Cherry-picks roadmap filed at wiki/concepts/cherry-picks.md Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
83
skills/defuddle/SKILL.md
Normal file
83
skills/defuddle/SKILL.md
Normal file
@@ -0,0 +1,83 @@
|
||||
---
|
||||
name: defuddle
|
||||
description: "Strip clutter from web pages before ingesting into the wiki. Removes ads, navigation, headers, footers, and boilerplate — leaving clean readable markdown that saves 40-60% tokens. Triggers on: defuddle, clean this page, strip this url, fetch and clean, clean web content before ingesting."
|
||||
---
|
||||
|
||||
# defuddle — Web Page Cleaner
|
||||
|
||||
Defuddle extracts the meaningful content from a web page and drops everything else: ads, cookie banners, nav bars, related articles, footers, social sharing buttons. What remains is the article body as clean markdown.
|
||||
|
||||
Use this before any URL ingestion. It is optional but strongly recommended — it cuts token usage by 40-60% on typical web articles and produces cleaner wiki pages.
|
||||
|
||||
---
|
||||
|
||||
## Install
|
||||
|
||||
```bash
|
||||
npm install -g defuddle-cli
|
||||
```
|
||||
|
||||
Verify: `defuddle --version`
|
||||
|
||||
---
|
||||
|
||||
## Usage
|
||||
|
||||
### Clean a URL directly
|
||||
```bash
|
||||
defuddle https://example.com/article
|
||||
```
|
||||
Outputs clean markdown to stdout.
|
||||
|
||||
### Save to .raw/
|
||||
```bash
|
||||
defuddle https://example.com/article > .raw/articles/article-slug-$(date +%Y-%m-%d).md
|
||||
```
|
||||
|
||||
### Add frontmatter header after saving
|
||||
After running defuddle, prepend the source URL and fetch date:
|
||||
```bash
|
||||
SLUG="article-slug-$(date +%Y-%m-%d)"
|
||||
{ echo "---"; echo "source_url: https://example.com/article"; echo "fetched: $(date +%Y-%m-%d)"; echo "---"; echo ""; defuddle https://example.com/article; } > .raw/articles/$SLUG.md
|
||||
```
|
||||
|
||||
### Clean a local HTML file
|
||||
```bash
|
||||
defuddle page.html
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## When to Use
|
||||
|
||||
**Use defuddle when:**
|
||||
- Ingesting a news article, blog post, or documentation page from a URL
|
||||
- The page has a lot of surrounding content (most web pages do)
|
||||
- You want to stay within token budget on a long article
|
||||
|
||||
**Skip defuddle when:**
|
||||
- The source is already a clean markdown or PDF file
|
||||
- The page is a dashboard, app, or structured data (defuddle expects article-style content)
|
||||
- defuddle is not installed and the article is short enough to process raw
|
||||
|
||||
---
|
||||
|
||||
## Fallback
|
||||
|
||||
If defuddle is not installed, check:
|
||||
|
||||
```bash
|
||||
which defuddle 2>/dev/null || echo "not installed"
|
||||
```
|
||||
|
||||
If not installed: use WebFetch directly. The content will be less clean but still workable.
|
||||
|
||||
---
|
||||
|
||||
## Integration with /wiki-ingest
|
||||
|
||||
The `/wiki-ingest` skill checks for defuddle automatically when a URL is passed. You do not need to run defuddle manually before ingesting a URL — the ingest skill will call it if available.
|
||||
|
||||
To manually clean a page and save before ingesting:
|
||||
1. Run the save command above
|
||||
2. Then: `ingest .raw/articles/[slug].md`
|
||||
Reference in New Issue
Block a user