AgenticCode/openspec/changes/archive/2026-04-06-few-shot-prompt-infrastructure/design.md

## Context

The extraction agent (Semantic Kernel with auto-invoke tools) needs few-shot examples to reliably map sales emails to TradeItem JSON. The `update-extraction-schema` change provides the real schema and tools. This change adds the prompting infrastructure and a dedicated endpoint.

Up to 100 input/output example pairs are available. Research shows 3-5 well-chosen examples are optimal for few-shot prompting — more can degrade performance by consuming context and introducing noise.

## Goals / Non-Goals

**Goals:**
- Load curated few-shot examples from disk and assemble a reusable ChatHistory prefix
- Provide a fixed instruction template for extraction (not user-editable)
- Create a dedicated extraction endpoint with the correct prompt and tools
- Keep the general chat endpoint unchanged

**Non-Goals:**
- Dynamic example selection (RAG-like similarity matching) — future enhancement
- Email upload UI (separate change: `email-upload-ux`)
- Building or curating the actual example content (user provides these)
- Evaluation pipeline for the ~95 non-few-shot examples (future work)

## Decisions

### 1. Examples as conversation turns (not system prompt string)

**Decision:** Inject few-shot examples as alternating User/Assistant messages in the ChatHistory, after the system message and before the real email.

**Why:** Chat models treat conversation turns as prior context — the model "sees" the examples as things it already did correctly. This is more effective than embedding examples in the system prompt string, where they're treated as instructions rather than demonstrated behavior.

### 2. Examples loaded once at startup, cached as ChatHistory prefix

**Decision:** `FewShotService` reads example files at startup, builds a `ChatHistory` prefix (system message + example turns), and caches it as a singleton. Each extraction request clones this prefix and appends the real email.

**Why:** Example files don't change at runtime. Loading once avoids repeated disk I/O. Cloning the cached prefix is cheap (ChatHistory is a list of message objects).

**Alternative considered:** Load examples per-request. Rejected — unnecessary I/O for static data.

### 3. Instruction template as an embedded text file

**Decision:** Store the extraction instruction template as a text file at `examples/extraction/instruction-template.txt`, loaded by FewShotService alongside the examples.

**Why:** Keeps the prompt text editable without recompilation. Co-located with the examples it references. Not in appsettings.json because it's multi-line prose, not configuration.

### 4. Separate extraction endpoint, not a mode flag on /api/chat

**Decision:** `POST /api/chat/extract` as a new controller action, separate from `POST /api/chat`.

**Why:** The extraction path uses a completely different ChatHistory (few-shot prefix, not user system prompt), different tools (extraction plugins only), and a different request DTO. A mode flag on the existing endpoint would add branching complexity. Separate endpoints make each path clear.

**Alternative considered:** Mode flag on ChatRequest (e.g. `"mode": "extract"`). Rejected — the request shapes diverge enough to warrant separate DTOs and endpoints.

### 5. ExtractionRequest includes conversation messages for follow-up

**Decision:** `ExtractionRequest` contains `EmailHtml` (string, the email to extract) plus `Messages` (list, optional follow-up conversation for disambiguation).

**Why:** After initial extraction, the agent may ask disambiguation questions. Follow-up user replies need to be sent back with the full conversation context so the agent can continue. The first request has only `EmailHtml`; subsequent requests include the growing `Messages` list.

### 6. Example folder uses numbered subdirectories

**Decision:** `examples/extraction/few-shot/01/input.html + output.json`, `02/`, etc.

**Why:** Numbered prefixes control ordering in the ChatHistory. Each subdirectory is one example, keeping input and output together. Easy to add/remove/reorder examples by renaming directories.

## Risks / Trade-offs

**[Example quality determines extraction quality]** → Poorly chosen few-shot examples will mislead the model. Mitigation: document selection criteria (diversity of swap structures, currencies, breakclause values). The user curates the 3-5 examples.

**[Instruction template drift]** → If the schema changes, the instruction template must be updated manually. Mitigation: the template references the TradeItem field names explicitly, making it obvious when they're out of sync.

**[ChatHistory size with few-shot examples]** → Each example adds ~2-5KB of tokens. With 5 examples that's ~10-25KB, well within model context limits. Not a risk at current scale but would be if dynamic selection adds more examples later.