AgenticCode/openspec/specs/agent-extraction/spec.md at 5b027eb0dbd29aedc501a2f54b155939aa6e307c

Files

local 5b027eb0db feat: add extraction schema, sidebar nav, few-shot prompting, and prompt settings

Overhaul extraction pipeline with new TradeItem model, conversation flow,
and dedicated extraction endpoint. Add sidebar navigation with NavMenu
component and landing page. Introduce few-shot prompting service and
tests. Add prompt settings and email upload specs. Update OpenSpec
tooling with improved export-spec and extract-feature commands. Archive
completed changes and export full specs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-06 23:39:23 +01:00

4.5 KiB

Raw Blame History

Purpose

Define the autonomous agent-driven extraction pipeline — structured field extraction from natural language, schema-based validation via tool calling, autonomous retry logic, and human-in-the-loop clarification.

Requirements

Requirement: Structured field extraction from natural language

The agent SHALL extract a predefined set of key-value pairs from user-provided natural language text (e.g., email content) and return them as a structured JSON object.

Scenario: All fields extracted successfully

WHEN the user sends a message containing natural language with all required information
THEN the agent returns a JSON object with all predefined fields populated from the text

Scenario: Partial extraction

WHEN the user sends a message that contains some but not all required fields
THEN the agent extracts available fields and leaves missing fields as null

Requirement: Predefined extraction schema

The system SHALL define the extraction schema as a TradeItem class with fields: valuedate, counterparty, legal_entity, trade_id, display_ccy, pv, breakclause. Extraction output SHALL be wrapped in an ExtractionResult containing a List<TradeItem>. All extraction output MUST conform to this schema.

Scenario: Output conforms to schema

WHEN the agent produces extracted fields from an email
THEN every item in the output is a valid TradeItem with all required fields matching expected types

Scenario: Multiple items from one email

WHEN the agent extracts data from an email containing multiple trade legs
THEN the output ExtractionResult contains one TradeItem per trade leg

Requirement: Autonomous validation via tool calling

The agent SHALL validate extracted fields by calling external API tools exposed as Semantic Kernel functions. Validation tools include counterparty lookup, trade validation, currency validation, and schema validation. Each tool returns structured results that the agent reasons about.

Scenario: Validation passes

WHEN the agent calls the schema validation tool with a complete and correct ExtractionResult
THEN the tool returns a success result and the agent returns the final output to the user

Scenario: Validation fails with fixable errors

WHEN a validation tool returns errors for missing or malformed fields
THEN the agent re-reads the source text and attempts to fix the extraction without user intervention

Scenario: Counterparty disambiguation required

WHEN the counterparty lookup tool returns multiple candidate (counterparty, legal_entity) tuples
THEN the agent presents the candidates to the user as a numbered list in the chat and waits for the user to select one before completing the extraction

Requirement: Autonomous retry with iteration cap

The agent SHALL retry extraction autonomously up to 3 times when validation fails. After exhausting retries, the agent MUST escalate to the user.

Scenario: Agent retries and succeeds

WHEN validation fails on the first attempt but the error is recoverable
THEN the agent retries extraction and calls validation again, up to 3 total attempts

Scenario: Agent exhausts retries and escalates

WHEN validation fails after 3 attempts
THEN the agent sends a natural language message to the user identifying the specific fields it could not resolve and asking for clarification

Requirement: Human-in-the-loop clarification

When the agent escalates to the user, the user SHALL be able to provide the missing information in natural language, and the agent SHALL incorporate the clarification and re-attempt extraction. Disambiguation of counterparty/legal_entity tuples is a specific case of human-in-the-loop clarification.

Scenario: User provides clarification

WHEN the agent asks for clarification about missing fields and the user responds
THEN the agent incorporates the user's response into the conversation context and produces an updated extraction

Scenario: User selects counterparty from candidates

WHEN the agent presents a numbered list of counterparty/legal_entity candidates and the user replies with a selection
THEN the agent populates the legal_entity field on all relevant TradeItems and proceeds with validation

Scenario: Clarification via normal chat

WHEN the agent escalates for clarification
THEN the clarification request appears as a regular assistant message in the chat UI, and the user responds via the normal chat input

4.5 KiB Raw Blame History