Overhaul extraction pipeline with new TradeItem model, conversation flow, and dedicated extraction endpoint. Add sidebar navigation with NavMenu component and landing page. Introduce few-shot prompting service and tests. Add prompt settings and email upload specs. Update OpenSpec tooling with improved export-spec and extract-feature commands. Archive completed changes and export full specs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3.8 KiB
Purpose
Define the streaming AI response pipeline — backend chat endpoint using Semantic Kernel, SSE delivery to the WASM client, configuration, and error handling.
Requirements
Requirement: Chat endpoint proxies to Responses API
The API backend SHALL expose POST /api/chat that accepts a ChatRequest containing messages, an optional system prompt, and optional model settings. The request is processed using a Semantic Kernel chat completion service. When a system prompt is provided, it SHALL be added as the first system message in the ChatHistory. When model settings are provided, non-null values SHALL be applied to the execution settings. A separate POST /api/chat/extract endpoint SHALL handle extraction-specific requests with few-shot prompting.
Scenario: Successful chat request with system prompt
- WHEN the client sends a POST to
/api/chatwith messages and a system prompt - THEN the API creates a ChatHistory with the system prompt as the first message, followed by the conversation messages, and processes them through Semantic Kernel
Scenario: Successful chat request with model settings
- WHEN the client sends a POST to
/api/chatwith messages and model settings (e.g., Temperature=0.3) - THEN the API applies the settings to OpenAIPromptExecutionSettings before calling the Semantic Kernel
Scenario: Successful chat request without optional fields
- WHEN the client sends a POST to
/api/chatwith only messages (no system prompt, no settings) - THEN the API processes the request with default behavior (no system message, default execution settings)
Scenario: Extraction request routed to dedicated endpoint
- WHEN the client sends a POST to
/api/chat/extractwith email HTML - THEN the API uses the few-shot ChatHistory prefix and extraction tools instead of the general chat configuration
Requirement: Streaming response delivery
The API backend SHALL stream the Semantic Kernel's chat completion response back to the WASM client as text/event-stream, forwarding text content so the client can render tokens incrementally. The SSE event format MUST remain data: {"text":"..."}\n\n for text deltas and data: [DONE]\n\n for completion.
Scenario: Tokens stream to client
- WHEN the Semantic Kernel emits streaming chat message content
- THEN the backend forwards each content chunk as an SSE event to the client containing the text fragment
Scenario: Stream completes
- WHEN the Semantic Kernel streaming response completes
- THEN the backend signals stream completion to the client with
data: [DONE]\n\n
Requirement: Configurable proxy target
The CLIProxyAPI base URL and model name SHALL be configurable via appsettings.json in the API project, not hardcoded. These values are used to configure the Semantic Kernel OpenAI connector.
Scenario: Configuration read at startup
- WHEN the API starts
- THEN it reads
ResponsesApi:BaseUrlandResponsesApi:Modelfrom configuration to configure the Semantic Kernel
Requirement: Client streams from backend
The WASM client SHALL call POST /api/chat with SetBrowserResponseStreamingEnabled(true) and HttpCompletionOption.ResponseHeadersRead, then iterate the SSE stream to update the UI token by token.
Scenario: Client reads streaming response
- WHEN the client sends a chat request
- THEN it reads the response stream incrementally and appends each text delta to the assistant message in real time
Requirement: Error propagation
If the LLM service returns an error or is unreachable, the API backend SHALL return an error SSE event and the client SHALL display the error to the user.
Scenario: LLM service unreachable
- WHEN the CLIProxyAPI proxy is not running
- THEN the client displays an error message instead of an assistant response