Files
AgenticCode/openspec/specs/chat-streaming/spec.md
local 5b027eb0db feat: add extraction schema, sidebar nav, few-shot prompting, and prompt settings
Overhaul extraction pipeline with new TradeItem model, conversation flow,
and dedicated extraction endpoint. Add sidebar navigation with NavMenu
component and landing page. Introduce few-shot prompting service and
tests. Add prompt settings and email upload specs. Update OpenSpec
tooling with improved export-spec and extract-feature commands. Archive
completed changes and export full specs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 23:39:23 +01:00

3.8 KiB

Purpose

Define the streaming AI response pipeline — backend chat endpoint using Semantic Kernel, SSE delivery to the WASM client, configuration, and error handling.

Requirements

Requirement: Chat endpoint proxies to Responses API

The API backend SHALL expose POST /api/chat that accepts a ChatRequest containing messages, an optional system prompt, and optional model settings. The request is processed using a Semantic Kernel chat completion service. When a system prompt is provided, it SHALL be added as the first system message in the ChatHistory. When model settings are provided, non-null values SHALL be applied to the execution settings. A separate POST /api/chat/extract endpoint SHALL handle extraction-specific requests with few-shot prompting.

Scenario: Successful chat request with system prompt

  • WHEN the client sends a POST to /api/chat with messages and a system prompt
  • THEN the API creates a ChatHistory with the system prompt as the first message, followed by the conversation messages, and processes them through Semantic Kernel

Scenario: Successful chat request with model settings

  • WHEN the client sends a POST to /api/chat with messages and model settings (e.g., Temperature=0.3)
  • THEN the API applies the settings to OpenAIPromptExecutionSettings before calling the Semantic Kernel

Scenario: Successful chat request without optional fields

  • WHEN the client sends a POST to /api/chat with only messages (no system prompt, no settings)
  • THEN the API processes the request with default behavior (no system message, default execution settings)

Scenario: Extraction request routed to dedicated endpoint

  • WHEN the client sends a POST to /api/chat/extract with email HTML
  • THEN the API uses the few-shot ChatHistory prefix and extraction tools instead of the general chat configuration

Requirement: Streaming response delivery

The API backend SHALL stream the Semantic Kernel's chat completion response back to the WASM client as text/event-stream, forwarding text content so the client can render tokens incrementally. The SSE event format MUST remain data: {"text":"..."}\n\n for text deltas and data: [DONE]\n\n for completion.

Scenario: Tokens stream to client

  • WHEN the Semantic Kernel emits streaming chat message content
  • THEN the backend forwards each content chunk as an SSE event to the client containing the text fragment

Scenario: Stream completes

  • WHEN the Semantic Kernel streaming response completes
  • THEN the backend signals stream completion to the client with data: [DONE]\n\n

Requirement: Configurable proxy target

The CLIProxyAPI base URL and model name SHALL be configurable via appsettings.json in the API project, not hardcoded. These values are used to configure the Semantic Kernel OpenAI connector.

Scenario: Configuration read at startup

  • WHEN the API starts
  • THEN it reads ResponsesApi:BaseUrl and ResponsesApi:Model from configuration to configure the Semantic Kernel

Requirement: Client streams from backend

The WASM client SHALL call POST /api/chat with SetBrowserResponseStreamingEnabled(true) and HttpCompletionOption.ResponseHeadersRead, then iterate the SSE stream to update the UI token by token.

Scenario: Client reads streaming response

  • WHEN the client sends a chat request
  • THEN it reads the response stream incrementally and appends each text delta to the assistant message in real time

Requirement: Error propagation

If the LLM service returns an error or is unreachable, the API backend SHALL return an error SSE event and the client SHALL display the error to the user.

Scenario: LLM service unreachable

  • WHEN the CLIProxyAPI proxy is not running
  • THEN the client displays an error message instead of an assistant response