AgenticCode/openspec/specs/chat-streaming/spec.md

## Purpose

Define the streaming AI response pipeline — backend chat endpoint using Semantic Kernel, SSE delivery to the WASM client, configuration, and error handling.

## Requirements

### Requirement: Chat endpoint proxies to Responses API

The API backend SHALL expose `POST /api/chat` that accepts a `ChatRequest` containing messages, an optional system prompt, and optional model settings. The request is processed using a Semantic Kernel chat completion service. When a system prompt is provided, it SHALL be added as the first system message in the ChatHistory. When model settings are provided, non-null values SHALL be applied to the execution settings. A separate `POST /api/chat/extract` endpoint SHALL handle extraction-specific requests with few-shot prompting.

#### Scenario: Successful chat request with system prompt

- **WHEN** the client sends a POST to `/api/chat` with messages and a system prompt
- **THEN** the API creates a ChatHistory with the system prompt as the first message, followed by the conversation messages, and processes them through Semantic Kernel

#### Scenario: Successful chat request with model settings

- **WHEN** the client sends a POST to `/api/chat` with messages and model settings (e.g., Temperature=0.3)
- **THEN** the API applies the settings to OpenAIPromptExecutionSettings before calling the Semantic Kernel

#### Scenario: Successful chat request without optional fields

- **WHEN** the client sends a POST to `/api/chat` with only messages (no system prompt, no settings)
- **THEN** the API processes the request with default behavior (no system message, default execution settings)

#### Scenario: Extraction request routed to dedicated endpoint

- **WHEN** the client sends a POST to `/api/chat/extract` with email HTML
- **THEN** the API uses the few-shot ChatHistory prefix and extraction tools instead of the general chat configuration

### Requirement: Streaming response delivery

The API backend SHALL stream the Semantic Kernel's chat completion response back to the WASM client as `text/event-stream`, forwarding text content so the client can render tokens incrementally. The SSE event format MUST remain `data: {"text":"..."}\n\n` for text deltas and `data: [DONE]\n\n` for completion.

#### Scenario: Tokens stream to client

- **WHEN** the Semantic Kernel emits streaming chat message content
- **THEN** the backend forwards each content chunk as an SSE event to the client containing the text fragment

#### Scenario: Stream completes

- **WHEN** the Semantic Kernel streaming response completes
- **THEN** the backend signals stream completion to the client with `data: [DONE]\n\n`

### Requirement: Configurable proxy target

The CLIProxyAPI base URL and model name SHALL be configurable via `appsettings.json` in the API project, not hardcoded. These values are used to configure the Semantic Kernel OpenAI connector.

#### Scenario: Configuration read at startup

- **WHEN** the API starts
- **THEN** it reads `ResponsesApi:BaseUrl` and `ResponsesApi:Model` from configuration to configure the Semantic Kernel

### Requirement: Client streams from backend

The WASM client SHALL call `POST /api/chat` with `SetBrowserResponseStreamingEnabled(true)` and `HttpCompletionOption.ResponseHeadersRead`, then iterate the SSE stream to update the UI token by token.

#### Scenario: Client reads streaming response

- **WHEN** the client sends a chat request
- **THEN** it reads the response stream incrementally and appends each text delta to the assistant message in real time

### Requirement: Error propagation

If the LLM service returns an error or is unreachable, the API backend SHALL return an error SSE event and the client SHALL display the error to the user.

#### Scenario: LLM service unreachable

- **WHEN** the CLIProxyAPI proxy is not running
- **THEN** the client displays an error message instead of an assistant response