## Context The chat UI has a hardcoded response stub. A local OpenAI-compatible proxy at `localhost:8317` serves the Responses API (`POST /v1/responses`) with Claude models. The existing architecture has a WASM client calling the API backend — we add a new endpoint that proxies to the Responses API and streams tokens back. The Responses API streaming format uses SSE with events like `response.output_text.delta` carrying a `delta` field with text fragments. ## Goals / Non-Goals **Goals:** - Wire real AI responses through the existing client → backend → proxy chain - Stream tokens to the UI for responsive feel - Keep the proxy URL and model configurable (server-side only) - Show a thinking indicator while waiting for first token **Non-Goals:** - Conversation history / multi-turn context (future phase) - Model selection UI (future phase) - Retry logic or rate limiting - Markdown rendering of responses (future phase — Markdig) ## Decisions ### Decision 1: Backend proxies the Responses API The WASM client cannot call `localhost:8317` directly (different origin, and we keep external service URLs server-side). The API backend gets a new `ChatController` that: 1. Receives messages from the client 2. Forwards them to `POST /v1/responses` with `"stream": true` 3. Reads the SSE stream, extracts `response.output_text.delta` events 4. Re-emits the text deltas as simple SSE events to the client **Format for client SSE**: `data: {"text": ""}\n\n` for each token, and `data: [DONE]\n\n` at the end. This is simpler than forwarding the full Responses API event structure. **Alternative considered**: Having the client call `localhost:8317` directly via CORS. Rejected — breaks the architecture constraint of keeping external URLs server-side. ### Decision 2: Client-side streaming with SetBrowserResponseStreamingEnabled Per the stack spec, the client uses: - `SetBrowserResponseStreamingEnabled(true)` on the `HttpRequestMessage` - `HttpCompletionOption.ResponseHeadersRead` to start reading before the full response arrives - Line-by-line iteration of the response stream This avoids any JavaScript interop for streaming. ### Decision 3: Simple DTOs in Shared project Add `ChatRequest` (list of messages) and keep the existing `ChatMessage` model. The SSE parsing happens in `ChatApiClient` — no DTO needed for individual stream events since they're parsed inline. ### Decision 4: Configuration via appsettings.json The API's `appsettings.json` gets: ```json { "ResponsesApi": { "BaseUrl": "http://localhost:8317", "Model": "claude-sonnet-4-6" } } ``` This is the API project's appsettings (server-side, not exposed to the browser). ## Risks / Trade-offs - [Proxy adds latency] → Minimal for localhost; acceptable tradeoff for keeping URLs server-side - [No conversation history] → Intentional; each request is single-turn for now. Multi-turn comes in a future phase. - [No retry on stream failure] → If the stream breaks mid-response, the partial text stays visible and an error is shown. Good enough for phase 1.