AgenticCode/openspec/changes/archive/2026-04-04-wire-responses-api/design.md at 7a5c22593a45971ed24a7b3d0394f278964b6705

Files

local 00e7df2802 feat: wire chat UI to Responses API with streaming

Add ChatController that proxies POST /api/chat to the local Responses API
(localhost:8317/v1/responses) with SSE streaming. Client reads tokens via
SetBrowserResponseStreamingEnabled and renders them incrementally. Includes
thinking indicator, input disabled during streaming, and error handling.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-04 01:54:28 +01:00

3.0 KiB

Raw Blame History

Context

The chat UI has a hardcoded response stub. A local OpenAI-compatible proxy at localhost:8317 serves the Responses API (POST /v1/responses) with Claude models. The existing architecture has a WASM client calling the API backend — we add a new endpoint that proxies to the Responses API and streams tokens back.

The Responses API streaming format uses SSE with events like response.output_text.delta carrying a delta field with text fragments.

Goals / Non-Goals

Goals:

Wire real AI responses through the existing client → backend → proxy chain
Stream tokens to the UI for responsive feel
Keep the proxy URL and model configurable (server-side only)
Show a thinking indicator while waiting for first token

Non-Goals:

Conversation history / multi-turn context (future phase)
Model selection UI (future phase)
Retry logic or rate limiting
Markdown rendering of responses (future phase — Markdig)

Decisions

Decision 1: Backend proxies the Responses API

The WASM client cannot call localhost:8317 directly (different origin, and we keep external service URLs server-side). The API backend gets a new ChatController that:

Receives messages from the client
Forwards them to POST /v1/responses with "stream": true
Reads the SSE stream, extracts response.output_text.delta events
Re-emits the text deltas as simple SSE events to the client

Format for client SSE: data: {"text": "<delta>"}\n\n for each token, and data: [DONE]\n\n at the end. This is simpler than forwarding the full Responses API event structure.

Alternative considered: Having the client call localhost:8317 directly via CORS. Rejected — breaks the architecture constraint of keeping external URLs server-side.

Decision 2: Client-side streaming with SetBrowserResponseStreamingEnabled

Per the stack spec, the client uses:

SetBrowserResponseStreamingEnabled(true) on the HttpRequestMessage
HttpCompletionOption.ResponseHeadersRead to start reading before the full response arrives
Line-by-line iteration of the response stream

This avoids any JavaScript interop for streaming.

Decision 3: Simple DTOs in Shared project

Add ChatRequest (list of messages) and keep the existing ChatMessage model. The SSE parsing happens in ChatApiClient — no DTO needed for individual stream events since they're parsed inline.

Decision 4: Configuration via appsettings.json

The API's appsettings.json gets:

{
  "ResponsesApi": {
    "BaseUrl": "http://localhost:8317",
    "Model": "claude-sonnet-4-6"
  }
}

This is the API project's appsettings (server-side, not exposed to the browser).

Risks / Trade-offs

[Proxy adds latency] → Minimal for localhost; acceptable tradeoff for keeping URLs server-side
[No conversation history] → Intentional; each request is single-turn for now. Multi-turn comes in a future phase.
[No retry on stream failure] → If the stream breaks mid-response, the partial text stays visible and an error is shown. Good enough for phase 1.

3.0 KiB Raw Blame History