feat: wire chat UI to Responses API with streaming

Add ChatController that proxies POST /api/chat to the local Responses API (localhost:8317/v1/responses) with SSE streaming. Client reads tokens via SetBrowserResponseStreamingEnabled and renders them incrementally. Includes thinking indicator, input disabled during streaming, and error handling. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 01:54:28 +01:00
parent 1614a61617
commit 00e7df2802
15 changed files with 500 additions and 69 deletions
--- a/openspec/changes/archive/2026-04-04-wire-responses-api/.openspec.yaml
+++ b/openspec/changes/archive/2026-04-04-wire-responses-api/.openspec.yaml
--- a/openspec/changes/archive/2026-04-04-wire-responses-api/design.md
+++ b/openspec/changes/archive/2026-04-04-wire-responses-api/design.md
--- a/openspec/changes/archive/2026-04-04-wire-responses-api/proposal.md
+++ b/openspec/changes/archive/2026-04-04-wire-responses-api/proposal.md
--- a/openspec/changes/archive/2026-04-04-wire-responses-api/specs/chat-streaming/spec.md
+++ b/openspec/changes/archive/2026-04-04-wire-responses-api/specs/chat-streaming/spec.md
--- a/openspec/changes/archive/2026-04-04-wire-responses-api/specs/chat-ui/spec.md
+++ b/openspec/changes/archive/2026-04-04-wire-responses-api/specs/chat-ui/spec.md
--- a/openspec/changes/archive/2026-04-04-wire-responses-api/tasks.md
+++ b/openspec/changes/archive/2026-04-04-wire-responses-api/tasks.md
@@ -0,0 +1,29 @@
+## 1. Shared Models
+
+- [x] 1.1 Create ChatRequest.cs in ChatAgent.Shared/Models with a Messages list property
+
+## 2. API Backend
+
+- [x] 2.1 Add appsettings.json to ChatAgent.Api with ResponsesApi:BaseUrl and ResponsesApi:Model
+- [x] 2.2 Register an HttpClient for the Responses API proxy in Api Program.cs
+- [x] 2.3 Create ChatController with POST /api/chat that proxies to the Responses API with streaming
+- [x] 2.4 Parse Responses API SSE stream, extract response.output_text.delta events, re-emit as simplified SSE to client
+
+## 3. Client Streaming
+
+- [x] 3.1 Add a streaming SendChatAsync method to ChatApiClient that uses SetBrowserResponseStreamingEnabled and HttpCompletionOption.ResponseHeadersRead
+- [x] 3.2 Parse the simplified SSE stream line-by-line, yielding text deltas
+
+## 4. Chat Page Updates
+
+- [x] 4.1 Replace hardcoded response in Chat.razor with a call to ChatApiClient.SendChatAsync
+- [x] 4.2 Append tokens to the assistant message incrementally with StateHasChanged after each delta
+- [x] 4.3 Add a thinking indicator shown until the first token arrives
+- [x] 4.4 Disable input field and send button while streaming is in progress
+- [x] 4.5 Handle errors — display error message if API call fails
+- [x] 4.6 Auto-scroll during streaming (not just at the end)
+
+## 5. Verify
+
+- [x] 5.1 Run dotnet build to confirm no errors
+- [ ] 5.2 Manually verify: send a message, see streaming response from Claude
--- a/openspec/changes/wire-responses-api/tasks.md
+++ b/openspec/changes/wire-responses-api/tasks.md
@@ -1,29 +0,0 @@
-## 1. Shared Models
-
- [ ] 1.1 Create ChatRequest.cs in ChatAgent.Shared/Models with a Messages list property
-
-## 2. API Backend
-
- [ ] 2.1 Add appsettings.json to ChatAgent.Api with ResponsesApi:BaseUrl and ResponsesApi:Model
- [ ] 2.2 Register an HttpClient for the Responses API proxy in Api Program.cs
- [ ] 2.3 Create ChatController with POST /api/chat that proxies to the Responses API with streaming
- [ ] 2.4 Parse Responses API SSE stream, extract response.output_text.delta events, re-emit as simplified SSE to client
-
-## 3. Client Streaming
-
- [ ] 3.1 Add a streaming SendChatAsync method to ChatApiClient that uses SetBrowserResponseStreamingEnabled and HttpCompletionOption.ResponseHeadersRead
- [ ] 3.2 Parse the simplified SSE stream line-by-line, yielding text deltas
-
-## 4. Chat Page Updates
-
- [ ] 4.1 Replace hardcoded response in Chat.razor with a call to ChatApiClient.SendChatAsync
- [ ] 4.2 Append tokens to the assistant message incrementally with StateHasChanged after each delta
- [ ] 4.3 Add a thinking indicator shown until the first token arrives
- [ ] 4.4 Disable input field and send button while streaming is in progress
- [ ] 4.5 Handle errors — display error message if API call fails
- [ ] 4.6 Auto-scroll during streaming (not just at the end)
-
-## 5. Verify
-
- [ ] 5.1 Run dotnet build to confirm no errors
- [ ] 5.2 Manually verify: send a message, see streaming response from Claude
--- a/openspec/specs/chat-streaming/spec.md
+++ b/openspec/specs/chat-streaming/spec.md
@@ -0,0 +1,55 @@
+## Purpose
+
+Define the streaming AI response pipeline — backend proxy to the Responses API, SSE delivery to the WASM client, configuration, and error handling.
+
+## Requirements
+
+### Requirement: Chat endpoint proxies to Responses API
+
+The API backend SHALL expose `POST /api/chat` that accepts a list of messages and proxies the request to the local Responses API at a configurable base URL using the `POST /v1/responses` endpoint.
+
+#### Scenario: Successful proxy request
+
+- **WHEN** the client sends a POST to `/api/chat` with a message list
+- **THEN** the API forwards the messages to the Responses API with the configured model and returns the response
+
+### Requirement: Streaming response delivery
+
+The API backend SHALL stream the Responses API's SSE events back to the WASM client as `text/event-stream`, forwarding `response.output_text.delta` events so the client can render tokens incrementally.
+
+#### Scenario: Tokens stream to client
+
+- **WHEN** the Responses API emits `response.output_text.delta` events
+- **THEN** the backend forwards each delta as an SSE event to the client containing the text fragment
+
+#### Scenario: Stream completes
+
+- **WHEN** the Responses API emits `response.completed`
+- **THEN** the backend signals stream completion to the client
+
+### Requirement: Configurable proxy target
+
+The Responses API base URL and model name SHALL be configurable via `appsettings.json` in the API project, not hardcoded.
+
+#### Scenario: Configuration read at startup
+
+- **WHEN** the API starts
+- **THEN** it reads `ResponsesApi:BaseUrl` and `ResponsesApi:Model` from configuration
+
+### Requirement: Client streams from backend
+
+The WASM client SHALL call `POST /api/chat` with `SetBrowserResponseStreamingEnabled(true)` and `HttpCompletionOption.ResponseHeadersRead`, then iterate the SSE stream to update the UI token by token.
+
+#### Scenario: Client reads streaming response
+
+- **WHEN** the client sends a chat request
+- **THEN** it reads the response stream incrementally and appends each text delta to the assistant message in real time
+
+### Requirement: Error propagation
+
+If the Responses API returns an error or is unreachable, the API backend SHALL return an appropriate HTTP error status and the client SHALL display the error to the user.
+
+#### Scenario: Proxy unreachable
+
+- **WHEN** the Responses API is not running
+- **THEN** the client displays an error message instead of an assistant response
--- a/openspec/specs/chat-ui/spec.md
+++ b/openspec/specs/chat-ui/spec.md
@@ -42,14 +42,33 @@ The chat page SHALL provide a text input area at the bottom of the page where th
 - **WHEN** the user attempts to send an empty or whitespace-only message
 - **THEN** nothing is sent and no message is added

-### Requirement: Hardcoded response
+#### Scenario: Input disabled during streaming

-In this phase, the assistant SHALL reply with a hardcoded message to every user input. This stubs the AI integration point for future phases.
+- **WHEN** the assistant is currently streaming a response
+- **THEN** the input field and send button are disabled until streaming completes

-#### Scenario: Bot replies to any input
+### Requirement: Thinking indicator
+
+The chat page SHALL show a visual indicator while waiting for the first token from the assistant.
+
+#### Scenario: Indicator shown during wait
+
+- **WHEN** the user sends a message and the assistant has not yet started streaming
+- **THEN** a thinking indicator (e.g., animated dots) is shown in the assistant message area
+
+#### Scenario: Indicator replaced by content
+
+- **WHEN** the first token arrives from the stream
+- **THEN** the thinking indicator is replaced by the streamed text
+
+### Requirement: Streaming AI response
+
+The assistant SHALL reply with a real AI response streamed from the backend API. Tokens appear incrementally as they arrive.
+
+#### Scenario: Bot replies with streamed AI response

 - **WHEN** the user sends any message
- **THEN** the assistant replies with a hardcoded response (e.g., "This is a placeholder response. AI integration coming soon!")
+- **THEN** the assistant message appears and grows token by token as the stream delivers text

 ### Requirement: Auto-scroll

--- a/src/ChatAgent.Api/Controllers/ChatController.cs
+++ b/src/ChatAgent.Api/Controllers/ChatController.cs
@@ -0,0 +1,183 @@
+// ChatController.cs -- Proxies chat requests to the Responses API with streaming.
+//
+// This controller receives messages from the WASM client, forwards them to the
+// local Responses API (OpenAI-compatible) at a configurable URL, and streams
+// the response tokens back as Server-Sent Events (SSE).
+//
+// Key concepts demonstrated:
+// - IHttpClientFactory named client injection for external API calls
+// - IConfiguration for reading appsettings.json values
+// - SSE streaming response from ASP.NET Core (text/event-stream)
+// - Parsing upstream SSE events and re-emitting simplified events to the client
+
+using System.Text;
+using System.Text.Json;
+using ChatAgent.Shared.Models;
+using Microsoft.AspNetCore.Mvc;
+
+namespace ChatAgent.Api.Controllers
+{
+    /// <summary>
+    /// Proxies chat requests to the Responses API and streams tokens back to the client.
+    /// The Responses API URL and model are configured in appsettings.json under "ResponsesApi".
+    /// </summary>
+    [ApiController]
+    [Route("api/[controller]")]
+    public class ChatController : ControllerBase
+    {
+        private readonly IHttpClientFactory _httpClientFactory;
+        private readonly IConfiguration _configuration;
+
+        public ChatController(IHttpClientFactory httpClientFactory, IConfiguration configuration)
+        {
+            _httpClientFactory = httpClientFactory;
+            _configuration = configuration;
+        }
+
+        /// <summary>
+        /// POST /api/chat -- Accepts a ChatRequest with messages, forwards to the Responses API
+        /// with streaming enabled, and re-emits text deltas as simplified SSE events.
+        ///
+        /// Client SSE format:
+        ///   data: {"text":"token here"}\n\n   -- for each text delta
+        ///   data: [DONE]\n\n                   -- when streaming completes
+        ///   data: {"error":"message"}\n\n       -- if an error occurs
+        /// </summary>
+        [HttpPost]
+        public async Task Post([FromBody] ChatRequest request)
+        {
+            // Set the response content type to SSE so the client knows to read it as a stream.
+            // "text/event-stream" is the standard MIME type for Server-Sent Events.
+            Response.ContentType = "text/event-stream";
+            Response.Headers["Cache-Control"] = "no-cache";
+
+            try
+            {
+                var client = _httpClientFactory.CreateClient("ResponsesApi");
+                var model = _configuration["ResponsesApi:Model"] ?? "claude-sonnet-4-6";
+
+                // Build the Responses API request payload.
+                // The Responses API expects "input" (array of role/content objects) and "model".
+                // "stream": true enables SSE streaming of token deltas.
+                var inputMessages = request.Messages.Select(m => new
+                {
+                    role = m.Role,
+                    content = m.Content
+                }).ToArray();
+
+                var payload = new
+                {
+                    model,
+                    input = inputMessages,
+                    stream = true
+                };
+
+                var jsonPayload = JsonSerializer.Serialize(payload);
+                var content = new StringContent(jsonPayload, Encoding.UTF8, "application/json");
+
+                // Use HttpCompletionOption.ResponseHeadersRead so we start reading the stream
+                // as soon as headers arrive, rather than waiting for the full response body.
+                using var upstreamRequest = new HttpRequestMessage(HttpMethod.Post, "/v1/responses")
+                {
+                    Content = content
+                };
+
+                using var upstreamResponse = await client.SendAsync(
+                    upstreamRequest,
+                    HttpCompletionOption.ResponseHeadersRead,
+                    HttpContext.RequestAborted);
+
+                if (!upstreamResponse.IsSuccessStatusCode)
+                {
+                    var errorBody = await upstreamResponse.Content.ReadAsStringAsync();
+                    await WriteSSEAsync($"{{\"error\":\"Responses API returned {upstreamResponse.StatusCode}: {EscapeJson(errorBody)}\"}}");
+                    await WriteSSEAsync("[DONE]");
+                    return;
+                }
+
+                // Read the upstream SSE stream line by line, extract text deltas,
+                // and re-emit them as simplified SSE events to the client.
+                using var stream = await upstreamResponse.Content.ReadAsStreamAsync();
+                using var reader = new StreamReader(stream);
+
+                // Use ReadLineAsync and check for null instead of reader.EndOfStream,
+                // because EndOfStream performs a synchronous read which is not supported
+                // in ASP.NET Core's async pipeline.
+                string? line;
+                while ((line = await reader.ReadLineAsync()) != null)
+                {
+                    // SSE format: "data: {json}" lines, separated by blank lines.
+                    // We only care about lines starting with "data: ".
+                    if (!line.StartsWith("data: "))
+                        continue;
+
+                    var data = line.Substring(6); // strip "data: " prefix
+
+                    // Parse the JSON to find response.output_text.delta events.
+                    // These carry the actual text tokens in the "delta" field.
+                    try
+                    {
+                        using var doc = JsonDocument.Parse(data);
+                        var root = doc.RootElement;
+
+                        if (root.TryGetProperty("type", out var typeElement))
+                        {
+                            var eventType = typeElement.GetString();
+
+                            if (eventType == "response.output_text.delta")
+                            {
+                                // Extract the text delta and send it to the client
+                                if (root.TryGetProperty("delta", out var deltaElement))
+                                {
+                                    var delta = deltaElement.GetString() ?? "";
+                                    await WriteSSEAsync($"{{\"text\":{JsonSerializer.Serialize(delta)}}}");
+                                    await Response.Body.FlushAsync();
+                                }
+                            }
+                            else if (eventType == "response.completed")
+                            {
+                                // Stream is done
+                                await WriteSSEAsync("[DONE]");
+                                return;
+                            }
+                        }
+                    }
+                    catch (JsonException)
+                    {
+                        // Skip malformed JSON lines
+                    }
+                }
+
+                // If we exit the loop without seeing response.completed, still signal done
+                await WriteSSEAsync("[DONE]");
+            }
+            catch (HttpRequestException ex)
+            {
+                await WriteSSEAsync($"{{\"error\":{JsonSerializer.Serialize($"Failed to reach Responses API: {ex.Message}")}}}");
+                await WriteSSEAsync("[DONE]");
+            }
+            catch (TaskCanceledException)
+            {
+                // Client disconnected — nothing to do
+            }
+        }
+
+        /// <summary>
+        /// Writes a single SSE event to the response stream.
+        /// SSE format: "data: {payload}\n\n"
+        /// </summary>
+        private async Task WriteSSEAsync(string data)
+        {
+            await Response.WriteAsync($"data: {data}\n\n");
+            await Response.Body.FlushAsync();
+        }
+
+        /// <summary>
+        /// Escapes a string for embedding in JSON (handles quotes and backslashes).
+        /// </summary>
+        private static string EscapeJson(string s)
+        {
+            return s.Replace("\\", "\\\\").Replace("\"", "\\\"").Replace("\n", "\\n");
+        }
+    }
+}
--- a/src/ChatAgent.Api/Program.cs
+++ b/src/ChatAgent.Api/Program.cs
@@ -16,6 +16,15 @@ var builder = WebApplication.CreateBuilder(args);
 // for explicit structure -- each controller is a separate file with clear routing (D-05).
 builder.Services.AddControllers();

+// Register a named HttpClient for proxying requests to the Responses API.
+// The base URL comes from appsettings.json (server-side config, not exposed to the browser).
+// IHttpClientFactory manages the underlying HttpMessageHandler lifetime.
+builder.Services.AddHttpClient("ResponsesApi", client =>
+{
+    var baseUrl = builder.Configuration["ResponsesApi:BaseUrl"] ?? "http://localhost:8317";
+    client.BaseAddress = new Uri(baseUrl);
+});
+
 // AddCors() registers Cross-Origin Resource Sharing services.
 // CORS is REQUIRED because the Blazor WASM client runs on a different origin
 // (https://localhost:5200) than this API (https://localhost:7100).
--- a/src/ChatAgent.Api/appsettings.json
+++ b/src/ChatAgent.Api/appsettings.json
@@ -5,5 +5,9 @@
      "Microsoft.AspNetCore": "Warning"
    }
  },
-  "AllowedHosts": "*"
+  "AllowedHosts": "*",
+  "ResponsesApi": {
+    "BaseUrl": "http://localhost:8317",
+    "Model": "claude-sonnet-4-6"
+  }
 }
--- a/src/ChatAgent.Client/Pages/Chat.razor
+++ b/src/ChatAgent.Client/Pages/Chat.razor
@@ -1,4 +1,4 @@
-@* Chat.razor -- The main chat interface.
+@* Chat.razor -- The main chat interface with streaming AI responses.

   This is the primary page of the application, mapped to the root route "/".
   It displays a vertically scrolling message list and a text input at the bottom,
@@ -9,9 +9,10 @@
   - Two-way binding with @bind-Value on MudTextField
   - Event handling with @onclick and OnKeyDown
   - List rendering with @foreach over a List<T>
-   - StateHasChanged() for manual re-render triggers
+   - StateHasChanged() for manual re-render triggers during streaming
   - IJSRuntime for calling JavaScript (auto-scroll)
-   - Conditional CSS classes based on data (user vs assistant styling)
+   - IAsyncEnumerable consumption for streaming API responses
+   - Conditional rendering for thinking indicator and error states
 *@

@page "/"
@@ -21,15 +22,17 @@
   built-in scroll API. *@
@inject IJSRuntime JS

+@* ChatApiClient is our typed HttpClient wrapper that handles API communication.
+   It was registered in Program.cs via AddHttpClient<ChatApiClient>. *@
+@inject ChatApiClient ApiClient
+
 <PageTitle>Chat Agent</PageTitle>

@* Chat container: uses flexbox to fill available height.
   The message area grows to fill space; the input stays pinned at the bottom. *@
 <div class="chat-container">

-    @* Message list: scrollable area that grows to fill available space.
-       The @ref directive captures a reference to this DOM element so we can
-       scroll it programmatically via JavaScript interop. *@
+    @* Message list: scrollable area that grows to fill available space. *@
    <div class="message-list" @ref="_messageListRef">
        @if (_messages.Count == 0)
        {
@@ -48,14 +51,23 @@
        else
        {
            @* Render each message as a MudPaper card.
-               @foreach iterates the list; Blazor re-renders this block when _messages changes.
               The CSS class changes based on Role to align user messages right, assistant left. *@
            @foreach (var message in _messages)
            {
                <div class="message-row @(message.Role == "user" ? "message-user" : "message-assistant")">
                    <MudPaper Class="@($"message-bubble {(message.Role == "user" ? "bubble-user" : "bubble-assistant")}")"
                              Elevation="0">
-                        <MudText Typo="Typo.body1">@message.Content</MudText>
+                        @if (message.Role == "assistant" && string.IsNullOrEmpty(message.Content) && _isStreaming)
+                        {
+                            @* Thinking indicator: shown while waiting for the first token.
+                               MudProgressCircular gives an animated spinner that disappears
+                               once the first text delta arrives. *@
+                            <MudProgressCircular Size="Size.Small" Indeterminate="true" />
+                        }
+                        else
+                        {
+                            <MudText Typo="Typo.body1">@message.Content</MudText>
+                        }
                    </MudPaper>
                </div>
            }
@@ -63,26 +75,25 @@
    </div>

    @* Input area: pinned at the bottom of the chat container.
-       MudTextField with an Adornment provides the send button inside the text field,
-       similar to ChatGPT's input design. *@
+       Disabled attribute prevents interaction while the assistant is streaming. *@
    <div class="input-area">
        <MudTextField @bind-Value="_userInput"
-                      Placeholder="Type a message..."
+                      Placeholder="@(_isStreaming ? "Waiting for response..." : "Type a message...")"
                      Variant="Variant.Outlined"
                      Adornment="Adornment.End"
                      AdornmentIcon="@Icons.Material.Filled.Send"
-                      AdornmentColor="Color.Primary"
+                      AdornmentColor="@(_isStreaming ? Color.Default : Color.Primary)"
                      OnAdornmentClick="SendMessage"
                      OnKeyDown="HandleKeyDown"
                      Immediate="true"
                      FullWidth="true"
-                      AutoFocus="true" />
+                      AutoFocus="true"
+                      Disabled="_isStreaming" />
    </div>
 </div>

@code {
    // The conversation messages, displayed in the message list.
-    // Using a simple List<T> since we only add to the end — no complex state management needed.
    private List<ChatMessage> _messages = new();

    // The current text in the input field. Bound two-way via @bind-Value.
@@ -91,26 +102,29 @@
    // DOM reference to the message list div, used for auto-scrolling via JS interop.
    private ElementReference _messageListRef;

+    // Tracks whether we are currently streaming a response from the API.
+    // Used to disable input and show the thinking indicator.
+    private bool _isStreaming = false;
+
    /// <summary>
    /// Handles the Enter key press to submit the message.
-    /// KeyboardEventArgs gives us the key that was pressed.
    /// </summary>
    private async Task HandleKeyDown(KeyboardEventArgs e)
    {
-        if (e.Key == "Enter" && !e.ShiftKey)
+        if (e.Key == "Enter" && !e.ShiftKey && !_isStreaming)
        {
            await SendMessage();
        }
    }

    /// <summary>
-    /// Sends the user's message and appends a hardcoded assistant response.
-    /// In future phases, this will call the API instead of using a hardcoded reply.
+    /// Sends the user's message and streams the AI response token by token.
+    /// Each token delta updates the assistant message and triggers a re-render.
    /// </summary>
    private async Task SendMessage()
    {
-        // Block empty or whitespace-only submissions
-        if (string.IsNullOrWhiteSpace(_userInput))
+        // Block empty or whitespace-only submissions, and prevent double-send during streaming
+        if (string.IsNullOrWhiteSpace(_userInput) || _isStreaming)
            return;

        // Add the user's message
@@ -121,33 +135,67 @@
            Timestamp = DateTime.UtcNow
        });

-        // Clear the input field
+        var userText = _userInput.Trim();
        _userInput = string.Empty;
+        _isStreaming = true;

-        // Add a hardcoded assistant response.
-        // This is the stub that will be replaced with an API call in the next phase.
-        _messages.Add(new ChatMessage
+        // Add an empty assistant message that will be filled token by token.
+        // The thinking indicator shows while Content is empty.
+        var assistantMessage = new ChatMessage
        {
            Role = "assistant",
-            Content = "This is a placeholder response. AI integration coming soon!",
+            Content = string.Empty,
            Timestamp = DateTime.UtcNow
-        });
+        };
+        _messages.Add(assistantMessage);

-        // StateHasChanged() tells Blazor to re-render this component.
-        // It's needed here because we modified _messages after the initial render cycle.
-        // Without this call, the new messages wouldn't appear until the next UI event.
        StateHasChanged();
-
-        // Auto-scroll to the bottom after rendering the new messages.
-        // We use a small delay to ensure the DOM has updated before scrolling.
-        await Task.Delay(50);
        await ScrollToBottom();
+
+        try
+        {
+            // Build the request with the current user message.
+            // Future phases will include full conversation history for multi-turn.
+            var request = new ChatRequest
+            {
+                Messages = new List<ChatMessage>
+                {
+                    new ChatMessage { Role = "user", Content = userText }
+                }
+            };
+
+            // Stream tokens from the API. IAsyncEnumerable yields each text delta
+            // as it arrives, allowing us to update the UI incrementally.
+            await foreach (var delta in ApiClient.SendChatStreamingAsync(request))
+            {
+                // Append each token to the assistant message content.
+                assistantMessage.Content += delta;
+
+                // StateHasChanged() triggers a re-render so the user sees each token appear.
+                // This is the core of the streaming UX — without it, the full response
+                // would only appear after the stream completes.
+                StateHasChanged();
+
+                // Auto-scroll during streaming so new content stays visible
+                await ScrollToBottom();
+            }
+        }
+        catch (Exception ex)
+        {
+            // If the API call fails, show the error in the assistant message.
+            assistantMessage.Content = $"Error: {ex.Message}";
+        }
+        finally
+        {
+            _isStreaming = false;
+            assistantMessage.Timestamp = DateTime.UtcNow;
+            StateHasChanged();
+            await ScrollToBottom();
+        }
    }

    /// <summary>
    /// Scrolls the message list to the bottom using JavaScript interop.
-    /// Blazor has no built-in scroll API, so we call a tiny JS snippet directly.
-    /// InvokeVoidAsync calls a JS function that returns nothing (void).
    /// </summary>
    private async Task ScrollToBottom()
    {
--- a/src/ChatAgent.Client/Services/ChatApiClient.cs
+++ b/src/ChatAgent.Client/Services/ChatApiClient.cs
@@ -11,7 +11,10 @@
 // The base URL is configured in Program.cs via AddHttpClient<ChatApiClient>.

 using System.Net.Http.Json;
+using System.Text;
+using System.Text.Json;
 using ChatAgent.Shared.Models;
+using Microsoft.AspNetCore.Components.WebAssembly.Http;

 namespace ChatAgent.Client.Services
 {
@@ -49,5 +52,94 @@ namespace ChatAgent.Client.Services
            // configured in Program.cs (e.g., https://localhost:7100/api/health).
            return await _httpClient.GetFromJsonAsync<HealthResponse>("api/health");
        }
+
+        /// <summary>
+        /// Sends a chat request to POST /api/chat and streams the response as an
+        /// async enumerable of text deltas. Each yielded string is a token fragment.
+        ///
+        /// Key Blazor WASM streaming concepts:
+        /// - SetBrowserResponseStreamingEnabled(true) tells the browser's Fetch API
+        ///   to make the response body readable as a stream (not buffered).
+        /// - HttpCompletionOption.ResponseHeadersRead means we start reading the
+        ///   stream as soon as HTTP headers arrive, not after the full body downloads.
+        /// - We parse the SSE format line by line, extracting "text" from each data event.
+        /// </summary>
+        public async IAsyncEnumerable<string> SendChatStreamingAsync(ChatRequest request)
+        {
+            // Build the HTTP request manually so we can set streaming options.
+            var jsonContent = JsonSerializer.Serialize(request);
+            var httpRequest = new HttpRequestMessage(HttpMethod.Post, "api/chat")
+            {
+                Content = new StringContent(jsonContent, Encoding.UTF8, "application/json")
+            };
+
+            // SetBrowserResponseStreamingEnabled is a Blazor WASM extension that tells
+            // the browser's Fetch API to expose the response as a ReadableStream.
+            // Without this, the browser buffers the entire response before .NET can read it.
+            httpRequest.SetBrowserResponseStreamingEnabled(true);
+
+            // ResponseHeadersRead: start processing as soon as headers arrive.
+            using var response = await _httpClient.SendAsync(
+                httpRequest,
+                HttpCompletionOption.ResponseHeadersRead);
+
+            response.EnsureSuccessStatusCode();
+
+            // Read the SSE stream line by line.
+            using var stream = await response.Content.ReadAsStreamAsync();
+            using var reader = new StreamReader(stream);
+
+            // Use ReadLineAsync and check for null instead of reader.EndOfStream,
+            // because EndOfStream performs a synchronous read which is not supported
+            // in Blazor WASM's async streaming pipeline.
+            string? line;
+            while ((line = await reader.ReadLineAsync()) != null)
+            {
+                // SSE lines starting with "data: " contain our payload.
+                if (!line.StartsWith("data: "))
+                    continue;
+
+                var data = line.Substring(6);
+
+                // "[DONE]" signals the end of the stream.
+                if (data == "[DONE]")
+                    yield break;
+
+                // Parse the simplified JSON event: {"text":"token"} or {"error":"message"}
+                // Note: C# does not allow yield inside try-catch, so we parse first
+                // and yield outside the try block.
+                string? parsedText = null;
+                string? parsedError = null;
+
+                try
+                {
+                    using var doc = JsonDocument.Parse(data);
+                    var root = doc.RootElement;
+
+                    if (root.TryGetProperty("error", out var errorElement))
+                    {
+                        parsedError = errorElement.GetString();
+                    }
+                    else if (root.TryGetProperty("text", out var textElement))
+                    {
+                        parsedText = textElement.GetString();
+                    }
+                }
+                catch (JsonException)
+                {
+                    // Skip malformed SSE data
+                }
+
+                if (parsedError != null)
+                {
+                    throw new HttpRequestException($"Chat API error: {parsedError}");
+                }
+
+                if (!string.IsNullOrEmpty(parsedText))
+                {
+                    yield return parsedText;
+                }
+            }
+        }
    }
 }
--- a/src/ChatAgent.Shared/Models/ChatRequest.cs
+++ b/src/ChatAgent.Shared/Models/ChatRequest.cs
@@ -0,0 +1,21 @@
+// ChatRequest.cs -- DTO sent from the WASM client to the API backend to request a chat response.
+//
+// This lives in ChatAgent.Shared so both client and API agree on the request shape.
+// The API backend uses this to build the Responses API request.
+
+namespace ChatAgent.Shared.Models
+{
+    /// <summary>
+    /// Request payload for POST /api/chat. Contains the conversation messages
+    /// to send to the AI model. Currently single-turn (one user message),
+    /// but the list structure supports multi-turn in future phases.
+    /// </summary>
+    public class ChatRequest
+    {
+        /// <summary>
+        /// The conversation messages to send. Each message has a Role ("user" or "assistant")
+        /// and Content (the text). The API forwards these to the Responses API.
+        /// </summary>
+        public List<ChatMessage> Messages { get; set; } = new();
+    }
+}