feat: migrate chat backend to Semantic Kernel with tool calling support

Replace manual HTTP proxy in ChatController with Semantic Kernel's OpenAI chat completion service pointed at CLIProxyAPI. Add extraction plugin with validation function for structured field extraction from natural language, enabling an agentic loop with auto-retry and human-in-the-loop escalation. - Add Microsoft.SemanticKernel 1.74.0 with OpenAI connector - Create ExtractedFields schema and ValidationResult models - Create ExtractionPlugin with [KernelFunction] validation - Rewrite ChatController to use IChatCompletionService streaming - Configure FunctionChoiceBehavior.Auto() for tool calling - Preserve existing SSE contract (client unchanged) - Update tests to mock SK services, add plugin and integration tests - Archive multi-turn-conversations and migrate-to-semantic-kernel changes - Sync specs for agent-extraction, semantic-kernel-integration, chat-streaming Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 23:59:13 +01:00
parent 3278a408b9
commit 471e9ce935
27 changed files with 1082 additions and 201 deletions
--- a/openspec/changes/archive/2026-04-04-migrate-to-semantic-kernel/.openspec.yaml
+++ b/openspec/changes/archive/2026-04-04-migrate-to-semantic-kernel/.openspec.yaml
--- a/openspec/changes/archive/2026-04-04-migrate-to-semantic-kernel/design.md
+++ b/openspec/changes/archive/2026-04-04-migrate-to-semantic-kernel/design.md
@@ -0,0 +1,77 @@
+## Context
+
+The chat backend currently proxies requests to a local CLIProxyAPI instance (OpenAI-compatible API at `localhost:8317`) via manual `HttpClient` calls and SSE parsing in `ChatController`. The architecture works for simple chat completion but has no abstraction for tool calling, function invocation, or agentic loops. The goal is to adopt Semantic Kernel as the AI orchestration layer to enable structured extraction with autonomous validation.
+
+## Goals / Non-Goals
+
+**Goals:**
+- Replace manual HTTP proxy logic with Semantic Kernel's chat completion service
+- Enable tool/function calling via SK plugins
+- Implement an agentic extraction loop: extract → validate → retry (up to 3 times) → escalate to user
+- Preserve the existing SSE contract so the Blazor client requires no changes
+- Maintain inline tutorial comments explaining SK concepts
+
+**Non-Goals:**
+- Multi-agent orchestration (future — when Agent Framework reaches GA)
+- Changing the Blazor client or `ChatApiClient`
+- Adding new UI for structured output display (future change)
+- Replacing CLIProxyAPI — SK's OpenAI connector talks to it as-is
+- Authentication or multi-user support
+
+## Decisions
+
+### D1: Use SK's OpenAI chat completion connector pointed at CLIProxyAPI
+
+**Choice:** `Microsoft.SemanticKernel.Connectors.OpenAI` with `OpenAIChatCompletionService` configured to use `localhost:8317` as the endpoint.
+
+**Alternatives considered:**
+- SK Anthropic connector (talks to Anthropic API directly) — would bypass CLIProxyAPI and lose model-switching flexibility
+- Keep manual HttpClient alongside SK — defeats the purpose of the migration
+
+**Rationale:** CLIProxyAPI already provides an OpenAI-compatible interface. SK's OpenAI connector works with any OpenAI-compatible endpoint. No infrastructure change required.
+
+### D2: Register Kernel and plugins in DI via `Program.cs`
+
+**Choice:** Configure `Kernel` in `Program.cs` using `builder.Services.AddKernel()` and register plugins via DI. Inject `Kernel` into `ChatController`.
+
+**Rationale:** Follows ASP.NET Core conventions. The kernel is a singleton service with plugins registered at startup. Controller receives it via constructor injection, consistent with the existing pattern of injecting `IHttpClientFactory` and `IConfiguration`.
+
+### D3: Validation as a native SK plugin function
+
+**Choice:** Create an `ExtractionPlugin` class with `[KernelFunction]` methods: one for validation of extracted fields. The agent auto-invokes this via `ToolCallBehavior.AutoInvokeKernelFunctions`.
+
+**Alternatives considered:**
+- Manual tool call loop in controller code — loses SK's built-in retry/function-calling orchestration
+- Separate validation service outside SK — requires manual plumbing between LLM and validator
+
+**Rationale:** SK's auto-invocation handles the loop naturally. The LLM sees the validation function as a tool, calls it, reads the result, and decides whether to retry or escalate. This is the core value proposition of adopting SK.
+
+### D4: Iteration cap with human-in-the-loop escalation
+
+**Choice:** Configure `ToolCallBehavior.AutoInvokeKernelFunctions` with `MaximumAutoInvokeAttempts = 3`. If the agent exhausts retries without valid output, it returns a clarification request as a regular chat message to the user.
+
+**Rationale:** The iteration cap prevents runaway loops. The escalation path uses the existing chat UI — the agent simply asks for clarification in natural language, and the user responds in the next message. No special UI needed.
+
+### D5: Preserve SSE contract via streaming kernel invocation
+
+**Choice:** Use `kernel.InvokeStreamingAsync<StreamingChatMessageContent>()` (or `IChatCompletionService.GetStreamingChatMessageContentsAsync()`) and re-emit tokens as the same SSE format the client expects: `data: {"text":"..."}\n\n` and `data: [DONE]\n\n`.
+
+**Rationale:** The Blazor client's `ChatApiClient` parses this exact format. By keeping the SSE contract identical, the entire client codebase remains untouched.
+
+### D6: Predefined extraction schema as a strongly-typed C# class
+
+**Choice:** Define an `ExtractedFields` record/class in `ChatAgent.Shared.Models` with the fixed set of known fields. Validation logic checks for required fields and type correctness.
+
+**Rationale:** Single output type with fixed keys. A strongly-typed class gives compile-time safety, works with `System.Text.Json` serialization, and can carry data annotations for validation rules.
+
+## Risks / Trade-offs
+
+- **[SK OpenAI connector compatibility with CLIProxyAPI]** → CLIProxyAPI aims for OpenAI API parity but may have edge cases with tool calling responses. Mitigation: test tool calling end-to-end early; fall back to direct Anthropic connector if needed.
+- **[Streaming + tool calling interaction]** → When the agent calls a tool mid-stream, the streaming behavior may differ from pure chat completion. Mitigation: handle tool call chunks in the SSE bridge; may need to buffer during tool execution and resume streaming after.
+- **[SK version churn]** → Semantic Kernel is actively developed; APIs may evolve. Mitigation: pin to a specific stable version, document the version in stack spec.
+- **[Tutorial complexity increase]** → SK adds abstractions (kernel, plugins, functions) that need explaining. Mitigation: maintain inline comments for every SK concept, consistent with project convention.
+
+## Open Questions
+
+- What are the exact field names and types for `ExtractedFields`? (Need user input for the real schema — can use a placeholder for initial implementation.)
+- Should tool call status ("Validating output...") be surfaced to the client as a distinct SSE event type, or just as regular text tokens? (Current design: regular text, revisit in a future change if needed.)
--- a/openspec/changes/archive/2026-04-04-migrate-to-semantic-kernel/proposal.md
+++ b/openspec/changes/archive/2026-04-04-migrate-to-semantic-kernel/proposal.md
@@ -0,0 +1,28 @@
+## Why
+
+The chat backend currently proxies requests to an OpenAI-compatible API (CLIProxyAPI) via manual HttpClient calls and SSE parsing. As the agent evolves toward structured extraction with tool calling and autonomous validation loops, this manual plumbing becomes a liability. Semantic Kernel provides a production-ready abstraction for chat completion, tool/function calling, and auto-invocation — letting us focus on agent behavior rather than HTTP mechanics. Adopting it now establishes the foundation for the agentic workflow (natural language → structured extraction → tool-based validation → human-in-the-loop clarification).
+
+## What Changes
+
+- Replace manual HttpClient + SSE proxy in `ChatController` with Semantic Kernel's `OpenAIChatCompletionService` pointed at the existing CLIProxyAPI proxy
+- Add a validation plugin that the agent can call as a tool to validate extracted key-value output against a predefined schema
+- Introduce an agentic loop: the kernel autonomously retries extraction up to 2–3 times on validation failure, then escalates to the user for clarification
+- Keep the existing SSE contract to the Blazor client unchanged — `ChatApiClient` and `Chat.razor` are not modified
+- **BREAKING**: `ChatController` internals are rewritten; the manual Responses API proxy logic is removed entirely
+
+## Capabilities
+
+### New Capabilities
+- `agent-extraction`: Defines the structured field extraction behavior — predefined keys, validation rules, autonomous retry loop, and human-in-the-loop escalation
+- `semantic-kernel-integration`: Defines how Semantic Kernel is configured, registered, and wired into the API — kernel setup, OpenAI connector config, plugin registration
+
+### Modified Capabilities
+- `chat-streaming`: The streaming requirement changes from "proxy SSE from upstream API" to "stream Semantic Kernel chat completion responses as SSE" — same client contract, different server implementation
+
+## Impact
+
+- **ChatAgent.Api**: New NuGet dependencies (`Microsoft.SemanticKernel`), `Program.cs` service registration changes, `ChatController` rewritten
+- **ChatAgent.Api.Tests**: Existing `ChatControllerTests` need updating to mock Semantic Kernel services instead of upstream HTTP calls
+- **Dependencies**: Adds `Microsoft.SemanticKernel` and `Microsoft.SemanticKernel.Connectors.OpenAI` packages
+- **Infrastructure**: No change — still talks to CLIProxyAPI at `localhost:8317`
+- **Client**: No change — SSE contract preserved
--- a/openspec/changes/archive/2026-04-04-migrate-to-semantic-kernel/specs/agent-extraction/spec.md
+++ b/openspec/changes/archive/2026-04-04-migrate-to-semantic-kernel/specs/agent-extraction/spec.md
@@ -0,0 +1,66 @@
+## ADDED Requirements
+
+### Requirement: Structured field extraction from natural language
+
+The agent SHALL extract a predefined set of key-value pairs from user-provided natural language text (e.g., email content) and return them as a structured JSON object.
+
+#### Scenario: All fields extracted successfully
+
+- **WHEN** the user sends a message containing natural language with all required information
+- **THEN** the agent returns a JSON object with all predefined fields populated from the text
+
+#### Scenario: Partial extraction
+
+- **WHEN** the user sends a message that contains some but not all required fields
+- **THEN** the agent extracts available fields and leaves missing fields as null
+
+### Requirement: Predefined extraction schema
+
+The system SHALL define a fixed set of known field names and types as a strongly-typed C# class. All extraction output MUST conform to this schema.
+
+#### Scenario: Output conforms to schema
+
+- **WHEN** the agent produces extracted fields
+- **THEN** every key in the output matches a field defined in the schema and values match expected types
+
+### Requirement: Autonomous validation via tool calling
+
+The agent SHALL validate extracted fields by calling a validation tool function. The validation tool checks that all required fields are present and correctly typed.
+
+#### Scenario: Validation passes
+
+- **WHEN** the agent calls the validation tool with a complete and correct extraction
+- **THEN** the tool returns a success result and the agent returns the final output to the user
+
+#### Scenario: Validation fails with fixable errors
+
+- **WHEN** the validation tool returns errors for missing or malformed fields
+- **THEN** the agent re-reads the source text and attempts to fix the extraction without user intervention
+
+### Requirement: Autonomous retry with iteration cap
+
+The agent SHALL retry extraction autonomously up to 3 times when validation fails. After exhausting retries, the agent MUST escalate to the user.
+
+#### Scenario: Agent retries and succeeds
+
+- **WHEN** validation fails on the first attempt but the error is recoverable
+- **THEN** the agent retries extraction and calls validation again, up to 3 total attempts
+
+#### Scenario: Agent exhausts retries and escalates
+
+- **WHEN** validation fails after 3 attempts
+- **THEN** the agent sends a natural language message to the user identifying the specific fields it could not resolve and asking for clarification
+
+### Requirement: Human-in-the-loop clarification
+
+When the agent escalates to the user, the user SHALL be able to provide the missing information in natural language, and the agent SHALL incorporate the clarification and re-attempt extraction.
+
+#### Scenario: User provides clarification
+
+- **WHEN** the agent asks for clarification about missing fields and the user responds
+- **THEN** the agent incorporates the user's response into the conversation context and produces an updated extraction
+
+#### Scenario: Clarification via normal chat
+
+- **WHEN** the agent escalates for clarification
+- **THEN** the clarification request appears as a regular assistant message in the chat UI, and the user responds via the normal chat input
--- a/openspec/changes/archive/2026-04-04-migrate-to-semantic-kernel/specs/chat-streaming/spec.md
+++ b/openspec/changes/archive/2026-04-04-migrate-to-semantic-kernel/specs/chat-streaming/spec.md
@@ -0,0 +1,42 @@
+## MODIFIED Requirements
+
+### Requirement: Chat endpoint proxies to Responses API
+
+The API backend SHALL expose `POST /api/chat` that accepts a list of messages and processes them using a Semantic Kernel chat completion service. The kernel is configured with an OpenAI connector pointed at the existing CLIProxyAPI proxy.
+
+#### Scenario: Successful chat request
+
+- **WHEN** the client sends a POST to `/api/chat` with a message list
+- **THEN** the API processes the messages through the Semantic Kernel and returns the response
+
+### Requirement: Streaming response delivery
+
+The API backend SHALL stream the Semantic Kernel's chat completion response back to the WASM client as `text/event-stream`, forwarding text content so the client can render tokens incrementally. The SSE event format MUST remain `data: {"text":"..."}\n\n` for text deltas and `data: [DONE]\n\n` for completion.
+
+#### Scenario: Tokens stream to client
+
+- **WHEN** the Semantic Kernel emits streaming chat message content
+- **THEN** the backend forwards each content chunk as an SSE event to the client containing the text fragment
+
+#### Scenario: Stream completes
+
+- **WHEN** the Semantic Kernel streaming response completes
+- **THEN** the backend signals stream completion to the client with `data: [DONE]\n\n`
+
+### Requirement: Configurable proxy target
+
+The CLIProxyAPI base URL and model name SHALL be configurable via `appsettings.json` in the API project, not hardcoded. These values are used to configure the Semantic Kernel OpenAI connector.
+
+#### Scenario: Configuration read at startup
+
+- **WHEN** the API starts
+- **THEN** it reads `ResponsesApi:BaseUrl` and `ResponsesApi:Model` from configuration to configure the Semantic Kernel
+
+### Requirement: Error propagation
+
+If the LLM service returns an error or is unreachable, the API backend SHALL return an error SSE event and the client SHALL display the error to the user.
+
+#### Scenario: LLM service unreachable
+
+- **WHEN** the CLIProxyAPI proxy is not running
+- **THEN** the client displays an error message instead of an assistant response
--- a/openspec/changes/archive/2026-04-04-migrate-to-semantic-kernel/specs/semantic-kernel-integration/spec.md
+++ b/openspec/changes/archive/2026-04-04-migrate-to-semantic-kernel/specs/semantic-kernel-integration/spec.md
@@ -0,0 +1,42 @@
+## ADDED Requirements
+
+### Requirement: Semantic Kernel service registration
+
+The API backend SHALL register a Semantic Kernel `Kernel` instance in the ASP.NET Core DI container at startup, configured with an OpenAI chat completion connector.
+
+#### Scenario: Kernel registered at startup
+
+- **WHEN** the API application starts
+- **THEN** a `Kernel` instance is available for injection into controllers
+
+### Requirement: OpenAI connector targets CLIProxyAPI proxy
+
+The Semantic Kernel OpenAI chat completion service SHALL be configured to use the existing CLIProxyAPI proxy endpoint as its base URL, reading the URL and model name from `appsettings.json`.
+
+#### Scenario: Connector uses configured endpoint
+
+- **WHEN** the kernel makes a chat completion request
+- **THEN** it sends the request to the URL specified in `ResponsesApi:BaseUrl` configuration
+
+#### Scenario: Model from configuration
+
+- **WHEN** the kernel makes a chat completion request
+- **THEN** it uses the model name specified in `ResponsesApi:Model` configuration
+
+### Requirement: Plugin registration
+
+The API backend SHALL register extraction and validation plugins with the Kernel so they are available as tools for the LLM to invoke.
+
+#### Scenario: Plugins available as tools
+
+- **WHEN** the kernel is constructed
+- **THEN** all registered plugin functions appear in the tool list sent to the LLM
+
+### Requirement: Auto function calling
+
+The Kernel SHALL be configured with automatic function calling enabled, allowing the LLM to invoke registered plugin functions without manual dispatch code.
+
+#### Scenario: LLM invokes tool automatically
+
+- **WHEN** the LLM decides to call a registered function during chat completion
+- **THEN** the kernel automatically executes the function and returns the result to the LLM
--- a/openspec/changes/archive/2026-04-04-migrate-to-semantic-kernel/tasks.md
+++ b/openspec/changes/archive/2026-04-04-migrate-to-semantic-kernel/tasks.md
@@ -0,0 +1,40 @@
+## 1. Add Semantic Kernel Dependencies
+
+- [x] 1.1 Add `Microsoft.SemanticKernel` and `Microsoft.SemanticKernel.Connectors.OpenAI` NuGet packages to `ChatAgent.Api`
+- [x] 1.2 Remove the `OpenAI` SDK package if no longer needed after migration
+
+## 2. Define Extraction Schema
+
+- [x] 2.1 Create `ExtractedFields` class in `ChatAgent.Shared/Models/` with the predefined set of key-value fields (placeholder fields until real schema is provided)
+- [x] 2.2 Create `ValidationResult` class in `ChatAgent.Shared/Models/` with `IsValid`, `Errors` properties
+
+## 3. Create Extraction Plugin
+
+- [x] 3.1 Create `ExtractionPlugin` class in `ChatAgent.Api/Plugins/` with a `[KernelFunction]` validation method that checks `ExtractedFields` for required fields and type correctness
+- [x] 3.2 Add inline tutorial comments explaining SK plugin concepts (`[KernelFunction]`, `[Description]`, auto-invocation)
+
+## 4. Wire Semantic Kernel in Program.cs
+
+- [x] 4.1 Register `OpenAIChatCompletionService` in DI using `ResponsesApi:BaseUrl` and `ResponsesApi:Model` from config
+- [x] 4.2 Register `Kernel` with `AddKernel()` and import `ExtractionPlugin`
+- [x] 4.3 Add inline tutorial comments explaining kernel setup, connectors, and plugin registration
+
+## 5. Rewrite ChatController
+
+- [x] 5.1 Replace `IHttpClientFactory` and `IConfiguration` injection with `Kernel` injection
+- [x] 5.2 Replace manual HTTP proxy logic with `IChatCompletionService.GetStreamingChatMessageContentsAsync()` using the conversation history from the request
+- [x] 5.3 Configure `OpenAIPromptExecutionSettings` with `FunctionChoiceBehavior.Auto()` and `autoInvokeMaxCallCount = 3`
+- [x] 5.4 Re-emit streaming content as the existing SSE format (`data: {"text":"..."}\n\n` and `data: [DONE]\n\n`)
+- [x] 5.5 Add inline tutorial comments explaining streaming chat completion, execution settings, and tool call behavior
+
+## 6. Update Tests
+
+- [x] 6.1 Update `ChatControllerTests` to mock `IChatCompletionService` instead of upstream HTTP calls
+- [x] 6.2 Add tests for the validation plugin (`ExtractionPlugin` returns correct pass/fail results)
+- [x] 6.3 Add a test verifying the agent escalates to the user after max retries
+
+## 7. Verify
+
+- [x] 7.1 Run `dotnet build` to confirm no errors
+- [x] 7.2 Run `dotnet test` to confirm all tests pass
+- [ ] 7.3 Manual smoke test: send a chat message and verify streaming still works end-to-end through SK
--- a/openspec/changes/archive/2026-04-04-multi-turn-conversations/.openspec.yaml
+++ b/openspec/changes/archive/2026-04-04-multi-turn-conversations/.openspec.yaml
@@ -0,0 +1,2 @@
+schema: spec-driven
+created: 2026-04-04
--- a/openspec/changes/archive/2026-04-04-multi-turn-conversations/design.md
+++ b/openspec/changes/archive/2026-04-04-multi-turn-conversations/design.md
--- a/openspec/changes/archive/2026-04-04-multi-turn-conversations/proposal.md
+++ b/openspec/changes/archive/2026-04-04-multi-turn-conversations/proposal.md
--- a/openspec/changes/archive/2026-04-04-multi-turn-conversations/specs/chat-ui/spec.md
+++ b/openspec/changes/archive/2026-04-04-multi-turn-conversations/specs/chat-ui/spec.md
--- a/openspec/changes/archive/2026-04-04-multi-turn-conversations/tasks.md
+++ b/openspec/changes/archive/2026-04-04-multi-turn-conversations/tasks.md
--- a/openspec/specs/agent-extraction/spec.md
+++ b/openspec/specs/agent-extraction/spec.md
@@ -0,0 +1,70 @@
+## Purpose
+
+Define the autonomous agent-driven extraction pipeline — structured field extraction from natural language, schema-based validation via tool calling, autonomous retry logic, and human-in-the-loop clarification.
+
+## Requirements
+
+### Requirement: Structured field extraction from natural language
+
+The agent SHALL extract a predefined set of key-value pairs from user-provided natural language text (e.g., email content) and return them as a structured JSON object.
+
+#### Scenario: All fields extracted successfully
+
+- **WHEN** the user sends a message containing natural language with all required information
+- **THEN** the agent returns a JSON object with all predefined fields populated from the text
+
+#### Scenario: Partial extraction
+
+- **WHEN** the user sends a message that contains some but not all required fields
+- **THEN** the agent extracts available fields and leaves missing fields as null
+
+### Requirement: Predefined extraction schema
+
+The system SHALL define a fixed set of known field names and types as a strongly-typed C# class. All extraction output MUST conform to this schema.
+
+#### Scenario: Output conforms to schema
+
+- **WHEN** the agent produces extracted fields
+- **THEN** every key in the output matches a field defined in the schema and values match expected types
+
+### Requirement: Autonomous validation via tool calling
+
+The agent SHALL validate extracted fields by calling a validation tool function. The validation tool checks that all required fields are present and correctly typed.
+
+#### Scenario: Validation passes
+
+- **WHEN** the agent calls the validation tool with a complete and correct extraction
+- **THEN** the tool returns a success result and the agent returns the final output to the user
+
+#### Scenario: Validation fails with fixable errors
+
+- **WHEN** the validation tool returns errors for missing or malformed fields
+- **THEN** the agent re-reads the source text and attempts to fix the extraction without user intervention
+
+### Requirement: Autonomous retry with iteration cap
+
+The agent SHALL retry extraction autonomously up to 3 times when validation fails. After exhausting retries, the agent MUST escalate to the user.
+
+#### Scenario: Agent retries and succeeds
+
+- **WHEN** validation fails on the first attempt but the error is recoverable
+- **THEN** the agent retries extraction and calls validation again, up to 3 total attempts
+
+#### Scenario: Agent exhausts retries and escalates
+
+- **WHEN** validation fails after 3 attempts
+- **THEN** the agent sends a natural language message to the user identifying the specific fields it could not resolve and asking for clarification
+
+### Requirement: Human-in-the-loop clarification
+
+When the agent escalates to the user, the user SHALL be able to provide the missing information in natural language, and the agent SHALL incorporate the clarification and re-attempt extraction.
+
+#### Scenario: User provides clarification
+
+- **WHEN** the agent asks for clarification about missing fields and the user responds
+- **THEN** the agent incorporates the user's response into the conversation context and produces an updated extraction
+
+#### Scenario: Clarification via normal chat
+
+- **WHEN** the agent escalates for clarification
+- **THEN** the clarification request appears as a regular assistant message in the chat UI, and the user responds via the normal chat input
--- a/openspec/specs/chat-streaming/spec.md
+++ b/openspec/specs/chat-streaming/spec.md
@@ -1,40 +1,40 @@
 ## Purpose

-Define the streaming AI response pipeline — backend proxy to the Responses API, SSE delivery to the WASM client, configuration, and error handling.
+Define the streaming AI response pipeline — backend chat endpoint using Semantic Kernel, SSE delivery to the WASM client, configuration, and error handling.

 ## Requirements

 ### Requirement: Chat endpoint proxies to Responses API

-The API backend SHALL expose `POST /api/chat` that accepts a list of messages and proxies the request to the local Responses API at a configurable base URL using the `POST /v1/responses` endpoint.
+The API backend SHALL expose `POST /api/chat` that accepts a list of messages and processes them using a Semantic Kernel chat completion service. The kernel is configured with an OpenAI connector pointed at the existing CLIProxyAPI proxy.

-#### Scenario: Successful proxy request
+#### Scenario: Successful chat request

 - **WHEN** the client sends a POST to `/api/chat` with a message list
- **THEN** the API forwards the messages to the Responses API with the configured model and returns the response
+- **THEN** the API processes the messages through the Semantic Kernel and returns the response

 ### Requirement: Streaming response delivery

-The API backend SHALL stream the Responses API's SSE events back to the WASM client as `text/event-stream`, forwarding `response.output_text.delta` events so the client can render tokens incrementally.
+The API backend SHALL stream the Semantic Kernel's chat completion response back to the WASM client as `text/event-stream`, forwarding text content so the client can render tokens incrementally. The SSE event format MUST remain `data: {"text":"..."}\n\n` for text deltas and `data: [DONE]\n\n` for completion.

 #### Scenario: Tokens stream to client

- **WHEN** the Responses API emits `response.output_text.delta` events
- **THEN** the backend forwards each delta as an SSE event to the client containing the text fragment
+- **WHEN** the Semantic Kernel emits streaming chat message content
+- **THEN** the backend forwards each content chunk as an SSE event to the client containing the text fragment

 #### Scenario: Stream completes

- **WHEN** the Responses API emits `response.completed`
- **THEN** the backend signals stream completion to the client
+- **WHEN** the Semantic Kernel streaming response completes
+- **THEN** the backend signals stream completion to the client with `data: [DONE]\n\n`

 ### Requirement: Configurable proxy target

-The Responses API base URL and model name SHALL be configurable via `appsettings.json` in the API project, not hardcoded.
+The CLIProxyAPI base URL and model name SHALL be configurable via `appsettings.json` in the API project, not hardcoded. These values are used to configure the Semantic Kernel OpenAI connector.

 #### Scenario: Configuration read at startup

 - **WHEN** the API starts
- **THEN** it reads `ResponsesApi:BaseUrl` and `ResponsesApi:Model` from configuration
+- **THEN** it reads `ResponsesApi:BaseUrl` and `ResponsesApi:Model` from configuration to configure the Semantic Kernel

 ### Requirement: Client streams from backend

@@ -47,9 +47,9 @@ The WASM client SHALL call `POST /api/chat` with `SetBrowserResponseStreamingEna

 ### Requirement: Error propagation

-If the Responses API returns an error or is unreachable, the API backend SHALL return an appropriate HTTP error status and the client SHALL display the error to the user.
+If the LLM service returns an error or is unreachable, the API backend SHALL return an error SSE event and the client SHALL display the error to the user.

-#### Scenario: Proxy unreachable
+#### Scenario: LLM service unreachable

- **WHEN** the Responses API is not running
+- **WHEN** the CLIProxyAPI proxy is not running
 - **THEN** the client displays an error message instead of an assistant response
--- a/openspec/specs/chat-ui/spec.md
+++ b/openspec/specs/chat-ui/spec.md
@@ -63,13 +63,32 @@ The chat page SHALL show a visual indicator while waiting for the first token fr

 ### Requirement: Streaming AI response

-The assistant SHALL reply with a real AI response streamed from the backend API. Tokens appear incrementally as they arrive.
+The assistant SHALL reply with a real AI response streamed from the backend API, using the full conversation history as context. Tokens appear incrementally as they arrive.

 #### Scenario: Bot replies with streamed AI response

 - **WHEN** the user sends any message
 - **THEN** the assistant message appears and grows token by token as the stream delivers text

+#### Scenario: Full history sent with each request
+
+- **WHEN** the user sends a message after prior exchanges
+- **THEN** all previous user and assistant messages are included in the API request so the AI has conversational context
+
+### Requirement: New chat button
+
+The chat page SHALL provide a button to clear the current conversation and start a new one.
+
+#### Scenario: User starts a new chat
+
+- **WHEN** the user clicks the "New Chat" button
+- **THEN** all messages are cleared and the empty state is shown
+
+#### Scenario: New chat button disabled during streaming
+
+- **WHEN** the assistant is currently streaming a response
+- **THEN** the "New Chat" button is disabled
+
 ### Requirement: Auto-scroll

 The message list SHALL automatically scroll to the newest message when a new message is added.
--- a/openspec/specs/semantic-kernel-integration/spec.md
+++ b/openspec/specs/semantic-kernel-integration/spec.md
@@ -0,0 +1,46 @@
+## Purpose
+
+Define the Semantic Kernel integration layer — kernel registration, OpenAI connector configuration, plugin registration, and automatic function calling.
+
+## Requirements
+
+### Requirement: Semantic Kernel service registration
+
+The API backend SHALL register a Semantic Kernel `Kernel` instance in the ASP.NET Core DI container at startup, configured with an OpenAI chat completion connector.
+
+#### Scenario: Kernel registered at startup
+
+- **WHEN** the API application starts
+- **THEN** a `Kernel` instance is available for injection into controllers
+
+### Requirement: OpenAI connector targets CLIProxyAPI proxy
+
+The Semantic Kernel OpenAI chat completion service SHALL be configured to use the existing CLIProxyAPI proxy endpoint as its base URL, reading the URL and model name from `appsettings.json`.
+
+#### Scenario: Connector uses configured endpoint
+
+- **WHEN** the kernel makes a chat completion request
+- **THEN** it sends the request to the URL specified in `ResponsesApi:BaseUrl` configuration
+
+#### Scenario: Model from configuration
+
+- **WHEN** the kernel makes a chat completion request
+- **THEN** it uses the model name specified in `ResponsesApi:Model` configuration
+
+### Requirement: Plugin registration
+
+The API backend SHALL register extraction and validation plugins with the Kernel so they are available as tools for the LLM to invoke.
+
+#### Scenario: Plugins available as tools
+
+- **WHEN** the kernel is constructed
+- **THEN** all registered plugin functions appear in the tool list sent to the LLM
+
+### Requirement: Auto function calling
+
+The Kernel SHALL be configured with automatic function calling enabled, allowing the LLM to invoke registered plugin functions without manual dispatch code.
+
+#### Scenario: LLM invokes tool automatically
+
+- **WHEN** the LLM decides to call a registered function during chat completion
+- **THEN** the kernel automatically executes the function and returns the result to the LLM