feat: migrate chat backend to Semantic Kernel with tool calling support

Replace manual HTTP proxy in ChatController with Semantic Kernel's OpenAI chat completion service pointed at CLIProxyAPI. Add extraction plugin with validation function for structured field extraction from natural language, enabling an agentic loop with auto-retry and human-in-the-loop escalation. - Add Microsoft.SemanticKernel 1.74.0 with OpenAI connector - Create ExtractedFields schema and ValidationResult models - Create ExtractionPlugin with [KernelFunction] validation - Rewrite ChatController to use IChatCompletionService streaming - Configure FunctionChoiceBehavior.Auto() for tool calling - Preserve existing SSE contract (client unchanged) - Update tests to mock SK services, add plugin and integration tests - Archive multi-turn-conversations and migrate-to-semantic-kernel changes - Sync specs for agent-extraction, semantic-kernel-integration, chat-streaming Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 23:59:13 +01:00
parent 3278a408b9
commit 471e9ce935
27 changed files with 1082 additions and 201 deletions
--- a/openspec/specs/chat-streaming/spec.md
+++ b/openspec/specs/chat-streaming/spec.md
@@ -1,40 +1,40 @@
 ## Purpose

-Define the streaming AI response pipeline — backend proxy to the Responses API, SSE delivery to the WASM client, configuration, and error handling.
+Define the streaming AI response pipeline — backend chat endpoint using Semantic Kernel, SSE delivery to the WASM client, configuration, and error handling.

 ## Requirements

 ### Requirement: Chat endpoint proxies to Responses API

-The API backend SHALL expose `POST /api/chat` that accepts a list of messages and proxies the request to the local Responses API at a configurable base URL using the `POST /v1/responses` endpoint.
+The API backend SHALL expose `POST /api/chat` that accepts a list of messages and processes them using a Semantic Kernel chat completion service. The kernel is configured with an OpenAI connector pointed at the existing CLIProxyAPI proxy.

-#### Scenario: Successful proxy request
+#### Scenario: Successful chat request

 - **WHEN** the client sends a POST to `/api/chat` with a message list
- **THEN** the API forwards the messages to the Responses API with the configured model and returns the response
+- **THEN** the API processes the messages through the Semantic Kernel and returns the response

 ### Requirement: Streaming response delivery

-The API backend SHALL stream the Responses API's SSE events back to the WASM client as `text/event-stream`, forwarding `response.output_text.delta` events so the client can render tokens incrementally.
+The API backend SHALL stream the Semantic Kernel's chat completion response back to the WASM client as `text/event-stream`, forwarding text content so the client can render tokens incrementally. The SSE event format MUST remain `data: {"text":"..."}\n\n` for text deltas and `data: [DONE]\n\n` for completion.

 #### Scenario: Tokens stream to client

- **WHEN** the Responses API emits `response.output_text.delta` events
- **THEN** the backend forwards each delta as an SSE event to the client containing the text fragment
+- **WHEN** the Semantic Kernel emits streaming chat message content
+- **THEN** the backend forwards each content chunk as an SSE event to the client containing the text fragment

 #### Scenario: Stream completes

- **WHEN** the Responses API emits `response.completed`
- **THEN** the backend signals stream completion to the client
+- **WHEN** the Semantic Kernel streaming response completes
+- **THEN** the backend signals stream completion to the client with `data: [DONE]\n\n`

 ### Requirement: Configurable proxy target

-The Responses API base URL and model name SHALL be configurable via `appsettings.json` in the API project, not hardcoded.
+The CLIProxyAPI base URL and model name SHALL be configurable via `appsettings.json` in the API project, not hardcoded. These values are used to configure the Semantic Kernel OpenAI connector.

 #### Scenario: Configuration read at startup

 - **WHEN** the API starts
- **THEN** it reads `ResponsesApi:BaseUrl` and `ResponsesApi:Model` from configuration
+- **THEN** it reads `ResponsesApi:BaseUrl` and `ResponsesApi:Model` from configuration to configure the Semantic Kernel

 ### Requirement: Client streams from backend

@@ -47,9 +47,9 @@ The WASM client SHALL call `POST /api/chat` with `SetBrowserResponseStreamingEna

 ### Requirement: Error propagation

-If the Responses API returns an error or is unreachable, the API backend SHALL return an appropriate HTTP error status and the client SHALL display the error to the user.
+If the LLM service returns an error or is unreachable, the API backend SHALL return an error SSE event and the client SHALL display the error to the user.

-#### Scenario: Proxy unreachable
+#### Scenario: LLM service unreachable

- **WHEN** the Responses API is not running
+- **WHEN** the CLIProxyAPI proxy is not running
 - **THEN** the client displays an error message instead of an assistant response