Replace manual HTTP proxy in ChatController with Semantic Kernel's OpenAI chat completion service pointed at CLIProxyAPI. Add extraction plugin with validation function for structured field extraction from natural language, enabling an agentic loop with auto-retry and human-in-the-loop escalation. - Add Microsoft.SemanticKernel 1.74.0 with OpenAI connector - Create ExtractedFields schema and ValidationResult models - Create ExtractionPlugin with [KernelFunction] validation - Rewrite ChatController to use IChatCompletionService streaming - Configure FunctionChoiceBehavior.Auto() for tool calling - Preserve existing SSE contract (client unchanged) - Update tests to mock SK services, add plugin and integration tests - Archive multi-turn-conversations and migrate-to-semantic-kernel changes - Sync specs for agent-extraction, semantic-kernel-integration, chat-streaming Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
78 lines
5.6 KiB
Markdown
78 lines
5.6 KiB
Markdown
## Context
|
|
|
|
The chat backend currently proxies requests to a local CLIProxyAPI instance (OpenAI-compatible API at `localhost:8317`) via manual `HttpClient` calls and SSE parsing in `ChatController`. The architecture works for simple chat completion but has no abstraction for tool calling, function invocation, or agentic loops. The goal is to adopt Semantic Kernel as the AI orchestration layer to enable structured extraction with autonomous validation.
|
|
|
|
## Goals / Non-Goals
|
|
|
|
**Goals:**
|
|
- Replace manual HTTP proxy logic with Semantic Kernel's chat completion service
|
|
- Enable tool/function calling via SK plugins
|
|
- Implement an agentic extraction loop: extract → validate → retry (up to 3 times) → escalate to user
|
|
- Preserve the existing SSE contract so the Blazor client requires no changes
|
|
- Maintain inline tutorial comments explaining SK concepts
|
|
|
|
**Non-Goals:**
|
|
- Multi-agent orchestration (future — when Agent Framework reaches GA)
|
|
- Changing the Blazor client or `ChatApiClient`
|
|
- Adding new UI for structured output display (future change)
|
|
- Replacing CLIProxyAPI — SK's OpenAI connector talks to it as-is
|
|
- Authentication or multi-user support
|
|
|
|
## Decisions
|
|
|
|
### D1: Use SK's OpenAI chat completion connector pointed at CLIProxyAPI
|
|
|
|
**Choice:** `Microsoft.SemanticKernel.Connectors.OpenAI` with `OpenAIChatCompletionService` configured to use `localhost:8317` as the endpoint.
|
|
|
|
**Alternatives considered:**
|
|
- SK Anthropic connector (talks to Anthropic API directly) — would bypass CLIProxyAPI and lose model-switching flexibility
|
|
- Keep manual HttpClient alongside SK — defeats the purpose of the migration
|
|
|
|
**Rationale:** CLIProxyAPI already provides an OpenAI-compatible interface. SK's OpenAI connector works with any OpenAI-compatible endpoint. No infrastructure change required.
|
|
|
|
### D2: Register Kernel and plugins in DI via `Program.cs`
|
|
|
|
**Choice:** Configure `Kernel` in `Program.cs` using `builder.Services.AddKernel()` and register plugins via DI. Inject `Kernel` into `ChatController`.
|
|
|
|
**Rationale:** Follows ASP.NET Core conventions. The kernel is a singleton service with plugins registered at startup. Controller receives it via constructor injection, consistent with the existing pattern of injecting `IHttpClientFactory` and `IConfiguration`.
|
|
|
|
### D3: Validation as a native SK plugin function
|
|
|
|
**Choice:** Create an `ExtractionPlugin` class with `[KernelFunction]` methods: one for validation of extracted fields. The agent auto-invokes this via `ToolCallBehavior.AutoInvokeKernelFunctions`.
|
|
|
|
**Alternatives considered:**
|
|
- Manual tool call loop in controller code — loses SK's built-in retry/function-calling orchestration
|
|
- Separate validation service outside SK — requires manual plumbing between LLM and validator
|
|
|
|
**Rationale:** SK's auto-invocation handles the loop naturally. The LLM sees the validation function as a tool, calls it, reads the result, and decides whether to retry or escalate. This is the core value proposition of adopting SK.
|
|
|
|
### D4: Iteration cap with human-in-the-loop escalation
|
|
|
|
**Choice:** Configure `ToolCallBehavior.AutoInvokeKernelFunctions` with `MaximumAutoInvokeAttempts = 3`. If the agent exhausts retries without valid output, it returns a clarification request as a regular chat message to the user.
|
|
|
|
**Rationale:** The iteration cap prevents runaway loops. The escalation path uses the existing chat UI — the agent simply asks for clarification in natural language, and the user responds in the next message. No special UI needed.
|
|
|
|
### D5: Preserve SSE contract via streaming kernel invocation
|
|
|
|
**Choice:** Use `kernel.InvokeStreamingAsync<StreamingChatMessageContent>()` (or `IChatCompletionService.GetStreamingChatMessageContentsAsync()`) and re-emit tokens as the same SSE format the client expects: `data: {"text":"..."}\n\n` and `data: [DONE]\n\n`.
|
|
|
|
**Rationale:** The Blazor client's `ChatApiClient` parses this exact format. By keeping the SSE contract identical, the entire client codebase remains untouched.
|
|
|
|
### D6: Predefined extraction schema as a strongly-typed C# class
|
|
|
|
**Choice:** Define an `ExtractedFields` record/class in `ChatAgent.Shared.Models` with the fixed set of known fields. Validation logic checks for required fields and type correctness.
|
|
|
|
**Rationale:** Single output type with fixed keys. A strongly-typed class gives compile-time safety, works with `System.Text.Json` serialization, and can carry data annotations for validation rules.
|
|
|
|
## Risks / Trade-offs
|
|
|
|
- **[SK OpenAI connector compatibility with CLIProxyAPI]** → CLIProxyAPI aims for OpenAI API parity but may have edge cases with tool calling responses. Mitigation: test tool calling end-to-end early; fall back to direct Anthropic connector if needed.
|
|
- **[Streaming + tool calling interaction]** → When the agent calls a tool mid-stream, the streaming behavior may differ from pure chat completion. Mitigation: handle tool call chunks in the SSE bridge; may need to buffer during tool execution and resume streaming after.
|
|
- **[SK version churn]** → Semantic Kernel is actively developed; APIs may evolve. Mitigation: pin to a specific stable version, document the version in stack spec.
|
|
- **[Tutorial complexity increase]** → SK adds abstractions (kernel, plugins, functions) that need explaining. Mitigation: maintain inline comments for every SK concept, consistent with project convention.
|
|
|
|
## Open Questions
|
|
|
|
- What are the exact field names and types for `ExtractedFields`? (Need user input for the real schema — can use a placeholder for initial implementation.)
|
|
- Should tool call status ("Validating output...") be surfaced to the client as a distinct SSE event type, or just as regular text tokens? (Current design: regular text, revisit in a future change if needed.)
|