fix: update export bundle for Azure OpenAI and add streaming diagnostics

Replace CLIProxyAPI/local proxy references with Azure OpenAI using DefaultAzureCredential and tenant ID auth. Add Critical Pattern #8 for SSE buffering diagnostics with timestamped curl test. Add streaming verification tasks (T6b, T15) and troubleshooting entries for Azure AD auth, RBAC, response compression, and proxy buffering. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 01:42:38 +01:00
parent d46b179221
commit 956ec243c5
3 changed files with 256 additions and 59 deletions
--- a/openspec/exports/nlxva-pricer-spec.md
+++ b/openspec/exports/nlxva-pricer-spec.md
@@ -58,6 +58,8 @@ This feature is a GUEST in CRC. Existing code, patterns, and conventions take ab

 Add to `CRC.Server`:
 - `Microsoft.SemanticKernel` (latest stable, >=1.x)
+- `Microsoft.SemanticKernel.Connectors.AzureOpenAI` (for Azure OpenAI connector)
+- `Azure.Identity` (for `DefaultAzureCredential` — CRC may already have this)
 - `Markdig` 1.1.1 (if CRC.Client doesn't already have it — check first)

 No new packages for CRC.Client or CRC.Shared (MudBlazor already present).
@@ -73,7 +75,7 @@ CRC.Server (ASP.NET Core)
    ├── NlxvaPricerController
    │     ├── POST /api/nlxva-pricer/chat      (general chat)
    │     └── POST /api/nlxva-pricer/extract    (email extraction)
-    │           Uses: Semantic Kernel → CLIProxyAPI (OpenAI-compatible proxy)
+    │           Uses: Semantic Kernel → Azure OpenAI (via DefaultAzureCredential)
    │           Uses: ExtractionPlugin (tool calling)
    │           Uses: FewShotService (example loading)
    ├── Services/
@@ -232,9 +234,9 @@ data: {"error":"message"}\n\n       ← on failure (followed by [DONE])
 ```json
 {
  "NlxvaPricer": {
-    "LlmBaseUrl": "http://localhost:8317/v1",
-    "LlmModel": "claude-sonnet-4-6",
-    "LlmApiKey": "not-needed",
+    "AzureOpenAIEndpoint": "https://your-resource.openai.azure.com/",
+    "DeploymentName": "gpt4o-prod",
+    "TenantId": "<your-azure-ad-tenant-id>",
    "FewShotPath": "examples/extraction"
  },
  "ExternalApis": {
@@ -245,6 +247,11 @@ data: {"error":"message"}\n\n       ← on failure (followed by [DONE])
 }
 ```

+If using API key auth instead of Azure AD, replace `TenantId` with:
+```json
+    "ApiKey": "<your-azure-openai-api-key>"
+```
+
 ## Critical Patterns

 ### 1. SSE streaming in Blazor WASM — DO NOT use `reader.EndOfStream`
@@ -276,16 +283,33 @@ while ((line = await reader.ReadLineAsync()) != null)   // ← NOT EndOfStream
 `SetBrowserResponseStreamingEnabled(true)` is a Blazor WASM extension that tells the browser Fetch API
 to expose the response as a ReadableStream. Without it, the browser buffers the entire response.

-### 2. Semantic Kernel base URL must include `/v1`
+### 2. Azure OpenAI: use deployment name, NOT model name; NO `/v1` suffix

-**Why:** The OpenAI SDK appends `chat/completions` directly to the base URL.
-Without `/v1`, requests hit `/chat/completions` instead of `/v1/chat/completions` → 404.
+**Why:** Azure OpenAI uses `AddAzureOpenAIChatCompletion()`, not `AddOpenAIChatCompletion()`.
+The endpoint is your Azure resource URL (no `/v1` — the Azure SDK constructs the path internally).
+The `deploymentName` is the name you gave the deployment in Azure portal, not the model name.
+Auth uses `DefaultAzureCredential` with the tenant ID, not an API key.

 ```csharp
-builder.Services.AddOpenAIChatCompletion(
-    modelId: model,
-    endpoint: new Uri("http://localhost:8317/v1"),  // ← MUST include /v1
-    apiKey: "not-needed");
+using Azure.Identity;
+
+builder.Services.AddAzureOpenAIChatCompletion(
+    deploymentName: builder.Configuration["NlxvaPricer:DeploymentName"] ?? "gpt4o-prod",
+    endpoint: builder.Configuration["NlxvaPricer:AzureOpenAIEndpoint"]
+        ?? "https://your-resource.openai.azure.com/",
+    credentials: new DefaultAzureCredential(
+        new DefaultAzureCredentialOptions
+        {
+            TenantId = builder.Configuration["NlxvaPricer:TenantId"]
+        }));
+```
+
+If using API key instead of Azure AD:
+```csharp
+builder.Services.AddAzureOpenAIChatCompletion(
+    deploymentName: "gpt4o-prod",
+    endpoint: "https://your-resource.openai.azure.com/",
+    apiKey: builder.Configuration["NlxvaPricer:ApiKey"]);
 ```

 ### 3. Layout height depends on AppBar height
@@ -371,18 +395,72 @@ if (!string.IsNullOrEmpty(parsedText))
    yield return parsedText;
 ```

+### 8. SSE response buffering — verify both streaming hops
+
+**Why:** The architecture has two streaming hops: Azure OpenAI → CRC.Server → Browser.
+If anything buffers in either hop, the user sees no tokens until the full response completes.
+Common buffers: response compression middleware, reverse proxies (NGINX/IIS), Azure API Management.
+
+**Diagnostic endpoint (add temporarily, remove after verifying):**
+```csharp
+[HttpGet("stream-test")]
+public async Task StreamTest()
+{
+    Response.ContentType = "text/event-stream";
+    Response.Headers["Cache-Control"] = "no-cache";
+    Response.Headers["X-Accel-Buffering"] = "no";  // NGINX hint
+
+    var chatService = _kernel.GetRequiredService<IChatCompletionService>();
+    var history = new ChatHistory();
+    history.AddUserMessage("Count from 1 to 10, one number per line.");
+
+    var sw = System.Diagnostics.Stopwatch.StartNew();
+    await foreach (var chunk in chatService.GetStreamingChatMessageContentsAsync(history))
+    {
+        if (!string.IsNullOrEmpty(chunk.Content))
+        {
+            await Response.WriteAsync($"data: [{sw.ElapsedMilliseconds}ms] {chunk.Content}\n\n");
+            await Response.Body.FlushAsync();
+        }
+    }
+    await Response.WriteAsync("data: [DONE]\n\n");
+}
+```
+
+**Test with:** `curl -N https://localhost:7100/api/nlxva-pricer/stream-test`
+- Timestamps spread over seconds = streaming works
+- All timestamps clustered at the end = something is buffering
+
+**If CRC.Server uses `UseResponseCompression()`, exclude SSE:**
+```csharp
+Response.Headers["Content-Encoding"] = "identity";  // opt out per-response
+```
+
+**Response headers to always set on SSE endpoints:**
+```csharp
+Response.ContentType = "text/event-stream";
+Response.Headers["Cache-Control"] = "no-cache";
+Response.Headers["X-Accel-Buffering"] = "no";       // prevents NGINX buffering
+```
+
 ## Wiring

 ### CRC.Server DI registration order (add to existing Program.cs / Startup.cs)

 ```csharp
-// 1. Semantic Kernel — OpenAI-compatible connector
-var llmBaseUrl = builder.Configuration["NlxvaPricer:LlmBaseUrl"] ?? "http://localhost:8317/v1";
-var llmModel = builder.Configuration["NlxvaPricer:LlmModel"] ?? "claude-sonnet-4-6";
-builder.Services.AddOpenAIChatCompletion(
-    modelId: llmModel,
-    endpoint: new Uri(llmBaseUrl),
-    apiKey: builder.Configuration["NlxvaPricer:LlmApiKey"] ?? "not-needed");
+// 1. Semantic Kernel — Azure OpenAI connector with Azure AD auth
+using Azure.Identity;
+
+var azureEndpoint = builder.Configuration["NlxvaPricer:AzureOpenAIEndpoint"]
+    ?? "https://your-resource.openai.azure.com/";
+var deploymentName = builder.Configuration["NlxvaPricer:DeploymentName"] ?? "gpt4o-prod";
+var tenantId = builder.Configuration["NlxvaPricer:TenantId"];
+
+builder.Services.AddAzureOpenAIChatCompletion(
+    deploymentName: deploymentName,
+    endpoint: azureEndpoint,
+    credentials: new DefaultAzureCredential(
+        new DefaultAzureCredentialOptions { TenantId = tenantId }));
 builder.Services.AddKernel();

 // 2. External API typed HttpClients