fix: update export bundle for Azure OpenAI and add streaming diagnostics

Replace CLIProxyAPI/local proxy references with Azure OpenAI using DefaultAzureCredential and tenant ID auth. Add Critical Pattern #8 for SSE buffering diagnostics with timestamped curl test. Add streaming verification tasks (T6b, T15) and troubleshooting entries for Azure AD auth, RBAC, response compression, and proxy buffering. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 01:42:38 +01:00
parent d46b179221
commit 956ec243c5
3 changed files with 256 additions and 59 deletions
--- a/openspec/exports/nlxva-pricer-openspec.md
+++ b/openspec/exports/nlxva-pricer-openspec.md
@@ -62,7 +62,7 @@ This feature automates extraction with AI + tool-calling validation, reducing er
 ## Scope
 - New page at /nlxva-pricer, new MudNavLink in existing NavMenu
 - New controller with 2 endpoints (chat + extract), same SSE streaming contract
- Semantic Kernel integration with OpenAI-compatible proxy
+- Semantic Kernel integration with Azure OpenAI (Azure AD auth via tenant ID)
 - Few-shot prompting infrastructure (instruction template + 3 examples)
 - External API clients for counterparty/trade/currency validation
 - Client-side markdown rendering with XSS sanitization
@@ -89,6 +89,14 @@ This feature automates extraction with AI + tool-calling validation, reducing er
 With raw HttpClient we'd need to manually parse tool-call JSON, dispatch functions,
 and feed results back. SK handles this loop automatically via FunctionChoiceBehavior.Auto().

+## Architecture Decision: Azure OpenAI with DefaultAzureCredential
+
+**Why:** The sandbox environment uses Azure OpenAI with an Azure AD tenant ID.
+SK's `AddAzureOpenAIChatCompletion()` with `DefaultAzureCredential` integrates
+with CRC's existing Azure AD auth. No API keys to manage — uses the developer's
+`az login` token locally and managed identity in production. The endpoint URL
+does NOT need `/v1` (Azure SDK constructs the path internally).
+
 ## Architecture Decision: SSE streaming over WebSocket

 **Why:** SSE is simpler (unidirectional server→client), works through HTTP proxies,
@@ -133,10 +141,25 @@ in the Razor component. CRC's existing Fluxor infrastructure is untouched.
 CRC.Server may need its CORS policy updated to allow SSE streaming (Content-Type: text/event-stream)
 to the CRC.Client origin. Verify existing policy covers this.

+## Risk: SSE response buffering
+
+Two streaming hops: Azure OpenAI → CRC.Server → Browser. Buffering at any point kills
+streaming UX. Common culprits: response compression middleware (`UseResponseCompression()`),
+reverse proxies (NGINX, IIS), Azure API Management in front of Azure OpenAI.
+Use the diagnostic stream-test endpoint (see Critical Pattern #8 in reference spec)
+to verify both hops stream correctly before building the UI.
+
 ## Risk: Semantic Kernel version compatibility

 CRC targets .NET 8.0. Ensure the SK NuGet package version is compatible with .NET 8.
-Current stable SK packages support .NET 8+.
+Current stable SK packages support .NET 8+. Also need `Microsoft.SemanticKernel.Connectors.AzureOpenAI`
+and `Azure.Identity` packages.
+
+## Risk: Azure AD token acquisition
+
+`DefaultAzureCredential` tries multiple auth methods in sequence. On a developer machine,
+it uses Azure CLI login (`az login --tenant <tenant-id>`). If the developer hasn't run
+`az login`, SK will fail with an auth error at the first LLM call, not at startup.

 ## Risk: Large file uploads

@@ -153,7 +176,7 @@ Email HTML files are read entirely into memory (max 10MB guard). For typical sal

 ## Phase 1: Foundation (Server)

- [ ] **T1: Add NuGet packages** — Add `Microsoft.SemanticKernel` to CRC.Server. Add `Markdig` 1.1.1 to CRC.Client (if not already present). Verify .NET 8 compatibility.
+- [ ] **T1: Add NuGet packages** — Add `Microsoft.SemanticKernel`, `Microsoft.SemanticKernel.Connectors.AzureOpenAI`, and `Azure.Identity` to CRC.Server. Add `Markdig` 1.1.1 to CRC.Client (if not already present). CRC may already have `Azure.Identity` — check first. Verify .NET 8 compatibility.

 - [ ] **T2: Add shared DTOs** — Create in CRC.Shared: `NlxvaChatMessage`, `NlxvaChatRequest`, `NlxvaModelSettings`, `NlxvaExtractionRequest`, `NlxvaExtractionResult`, `TradeItem` (with `[JsonPropertyName]` snake_case), `NlxvaValidationResult`, `NlxvaCandidateMatch`. See Contracts section in reference spec for exact shapes.

@@ -163,7 +186,9 @@ Email HTML files are read entirely into memory (max 10MB guard). For typical sal

 - [ ] **T5: Add FewShotService** — Create in CRC.Server/Services: `FewShotService` that loads instruction template + few-shot examples from disk. Caches ChatHistory prefix. Methods: `CloneWithEmail()`, `CloneWithEmailAndMessages()`. Register as Singleton. Copy examples/ folder to CRC.Server root.

- [ ] **T6: Register Semantic Kernel** — In CRC.Server DI: `AddOpenAIChatCompletion()` + `AddKernel()`. Base URL MUST include `/v1`. Config from `NlxvaPricer:*` keys in appsettings.json. See Critical Pattern #2.
+- [ ] **T6: Register Semantic Kernel** — In CRC.Server DI: `AddAzureOpenAIChatCompletion()` with `DefaultAzureCredential` (tenant ID from config) + `AddKernel()`. Endpoint is Azure OpenAI resource URL (NO `/v1`). Use deployment name, NOT model name. Config from `NlxvaPricer:*` keys in appsettings.json. See Critical Pattern #2.
+
+- [ ] **T6b: Verify streaming hop 1** — Add temporary `stream-test` diagnostic endpoint (see Critical Pattern #8). Run `curl -N` against it. Verify timestamps are spread across seconds (not clustered). Check for response compression middleware interference. Remove diagnostic endpoint after verification.

 - [ ] **T7: Add NlxvaPricerController** — Create controller with `POST /api/nlxva-pricer/chat` and `POST /api/nlxva-pricer/extract`. Both stream SSE. Chat endpoint: builds ChatHistory from messages + optional system prompt + model settings. Extract endpoint: uses FewShotService prefix. Both import ExtractionPlugin per-request and enable `FunctionChoiceBehavior.Auto()`. See Critical Pattern #6.

@@ -183,9 +208,11 @@ Email HTML files are read entirely into memory (max 10MB guard). For typical sal

 ## Phase 3: Verify

- [ ] **T14: Config** — Add `NlxvaPricer` and `ExternalApis` sections to CRC.Server appsettings.json. Ensure CORS allows CRC.Client origin for SSE responses.
+- [ ] **T14: Config** — Add `NlxvaPricer` (AzureOpenAIEndpoint, DeploymentName, TenantId, FewShotPath) and `ExternalApis` sections to CRC.Server appsettings.json. Ensure CORS allows CRC.Client origin for SSE responses. Ensure developer has run `az login --tenant <tenant-id>`.

- [ ] **T15: Smoke test** — Build both projects. Navigate to /nlxva-pricer. Send a chat message → verify streaming. Upload an example email HTML → verify extraction streams. Verify New Chat resets. Verify drag-drop visual feedback.
+- [ ] **T15: Verify streaming end-to-end** — Run `curl -N` against `/api/nlxva-pricer/chat` to verify hop 2 (server → client) streams correctly. Check browser Network tab EventStream view for incremental token delivery. If response compression is enabled, verify SSE endpoints opt out.
+
+- [ ] **T16: Smoke test** — Build both projects. Navigate to /nlxva-pricer. Send a chat message → verify streaming tokens appear incrementally. Upload an example email HTML → verify extraction streams. Verify New Chat resets. Verify drag-drop visual feedback.

 ## Implementation Notes

--- a/openspec/exports/nlxva-pricer-porting-guide.md
+++ b/openspec/exports/nlxva-pricer-porting-guide.md
@@ -24,9 +24,9 @@ You have three companion documents for this port:

 The Natural Language XVA Pricer is a chat-based interface that lets the CVA desk interact with an AI agent to price trades using natural language. It serves two modes: **general chat** (ask questions about XVA pricing, get explanations) and **email extraction** (upload a sales email, get structured trade data back as JSON).

-The data flows like this: The user types a message or drops an email `.html` file onto the chat area. The Blazor WASM client sends the request to the ASP.NET Core backend via HTTP POST. The backend processes it through **Microsoft Semantic Kernel** — an AI orchestration framework that connects to an OpenAI-compatible LLM proxy (CLIProxyAPI running locally). For extraction requests, the backend prepends **few-shot examples** (real email → expected JSON pairs loaded from disk) to teach the model the expected output format. The LLM can autonomously call **validation tools** (counterparty lookup, trade ID validation, currency validation, schema validation) via SK's automatic function calling. The response streams back token-by-token as **Server-Sent Events (SSE)**, and the client renders each token into the chat UI with **markdown formatting** and **XSS sanitization**.
+The data flows like this: The user types a message or drops an email `.html` file onto the chat area. The Blazor WASM client sends the request to the ASP.NET Core backend via HTTP POST. The backend processes it through **Microsoft Semantic Kernel** — an AI orchestration framework that connects to **Azure OpenAI** using Azure AD authentication (the same tenant CRC already uses). For extraction requests, the backend prepends **few-shot examples** (real email → expected JSON pairs loaded from disk) to teach the model the expected output format. The LLM can autonomously call **validation tools** (counterparty lookup, trade ID validation, currency validation, schema validation) via SK's automatic function calling. The response streams back token-by-token as **Server-Sent Events (SSE)**, and the client renders each token into the chat UI with **markdown formatting** and **XSS sanitization**.

-The external dependencies are: (1) a CLIProxyAPI proxy for LLM access (any OpenAI-compatible endpoint works), (2) three external APIs for validation (counterparty, trade, currency) — these are the existing CRC backend services that CRC.Server already integrates with, and (3) the `Markdig` NuGet package for markdown rendering plus `Microsoft.SemanticKernel` for LLM orchestration.
+The external dependencies are: (1) an Azure OpenAI resource with a deployed model (authenticated via Azure AD tenant ID), (2) three external APIs for validation (counterparty, trade, currency) — these are the existing CRC backend services that CRC.Server already integrates with, and (3) the `Markdig` NuGet package for markdown rendering plus `Microsoft.SemanticKernel` for LLM orchestration.

 **The one thing you must understand**: this feature is an isolated page. It doesn't need Fluxor, doesn't modify CRC's data layer, and doesn't touch the Pricer/MarketData/XVA/Sales pages. It adds a controller, some services, a page, and a nav link. If something goes wrong during porting, the blast radius is limited to the new files.

@@ -34,20 +34,25 @@ The external dependencies are: (1) a CLIProxyAPI proxy for LLM access (any OpenA

 ## Design Decisions (Detailed)

-### 1. Semantic Kernel over raw HttpClient for LLM communication
+### 1. Semantic Kernel with Azure OpenAI for LLM communication

-**What we chose:** Microsoft Semantic Kernel (SK) as the AI orchestration layer.
+**What we chose:** Microsoft Semantic Kernel (SK) as the AI orchestration layer, connecting to Azure OpenAI via Azure AD authentication.

 **Why:** The core value isn't just chat — it's the **extraction agent loop**. The agent extracts trade data, calls validation tools, interprets results, retries with fixes, and escalates to the user. Without SK, you'd need to: (a) manually parse the LLM's tool-call JSON from the streaming response, (b) dispatch to the correct C# function, (c) serialize the result, (d) feed it back to the LLM, (e) handle the loop termination. SK does all of this with one line: `FunctionChoiceBehavior.Auto()`. It turns ~200 lines of manual orchestration into zero.

+Azure OpenAI is the LLM backend because CRC's sandbox environment provides it with an Azure AD tenant. `DefaultAzureCredential` integrates with CRC's existing Azure AD auth — no separate API keys to manage. On a developer's machine it uses the `az login` token; in production it can use managed identity.
+
 **What we rejected:**
 - **Raw HttpClient + manual SSE parsing** — This was the original Phase 2 approach. It works for simple chat but doesn't support tool calling without writing a full agent loop. Rejected when we added extraction tools.
 - **LangChain/.NET equivalent** — Considered briefly. SK is Microsoft's official offering, has first-class .NET support, and integrates cleanly with ASP.NET Core DI. LangChain's .NET port was less mature.
- **Azure OpenAI Service directly** — CRC's network may not allow direct Azure OpenAI access from the server. CLIProxyAPI acts as a local proxy, and SK's OpenAI connector targets any OpenAI-compatible endpoint.
+- **OpenAI direct (non-Azure)** — CRC's network may not allow direct OpenAI access. Azure OpenAI is within the corporate Azure tenant, which is already permitted.
+- **API key auth** — Simpler to configure but keys need rotation and secure storage. Azure AD tokens are automatic and tied to the developer/service identity.

-**When you'd revisit this:** If CRC moves to Azure OpenAI with managed identity auth, you'd swap `AddOpenAIChatCompletion()` for `AddAzureOpenAIChatCompletion()`. SK makes this a one-line change.
+**When you'd revisit this:** If the Azure OpenAI resource is decommissioned or you need a different model provider, swap `AddAzureOpenAIChatCompletion()` for `AddOpenAIChatCompletion()` — SK abstracts the difference. Everything downstream (controller, plugins, streaming) stays identical.

-**Target adaptation:** CRC uses Scrutor for assembly scanning. SK's `AddKernel()` and `AddOpenAIChatCompletion()` are explicit registrations that coexist with Scrutor — no conflict. But verify that Scrutor doesn't auto-register ExtractionPlugin before your manual `AddScoped<ExtractionPlugin>()` call (it could if it scans the Plugins namespace). If it does, you'll get the plugin registered without its HttpClient dependencies. Check by looking at CRC's Scrutor scan filters.
+**Target adaptation:** CRC uses Scrutor for assembly scanning. SK's `AddKernel()` and `AddAzureOpenAIChatCompletion()` are explicit registrations that coexist with Scrutor — no conflict. But verify that Scrutor doesn't auto-register ExtractionPlugin before your manual `AddScoped<ExtractionPlugin>()` call (it could if it scans the Plugins namespace). If it does, you'll get the plugin registered without its HttpClient dependencies. Check by looking at CRC's Scrutor scan filters.
+
+**Azure AD prerequisite:** Developers must run `az login --tenant <tenant-id>` before starting CRC.Server. `DefaultAzureCredential` will silently fail at the first LLM call (not at startup) if the token isn't available — the error message mentions "ManagedIdentityCredential" and "EnvironmentCredential" failures, which can be confusing. The fix is always `az login`.

 ---

@@ -200,11 +205,14 @@ Check CRC's MudBlazor version first: `grep MudBlazor CRC.Client.csproj`

 **Step-by-step:**
 1. Add `Microsoft.SemanticKernel` to `CRC.Server.csproj`
-2. Add `Markdig` to `CRC.Client.csproj` (check if it's already there: `grep -i markdig CRC.Client.csproj`)
-3. Run `dotnet restore CRC.sln`
+2. Add `Microsoft.SemanticKernel.Connectors.AzureOpenAI` to `CRC.Server.csproj`
+3. Add `Azure.Identity` to `CRC.Server.csproj` (check first: `grep -i Azure.Identity CRC.Server.csproj` — CRC may already have it since it uses Azure AD)
+4. Add `Markdig` to `CRC.Client.csproj` (check if it's already there: `grep -i markdig CRC.Client.csproj`)
+5. Run `dotnet restore CRC.sln`

 **Expected friction on target:**
 - **GV Artifactory may not have `Microsoft.SemanticKernel`**. SK is a relatively new package. If it's not mirrored in the internal feed, you'll need to either: request it be added to Artifactory, or temporarily add nuget.org as a source in `nuget.config` (check with your team if this is allowed).
+- **Azure.Identity version conflict**. If CRC already has `Azure.Identity` at a different version, the SK transitive dependency may conflict. Run `dotnet list CRC.Server package --include-transitive | grep Azure.Identity` to check.
 - **Version pinning**. CRC uses `RestorePackagesWithLockFile=true` — after installing, commit the updated `packages.lock.json`.

 **Verify it works:**
@@ -348,37 +356,101 @@ Check CRC's MudBlazor version first: `grep MudBlazor CRC.Client.csproj`

 ---

-### T6: Register Semantic Kernel
+### T6: Register Semantic Kernel with Azure OpenAI

-**Prerequisites:** T1 (NuGet package installed).
+**Prerequisites:** T1 (NuGet packages installed). Developer has run `az login --tenant <tenant-id>`.

-**Context:** This registers the SK Kernel and OpenAI chat completion connector in DI. The connector works with any OpenAI-compatible API, so we point it at CLIProxyAPI (a local proxy that routes to Claude/GPT).
+**Context:** This registers the SK Kernel and Azure OpenAI chat completion connector in DI. Unlike the source project (which used a local proxy), the CRC sandbox uses Azure OpenAI with Azure AD authentication. The key differences: use `AddAzureOpenAIChatCompletion()` (not `AddOpenAIChatCompletion()`), use deployment name (not model name), endpoint has NO `/v1` suffix, and auth uses `DefaultAzureCredential` with the tenant ID.

 **Step-by-step:**
-1. Add `using Microsoft.SemanticKernel;` to the startup file
-2. Read config values from `NlxvaPricer:*` section
-3. Register: `AddOpenAIChatCompletion()` then `AddKernel()`
-4. The base URL **MUST** include `/v1` — this is the most common misconfiguration
+1. Add `using Microsoft.SemanticKernel;` and `using Azure.Identity;` to the startup file
+2. Read config values from `NlxvaPricer:*` section (AzureOpenAIEndpoint, DeploymentName, TenantId)
+3. Register: `AddAzureOpenAIChatCompletion()` then `AddKernel()`
+4. The endpoint is the Azure resource URL — do NOT add `/v1` (the Azure SDK handles path construction)
+5. Use `DefaultAzureCredential` with the tenant ID
+
+```csharp
+var azureEndpoint = builder.Configuration["NlxvaPricer:AzureOpenAIEndpoint"];
+var deploymentName = builder.Configuration["NlxvaPricer:DeploymentName"];
+var tenantId = builder.Configuration["NlxvaPricer:TenantId"];
+
+builder.Services.AddAzureOpenAIChatCompletion(
+    deploymentName: deploymentName,
+    endpoint: azureEndpoint,
+    credentials: new DefaultAzureCredential(
+        new DefaultAzureCredentialOptions { TenantId = tenantId }));
+builder.Services.AddKernel();
+```

 **Expected friction on target:**
- **CLIProxyAPI availability**: The proxy must be running on the target machine at the configured URL. If CRC's server runs on a different machine than the developer's laptop (where CLIProxyAPI runs), you'll need network routing or to deploy CLIProxyAPI alongside CRC.
- **API key**: CLIProxyAPI may not check the key, but the SK OpenAI connector requires a non-empty string. Use `"not-needed"` as a placeholder.
+- **`az login` not done**: `DefaultAzureCredential` tries multiple auth methods in sequence (environment vars → managed identity → Visual Studio → Azure CLI → etc.). On a developer machine, it relies on Azure CLI. If the developer hasn't run `az login --tenant <tenant-id>`, the error at runtime will be a confusing `CredentialUnavailableException` listing all the methods it tried. The fix is always: `az login --tenant <tenant-id>`.
+- **Deployment name vs model name**: In Azure portal, you deploy a model (e.g., `gpt-4o`) and give the deployment a name (e.g., `gpt4o-prod`). You pass the **deployment name** to SK, not the model name. Ask your Azure admin for the deployment name.
+- **Azure RBAC permissions**: The developer's Azure AD identity needs the "Cognitive Services OpenAI User" role on the Azure OpenAI resource. Without it, you'll get a 403.

 **Verify it works:**
- `dotnet build` succeeds (SK NuGet resolved correctly)
+- `dotnet build` succeeds
 - At runtime: inject `Kernel` into a test controller and verify it resolves
- Quick smoke test: call `kernel.GetRequiredService<IChatCompletionService>()` — should not throw
+- Quick smoke test: `kernel.GetRequiredService<IChatCompletionService>()` — should not throw
+- Full test: the diagnostic stream-test endpoint (see T6b below)

 **If it breaks — diagnostic checklist:**
- Symptom: 404 on LLM requests
-  Cause: Base URL missing `/v1`
-  Fix: Change `http://localhost:8317` to `http://localhost:8317/v1`
- Symptom: `HttpRequestException: Connection refused`
-  Cause: CLIProxyAPI not running
-  Fix: Start CLIProxyAPI on the target machine, verify with `curl http://localhost:8317/v1/models`
+- Symptom: `CredentialUnavailableException` with "DefaultAzureCredential failed to retrieve a token"
+  Cause: Developer not logged in to Azure CLI
+  Fix: Run `az login --tenant <tenant-id>`, then restart CRC.Server
+- Symptom: HTTP 403 Forbidden from Azure OpenAI
+  Cause: Azure AD identity lacks "Cognitive Services OpenAI User" role
+  Fix: Ask Azure admin to grant the role on the Azure OpenAI resource
+- Symptom: HTTP 404 on Azure OpenAI endpoint
+  Cause: Wrong deployment name, or deployment doesn't exist
+  Fix: Verify deployment name in Azure portal → Azure OpenAI → Deployments
 - Symptom: `InvalidOperationException: No service for type IChatCompletionService`
-  Cause: `AddOpenAIChatCompletion()` not called before `AddKernel()`
-  Fix: Ensure registration order: OpenAIChatCompletion first, then Kernel
+  Cause: `AddAzureOpenAIChatCompletion()` not called before `AddKernel()`
+  Fix: Ensure registration order: AzureOpenAIChatCompletion first, then Kernel
+
+---
+
+### T6b: Verify streaming hop 1 (Azure OpenAI → CRC.Server)
+
+**Prerequisites:** T6 (SK registered), T7 (controller exists — or add the diagnostic endpoint to any controller temporarily).
+
+**Context:** Before building the full UI, verify that tokens actually stream from Azure OpenAI through CRC.Server. This catches buffering issues early (response compression middleware, Azure API Management, corporate proxies).
+
+**Step-by-step:**
+1. Add a temporary diagnostic endpoint to NlxvaPricerController (see Critical Pattern #8 in export-spec)
+2. Run: `curl -N https://localhost:7100/api/nlxva-pricer/stream-test`
+3. Watch the timestamps in the output
+
+**What correct streaming looks like:**
+```
+data: [450ms] 1          ← timestamps spread across seconds
+data: [620ms]
+data: [780ms] 2
+data: [950ms]
+data: [1100ms] 3
+```
+
+**What buffered streaming looks like:**
+```
+data: [8200ms] 1         ← all timestamps clustered at the end
+data: [8201ms]
+data: [8202ms] 2
+data: [8203ms]
+```
+
+**If buffered — check these in order:**
+1. **Response compression middleware**: If CRC.Server has `app.UseResponseCompression()`, it buffers SSE to compress. Add `Response.Headers["Content-Encoding"] = "identity";` in the controller to opt out.
+2. **Azure API Management (APIM)**: If APIM sits in front of the Azure OpenAI resource, it buffers by default. Need `forward-request` policy with `buffer-response="false"`.
+3. **Corporate HTTPS proxy**: Check `echo $HTTPS_PROXY` on the server. May need proxy bypass for `*.openai.azure.com`.
+4. **IIS**: If CRC runs under IIS, add `responseBufferLimit="0"` in web.config.
+
+**Always set these headers on SSE endpoints:**
+```csharp
+Response.ContentType = "text/event-stream";
+Response.Headers["Cache-Control"] = "no-cache";
+Response.Headers["X-Accel-Buffering"] = "no";  // prevents NGINX buffering
+```
+
+5. Remove the diagnostic endpoint after verification.

 ---

@@ -561,10 +633,11 @@ Check CRC's MudBlazor version first: `grep MudBlazor CRC.Client.csproj`

 **Step-by-step checklist:**

- [ ] `NlxvaPricer:LlmBaseUrl` in CRC.Server `appsettings.json` — default `http://localhost:8317/v1`
- [ ] `NlxvaPricer:LlmModel` in CRC.Server `appsettings.json` — default `claude-sonnet-4-6`
- [ ] `NlxvaPricer:LlmApiKey` in CRC.Server `appsettings.json` — default `not-needed`
+- [ ] `NlxvaPricer:AzureOpenAIEndpoint` in CRC.Server `appsettings.json` — e.g., `https://your-resource.openai.azure.com/` — **no `/v1`**. What happens if missing: SK registration fails at startup
+- [ ] `NlxvaPricer:DeploymentName` in CRC.Server `appsettings.json` — the Azure deployment name (not model name). Get from Azure portal → Azure OpenAI → Deployments
+- [ ] `NlxvaPricer:TenantId` in CRC.Server `appsettings.json` — Azure AD tenant ID. Same tenant CRC uses for Microsoft.Identity.Web auth
 - [ ] `NlxvaPricer:FewShotPath` in CRC.Server `appsettings.json` — default `examples/extraction`
+- [ ] Developer has run `az login --tenant <tenant-id>` — `DefaultAzureCredential` needs this. Failure shows at first LLM call, not at startup
 - [ ] `ExternalApis:CounterpartyBaseUrl` — default `http://localhost:5000/api/counterparty` (or use CRC's existing)
 - [ ] `ExternalApis:TradeBaseUrl` — default `http://localhost:5000/api/trade` (or use CRC's existing)
 - [ ] `ExternalApis:CurrencyBaseUrl` — default `http://localhost:5000/api/currency` (or use CRC's existing)
@@ -579,7 +652,7 @@ Check CRC's MudBlazor version first: `grep MudBlazor CRC.Client.csproj`
 **Full verification sequence:**

 1. `dotnet build --configuration release CRC.sln` — 0 errors, 0 new warnings
-2. Start CLIProxyAPI on target machine
+2. Ensure developer has run `az login --tenant <tenant-id>`
 3. Start CRC.Server
 4. Navigate to CRC.Client in browser
 5. Verify "NL XVA Pricer" appears in sidebar
@@ -598,7 +671,7 @@ Check CRC's MudBlazor version first: `grep MudBlazor CRC.Client.csproj`

 | # | Symptom | Likely Cause | Fix |
 |---|---|---|---|
-| 1 | 404 on `/v1/chat/completions` | Base URL missing `/v1` suffix | Set `NlxvaPricer:LlmBaseUrl` to `http://localhost:8317/v1` |
+| 1 | 404 on Azure OpenAI endpoint | Wrong deployment name or endpoint URL | Verify deployment name in Azure portal; endpoint should be `https://<resource>.openai.azure.com/` with NO `/v1` |
 | 2 | CORS 403 in browser console | CORS policy doesn't cover CRC.Client origin or `text/event-stream` | Add CRC.Client origin with `AllowAnyHeader()` in CORS config |
 | 3 | No streaming — entire response at once | `SetBrowserResponseStreamingEnabled(true)` missing on client | Add to HttpRequestMessage before SendAsync |
 | 4 | `NotSupportedException: Synchronous operations` | Using `reader.EndOfStream` in WASM | Replace with `while ((line = await ReadLineAsync()) != null)` |
@@ -611,7 +684,10 @@ Check CRC's MudBlazor version first: `grep MudBlazor CRC.Client.csproj`
 | 11 | `FileNotFoundException` for instruction-template.txt | Examples folder not at ContentRootPath | Log ContentRootPath; verify examples location; update FewShotPath config |
 | 12 | Empty few-shot examples (only system message) | Subdirectory structure wrong | Verify `examples/extraction/few-shot/01/input.html` exists |
 | 13 | `NuGet restore error` for SemanticKernel | Package not in GV Artifactory feed | Request mirroring or temporary nuget.org source |
-| 14 | `HttpRequestException: Connection refused` | CLIProxyAPI not running | Start proxy; verify with `curl http://localhost:8317/v1/models` |
+| 14 | `CredentialUnavailableException` from DefaultAzureCredential | Developer not logged in via Azure CLI | Run `az login --tenant <tenant-id>`, restart CRC.Server |
+| 14b | HTTP 403 from Azure OpenAI | Azure AD identity lacks role | Grant "Cognitive Services OpenAI User" on the Azure OpenAI resource |
+| 14c | All tokens arrive at once (no streaming) | Response compression or proxy buffering | Use stream-test diagnostic endpoint; check `UseResponseCompression()`; set `X-Accel-Buffering: no` header |
+| 14d | Streaming works in curl but not in browser | Response compression only applied for browser Accept-Encoding | Add `Response.Headers["Content-Encoding"] = "identity"` in SSE endpoints |
 | 15 | Drag-drop file not triggering extraction | `file-drop.js` not loaded | Check `<script>` tag in index.html; check browser console for JS errors |
 | 16 | `window.fileDrop is undefined` | Script loaded after Blazor framework init | Move `<script>` tag before `_framework/blazor.webassembly.js` |
 | 17 | `JsonException` when parsing SSE data | SSE line doesn't match expected format | Add logging for raw SSE lines; check server-side WriteSSEAsync format |
@@ -627,6 +703,20 @@ Check CRC's MudBlazor version first: `grep MudBlazor CRC.Client.csproj`
 - **NuGet source:** Available on nuget.org. If CRC's GV Artifactory doesn't mirror it, this is a blocker — request mirroring.
 - **Size:** ~5MB total with dependencies

+### Microsoft.SemanticKernel.Connectors.AzureOpenAI
+- **Why needed:** Azure OpenAI-specific connector for SK (provides `AddAzureOpenAIChatCompletion()`)
+- **.NET compatibility:** Same as core SK package
+- **Transitive dependencies:** Pulls in `Azure.AI.OpenAI` SDK
+- **NuGet source:** Same as core SK — nuget.org
+- **Note:** This is separate from the core SK package. Without it, only `AddOpenAIChatCompletion()` is available (for non-Azure endpoints).
+
+### Azure.Identity
+- **Why needed:** Provides `DefaultAzureCredential` for Azure AD authentication to Azure OpenAI
+- **.NET compatibility:** .NET Standard 2.0+ (compatible with everything)
+- **CRC likely already has this** — it uses `Microsoft.Identity.Web` for Azure AD auth. Check `grep Azure.Identity CRC.Server.csproj`.
+- **Version conflicts:** If CRC has an older version, SK may pull in a newer one. Usually compatible, but verify with `dotnet build`.
+- **NuGet source:** Available on nuget.org and commonly mirrored in enterprise feeds
+
 ### Markdig (1.1.1)
 - **Why needed:** Markdown → HTML conversion for rendering LLM responses
 - **.NET compatibility:** .NET Standard 2.0+ (compatible with everything)
@@ -664,8 +754,10 @@ If the feature needs to be removed:

 **NuGet packages to remove:**
 - `Microsoft.SemanticKernel` from CRC.Server
+- `Microsoft.SemanticKernel.Connectors.AzureOpenAI` from CRC.Server
+- `Azure.Identity` from CRC.Server (only if not used by other CRC features — likely IS used, so leave it)
 - `Markdig` from CRC.Client (if not used by other features)

 **Config keys to remove:**
- `NlxvaPricer:*` section from `appsettings.json`
+- `NlxvaPricer:*` section (AzureOpenAIEndpoint, DeploymentName, TenantId, FewShotPath) from `appsettings.json`
 - `ExternalApis:*` section (if only used by this feature)
--- a/openspec/exports/nlxva-pricer-spec.md
+++ b/openspec/exports/nlxva-pricer-spec.md
@@ -58,6 +58,8 @@ This feature is a GUEST in CRC. Existing code, patterns, and conventions take ab

 Add to `CRC.Server`:
 - `Microsoft.SemanticKernel` (latest stable, >=1.x)
+- `Microsoft.SemanticKernel.Connectors.AzureOpenAI` (for Azure OpenAI connector)
+- `Azure.Identity` (for `DefaultAzureCredential` — CRC may already have this)
 - `Markdig` 1.1.1 (if CRC.Client doesn't already have it — check first)

 No new packages for CRC.Client or CRC.Shared (MudBlazor already present).
@@ -73,7 +75,7 @@ CRC.Server (ASP.NET Core)
    ├── NlxvaPricerController
    │     ├── POST /api/nlxva-pricer/chat      (general chat)
    │     └── POST /api/nlxva-pricer/extract    (email extraction)
-    │           Uses: Semantic Kernel → CLIProxyAPI (OpenAI-compatible proxy)
+    │           Uses: Semantic Kernel → Azure OpenAI (via DefaultAzureCredential)
    │           Uses: ExtractionPlugin (tool calling)
    │           Uses: FewShotService (example loading)
    ├── Services/
@@ -232,9 +234,9 @@ data: {"error":"message"}\n\n       ← on failure (followed by [DONE])
 ```json
 {
  "NlxvaPricer": {
-    "LlmBaseUrl": "http://localhost:8317/v1",
-    "LlmModel": "claude-sonnet-4-6",
-    "LlmApiKey": "not-needed",
+    "AzureOpenAIEndpoint": "https://your-resource.openai.azure.com/",
+    "DeploymentName": "gpt4o-prod",
+    "TenantId": "<your-azure-ad-tenant-id>",
    "FewShotPath": "examples/extraction"
  },
  "ExternalApis": {
@@ -245,6 +247,11 @@ data: {"error":"message"}\n\n       ← on failure (followed by [DONE])
 }
 ```

+If using API key auth instead of Azure AD, replace `TenantId` with:
+```json
+    "ApiKey": "<your-azure-openai-api-key>"
+```
+
 ## Critical Patterns

 ### 1. SSE streaming in Blazor WASM — DO NOT use `reader.EndOfStream`
@@ -276,16 +283,33 @@ while ((line = await reader.ReadLineAsync()) != null)   // ← NOT EndOfStream
 `SetBrowserResponseStreamingEnabled(true)` is a Blazor WASM extension that tells the browser Fetch API
 to expose the response as a ReadableStream. Without it, the browser buffers the entire response.

-### 2. Semantic Kernel base URL must include `/v1`
+### 2. Azure OpenAI: use deployment name, NOT model name; NO `/v1` suffix

-**Why:** The OpenAI SDK appends `chat/completions` directly to the base URL.
-Without `/v1`, requests hit `/chat/completions` instead of `/v1/chat/completions` → 404.
+**Why:** Azure OpenAI uses `AddAzureOpenAIChatCompletion()`, not `AddOpenAIChatCompletion()`.
+The endpoint is your Azure resource URL (no `/v1` — the Azure SDK constructs the path internally).
+The `deploymentName` is the name you gave the deployment in Azure portal, not the model name.
+Auth uses `DefaultAzureCredential` with the tenant ID, not an API key.

 ```csharp
-builder.Services.AddOpenAIChatCompletion(
-    modelId: model,
-    endpoint: new Uri("http://localhost:8317/v1"),  // ← MUST include /v1
-    apiKey: "not-needed");
+using Azure.Identity;
+
+builder.Services.AddAzureOpenAIChatCompletion(
+    deploymentName: builder.Configuration["NlxvaPricer:DeploymentName"] ?? "gpt4o-prod",
+    endpoint: builder.Configuration["NlxvaPricer:AzureOpenAIEndpoint"]
+        ?? "https://your-resource.openai.azure.com/",
+    credentials: new DefaultAzureCredential(
+        new DefaultAzureCredentialOptions
+        {
+            TenantId = builder.Configuration["NlxvaPricer:TenantId"]
+        }));
+```
+
+If using API key instead of Azure AD:
+```csharp
+builder.Services.AddAzureOpenAIChatCompletion(
+    deploymentName: "gpt4o-prod",
+    endpoint: "https://your-resource.openai.azure.com/",
+    apiKey: builder.Configuration["NlxvaPricer:ApiKey"]);
 ```

 ### 3. Layout height depends on AppBar height
@@ -371,18 +395,72 @@ if (!string.IsNullOrEmpty(parsedText))
    yield return parsedText;
 ```

+### 8. SSE response buffering — verify both streaming hops
+
+**Why:** The architecture has two streaming hops: Azure OpenAI → CRC.Server → Browser.
+If anything buffers in either hop, the user sees no tokens until the full response completes.
+Common buffers: response compression middleware, reverse proxies (NGINX/IIS), Azure API Management.
+
+**Diagnostic endpoint (add temporarily, remove after verifying):**
+```csharp
+[HttpGet("stream-test")]
+public async Task StreamTest()
+{
+    Response.ContentType = "text/event-stream";
+    Response.Headers["Cache-Control"] = "no-cache";
+    Response.Headers["X-Accel-Buffering"] = "no";  // NGINX hint
+
+    var chatService = _kernel.GetRequiredService<IChatCompletionService>();
+    var history = new ChatHistory();
+    history.AddUserMessage("Count from 1 to 10, one number per line.");
+
+    var sw = System.Diagnostics.Stopwatch.StartNew();
+    await foreach (var chunk in chatService.GetStreamingChatMessageContentsAsync(history))
+    {
+        if (!string.IsNullOrEmpty(chunk.Content))
+        {
+            await Response.WriteAsync($"data: [{sw.ElapsedMilliseconds}ms] {chunk.Content}\n\n");
+            await Response.Body.FlushAsync();
+        }
+    }
+    await Response.WriteAsync("data: [DONE]\n\n");
+}
+```
+
+**Test with:** `curl -N https://localhost:7100/api/nlxva-pricer/stream-test`
+- Timestamps spread over seconds = streaming works
+- All timestamps clustered at the end = something is buffering
+
+**If CRC.Server uses `UseResponseCompression()`, exclude SSE:**
+```csharp
+Response.Headers["Content-Encoding"] = "identity";  // opt out per-response
+```
+
+**Response headers to always set on SSE endpoints:**
+```csharp
+Response.ContentType = "text/event-stream";
+Response.Headers["Cache-Control"] = "no-cache";
+Response.Headers["X-Accel-Buffering"] = "no";       // prevents NGINX buffering
+```
+
 ## Wiring

 ### CRC.Server DI registration order (add to existing Program.cs / Startup.cs)

 ```csharp
-// 1. Semantic Kernel — OpenAI-compatible connector
-var llmBaseUrl = builder.Configuration["NlxvaPricer:LlmBaseUrl"] ?? "http://localhost:8317/v1";
-var llmModel = builder.Configuration["NlxvaPricer:LlmModel"] ?? "claude-sonnet-4-6";
-builder.Services.AddOpenAIChatCompletion(
-    modelId: llmModel,
-    endpoint: new Uri(llmBaseUrl),
-    apiKey: builder.Configuration["NlxvaPricer:LlmApiKey"] ?? "not-needed");
+// 1. Semantic Kernel — Azure OpenAI connector with Azure AD auth
+using Azure.Identity;
+
+var azureEndpoint = builder.Configuration["NlxvaPricer:AzureOpenAIEndpoint"]
+    ?? "https://your-resource.openai.azure.com/";
+var deploymentName = builder.Configuration["NlxvaPricer:DeploymentName"] ?? "gpt4o-prod";
+var tenantId = builder.Configuration["NlxvaPricer:TenantId"];
+
+builder.Services.AddAzureOpenAIChatCompletion(
+    deploymentName: deploymentName,
+    endpoint: azureEndpoint,
+    credentials: new DefaultAzureCredential(
+        new DefaultAzureCredentialOptions { TenantId = tenantId }));
 builder.Services.AddKernel();

 // 2. External API typed HttpClients