1 Commits

Author SHA1 Message Date
local
956ec243c5 fix: update export bundle for Azure OpenAI and add streaming diagnostics
Replace CLIProxyAPI/local proxy references with Azure OpenAI using
DefaultAzureCredential and tenant ID auth. Add Critical Pattern #8
for SSE buffering diagnostics with timestamped curl test. Add
streaming verification tasks (T6b, T15) and troubleshooting entries
for Azure AD auth, RBAC, response compression, and proxy buffering.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 01:42:38 +01:00
3 changed files with 256 additions and 59 deletions

View File

@@ -62,7 +62,7 @@ This feature automates extraction with AI + tool-calling validation, reducing er
## Scope ## Scope
- New page at /nlxva-pricer, new MudNavLink in existing NavMenu - New page at /nlxva-pricer, new MudNavLink in existing NavMenu
- New controller with 2 endpoints (chat + extract), same SSE streaming contract - New controller with 2 endpoints (chat + extract), same SSE streaming contract
- Semantic Kernel integration with OpenAI-compatible proxy - Semantic Kernel integration with Azure OpenAI (Azure AD auth via tenant ID)
- Few-shot prompting infrastructure (instruction template + 3 examples) - Few-shot prompting infrastructure (instruction template + 3 examples)
- External API clients for counterparty/trade/currency validation - External API clients for counterparty/trade/currency validation
- Client-side markdown rendering with XSS sanitization - Client-side markdown rendering with XSS sanitization
@@ -89,6 +89,14 @@ This feature automates extraction with AI + tool-calling validation, reducing er
With raw HttpClient we'd need to manually parse tool-call JSON, dispatch functions, With raw HttpClient we'd need to manually parse tool-call JSON, dispatch functions,
and feed results back. SK handles this loop automatically via FunctionChoiceBehavior.Auto(). and feed results back. SK handles this loop automatically via FunctionChoiceBehavior.Auto().
## Architecture Decision: Azure OpenAI with DefaultAzureCredential
**Why:** The sandbox environment uses Azure OpenAI with an Azure AD tenant ID.
SK's `AddAzureOpenAIChatCompletion()` with `DefaultAzureCredential` integrates
with CRC's existing Azure AD auth. No API keys to manage — uses the developer's
`az login` token locally and managed identity in production. The endpoint URL
does NOT need `/v1` (Azure SDK constructs the path internally).
## Architecture Decision: SSE streaming over WebSocket ## Architecture Decision: SSE streaming over WebSocket
**Why:** SSE is simpler (unidirectional server→client), works through HTTP proxies, **Why:** SSE is simpler (unidirectional server→client), works through HTTP proxies,
@@ -133,10 +141,25 @@ in the Razor component. CRC's existing Fluxor infrastructure is untouched.
CRC.Server may need its CORS policy updated to allow SSE streaming (Content-Type: text/event-stream) CRC.Server may need its CORS policy updated to allow SSE streaming (Content-Type: text/event-stream)
to the CRC.Client origin. Verify existing policy covers this. to the CRC.Client origin. Verify existing policy covers this.
## Risk: SSE response buffering
Two streaming hops: Azure OpenAI → CRC.Server → Browser. Buffering at any point kills
streaming UX. Common culprits: response compression middleware (`UseResponseCompression()`),
reverse proxies (NGINX, IIS), Azure API Management in front of Azure OpenAI.
Use the diagnostic stream-test endpoint (see Critical Pattern #8 in reference spec)
to verify both hops stream correctly before building the UI.
## Risk: Semantic Kernel version compatibility ## Risk: Semantic Kernel version compatibility
CRC targets .NET 8.0. Ensure the SK NuGet package version is compatible with .NET 8. CRC targets .NET 8.0. Ensure the SK NuGet package version is compatible with .NET 8.
Current stable SK packages support .NET 8+. Current stable SK packages support .NET 8+. Also need `Microsoft.SemanticKernel.Connectors.AzureOpenAI`
and `Azure.Identity` packages.
## Risk: Azure AD token acquisition
`DefaultAzureCredential` tries multiple auth methods in sequence. On a developer machine,
it uses Azure CLI login (`az login --tenant <tenant-id>`). If the developer hasn't run
`az login`, SK will fail with an auth error at the first LLM call, not at startup.
## Risk: Large file uploads ## Risk: Large file uploads
@@ -153,7 +176,7 @@ Email HTML files are read entirely into memory (max 10MB guard). For typical sal
## Phase 1: Foundation (Server) ## Phase 1: Foundation (Server)
- [ ] **T1: Add NuGet packages** — Add `Microsoft.SemanticKernel` to CRC.Server. Add `Markdig` 1.1.1 to CRC.Client (if not already present). Verify .NET 8 compatibility. - [ ] **T1: Add NuGet packages** — Add `Microsoft.SemanticKernel`, `Microsoft.SemanticKernel.Connectors.AzureOpenAI`, and `Azure.Identity` to CRC.Server. Add `Markdig` 1.1.1 to CRC.Client (if not already present). CRC may already have `Azure.Identity` — check first. Verify .NET 8 compatibility.
- [ ] **T2: Add shared DTOs** — Create in CRC.Shared: `NlxvaChatMessage`, `NlxvaChatRequest`, `NlxvaModelSettings`, `NlxvaExtractionRequest`, `NlxvaExtractionResult`, `TradeItem` (with `[JsonPropertyName]` snake_case), `NlxvaValidationResult`, `NlxvaCandidateMatch`. See Contracts section in reference spec for exact shapes. - [ ] **T2: Add shared DTOs** — Create in CRC.Shared: `NlxvaChatMessage`, `NlxvaChatRequest`, `NlxvaModelSettings`, `NlxvaExtractionRequest`, `NlxvaExtractionResult`, `TradeItem` (with `[JsonPropertyName]` snake_case), `NlxvaValidationResult`, `NlxvaCandidateMatch`. See Contracts section in reference spec for exact shapes.
@@ -163,7 +186,9 @@ Email HTML files are read entirely into memory (max 10MB guard). For typical sal
- [ ] **T5: Add FewShotService** — Create in CRC.Server/Services: `FewShotService` that loads instruction template + few-shot examples from disk. Caches ChatHistory prefix. Methods: `CloneWithEmail()`, `CloneWithEmailAndMessages()`. Register as Singleton. Copy examples/ folder to CRC.Server root. - [ ] **T5: Add FewShotService** — Create in CRC.Server/Services: `FewShotService` that loads instruction template + few-shot examples from disk. Caches ChatHistory prefix. Methods: `CloneWithEmail()`, `CloneWithEmailAndMessages()`. Register as Singleton. Copy examples/ folder to CRC.Server root.
- [ ] **T6: Register Semantic Kernel** — In CRC.Server DI: `AddOpenAIChatCompletion()` + `AddKernel()`. Base URL MUST include `/v1`. Config from `NlxvaPricer:*` keys in appsettings.json. See Critical Pattern #2. - [ ] **T6: Register Semantic Kernel** — In CRC.Server DI: `AddAzureOpenAIChatCompletion()` with `DefaultAzureCredential` (tenant ID from config) + `AddKernel()`. Endpoint is Azure OpenAI resource URL (NO `/v1`). Use deployment name, NOT model name. Config from `NlxvaPricer:*` keys in appsettings.json. See Critical Pattern #2.
- [ ] **T6b: Verify streaming hop 1** — Add temporary `stream-test` diagnostic endpoint (see Critical Pattern #8). Run `curl -N` against it. Verify timestamps are spread across seconds (not clustered). Check for response compression middleware interference. Remove diagnostic endpoint after verification.
- [ ] **T7: Add NlxvaPricerController** — Create controller with `POST /api/nlxva-pricer/chat` and `POST /api/nlxva-pricer/extract`. Both stream SSE. Chat endpoint: builds ChatHistory from messages + optional system prompt + model settings. Extract endpoint: uses FewShotService prefix. Both import ExtractionPlugin per-request and enable `FunctionChoiceBehavior.Auto()`. See Critical Pattern #6. - [ ] **T7: Add NlxvaPricerController** — Create controller with `POST /api/nlxva-pricer/chat` and `POST /api/nlxva-pricer/extract`. Both stream SSE. Chat endpoint: builds ChatHistory from messages + optional system prompt + model settings. Extract endpoint: uses FewShotService prefix. Both import ExtractionPlugin per-request and enable `FunctionChoiceBehavior.Auto()`. See Critical Pattern #6.
@@ -183,9 +208,11 @@ Email HTML files are read entirely into memory (max 10MB guard). For typical sal
## Phase 3: Verify ## Phase 3: Verify
- [ ] **T14: Config** — Add `NlxvaPricer` and `ExternalApis` sections to CRC.Server appsettings.json. Ensure CORS allows CRC.Client origin for SSE responses. - [ ] **T14: Config** — Add `NlxvaPricer` (AzureOpenAIEndpoint, DeploymentName, TenantId, FewShotPath) and `ExternalApis` sections to CRC.Server appsettings.json. Ensure CORS allows CRC.Client origin for SSE responses. Ensure developer has run `az login --tenant <tenant-id>`.
- [ ] **T15: Smoke test** — Build both projects. Navigate to /nlxva-pricer. Send a chat message → verify streaming. Upload an example email HTML → verify extraction streams. Verify New Chat resets. Verify drag-drop visual feedback. - [ ] **T15: Verify streaming end-to-end** — Run `curl -N` against `/api/nlxva-pricer/chat` to verify hop 2 (server → client) streams correctly. Check browser Network tab EventStream view for incremental token delivery. If response compression is enabled, verify SSE endpoints opt out.
- [ ] **T16: Smoke test** — Build both projects. Navigate to /nlxva-pricer. Send a chat message → verify streaming tokens appear incrementally. Upload an example email HTML → verify extraction streams. Verify New Chat resets. Verify drag-drop visual feedback.
## Implementation Notes ## Implementation Notes

View File

@@ -24,9 +24,9 @@ You have three companion documents for this port:
The Natural Language XVA Pricer is a chat-based interface that lets the CVA desk interact with an AI agent to price trades using natural language. It serves two modes: **general chat** (ask questions about XVA pricing, get explanations) and **email extraction** (upload a sales email, get structured trade data back as JSON). The Natural Language XVA Pricer is a chat-based interface that lets the CVA desk interact with an AI agent to price trades using natural language. It serves two modes: **general chat** (ask questions about XVA pricing, get explanations) and **email extraction** (upload a sales email, get structured trade data back as JSON).
The data flows like this: The user types a message or drops an email `.html` file onto the chat area. The Blazor WASM client sends the request to the ASP.NET Core backend via HTTP POST. The backend processes it through **Microsoft Semantic Kernel** — an AI orchestration framework that connects to an OpenAI-compatible LLM proxy (CLIProxyAPI running locally). For extraction requests, the backend prepends **few-shot examples** (real email → expected JSON pairs loaded from disk) to teach the model the expected output format. The LLM can autonomously call **validation tools** (counterparty lookup, trade ID validation, currency validation, schema validation) via SK's automatic function calling. The response streams back token-by-token as **Server-Sent Events (SSE)**, and the client renders each token into the chat UI with **markdown formatting** and **XSS sanitization**. The data flows like this: The user types a message or drops an email `.html` file onto the chat area. The Blazor WASM client sends the request to the ASP.NET Core backend via HTTP POST. The backend processes it through **Microsoft Semantic Kernel** — an AI orchestration framework that connects to **Azure OpenAI** using Azure AD authentication (the same tenant CRC already uses). For extraction requests, the backend prepends **few-shot examples** (real email → expected JSON pairs loaded from disk) to teach the model the expected output format. The LLM can autonomously call **validation tools** (counterparty lookup, trade ID validation, currency validation, schema validation) via SK's automatic function calling. The response streams back token-by-token as **Server-Sent Events (SSE)**, and the client renders each token into the chat UI with **markdown formatting** and **XSS sanitization**.
The external dependencies are: (1) a CLIProxyAPI proxy for LLM access (any OpenAI-compatible endpoint works), (2) three external APIs for validation (counterparty, trade, currency) — these are the existing CRC backend services that CRC.Server already integrates with, and (3) the `Markdig` NuGet package for markdown rendering plus `Microsoft.SemanticKernel` for LLM orchestration. The external dependencies are: (1) an Azure OpenAI resource with a deployed model (authenticated via Azure AD tenant ID), (2) three external APIs for validation (counterparty, trade, currency) — these are the existing CRC backend services that CRC.Server already integrates with, and (3) the `Markdig` NuGet package for markdown rendering plus `Microsoft.SemanticKernel` for LLM orchestration.
**The one thing you must understand**: this feature is an isolated page. It doesn't need Fluxor, doesn't modify CRC's data layer, and doesn't touch the Pricer/MarketData/XVA/Sales pages. It adds a controller, some services, a page, and a nav link. If something goes wrong during porting, the blast radius is limited to the new files. **The one thing you must understand**: this feature is an isolated page. It doesn't need Fluxor, doesn't modify CRC's data layer, and doesn't touch the Pricer/MarketData/XVA/Sales pages. It adds a controller, some services, a page, and a nav link. If something goes wrong during porting, the blast radius is limited to the new files.
@@ -34,20 +34,25 @@ The external dependencies are: (1) a CLIProxyAPI proxy for LLM access (any OpenA
## Design Decisions (Detailed) ## Design Decisions (Detailed)
### 1. Semantic Kernel over raw HttpClient for LLM communication ### 1. Semantic Kernel with Azure OpenAI for LLM communication
**What we chose:** Microsoft Semantic Kernel (SK) as the AI orchestration layer. **What we chose:** Microsoft Semantic Kernel (SK) as the AI orchestration layer, connecting to Azure OpenAI via Azure AD authentication.
**Why:** The core value isn't just chat — it's the **extraction agent loop**. The agent extracts trade data, calls validation tools, interprets results, retries with fixes, and escalates to the user. Without SK, you'd need to: (a) manually parse the LLM's tool-call JSON from the streaming response, (b) dispatch to the correct C# function, (c) serialize the result, (d) feed it back to the LLM, (e) handle the loop termination. SK does all of this with one line: `FunctionChoiceBehavior.Auto()`. It turns ~200 lines of manual orchestration into zero. **Why:** The core value isn't just chat — it's the **extraction agent loop**. The agent extracts trade data, calls validation tools, interprets results, retries with fixes, and escalates to the user. Without SK, you'd need to: (a) manually parse the LLM's tool-call JSON from the streaming response, (b) dispatch to the correct C# function, (c) serialize the result, (d) feed it back to the LLM, (e) handle the loop termination. SK does all of this with one line: `FunctionChoiceBehavior.Auto()`. It turns ~200 lines of manual orchestration into zero.
Azure OpenAI is the LLM backend because CRC's sandbox environment provides it with an Azure AD tenant. `DefaultAzureCredential` integrates with CRC's existing Azure AD auth — no separate API keys to manage. On a developer's machine it uses the `az login` token; in production it can use managed identity.
**What we rejected:** **What we rejected:**
- **Raw HttpClient + manual SSE parsing** — This was the original Phase 2 approach. It works for simple chat but doesn't support tool calling without writing a full agent loop. Rejected when we added extraction tools. - **Raw HttpClient + manual SSE parsing** — This was the original Phase 2 approach. It works for simple chat but doesn't support tool calling without writing a full agent loop. Rejected when we added extraction tools.
- **LangChain/.NET equivalent** — Considered briefly. SK is Microsoft's official offering, has first-class .NET support, and integrates cleanly with ASP.NET Core DI. LangChain's .NET port was less mature. - **LangChain/.NET equivalent** — Considered briefly. SK is Microsoft's official offering, has first-class .NET support, and integrates cleanly with ASP.NET Core DI. LangChain's .NET port was less mature.
- **Azure OpenAI Service directly** — CRC's network may not allow direct Azure OpenAI access from the server. CLIProxyAPI acts as a local proxy, and SK's OpenAI connector targets any OpenAI-compatible endpoint. - **OpenAI direct (non-Azure)** — CRC's network may not allow direct OpenAI access. Azure OpenAI is within the corporate Azure tenant, which is already permitted.
- **API key auth** — Simpler to configure but keys need rotation and secure storage. Azure AD tokens are automatic and tied to the developer/service identity.
**When you'd revisit this:** If CRC moves to Azure OpenAI with managed identity auth, you'd swap `AddOpenAIChatCompletion()` for `AddAzureOpenAIChatCompletion()`. SK makes this a one-line change. **When you'd revisit this:** If the Azure OpenAI resource is decommissioned or you need a different model provider, swap `AddAzureOpenAIChatCompletion()` for `AddOpenAIChatCompletion()` SK abstracts the difference. Everything downstream (controller, plugins, streaming) stays identical.
**Target adaptation:** CRC uses Scrutor for assembly scanning. SK's `AddKernel()` and `AddOpenAIChatCompletion()` are explicit registrations that coexist with Scrutor — no conflict. But verify that Scrutor doesn't auto-register ExtractionPlugin before your manual `AddScoped<ExtractionPlugin>()` call (it could if it scans the Plugins namespace). If it does, you'll get the plugin registered without its HttpClient dependencies. Check by looking at CRC's Scrutor scan filters. **Target adaptation:** CRC uses Scrutor for assembly scanning. SK's `AddKernel()` and `AddAzureOpenAIChatCompletion()` are explicit registrations that coexist with Scrutor — no conflict. But verify that Scrutor doesn't auto-register ExtractionPlugin before your manual `AddScoped<ExtractionPlugin>()` call (it could if it scans the Plugins namespace). If it does, you'll get the plugin registered without its HttpClient dependencies. Check by looking at CRC's Scrutor scan filters.
**Azure AD prerequisite:** Developers must run `az login --tenant <tenant-id>` before starting CRC.Server. `DefaultAzureCredential` will silently fail at the first LLM call (not at startup) if the token isn't available — the error message mentions "ManagedIdentityCredential" and "EnvironmentCredential" failures, which can be confusing. The fix is always `az login`.
--- ---
@@ -200,11 +205,14 @@ Check CRC's MudBlazor version first: `grep MudBlazor CRC.Client.csproj`
**Step-by-step:** **Step-by-step:**
1. Add `Microsoft.SemanticKernel` to `CRC.Server.csproj` 1. Add `Microsoft.SemanticKernel` to `CRC.Server.csproj`
2. Add `Markdig` to `CRC.Client.csproj` (check if it's already there: `grep -i markdig CRC.Client.csproj`) 2. Add `Microsoft.SemanticKernel.Connectors.AzureOpenAI` to `CRC.Server.csproj`
3. Run `dotnet restore CRC.sln` 3. Add `Azure.Identity` to `CRC.Server.csproj` (check first: `grep -i Azure.Identity CRC.Server.csproj` — CRC may already have it since it uses Azure AD)
4. Add `Markdig` to `CRC.Client.csproj` (check if it's already there: `grep -i markdig CRC.Client.csproj`)
5. Run `dotnet restore CRC.sln`
**Expected friction on target:** **Expected friction on target:**
- **GV Artifactory may not have `Microsoft.SemanticKernel`**. SK is a relatively new package. If it's not mirrored in the internal feed, you'll need to either: request it be added to Artifactory, or temporarily add nuget.org as a source in `nuget.config` (check with your team if this is allowed). - **GV Artifactory may not have `Microsoft.SemanticKernel`**. SK is a relatively new package. If it's not mirrored in the internal feed, you'll need to either: request it be added to Artifactory, or temporarily add nuget.org as a source in `nuget.config` (check with your team if this is allowed).
- **Azure.Identity version conflict**. If CRC already has `Azure.Identity` at a different version, the SK transitive dependency may conflict. Run `dotnet list CRC.Server package --include-transitive | grep Azure.Identity` to check.
- **Version pinning**. CRC uses `RestorePackagesWithLockFile=true` — after installing, commit the updated `packages.lock.json`. - **Version pinning**. CRC uses `RestorePackagesWithLockFile=true` — after installing, commit the updated `packages.lock.json`.
**Verify it works:** **Verify it works:**
@@ -348,37 +356,101 @@ Check CRC's MudBlazor version first: `grep MudBlazor CRC.Client.csproj`
--- ---
### T6: Register Semantic Kernel ### T6: Register Semantic Kernel with Azure OpenAI
**Prerequisites:** T1 (NuGet package installed). **Prerequisites:** T1 (NuGet packages installed). Developer has run `az login --tenant <tenant-id>`.
**Context:** This registers the SK Kernel and OpenAI chat completion connector in DI. The connector works with any OpenAI-compatible API, so we point it at CLIProxyAPI (a local proxy that routes to Claude/GPT). **Context:** This registers the SK Kernel and Azure OpenAI chat completion connector in DI. Unlike the source project (which used a local proxy), the CRC sandbox uses Azure OpenAI with Azure AD authentication. The key differences: use `AddAzureOpenAIChatCompletion()` (not `AddOpenAIChatCompletion()`), use deployment name (not model name), endpoint has NO `/v1` suffix, and auth uses `DefaultAzureCredential` with the tenant ID.
**Step-by-step:** **Step-by-step:**
1. Add `using Microsoft.SemanticKernel;` to the startup file 1. Add `using Microsoft.SemanticKernel;` and `using Azure.Identity;` to the startup file
2. Read config values from `NlxvaPricer:*` section 2. Read config values from `NlxvaPricer:*` section (AzureOpenAIEndpoint, DeploymentName, TenantId)
3. Register: `AddOpenAIChatCompletion()` then `AddKernel()` 3. Register: `AddAzureOpenAIChatCompletion()` then `AddKernel()`
4. The base URL **MUST** include `/v1` — this is the most common misconfiguration 4. The endpoint is the Azure resource URL — do NOT add `/v1` (the Azure SDK handles path construction)
5. Use `DefaultAzureCredential` with the tenant ID
```csharp
var azureEndpoint = builder.Configuration["NlxvaPricer:AzureOpenAIEndpoint"];
var deploymentName = builder.Configuration["NlxvaPricer:DeploymentName"];
var tenantId = builder.Configuration["NlxvaPricer:TenantId"];
builder.Services.AddAzureOpenAIChatCompletion(
deploymentName: deploymentName,
endpoint: azureEndpoint,
credentials: new DefaultAzureCredential(
new DefaultAzureCredentialOptions { TenantId = tenantId }));
builder.Services.AddKernel();
```
**Expected friction on target:** **Expected friction on target:**
- **CLIProxyAPI availability**: The proxy must be running on the target machine at the configured URL. If CRC's server runs on a different machine than the developer's laptop (where CLIProxyAPI runs), you'll need network routing or to deploy CLIProxyAPI alongside CRC. - **`az login` not done**: `DefaultAzureCredential` tries multiple auth methods in sequence (environment vars → managed identity → Visual Studio → Azure CLI → etc.). On a developer machine, it relies on Azure CLI. If the developer hasn't run `az login --tenant <tenant-id>`, the error at runtime will be a confusing `CredentialUnavailableException` listing all the methods it tried. The fix is always: `az login --tenant <tenant-id>`.
- **API key**: CLIProxyAPI may not check the key, but the SK OpenAI connector requires a non-empty string. Use `"not-needed"` as a placeholder. - **Deployment name vs model name**: In Azure portal, you deploy a model (e.g., `gpt-4o`) and give the deployment a name (e.g., `gpt4o-prod`). You pass the **deployment name** to SK, not the model name. Ask your Azure admin for the deployment name.
- **Azure RBAC permissions**: The developer's Azure AD identity needs the "Cognitive Services OpenAI User" role on the Azure OpenAI resource. Without it, you'll get a 403.
**Verify it works:** **Verify it works:**
- `dotnet build` succeeds (SK NuGet resolved correctly) - `dotnet build` succeeds
- At runtime: inject `Kernel` into a test controller and verify it resolves - At runtime: inject `Kernel` into a test controller and verify it resolves
- Quick smoke test: call `kernel.GetRequiredService<IChatCompletionService>()` — should not throw - Quick smoke test: `kernel.GetRequiredService<IChatCompletionService>()` — should not throw
- Full test: the diagnostic stream-test endpoint (see T6b below)
**If it breaks — diagnostic checklist:** **If it breaks — diagnostic checklist:**
- Symptom: 404 on LLM requests - Symptom: `CredentialUnavailableException` with "DefaultAzureCredential failed to retrieve a token"
Cause: Base URL missing `/v1` Cause: Developer not logged in to Azure CLI
Fix: Change `http://localhost:8317` to `http://localhost:8317/v1` Fix: Run `az login --tenant <tenant-id>`, then restart CRC.Server
- Symptom: `HttpRequestException: Connection refused` - Symptom: HTTP 403 Forbidden from Azure OpenAI
Cause: CLIProxyAPI not running Cause: Azure AD identity lacks "Cognitive Services OpenAI User" role
Fix: Start CLIProxyAPI on the target machine, verify with `curl http://localhost:8317/v1/models` Fix: Ask Azure admin to grant the role on the Azure OpenAI resource
- Symptom: HTTP 404 on Azure OpenAI endpoint
Cause: Wrong deployment name, or deployment doesn't exist
Fix: Verify deployment name in Azure portal → Azure OpenAI → Deployments
- Symptom: `InvalidOperationException: No service for type IChatCompletionService` - Symptom: `InvalidOperationException: No service for type IChatCompletionService`
Cause: `AddOpenAIChatCompletion()` not called before `AddKernel()` Cause: `AddAzureOpenAIChatCompletion()` not called before `AddKernel()`
Fix: Ensure registration order: OpenAIChatCompletion first, then Kernel Fix: Ensure registration order: AzureOpenAIChatCompletion first, then Kernel
---
### T6b: Verify streaming hop 1 (Azure OpenAI → CRC.Server)
**Prerequisites:** T6 (SK registered), T7 (controller exists — or add the diagnostic endpoint to any controller temporarily).
**Context:** Before building the full UI, verify that tokens actually stream from Azure OpenAI through CRC.Server. This catches buffering issues early (response compression middleware, Azure API Management, corporate proxies).
**Step-by-step:**
1. Add a temporary diagnostic endpoint to NlxvaPricerController (see Critical Pattern #8 in export-spec)
2. Run: `curl -N https://localhost:7100/api/nlxva-pricer/stream-test`
3. Watch the timestamps in the output
**What correct streaming looks like:**
```
data: [450ms] 1 ← timestamps spread across seconds
data: [620ms]
data: [780ms] 2
data: [950ms]
data: [1100ms] 3
```
**What buffered streaming looks like:**
```
data: [8200ms] 1 ← all timestamps clustered at the end
data: [8201ms]
data: [8202ms] 2
data: [8203ms]
```
**If buffered — check these in order:**
1. **Response compression middleware**: If CRC.Server has `app.UseResponseCompression()`, it buffers SSE to compress. Add `Response.Headers["Content-Encoding"] = "identity";` in the controller to opt out.
2. **Azure API Management (APIM)**: If APIM sits in front of the Azure OpenAI resource, it buffers by default. Need `forward-request` policy with `buffer-response="false"`.
3. **Corporate HTTPS proxy**: Check `echo $HTTPS_PROXY` on the server. May need proxy bypass for `*.openai.azure.com`.
4. **IIS**: If CRC runs under IIS, add `responseBufferLimit="0"` in web.config.
**Always set these headers on SSE endpoints:**
```csharp
Response.ContentType = "text/event-stream";
Response.Headers["Cache-Control"] = "no-cache";
Response.Headers["X-Accel-Buffering"] = "no"; // prevents NGINX buffering
```
5. Remove the diagnostic endpoint after verification.
--- ---
@@ -561,10 +633,11 @@ Check CRC's MudBlazor version first: `grep MudBlazor CRC.Client.csproj`
**Step-by-step checklist:** **Step-by-step checklist:**
- [ ] `NlxvaPricer:LlmBaseUrl` in CRC.Server `appsettings.json` — default `http://localhost:8317/v1` - [ ] `NlxvaPricer:AzureOpenAIEndpoint` in CRC.Server `appsettings.json` — e.g., `https://your-resource.openai.azure.com/` — **no `/v1`**. What happens if missing: SK registration fails at startup
- [ ] `NlxvaPricer:LlmModel` in CRC.Server `appsettings.json` — default `claude-sonnet-4-6` - [ ] `NlxvaPricer:DeploymentName` in CRC.Server `appsettings.json` — the Azure deployment name (not model name). Get from Azure portal → Azure OpenAI → Deployments
- [ ] `NlxvaPricer:LlmApiKey` in CRC.Server `appsettings.json` — default `not-needed` - [ ] `NlxvaPricer:TenantId` in CRC.Server `appsettings.json` — Azure AD tenant ID. Same tenant CRC uses for Microsoft.Identity.Web auth
- [ ] `NlxvaPricer:FewShotPath` in CRC.Server `appsettings.json` — default `examples/extraction` - [ ] `NlxvaPricer:FewShotPath` in CRC.Server `appsettings.json` — default `examples/extraction`
- [ ] Developer has run `az login --tenant <tenant-id>` — `DefaultAzureCredential` needs this. Failure shows at first LLM call, not at startup
- [ ] `ExternalApis:CounterpartyBaseUrl` — default `http://localhost:5000/api/counterparty` (or use CRC's existing) - [ ] `ExternalApis:CounterpartyBaseUrl` — default `http://localhost:5000/api/counterparty` (or use CRC's existing)
- [ ] `ExternalApis:TradeBaseUrl` — default `http://localhost:5000/api/trade` (or use CRC's existing) - [ ] `ExternalApis:TradeBaseUrl` — default `http://localhost:5000/api/trade` (or use CRC's existing)
- [ ] `ExternalApis:CurrencyBaseUrl` — default `http://localhost:5000/api/currency` (or use CRC's existing) - [ ] `ExternalApis:CurrencyBaseUrl` — default `http://localhost:5000/api/currency` (or use CRC's existing)
@@ -579,7 +652,7 @@ Check CRC's MudBlazor version first: `grep MudBlazor CRC.Client.csproj`
**Full verification sequence:** **Full verification sequence:**
1. `dotnet build --configuration release CRC.sln` — 0 errors, 0 new warnings 1. `dotnet build --configuration release CRC.sln` — 0 errors, 0 new warnings
2. Start CLIProxyAPI on target machine 2. Ensure developer has run `az login --tenant <tenant-id>`
3. Start CRC.Server 3. Start CRC.Server
4. Navigate to CRC.Client in browser 4. Navigate to CRC.Client in browser
5. Verify "NL XVA Pricer" appears in sidebar 5. Verify "NL XVA Pricer" appears in sidebar
@@ -598,7 +671,7 @@ Check CRC's MudBlazor version first: `grep MudBlazor CRC.Client.csproj`
| # | Symptom | Likely Cause | Fix | | # | Symptom | Likely Cause | Fix |
|---|---|---|---| |---|---|---|---|
| 1 | 404 on `/v1/chat/completions` | Base URL missing `/v1` suffix | Set `NlxvaPricer:LlmBaseUrl` to `http://localhost:8317/v1` | | 1 | 404 on Azure OpenAI endpoint | Wrong deployment name or endpoint URL | Verify deployment name in Azure portal; endpoint should be `https://<resource>.openai.azure.com/` with NO `/v1` |
| 2 | CORS 403 in browser console | CORS policy doesn't cover CRC.Client origin or `text/event-stream` | Add CRC.Client origin with `AllowAnyHeader()` in CORS config | | 2 | CORS 403 in browser console | CORS policy doesn't cover CRC.Client origin or `text/event-stream` | Add CRC.Client origin with `AllowAnyHeader()` in CORS config |
| 3 | No streaming — entire response at once | `SetBrowserResponseStreamingEnabled(true)` missing on client | Add to HttpRequestMessage before SendAsync | | 3 | No streaming — entire response at once | `SetBrowserResponseStreamingEnabled(true)` missing on client | Add to HttpRequestMessage before SendAsync |
| 4 | `NotSupportedException: Synchronous operations` | Using `reader.EndOfStream` in WASM | Replace with `while ((line = await ReadLineAsync()) != null)` | | 4 | `NotSupportedException: Synchronous operations` | Using `reader.EndOfStream` in WASM | Replace with `while ((line = await ReadLineAsync()) != null)` |
@@ -611,7 +684,10 @@ Check CRC's MudBlazor version first: `grep MudBlazor CRC.Client.csproj`
| 11 | `FileNotFoundException` for instruction-template.txt | Examples folder not at ContentRootPath | Log ContentRootPath; verify examples location; update FewShotPath config | | 11 | `FileNotFoundException` for instruction-template.txt | Examples folder not at ContentRootPath | Log ContentRootPath; verify examples location; update FewShotPath config |
| 12 | Empty few-shot examples (only system message) | Subdirectory structure wrong | Verify `examples/extraction/few-shot/01/input.html` exists | | 12 | Empty few-shot examples (only system message) | Subdirectory structure wrong | Verify `examples/extraction/few-shot/01/input.html` exists |
| 13 | `NuGet restore error` for SemanticKernel | Package not in GV Artifactory feed | Request mirroring or temporary nuget.org source | | 13 | `NuGet restore error` for SemanticKernel | Package not in GV Artifactory feed | Request mirroring or temporary nuget.org source |
| 14 | `HttpRequestException: Connection refused` | CLIProxyAPI not running | Start proxy; verify with `curl http://localhost:8317/v1/models` | | 14 | `CredentialUnavailableException` from DefaultAzureCredential | Developer not logged in via Azure CLI | Run `az login --tenant <tenant-id>`, restart CRC.Server |
| 14b | HTTP 403 from Azure OpenAI | Azure AD identity lacks role | Grant "Cognitive Services OpenAI User" on the Azure OpenAI resource |
| 14c | All tokens arrive at once (no streaming) | Response compression or proxy buffering | Use stream-test diagnostic endpoint; check `UseResponseCompression()`; set `X-Accel-Buffering: no` header |
| 14d | Streaming works in curl but not in browser | Response compression only applied for browser Accept-Encoding | Add `Response.Headers["Content-Encoding"] = "identity"` in SSE endpoints |
| 15 | Drag-drop file not triggering extraction | `file-drop.js` not loaded | Check `<script>` tag in index.html; check browser console for JS errors | | 15 | Drag-drop file not triggering extraction | `file-drop.js` not loaded | Check `<script>` tag in index.html; check browser console for JS errors |
| 16 | `window.fileDrop is undefined` | Script loaded after Blazor framework init | Move `<script>` tag before `_framework/blazor.webassembly.js` | | 16 | `window.fileDrop is undefined` | Script loaded after Blazor framework init | Move `<script>` tag before `_framework/blazor.webassembly.js` |
| 17 | `JsonException` when parsing SSE data | SSE line doesn't match expected format | Add logging for raw SSE lines; check server-side WriteSSEAsync format | | 17 | `JsonException` when parsing SSE data | SSE line doesn't match expected format | Add logging for raw SSE lines; check server-side WriteSSEAsync format |
@@ -627,6 +703,20 @@ Check CRC's MudBlazor version first: `grep MudBlazor CRC.Client.csproj`
- **NuGet source:** Available on nuget.org. If CRC's GV Artifactory doesn't mirror it, this is a blocker — request mirroring. - **NuGet source:** Available on nuget.org. If CRC's GV Artifactory doesn't mirror it, this is a blocker — request mirroring.
- **Size:** ~5MB total with dependencies - **Size:** ~5MB total with dependencies
### Microsoft.SemanticKernel.Connectors.AzureOpenAI
- **Why needed:** Azure OpenAI-specific connector for SK (provides `AddAzureOpenAIChatCompletion()`)
- **.NET compatibility:** Same as core SK package
- **Transitive dependencies:** Pulls in `Azure.AI.OpenAI` SDK
- **NuGet source:** Same as core SK — nuget.org
- **Note:** This is separate from the core SK package. Without it, only `AddOpenAIChatCompletion()` is available (for non-Azure endpoints).
### Azure.Identity
- **Why needed:** Provides `DefaultAzureCredential` for Azure AD authentication to Azure OpenAI
- **.NET compatibility:** .NET Standard 2.0+ (compatible with everything)
- **CRC likely already has this** — it uses `Microsoft.Identity.Web` for Azure AD auth. Check `grep Azure.Identity CRC.Server.csproj`.
- **Version conflicts:** If CRC has an older version, SK may pull in a newer one. Usually compatible, but verify with `dotnet build`.
- **NuGet source:** Available on nuget.org and commonly mirrored in enterprise feeds
### Markdig (1.1.1) ### Markdig (1.1.1)
- **Why needed:** Markdown → HTML conversion for rendering LLM responses - **Why needed:** Markdown → HTML conversion for rendering LLM responses
- **.NET compatibility:** .NET Standard 2.0+ (compatible with everything) - **.NET compatibility:** .NET Standard 2.0+ (compatible with everything)
@@ -664,8 +754,10 @@ If the feature needs to be removed:
**NuGet packages to remove:** **NuGet packages to remove:**
- `Microsoft.SemanticKernel` from CRC.Server - `Microsoft.SemanticKernel` from CRC.Server
- `Microsoft.SemanticKernel.Connectors.AzureOpenAI` from CRC.Server
- `Azure.Identity` from CRC.Server (only if not used by other CRC features — likely IS used, so leave it)
- `Markdig` from CRC.Client (if not used by other features) - `Markdig` from CRC.Client (if not used by other features)
**Config keys to remove:** **Config keys to remove:**
- `NlxvaPricer:*` section from `appsettings.json` - `NlxvaPricer:*` section (AzureOpenAIEndpoint, DeploymentName, TenantId, FewShotPath) from `appsettings.json`
- `ExternalApis:*` section (if only used by this feature) - `ExternalApis:*` section (if only used by this feature)

View File

@@ -58,6 +58,8 @@ This feature is a GUEST in CRC. Existing code, patterns, and conventions take ab
Add to `CRC.Server`: Add to `CRC.Server`:
- `Microsoft.SemanticKernel` (latest stable, >=1.x) - `Microsoft.SemanticKernel` (latest stable, >=1.x)
- `Microsoft.SemanticKernel.Connectors.AzureOpenAI` (for Azure OpenAI connector)
- `Azure.Identity` (for `DefaultAzureCredential` — CRC may already have this)
- `Markdig` 1.1.1 (if CRC.Client doesn't already have it — check first) - `Markdig` 1.1.1 (if CRC.Client doesn't already have it — check first)
No new packages for CRC.Client or CRC.Shared (MudBlazor already present). No new packages for CRC.Client or CRC.Shared (MudBlazor already present).
@@ -73,7 +75,7 @@ CRC.Server (ASP.NET Core)
├── NlxvaPricerController ├── NlxvaPricerController
│ ├── POST /api/nlxva-pricer/chat (general chat) │ ├── POST /api/nlxva-pricer/chat (general chat)
│ └── POST /api/nlxva-pricer/extract (email extraction) │ └── POST /api/nlxva-pricer/extract (email extraction)
│ Uses: Semantic Kernel → CLIProxyAPI (OpenAI-compatible proxy) │ Uses: Semantic Kernel → Azure OpenAI (via DefaultAzureCredential)
│ Uses: ExtractionPlugin (tool calling) │ Uses: ExtractionPlugin (tool calling)
│ Uses: FewShotService (example loading) │ Uses: FewShotService (example loading)
├── Services/ ├── Services/
@@ -232,9 +234,9 @@ data: {"error":"message"}\n\n ← on failure (followed by [DONE])
```json ```json
{ {
"NlxvaPricer": { "NlxvaPricer": {
"LlmBaseUrl": "http://localhost:8317/v1", "AzureOpenAIEndpoint": "https://your-resource.openai.azure.com/",
"LlmModel": "claude-sonnet-4-6", "DeploymentName": "gpt4o-prod",
"LlmApiKey": "not-needed", "TenantId": "<your-azure-ad-tenant-id>",
"FewShotPath": "examples/extraction" "FewShotPath": "examples/extraction"
}, },
"ExternalApis": { "ExternalApis": {
@@ -245,6 +247,11 @@ data: {"error":"message"}\n\n ← on failure (followed by [DONE])
} }
``` ```
If using API key auth instead of Azure AD, replace `TenantId` with:
```json
"ApiKey": "<your-azure-openai-api-key>"
```
## Critical Patterns ## Critical Patterns
### 1. SSE streaming in Blazor WASM — DO NOT use `reader.EndOfStream` ### 1. SSE streaming in Blazor WASM — DO NOT use `reader.EndOfStream`
@@ -276,16 +283,33 @@ while ((line = await reader.ReadLineAsync()) != null) // ← NOT EndOfStream
`SetBrowserResponseStreamingEnabled(true)` is a Blazor WASM extension that tells the browser Fetch API `SetBrowserResponseStreamingEnabled(true)` is a Blazor WASM extension that tells the browser Fetch API
to expose the response as a ReadableStream. Without it, the browser buffers the entire response. to expose the response as a ReadableStream. Without it, the browser buffers the entire response.
### 2. Semantic Kernel base URL must include `/v1` ### 2. Azure OpenAI: use deployment name, NOT model name; NO `/v1` suffix
**Why:** The OpenAI SDK appends `chat/completions` directly to the base URL. **Why:** Azure OpenAI uses `AddAzureOpenAIChatCompletion()`, not `AddOpenAIChatCompletion()`.
Without `/v1`, requests hit `/chat/completions` instead of `/v1/chat/completions` → 404. The endpoint is your Azure resource URL (no `/v1` — the Azure SDK constructs the path internally).
The `deploymentName` is the name you gave the deployment in Azure portal, not the model name.
Auth uses `DefaultAzureCredential` with the tenant ID, not an API key.
```csharp ```csharp
builder.Services.AddOpenAIChatCompletion( using Azure.Identity;
modelId: model,
endpoint: new Uri("http://localhost:8317/v1"), // ← MUST include /v1 builder.Services.AddAzureOpenAIChatCompletion(
apiKey: "not-needed"); deploymentName: builder.Configuration["NlxvaPricer:DeploymentName"] ?? "gpt4o-prod",
endpoint: builder.Configuration["NlxvaPricer:AzureOpenAIEndpoint"]
?? "https://your-resource.openai.azure.com/",
credentials: new DefaultAzureCredential(
new DefaultAzureCredentialOptions
{
TenantId = builder.Configuration["NlxvaPricer:TenantId"]
}));
```
If using API key instead of Azure AD:
```csharp
builder.Services.AddAzureOpenAIChatCompletion(
deploymentName: "gpt4o-prod",
endpoint: "https://your-resource.openai.azure.com/",
apiKey: builder.Configuration["NlxvaPricer:ApiKey"]);
``` ```
### 3. Layout height depends on AppBar height ### 3. Layout height depends on AppBar height
@@ -371,18 +395,72 @@ if (!string.IsNullOrEmpty(parsedText))
yield return parsedText; yield return parsedText;
``` ```
### 8. SSE response buffering — verify both streaming hops
**Why:** The architecture has two streaming hops: Azure OpenAI → CRC.Server → Browser.
If anything buffers in either hop, the user sees no tokens until the full response completes.
Common buffers: response compression middleware, reverse proxies (NGINX/IIS), Azure API Management.
**Diagnostic endpoint (add temporarily, remove after verifying):**
```csharp
[HttpGet("stream-test")]
public async Task StreamTest()
{
Response.ContentType = "text/event-stream";
Response.Headers["Cache-Control"] = "no-cache";
Response.Headers["X-Accel-Buffering"] = "no"; // NGINX hint
var chatService = _kernel.GetRequiredService<IChatCompletionService>();
var history = new ChatHistory();
history.AddUserMessage("Count from 1 to 10, one number per line.");
var sw = System.Diagnostics.Stopwatch.StartNew();
await foreach (var chunk in chatService.GetStreamingChatMessageContentsAsync(history))
{
if (!string.IsNullOrEmpty(chunk.Content))
{
await Response.WriteAsync($"data: [{sw.ElapsedMilliseconds}ms] {chunk.Content}\n\n");
await Response.Body.FlushAsync();
}
}
await Response.WriteAsync("data: [DONE]\n\n");
}
```
**Test with:** `curl -N https://localhost:7100/api/nlxva-pricer/stream-test`
- Timestamps spread over seconds = streaming works
- All timestamps clustered at the end = something is buffering
**If CRC.Server uses `UseResponseCompression()`, exclude SSE:**
```csharp
Response.Headers["Content-Encoding"] = "identity"; // opt out per-response
```
**Response headers to always set on SSE endpoints:**
```csharp
Response.ContentType = "text/event-stream";
Response.Headers["Cache-Control"] = "no-cache";
Response.Headers["X-Accel-Buffering"] = "no"; // prevents NGINX buffering
```
## Wiring ## Wiring
### CRC.Server DI registration order (add to existing Program.cs / Startup.cs) ### CRC.Server DI registration order (add to existing Program.cs / Startup.cs)
```csharp ```csharp
// 1. Semantic Kernel — OpenAI-compatible connector // 1. Semantic Kernel — Azure OpenAI connector with Azure AD auth
var llmBaseUrl = builder.Configuration["NlxvaPricer:LlmBaseUrl"] ?? "http://localhost:8317/v1"; using Azure.Identity;
var llmModel = builder.Configuration["NlxvaPricer:LlmModel"] ?? "claude-sonnet-4-6";
builder.Services.AddOpenAIChatCompletion( var azureEndpoint = builder.Configuration["NlxvaPricer:AzureOpenAIEndpoint"]
modelId: llmModel, ?? "https://your-resource.openai.azure.com/";
endpoint: new Uri(llmBaseUrl), var deploymentName = builder.Configuration["NlxvaPricer:DeploymentName"] ?? "gpt4o-prod";
apiKey: builder.Configuration["NlxvaPricer:LlmApiKey"] ?? "not-needed"); var tenantId = builder.Configuration["NlxvaPricer:TenantId"];
builder.Services.AddAzureOpenAIChatCompletion(
deploymentName: deploymentName,
endpoint: azureEndpoint,
credentials: new DefaultAzureCredential(
new DefaultAzureCredentialOptions { TenantId = tenantId }));
builder.Services.AddKernel(); builder.Services.AddKernel();
// 2. External API typed HttpClients // 2. External API typed HttpClients