fix: update export bundle for Azure OpenAI and add streaming diagnostics
Replace CLIProxyAPI/local proxy references with Azure OpenAI using DefaultAzureCredential and tenant ID auth. Add Critical Pattern #8 for SSE buffering diagnostics with timestamped curl test. Add streaming verification tasks (T6b, T15) and troubleshooting entries for Azure AD auth, RBAC, response compression, and proxy buffering. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -58,6 +58,8 @@ This feature is a GUEST in CRC. Existing code, patterns, and conventions take ab
|
||||
|
||||
Add to `CRC.Server`:
|
||||
- `Microsoft.SemanticKernel` (latest stable, >=1.x)
|
||||
- `Microsoft.SemanticKernel.Connectors.AzureOpenAI` (for Azure OpenAI connector)
|
||||
- `Azure.Identity` (for `DefaultAzureCredential` — CRC may already have this)
|
||||
- `Markdig` 1.1.1 (if CRC.Client doesn't already have it — check first)
|
||||
|
||||
No new packages for CRC.Client or CRC.Shared (MudBlazor already present).
|
||||
@@ -73,7 +75,7 @@ CRC.Server (ASP.NET Core)
|
||||
├── NlxvaPricerController
|
||||
│ ├── POST /api/nlxva-pricer/chat (general chat)
|
||||
│ └── POST /api/nlxva-pricer/extract (email extraction)
|
||||
│ Uses: Semantic Kernel → CLIProxyAPI (OpenAI-compatible proxy)
|
||||
│ Uses: Semantic Kernel → Azure OpenAI (via DefaultAzureCredential)
|
||||
│ Uses: ExtractionPlugin (tool calling)
|
||||
│ Uses: FewShotService (example loading)
|
||||
├── Services/
|
||||
@@ -232,9 +234,9 @@ data: {"error":"message"}\n\n ← on failure (followed by [DONE])
|
||||
```json
|
||||
{
|
||||
"NlxvaPricer": {
|
||||
"LlmBaseUrl": "http://localhost:8317/v1",
|
||||
"LlmModel": "claude-sonnet-4-6",
|
||||
"LlmApiKey": "not-needed",
|
||||
"AzureOpenAIEndpoint": "https://your-resource.openai.azure.com/",
|
||||
"DeploymentName": "gpt4o-prod",
|
||||
"TenantId": "<your-azure-ad-tenant-id>",
|
||||
"FewShotPath": "examples/extraction"
|
||||
},
|
||||
"ExternalApis": {
|
||||
@@ -245,6 +247,11 @@ data: {"error":"message"}\n\n ← on failure (followed by [DONE])
|
||||
}
|
||||
```
|
||||
|
||||
If using API key auth instead of Azure AD, replace `TenantId` with:
|
||||
```json
|
||||
"ApiKey": "<your-azure-openai-api-key>"
|
||||
```
|
||||
|
||||
## Critical Patterns
|
||||
|
||||
### 1. SSE streaming in Blazor WASM — DO NOT use `reader.EndOfStream`
|
||||
@@ -276,16 +283,33 @@ while ((line = await reader.ReadLineAsync()) != null) // ← NOT EndOfStream
|
||||
`SetBrowserResponseStreamingEnabled(true)` is a Blazor WASM extension that tells the browser Fetch API
|
||||
to expose the response as a ReadableStream. Without it, the browser buffers the entire response.
|
||||
|
||||
### 2. Semantic Kernel base URL must include `/v1`
|
||||
### 2. Azure OpenAI: use deployment name, NOT model name; NO `/v1` suffix
|
||||
|
||||
**Why:** The OpenAI SDK appends `chat/completions` directly to the base URL.
|
||||
Without `/v1`, requests hit `/chat/completions` instead of `/v1/chat/completions` → 404.
|
||||
**Why:** Azure OpenAI uses `AddAzureOpenAIChatCompletion()`, not `AddOpenAIChatCompletion()`.
|
||||
The endpoint is your Azure resource URL (no `/v1` — the Azure SDK constructs the path internally).
|
||||
The `deploymentName` is the name you gave the deployment in Azure portal, not the model name.
|
||||
Auth uses `DefaultAzureCredential` with the tenant ID, not an API key.
|
||||
|
||||
```csharp
|
||||
builder.Services.AddOpenAIChatCompletion(
|
||||
modelId: model,
|
||||
endpoint: new Uri("http://localhost:8317/v1"), // ← MUST include /v1
|
||||
apiKey: "not-needed");
|
||||
using Azure.Identity;
|
||||
|
||||
builder.Services.AddAzureOpenAIChatCompletion(
|
||||
deploymentName: builder.Configuration["NlxvaPricer:DeploymentName"] ?? "gpt4o-prod",
|
||||
endpoint: builder.Configuration["NlxvaPricer:AzureOpenAIEndpoint"]
|
||||
?? "https://your-resource.openai.azure.com/",
|
||||
credentials: new DefaultAzureCredential(
|
||||
new DefaultAzureCredentialOptions
|
||||
{
|
||||
TenantId = builder.Configuration["NlxvaPricer:TenantId"]
|
||||
}));
|
||||
```
|
||||
|
||||
If using API key instead of Azure AD:
|
||||
```csharp
|
||||
builder.Services.AddAzureOpenAIChatCompletion(
|
||||
deploymentName: "gpt4o-prod",
|
||||
endpoint: "https://your-resource.openai.azure.com/",
|
||||
apiKey: builder.Configuration["NlxvaPricer:ApiKey"]);
|
||||
```
|
||||
|
||||
### 3. Layout height depends on AppBar height
|
||||
@@ -371,18 +395,72 @@ if (!string.IsNullOrEmpty(parsedText))
|
||||
yield return parsedText;
|
||||
```
|
||||
|
||||
### 8. SSE response buffering — verify both streaming hops
|
||||
|
||||
**Why:** The architecture has two streaming hops: Azure OpenAI → CRC.Server → Browser.
|
||||
If anything buffers in either hop, the user sees no tokens until the full response completes.
|
||||
Common buffers: response compression middleware, reverse proxies (NGINX/IIS), Azure API Management.
|
||||
|
||||
**Diagnostic endpoint (add temporarily, remove after verifying):**
|
||||
```csharp
|
||||
[HttpGet("stream-test")]
|
||||
public async Task StreamTest()
|
||||
{
|
||||
Response.ContentType = "text/event-stream";
|
||||
Response.Headers["Cache-Control"] = "no-cache";
|
||||
Response.Headers["X-Accel-Buffering"] = "no"; // NGINX hint
|
||||
|
||||
var chatService = _kernel.GetRequiredService<IChatCompletionService>();
|
||||
var history = new ChatHistory();
|
||||
history.AddUserMessage("Count from 1 to 10, one number per line.");
|
||||
|
||||
var sw = System.Diagnostics.Stopwatch.StartNew();
|
||||
await foreach (var chunk in chatService.GetStreamingChatMessageContentsAsync(history))
|
||||
{
|
||||
if (!string.IsNullOrEmpty(chunk.Content))
|
||||
{
|
||||
await Response.WriteAsync($"data: [{sw.ElapsedMilliseconds}ms] {chunk.Content}\n\n");
|
||||
await Response.Body.FlushAsync();
|
||||
}
|
||||
}
|
||||
await Response.WriteAsync("data: [DONE]\n\n");
|
||||
}
|
||||
```
|
||||
|
||||
**Test with:** `curl -N https://localhost:7100/api/nlxva-pricer/stream-test`
|
||||
- Timestamps spread over seconds = streaming works
|
||||
- All timestamps clustered at the end = something is buffering
|
||||
|
||||
**If CRC.Server uses `UseResponseCompression()`, exclude SSE:**
|
||||
```csharp
|
||||
Response.Headers["Content-Encoding"] = "identity"; // opt out per-response
|
||||
```
|
||||
|
||||
**Response headers to always set on SSE endpoints:**
|
||||
```csharp
|
||||
Response.ContentType = "text/event-stream";
|
||||
Response.Headers["Cache-Control"] = "no-cache";
|
||||
Response.Headers["X-Accel-Buffering"] = "no"; // prevents NGINX buffering
|
||||
```
|
||||
|
||||
## Wiring
|
||||
|
||||
### CRC.Server DI registration order (add to existing Program.cs / Startup.cs)
|
||||
|
||||
```csharp
|
||||
// 1. Semantic Kernel — OpenAI-compatible connector
|
||||
var llmBaseUrl = builder.Configuration["NlxvaPricer:LlmBaseUrl"] ?? "http://localhost:8317/v1";
|
||||
var llmModel = builder.Configuration["NlxvaPricer:LlmModel"] ?? "claude-sonnet-4-6";
|
||||
builder.Services.AddOpenAIChatCompletion(
|
||||
modelId: llmModel,
|
||||
endpoint: new Uri(llmBaseUrl),
|
||||
apiKey: builder.Configuration["NlxvaPricer:LlmApiKey"] ?? "not-needed");
|
||||
// 1. Semantic Kernel — Azure OpenAI connector with Azure AD auth
|
||||
using Azure.Identity;
|
||||
|
||||
var azureEndpoint = builder.Configuration["NlxvaPricer:AzureOpenAIEndpoint"]
|
||||
?? "https://your-resource.openai.azure.com/";
|
||||
var deploymentName = builder.Configuration["NlxvaPricer:DeploymentName"] ?? "gpt4o-prod";
|
||||
var tenantId = builder.Configuration["NlxvaPricer:TenantId"];
|
||||
|
||||
builder.Services.AddAzureOpenAIChatCompletion(
|
||||
deploymentName: deploymentName,
|
||||
endpoint: azureEndpoint,
|
||||
credentials: new DefaultAzureCredential(
|
||||
new DefaultAzureCredentialOptions { TenantId = tenantId }));
|
||||
builder.Services.AddKernel();
|
||||
|
||||
// 2. External API typed HttpClients
|
||||
|
||||
Reference in New Issue
Block a user