Files

local d9878dea73 docs: complete project research

2026-03-27 00:59:24 +00:00

22 KiB

Raw Blame History

Project Research Summary

Project: Blazor WebAssembly AI Chat Application Domain: Single-user personal AI chat web app (.NET / C# / OpenAI GPT) Researched: 2026-03-27 Confidence: HIGH

Executive Summary

This is a single-user personal AI chat application built on Blazor WebAssembly with an ASP.NET Core backend. The project has a dual purpose: functioning as a useful personal tool and serving as a tutorial-quality reference implementation for Blazor WASM patterns. The recommended architecture is a strict two-project split — a standalone Blazor WASM client and a separate ASP.NET Core Minimal API server — reflecting a breaking change in .NET 8+ that removed the hosted Blazor WASM template. The client runs entirely in the browser; the server holds secrets, calls OpenAI, and manages disk persistence. This boundary is non-negotiable and must be established before any feature code is written.

The core technical challenge is streaming. OpenAI's token-by-token streaming requires explicit opt-in in Blazor WASM (SetBrowserResponseStreamingEnabled(true)) that the SDK does not set by default — the stream silently falls back to buffered delivery with no error. Combined with the need to call StateHasChanged() on every token to update the UI, streaming is the highest-risk implementation step and must be validated early. All other features — conversation management, markdown rendering, copy-to-clipboard — are well-understood patterns with clear .NET implementations.

The key risk profile is concentrated in Phase 1 (architecture foundation) and the streaming phase. Three "never" mistakes — putting the API key in WASM, writing file I/O in the WASM project, and calling OpenAI directly from WASM — must be locked out architecturally before feature development begins. Once those boundaries are established, the remainder of the v1 feature set follows a clear dependency chain from storage to conversations to streaming to UI polish.

Key Findings

Recommended Stack

The stack is .NET 9 throughout: a blazorwasm standalone client and a webapi backend in a single solution, connected by HTTP and SSE. There are no exotic dependencies — the official OpenAI NuGet package (2.9.1, published by OpenAI directly) handles AI calls, Markdig (1.1.1) handles markdown-to-HTML conversion, MudBlazor (9.2.0) provides Material Design UI components with zero JavaScript dependencies, and System.Text.Json (built-in) handles JSON serialization and file storage. All versions are confirmed compatible with .NET 9.

The most important stack decision is what to exclude: do not use OpenAI-DotNet (unofficial community package), Microsoft.SemanticKernel (excessive abstraction for v1), Newtonsoft.Json (superseded by System.Text.Json), Blazored.LocalStorage (wrong persistence layer for this architecture), or JSInterop for streaming (WASM has a native streaming opt-in that avoids it).

Core technologies:

.NET 9 SDK + C# 13: Runtime and language — stable, LTS-adjacent, both project types target net9.0
Blazor WebAssembly (standalone): Client SPA — non-negotiable per project constraints; runs in-browser with no server round-trip for UI
ASP.NET Core Minimal API: Backend proxy — required to keep the OpenAI key server-side and to handle disk I/O that WASM cannot perform
OpenAI 2.9.1 (official): AI calls — CompleteChatStreamingAsync() with await foreach; the only correct .NET SDK choice
Markdig 1.1.1: Markdown rendering — de facto .NET standard; CommonMark-compliant; renders via @((MarkupString)html) with no JS interop
MudBlazor 9.2.0: UI components — pure C#, zero JS dependencies, comprehensive chat-friendly components, free license
System.Text.Json (built-in): Persistence — serialize conversations to JSON files on the server; no extra dependency

Expected Features

The feature set is well-defined by comparison to ChatGPT and Claude as reference products. The v1 scope is intentionally constrained to what makes the app genuinely usable, with explicit anti-features documented to prevent scope creep during implementation.

Must have (table stakes):

Send message / receive streaming response — core loop; blocking responses are unacceptable by 2026 standards
Markdown rendering with syntax-highlighted code blocks — GPT always responds with markdown; raw output is unusable
Multiple named conversations with create / switch / delete — without this, the app is a single disposable thread
JSON file persistence across sessions — conversations must survive page refresh to be useful
Auto-scroll to latest message and loading indicator — baseline polish that makes the app feel complete
Copy-to-clipboard on code blocks — high-frequency action for the developer-focused target user
Input disabled during streaming and send-on-Enter — prevents double-submit and matches chat conventions
Tutorial-style inline code comments — the project's defining purpose as a learning resource

Should have (competitive, v1.x):

Auto-generated conversation titles — reduces naming friction; single GPT summarization call
System prompt / persona configuration — power-user feature; natural extension once multi-conversation works
Model selector (GPT-4o vs GPT-4o-mini) — cost/quality tradeoff; low implementation cost
Export conversation to markdown/text — low complexity, occasional high value

Defer (v2+):

Message edit and regenerate — medium complexity; wait until core loop is solid
Token usage display — streaming completion handling required; not blocking
LangChain / agentic workflows, RAG, MCP server integration — explicitly v2 per project intent
Voice input/output, image uploads, multi-user auth, PWA — all documented anti-features with clear rationale

Feature dependencies are explicit: JSON storage must precede conversation management, which must precede conversation switching. A basic blocking API call must precede streaming. Markdown must precede syntax highlighting, which must precede copy-to-clipboard.

Architecture Approach

The architecture is a strict two-tier system: a Blazor WASM SPA in the browser communicating with an ASP.NET Core Minimal API via HTTP and SSE. State on the client is managed by a singleton ConversationStateService that raises OnChange events — components subscribe in OnInitialized and unsubscribe in Dispose. There is a Shared library project that holds Conversation and ChatMessage models used by both tiers, eliminating duplicate DTOs.

Components are kept intentionally thin (data in via [Parameter], actions out via EventCallback<T>). All logic lives in services. This is explicitly stated as a tutorial goal — fat components teach bad habits and are hard to explain.

Major components:

ConversationStateService (WASM singleton) — active conversation, message list, streaming flag; raises OnChange for all subscribed components
ChatApiClient (WASM scoped service) — wraps HttpClient, handles SSE stream reading with SetBrowserResponseStreamingEnabled(true) and ResponseHeadersRead
OpenAiService (server scoped) — wraps official OpenAI SDK, returns IAsyncEnumerable<string> of tokens to endpoint handlers
ConversationRepository (server singleton) — reads/writes JSON files under a configurable data directory; uses SemaphoreSlim(1,1) for write serialization
ChatEndpoints + ConversationEndpoints (server Minimal API) — thin HTTP layer wiring services to routes; SSE streaming endpoint proxies tokens to client
Leaf UI components: MessageBubble, ChatInput, ConversationList, MessageList — pure display, no service calls
Container component: ChatPage — composes all child components, owns the route (@page "/chat/{id?}")

Build order: Shared models → ConversationRepository → OpenAiService → server endpoints → ChatApiClient → ConversationStateService → leaf UI → container UI. This maps directly to implementation phases.

Critical Pitfalls

Streaming silently broken in WASM (Pitfall 1 + 4) — Two distinct failure modes that appear identical: (a) the OpenAI SDK does not set SetBrowserResponseStreamingEnabled(true) so the browser buffers the entire response; (b) StateHasChanged() is not called per-token so Blazor batches all renders until the stream completes. Both produce the same symptom — tokens appear all at once. Fix: custom BlazorHttpClientTransport on the backend, and explicit StateHasChanged() + await Task.Yield() inside the await foreach token loop. Throttle to ~50ms intervals to prevent UI thread starvation at GPT-4o token speeds.
API key exposure in WASM (Pitfall 2) — wwwroot/appsettings.json is a static file served to any browser visitor. dotnet user-secrets in WASM projects are embedded in the published bundle in plaintext. The key must live exclusively in the server project, accessed via server-side user-secrets or environment variables. This boundary must be established in Phase 1 and never crossed.
File I/O in WASM project (Pitfall 5) — System.IO.File compiles in WASM but writes to an in-memory virtual filesystem that resets on every page refresh. All persistence must go through backend API endpoints. Reinforce the same architectural boundary as the API key rule.
Scoped DI = Singleton in WASM (Pitfall 3) — In Blazor WASM there is exactly one DI scope for the tab lifetime. A service registered as Scoped never resets. Design ConversationStateService to hold a collection keyed by conversation ID, not mutable "current conversation" fields.
IL trimming breaks Release builds (Pitfall 6) — Debug builds do not trim; published builds do. JSON serialization properties, DI-resolved types, and JSInterop callbacks can be silently stripped. Use [JsonSerializable] source generators on all model types and run dotnet publish once in Phase 1 to catch trim warnings while the surface is small.

Implications for Roadmap

Based on combined research, the architecture dependency chain and pitfall prevention requirements suggest five phases:

Phase 1: Architecture Foundation

Rationale: Three critical "never" mistakes (API key in WASM, file I/O in WASM, direct OpenAI call from WASM) must be architecturally locked before any feature code is written. The WASM/backend split is the load-bearing constraint everything else depends on. This phase also establishes the Shared models library which both tiers need immediately. Delivers: Working solution structure with two projects + shared library; CORS configured; basic HTTP connectivity verified WASM-to-server; dotnet publish tested once to catch IL trim warnings early; placeholder endpoints in place; no OpenAI calls yet. Addresses: Project scaffolding, solution structure (FEATURES.md scaffolding prerequisite) Avoids: API key exposure (Pitfall 2), file I/O in WASM (Pitfall 5), direct OpenAI calls from WASM (Architecture anti-pattern 1), IL trimming surprises (Pitfall 6)

Phase 2: Conversation Storage and Management

Rationale: JSON file storage is the prerequisite for every conversation-related feature. Per the feature dependency graph: [JSON File Storage] → [Multiple Conversations] → [Create/Switch/Delete/Persist]. This phase must come before any AI integration because the persistence layer needs to exist before we can store AI responses. Delivers: ConversationRepository with full CRUD, ConversationEndpoints wired to HTTP routes, ConversationList sidebar component, create/switch/delete conversations working, conversation history persisted to disk and loaded on startup. The app has no AI yet but has a working conversation management UI. Uses: System.Text.Json built-in, SemaphoreSlim(1,1) for write serialization, MudBlazor for sidebar components Implements: ConversationRepository, ConversationEndpoints, ConversationStateService (initial version), ConversationList.razor Avoids: Scoped DI state leaks (Pitfall 3) — design ConversationStateService with Dictionary<Guid, ConversationState> from the start

Phase 3: Basic AI Chat (Non-Streaming)

Rationale: Per the feature dependency chain, a working blocking API call must be established before streaming is layered on top. Building non-streaming first validates the full request/response shape, CORS, error handling, and conversation history construction without the added complexity of SSE. This is the correct learning sequence for a tutorial project. Delivers: Full chat loop working end-to-end: user sends message → backend calls OpenAI → response appended to conversation → conversation saved to disk. All without streaming. Markdown rendering added here because GPT responses with raw markdown are effectively unusable and would make all testing painful. Uses: OpenAI 2.9.1 SDK, Markdig 1.1.1, MudBlazor chat components Implements: OpenAiService, ChatEndpoints (non-streaming POST), ChatApiClient (basic POST), MessageBubble.razor with @((MarkupString)html) rendering, ChatInput.razor Avoids: Markdown XSS via raw MarkupString (PITFALLS integration gotchas) — sanitize or accept risk explicitly in code comments

Phase 4: Streaming Responses

Rationale: Streaming is the highest-risk implementation step. Research identified two independent failure modes (transport not set, StateHasChanged not called) that produce identical symptoms. Addressing this in its own phase means streaming can be diagnosed and debugged in isolation, without other variables. All streaming-specific patterns — BlazorHttpClientTransport, SSE endpoint, ResponseHeadersRead, per-token StateHasChanged with throttling — are introduced and documented here. Delivers: Token-by-token streaming from OpenAI through the backend SSE endpoint to the WASM UI. Loading indicator shown immediately on send, hidden on first token. Auto-scroll to latest message. Input disabled during streaming. Cancel button wired to CancellationToken. Stream throttling (~50ms) to prevent UI thread starvation. Uses: SetBrowserResponseStreamingEnabled(true), HttpCompletionOption.ResponseHeadersRead, text/event-stream SSE frames, await Task.Yield() in token loop Implements: Streaming ChatEndpoints, updated ChatApiClient with stream reader, updated MessageList and ChatPage with streaming state Avoids: Streaming silently broken (Pitfall 1), UI freeze without StateHasChanged (Pitfall 4), UI thread starvation from unthrottled renders (PITFALLS performance traps)

Phase 5: Polish and v1.x Features

Rationale: Once the core loop (storage + AI + streaming) is solid, the remaining v1.x features are all low-to-medium complexity additions that build on the established foundation. Grouping them together allows the tutorial narrative to focus on "extending a working app" rather than "getting the basics right." Delivers: Auto-generated conversation titles (GPT summarization call after first exchange), syntax-highlighted code blocks (Markdown.ColorCode Markdig pipeline extension), copy-to-clipboard on code blocks (JS interop via navigator.clipboard.writeText), responsive layout for mobile, error handling with user-visible messages, model selector dropdown (GPT-4o vs GPT-4o-mini). Optional v1.x additions: system prompt configuration, export conversation. Uses: Markdown.ColorCode NuGet package (base package, NOT CSharpToColoredHtml which breaks WASM), navigator.clipboard JS interop Implements: Updated MarkdownPipeline with ColorCode extension, ClipboardService.cs JS interop wrapper, settings model for model selection

Phase Ordering Rationale

Architecture before features prevents the three hardest-to-recover-from mistakes (API key exposure, WASM file I/O, wrong project boundaries) from being baked in.
Storage before AI follows the feature dependency graph exactly: conversations need a home before AI responses can be stored in them.
Non-streaming before streaming validates the full request/response shape with simpler code, making streaming easier to debug when it is introduced.
Streaming as its own phase isolates the highest-risk technical challenge. Combined with the tutorial purpose, this also makes for a clear "here is how streaming actually works in Blazor WASM" chapter.
Polish last respects the single-responsibility of each phase and avoids complexity interleaving.

Research Flags

Phases likely needing deeper research during planning (i.e., run /gsd:research-phase):

Phase 4 (Streaming): The BlazorHttpClientTransport workaround and SSE frame format have multiple interacting constraints. Phase planning should re-verify the current state of openai-dotnet issue #65 and confirm whether .NET 9.x patch releases have changed the default behavior. Token throttling strategy (timer vs counter) also warrants a concrete recommendation.
Phase 5 (Markdown.ColorCode + JS Interop): The WASM compatibility note (base Markdown.ColorCode works; CSharpToColoredHtml does not) was sourced from community reports. Verify against the current NuGet package version before implementing.

Phases with standard patterns (skip research-phase):

Phase 1 (Architecture Foundation): The two-project solution structure and CORS setup are fully documented in official Microsoft docs. No novel patterns.
Phase 2 (Conversation Storage): Repository pattern with JSON file I/O is a standard .NET pattern. SemaphoreSlim for single-writer serialization is well-documented.
Phase 3 (Basic AI Chat): OpenAI SDK usage for non-streaming chat completions is documented in the official SDK repo with examples. Markdig integration in Blazor has multiple tutorial references.

Confidence Assessment

Area	Confidence	Notes
Stack	HIGH	All package versions verified on nuget.org; official SDK confirmed by OpenAI .NET Blog post; version compatibility table verified against published TFM support
Features	HIGH	Feature set cross-referenced against live ChatGPT and Claude UX; OpenAI streaming API docs consulted; Blazor-specific constraints verified
Architecture	HIGH	Microsoft official Blazor docs + verified community implementations (PalmHill.BlazorChat reference); all patterns confirmed with working code samples
Pitfalls	HIGH	Critical pitfalls sourced from official GitHub issue tracker (`openai-dotnet` #65, `aspnetcore` #43098), Microsoft Q&A, and documented production experience

Overall confidence: HIGH

Gaps to Address

Streaming transport behavior in .NET 9 patch releases: The SetBrowserResponseStreamingEnabled(true) workaround is confirmed required in .NET 9 and becomes default in .NET 10. There is a possibility a .NET 9.x patch release may have changed this behavior. Verify at the start of Phase 4 by checking the official .NET 9 breaking change notes.
StateHasChanged throttling threshold: Research recommends ~50ms or every N tokens, but the optimal value depends on GPT-4o's actual token delivery rate and the target device's rendering performance. Treat as a tunable constant in code rather than a magic number.
XSS risk of rendering GPT output as MarkupString: This is a known accepted risk for a single-user personal tool. Document the decision explicitly in the code (tutorial purpose) rather than leaving it as a silent assumption. Consider adding Markdig's DisableHtml() pipeline option as a low-friction mitigation.
CORS configuration for deployment: Research covered localhost development CORS. If the app is ever deployed (even to a home server), the CORS origin list needs updating. Document this as a deployment note in Phase 1.

Sources

Research completed: 2026-03-27 Ready for roadmap: yes

22 KiB Raw Blame History