feat: enable rich text display for assistant messages

Add markdown-to-HTML rendering for assistant messages using Markdig with
HTML sanitization. Includes cached rendering to avoid lag during streaming,
styled markdown elements (code blocks, tables, lists, blockquotes) within
chat bubbles, and 18 unit tests covering rendering and XSS prevention.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
local
2026-04-05 01:29:58 +01:00
parent d3300c7db9
commit 7a5c22593a
14 changed files with 735 additions and 5 deletions

View File

@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-04-05

View File

@@ -0,0 +1,57 @@
## Context
Assistant messages from the LLM contain markdown formatting (bold, code blocks, lists, headings) but are currently rendered as plain text via `<MudText>@message.Content</MudText>`. Markdig 1.1.1 is already in the stack spec but not yet wired up. The Blazor WASM client needs a rendering pipeline that converts markdown to HTML and displays it safely inside chat bubbles.
## Goals / Non-Goals
**Goals:**
- Render assistant message markdown as formatted HTML (headings, bold, italic, code, lists, tables, links)
- Sanitize HTML output to prevent XSS from LLM-generated content
- Style rendered elements to look good inside MudBlazor chat bubbles
- Maintain streaming performance — re-render as tokens arrive without flicker
**Non-Goals:**
- Rendering user messages as markdown (they stay plain text)
- Syntax highlighting for code blocks (future enhancement)
- LaTeX/math rendering
- Image rendering from markdown
## Decisions
### D1: Use Markdig for markdown-to-HTML conversion
Markdig is already in the stack spec. It's the standard .NET markdown library, runs in WASM, and supports GFM (GitHub Flavored Markdown) extensions out of the box.
**Alternative**: Custom regex-based parsing — rejected, fragile and incomplete.
### D2: Render via `MarkupString` in Blazor
Blazor's `MarkupString` struct renders raw HTML in the component tree. We wrap the Markdig HTML output in `MarkupString` to display it. This replaces `@message.Content` with `@((MarkupString)renderedHtml)` for assistant messages only.
**Alternative**: Use a third-party Blazor markdown component — rejected, adds a dependency when Markdig + MarkupString achieves the same result with less coupling.
### D3: HTML sanitization via allowlist approach
LLM output is untrusted. After Markdig converts markdown to HTML, we strip any tags/attributes not on an allowlist. Allowed: `p, h1-h6, strong, em, code, pre, ul, ol, li, a (href only), table, thead, tbody, tr, th, td, br, blockquote`. This prevents script injection without a heavy sanitizer dependency.
**Alternative**: Use a NuGet sanitizer package (e.g., HtmlSanitizer) — viable but adds a dependency for a simple allowlist. If the allowlist proves insufficient, revisit.
### D4: Create a MarkdownService for the conversion pipeline
A `MarkdownService` class encapsulates the Markdig pipeline configuration, HTML sanitization, and caching. Registered as singleton in DI. This keeps the rendering logic out of the Razor component and makes it testable.
### D5: Incremental rendering during streaming
During streaming, the assistant message content grows token by token. Each time `StateHasChanged()` is called, the full content string is re-converted through Markdig. This is acceptable because:
- Markdig is fast (sub-millisecond for typical message sizes)
- Messages rarely exceed a few KB
- No DOM diffing overhead — Blazor handles this efficiently
If performance becomes an issue, we can cache the last-known-good HTML prefix and only re-render the delta.
## Risks / Trade-offs
- **[XSS from LLM output]** → Mitigated by HTML sanitization allowlist. The allowlist is conservative — only structural tags, no script/style/event handlers.
- **[Streaming flicker]** → Low risk. Blazor's diffing minimizes DOM updates. If observed, add debouncing to StateHasChanged calls.
- **[Markdig WASM size]** → Markdig adds ~200KB to the WASM bundle. Already accepted in stack spec.
- **[Incomplete markdown support]** → Markdig supports GFM by default. Edge cases (nested tables, complex HTML blocks) may render imperfectly but are rare in LLM output.

View File

@@ -0,0 +1,24 @@
## Why
Assistant messages currently render as plain text — markdown formatting (bold, code blocks, lists, headings) from the LLM appears as raw characters. With Semantic Kernel and tool calling now in place, responses are increasingly structured and harder to read without proper rendering. Markdig 1.1.1 is already in the stack but not wired up.
## What Changes
- Render assistant message content as HTML by converting markdown via Markdig
- Sanitize rendered HTML to prevent XSS (the LLM output is untrusted content)
- Style rendered markdown elements (code blocks, lists, tables) to fit the chat bubble aesthetic
- Keep user messages as plain text (they are short inputs, not markdown)
## Capabilities
### New Capabilities
- `rich-text-display`: Markdown-to-HTML rendering pipeline for assistant messages, including sanitization and styling
### Modified Capabilities
- `chat-ui`: Assistant message display changes from plain text to rendered markdown
## Impact
- ChatAgent.Client: Chat.razor message rendering, new markdown service, CSS additions
- Dependencies: Markdig already in stack spec; may need an HTML sanitizer package
- No backend changes — this is purely client-side rendering

View File

@@ -0,0 +1,20 @@
## MODIFIED Requirements
### Requirement: Message display
The chat page SHALL display messages in a vertically scrolling list, with each message showing the sender role (user or assistant), the message content, and a visual distinction between user and assistant messages (e.g., alignment, color, or avatar). Assistant messages SHALL render content as formatted HTML converted from markdown; user messages SHALL display as plain text.
#### Scenario: User message displayed
- **WHEN** the user sends a message
- **THEN** the message appears in the message list aligned or styled to indicate it is from the user, rendered as plain text
#### Scenario: Assistant message displayed
- **WHEN** the assistant responds
- **THEN** the response appears in the message list with distinct styling from user messages, with markdown content rendered as formatted HTML
#### Scenario: Message ordering
- **WHEN** multiple messages exist in the conversation
- **THEN** messages are displayed in chronological order, oldest at top

View File

@@ -0,0 +1,91 @@
## ADDED Requirements
### Requirement: Markdown rendering for assistant messages
The system SHALL convert assistant message content from markdown to formatted HTML using Markdig, displaying headings, bold, italic, code blocks, lists, tables, links, and blockquotes with proper visual formatting.
#### Scenario: Markdown bold and italic rendered
- **WHEN** an assistant message contains `**bold**` or `*italic*` text
- **THEN** the text is displayed with bold or italic formatting respectively
#### Scenario: Code block rendered
- **WHEN** an assistant message contains a fenced code block (triple backticks)
- **THEN** the code is displayed in a monospace font within a visually distinct block
#### Scenario: Inline code rendered
- **WHEN** an assistant message contains inline code (single backticks)
- **THEN** the code is displayed in a monospace font with a subtle background
#### Scenario: List rendered
- **WHEN** an assistant message contains a markdown list (ordered or unordered)
- **THEN** the list is displayed with proper indentation and bullet/number markers
#### Scenario: Heading rendered
- **WHEN** an assistant message contains markdown headings (# through ######)
- **THEN** the headings are displayed with appropriate size and weight
#### Scenario: Link rendered
- **WHEN** an assistant message contains a markdown link `[text](url)`
- **THEN** the link is displayed as a clickable hyperlink opening in a new tab
#### Scenario: Table rendered
- **WHEN** an assistant message contains a markdown table
- **THEN** the table is displayed with borders, header row styling, and proper alignment
### Requirement: HTML sanitization
The system SHALL sanitize all HTML output from the markdown renderer to prevent cross-site scripting (XSS) attacks from LLM-generated content.
#### Scenario: Script tags stripped
- **WHEN** assistant message content contains `<script>` tags
- **THEN** the script tags and their content are removed from the rendered output
#### Scenario: Event handlers stripped
- **WHEN** assistant message content contains HTML attributes like `onclick` or `onerror`
- **THEN** the attributes are removed from the rendered output
#### Scenario: Safe tags preserved
- **WHEN** assistant message content contains allowed structural HTML (p, strong, em, code, pre, ul, ol, li, a, table elements, blockquote, br, h1-h6)
- **THEN** the tags are preserved in the rendered output
### Requirement: Markdown styling within chat bubbles
The system SHALL style rendered markdown elements to be visually consistent with the MudBlazor chat bubble theme.
#### Scenario: Code block styled in bubble
- **WHEN** a code block is rendered inside an assistant chat bubble
- **THEN** it has a distinct background color, padding, border-radius, and does not overflow the bubble width (horizontal scroll if needed)
#### Scenario: Paragraph spacing in bubble
- **WHEN** multiple paragraphs are rendered inside an assistant chat bubble
- **THEN** paragraphs have appropriate spacing without excessive gaps
### Requirement: Streaming compatibility
The system SHALL re-render markdown as new tokens arrive during streaming without visual glitches.
#### Scenario: Partial markdown rendered during streaming
- **WHEN** tokens are arriving and the current content contains incomplete markdown (e.g., a code block not yet closed)
- **THEN** the content is rendered with best-effort formatting and updates smoothly as more tokens complete the markdown structure
### Requirement: User messages remain plain text
The system SHALL continue to render user messages as plain text without markdown processing.
#### Scenario: User message not processed as markdown
- **WHEN** a user message contains markdown syntax like `**bold**`
- **THEN** the raw text `**bold**` is displayed, not formatted bold text

View File

@@ -0,0 +1,26 @@
## 1. Markdown Service
- [x] 1.1 Add Markdig NuGet package to ChatAgent.Client project
- [x] 1.2 Create MarkdownService class with a ConfigureMarkdigPipeline method using AdvancedExtensions (GFM tables, task lists, pipe tables)
- [x] 1.3 Add ConvertToHtml method that takes a markdown string and returns sanitized HTML
- [x] 1.4 Implement HTML sanitization via tag/attribute allowlist (p, h1-h6, strong, em, code, pre, ul, ol, li, a[href], table, thead, tbody, tr, th, td, br, blockquote)
- [x] 1.5 Register MarkdownService as singleton in Program.cs
## 2. Chat Component Integration
- [x] 2.1 Inject MarkdownService into Chat.razor
- [x] 2.2 For assistant messages, replace `@message.Content` with `@((MarkupString)MarkdownService.ConvertToHtml(message.Content))`
- [x] 2.3 Keep user messages rendering as plain text via `@message.Content`
- [x] 2.4 Verify streaming still works — markdown re-renders as tokens arrive
## 3. CSS Styling
- [x] 3.1 Add CSS for rendered markdown inside assistant bubbles: code blocks (background, padding, border-radius, overflow-x auto), inline code (background, padding), paragraphs (margin control), lists (indentation), tables (borders, header styling), blockquotes (left border, padding), links (color, underline)
- [x] 3.2 Ensure code blocks do not overflow bubble width — add horizontal scroll
- [x] 3.3 Ensure last paragraph/element in bubble has no bottom margin (tight fit)
## 4. Testing
- [x] 4.1 Add unit tests for MarkdownService: bold/italic, code blocks, lists, headings, tables, links
- [x] 4.2 Add sanitization tests: script tags stripped, event handlers stripped, safe tags preserved
- [x] 4.3 Manual test: send a message asking LLM to respond with markdown formatting and verify rendering

View File

@@ -6,17 +6,17 @@ Define the chat interface — message display, input handling, auto-scroll, and
### Requirement: Message display
The chat page SHALL display messages in a vertically scrolling list, with each message showing the sender role (user or assistant), the message content, and a visual distinction between user and assistant messages (e.g., alignment, color, or avatar).
The chat page SHALL display messages in a vertically scrolling list, with each message showing the sender role (user or assistant), the message content, and a visual distinction between user and assistant messages (e.g., alignment, color, or avatar). Assistant messages SHALL render content as formatted HTML converted from markdown; user messages SHALL display as plain text.
#### Scenario: User message displayed
- **WHEN** the user sends a message
- **THEN** the message appears in the message list aligned or styled to indicate it is from the user
- **THEN** the message appears in the message list aligned or styled to indicate it is from the user, rendered as plain text
#### Scenario: Assistant message displayed
- **WHEN** the assistant responds
- **THEN** the response appears in the message list with distinct styling from user messages (different alignment, color, or avatar)
- **THEN** the response appears in the message list with distinct styling from user messages, with markdown content rendered as formatted HTML
#### Scenario: Message ordering

View File

@@ -0,0 +1,95 @@
## Purpose
Define rich text rendering for assistant messages — markdown-to-HTML conversion, sanitization, styling, and streaming compatibility.
## Requirements
### Requirement: Markdown rendering for assistant messages
The system SHALL convert assistant message content from markdown to formatted HTML using Markdig, displaying headings, bold, italic, code blocks, lists, tables, links, and blockquotes with proper visual formatting.
#### Scenario: Markdown bold and italic rendered
- **WHEN** an assistant message contains `**bold**` or `*italic*` text
- **THEN** the text is displayed with bold or italic formatting respectively
#### Scenario: Code block rendered
- **WHEN** an assistant message contains a fenced code block (triple backticks)
- **THEN** the code is displayed in a monospace font within a visually distinct block
#### Scenario: Inline code rendered
- **WHEN** an assistant message contains inline code (single backticks)
- **THEN** the code is displayed in a monospace font with a subtle background
#### Scenario: List rendered
- **WHEN** an assistant message contains a markdown list (ordered or unordered)
- **THEN** the list is displayed with proper indentation and bullet/number markers
#### Scenario: Heading rendered
- **WHEN** an assistant message contains markdown headings (# through ######)
- **THEN** the headings are displayed with appropriate size and weight
#### Scenario: Link rendered
- **WHEN** an assistant message contains a markdown link `[text](url)`
- **THEN** the link is displayed as a clickable hyperlink opening in a new tab
#### Scenario: Table rendered
- **WHEN** an assistant message contains a markdown table
- **THEN** the table is displayed with borders, header row styling, and proper alignment
### Requirement: HTML sanitization
The system SHALL sanitize all HTML output from the markdown renderer to prevent cross-site scripting (XSS) attacks from LLM-generated content.
#### Scenario: Script tags stripped
- **WHEN** assistant message content contains `<script>` tags
- **THEN** the script tags and their content are removed from the rendered output
#### Scenario: Event handlers stripped
- **WHEN** assistant message content contains HTML attributes like `onclick` or `onerror`
- **THEN** the attributes are removed from the rendered output
#### Scenario: Safe tags preserved
- **WHEN** assistant message content contains allowed structural HTML (p, strong, em, code, pre, ul, ol, li, a, table elements, blockquote, br, h1-h6)
- **THEN** the tags are preserved in the rendered output
### Requirement: Markdown styling within chat bubbles
The system SHALL style rendered markdown elements to be visually consistent with the MudBlazor chat bubble theme.
#### Scenario: Code block styled in bubble
- **WHEN** a code block is rendered inside an assistant chat bubble
- **THEN** it has a distinct background color, padding, border-radius, and does not overflow the bubble width (horizontal scroll if needed)
#### Scenario: Paragraph spacing in bubble
- **WHEN** multiple paragraphs are rendered inside an assistant chat bubble
- **THEN** paragraphs have appropriate spacing without excessive gaps
### Requirement: Streaming compatibility
The system SHALL re-render markdown as new tokens arrive during streaming without visual glitches.
#### Scenario: Partial markdown rendered during streaming
- **WHEN** tokens are arriving and the current content contains incomplete markdown (e.g., a code block not yet closed)
- **THEN** the content is rendered with best-effort formatting and updates smoothly as more tokens complete the markdown structure
### Requirement: User messages remain plain text
The system SHALL continue to render user messages as plain text without markdown processing.
#### Scenario: User message not processed as markdown
- **WHEN** a user message contains markdown syntax like `**bold**`
- **THEN** the raw text `**bold**` is displayed, not formatted bold text

View File

@@ -10,11 +10,12 @@
<PackageReference Include="Microsoft.AspNetCore.Components.WebAssembly" Version="9.0.14" />
<PackageReference Include="Microsoft.AspNetCore.Components.WebAssembly.DevServer" Version="9.0.14" PrivateAssets="all" />
<PackageReference Include="Microsoft.Extensions.Http" Version="9.0.4" />
<PackageReference Include="Markdig" Version="1.1.1" />
<PackageReference Include="MudBlazor" Version="9.2.0" />
</ItemGroup>
<ItemGroup>
<ProjectReference Include="..\ChatAgent.Shared\ChatAgent.Shared.csproj" />
<ItemGroup>
<ProjectReference Include="..\ChatAgent.Shared\ChatAgent.Shared.csproj" />
</ItemGroup>
</Project>

View File

@@ -26,6 +26,11 @@
It was registered in Program.cs via AddHttpClient<ChatApiClient>. *@
@inject ChatApiClient ApiClient
@* MarkdownService converts markdown to sanitized HTML for rendering assistant messages.
We use it to transform the raw LLM output into formatted HTML before display. *@
@using ChatAgent.Client.Services
@inject MarkdownService Markdown
<PageTitle>Chat Agent</PageTitle>
@* Chat container: uses flexbox to fill available height.
@@ -64,8 +69,20 @@
once the first text delta arrives. *@
<MudProgressCircular Size="Size.Small" Indeterminate="true" />
}
else if (message.Role == "assistant")
{
@* Assistant messages are rendered as HTML converted from markdown.
MarkupString tells Blazor to treat the string as raw HTML rather
than escaping it. The MarkdownService sanitizes the HTML to prevent XSS.
We use GetRenderedHtml() to cache completed messages and avoid
re-running Markdig on every StateHasChanged during streaming. *@
<div class="markdown-body">
@((MarkupString)GetRenderedHtml(message))
</div>
}
else
{
@* User messages stay as plain text — no markdown processing. *@
<MudText Typo="Typo.body1">@message.Content</MudText>
}
</MudPaper>
@@ -119,12 +136,34 @@
// Used to disable input and show the thinking indicator.
private bool _isStreaming = false;
// Cache of rendered HTML for completed assistant messages. Without this cache,
// every StateHasChanged() during streaming would re-run Markdig on ALL messages,
// causing noticeable lag as the conversation grows. Only the actively streaming
// message is re-rendered; completed messages use their cached HTML.
private readonly Dictionary<ChatMessage, string> _renderedHtmlCache = new();
/// <summary>
/// Returns cached rendered HTML for completed messages, or renders fresh
/// for the actively streaming message. This avoids re-running Markdig on
/// every message during each StateHasChanged call.
/// </summary>
private string GetRenderedHtml(ChatMessage message)
{
if (_renderedHtmlCache.TryGetValue(message, out var cached))
return cached;
// Not cached — this is either the streaming message or first render.
// Render it fresh. It will be cached once streaming completes.
return Markdown.ConvertToHtml(message.Content);
}
/// <summary>
/// Clears all messages to start a new conversation.
/// </summary>
private void NewChat()
{
_messages.Clear();
_renderedHtmlCache.Clear();
_userInput = string.Empty;
}
@@ -213,6 +252,10 @@
{
_isStreaming = false;
assistantMessage.Timestamp = DateTime.UtcNow;
// Cache the final rendered HTML so future re-renders don't re-run Markdig
_renderedHtmlCache[assistantMessage] = Markdown.ConvertToHtml(assistantMessage.Content);
StateHasChanged();
await ScrollToBottom();
}

View File

@@ -78,6 +78,99 @@
background-color: var(--mud-palette-background);
}
/* --- Markdown rendering styles for assistant messages ---
* These target HTML elements produced by Markdig inside the .markdown-body wrapper.
* ::deep is required because the rendered HTML is injected via MarkupString,
* not generated by a Blazor component, but it still lives inside this component's scope. */
/* Paragraphs: tighten spacing inside bubbles */
::deep .markdown-body p {
margin: 0 0 0.5rem 0;
}
/* Remove bottom margin on the last element to keep bubble tight */
::deep .markdown-body > *:last-child {
margin-bottom: 0;
}
/* Code blocks: distinct background with horizontal scroll for wide content */
::deep .markdown-body pre {
background-color: var(--mud-palette-background-gray);
padding: 0.75rem;
border-radius: 0.375rem;
overflow-x: auto;
margin: 0.5rem 0;
}
::deep .markdown-body pre code {
background: none;
padding: 0;
font-size: 0.85rem;
font-family: 'Cascadia Code', 'Fira Code', 'Consolas', monospace;
}
/* Inline code: subtle background to distinguish from surrounding text */
::deep .markdown-body code {
background-color: var(--mud-palette-background-gray);
padding: 0.125rem 0.375rem;
border-radius: 0.25rem;
font-size: 0.875em;
font-family: 'Cascadia Code', 'Fira Code', 'Consolas', monospace;
}
/* Lists: proper indentation inside bubbles */
::deep .markdown-body ul,
::deep .markdown-body ol {
margin: 0.5rem 0;
padding-left: 1.5rem;
}
::deep .markdown-body li {
margin-bottom: 0.25rem;
}
/* Tables: borders and header styling */
::deep .markdown-body table {
border-collapse: collapse;
margin: 0.5rem 0;
width: 100%;
font-size: 0.875rem;
}
::deep .markdown-body th,
::deep .markdown-body td {
border: 1px solid var(--mud-palette-lines-default);
padding: 0.375rem 0.625rem;
text-align: left;
}
::deep .markdown-body th {
background-color: var(--mud-palette-background-gray);
font-weight: 600;
}
/* Blockquotes: left border accent */
::deep .markdown-body blockquote {
border-left: 3px solid var(--mud-palette-primary);
margin: 0.5rem 0;
padding: 0.25rem 0 0.25rem 0.75rem;
color: var(--mud-palette-text-secondary);
}
/* Links: themed color */
::deep .markdown-body a {
color: var(--mud-palette-primary);
text-decoration: underline;
}
/* Headings: scale down for bubble context */
::deep .markdown-body h1 { font-size: 1.25rem; font-weight: 600; margin: 0.5rem 0 0.25rem 0; }
::deep .markdown-body h2 { font-size: 1.125rem; font-weight: 600; margin: 0.5rem 0 0.25rem 0; }
::deep .markdown-body h3 { font-size: 1rem; font-weight: 600; margin: 0.5rem 0 0.25rem 0; }
::deep .markdown-body h4,
::deep .markdown-body h5,
::deep .markdown-body h6 { font-size: 0.875rem; font-weight: 600; margin: 0.5rem 0 0.25rem 0; }
/* Action row above the text input (New Chat button) */
.input-actions {
display: flex;

View File

@@ -43,6 +43,10 @@ var apiBaseUrl = isHttps
// into the DI container. This is required before any MudBlazor component will work.
builder.Services.AddMudServices();
// MarkdownService converts markdown to sanitized HTML for rendering assistant messages.
// Registered as singleton because the Markdig pipeline is immutable and thread-safe.
builder.Services.AddSingleton<ChatAgent.Client.Services.MarkdownService>();
// AddHttpClient<ChatApiClient> registers a typed HttpClient using IHttpClientFactory.
// IHttpClientFactory manages the underlying HttpMessageHandler lifetime to prevent
// socket exhaustion (a common problem with raw HttpClient in long-running apps).

View File

@@ -0,0 +1,134 @@
// MarkdownService.cs -- Converts markdown text to sanitized HTML for display.
//
// This service wraps Markdig (a .NET markdown processor) and adds HTML sanitization
// to safely render LLM-generated content in the browser. LLM output is untrusted —
// it could contain <script> tags or event handlers that would execute in the browser
// if rendered as raw HTML. The sanitization step strips everything not on the allowlist.
//
// Key concepts demonstrated:
// - Markdig pipeline configuration with extension methods
// - HTML sanitization via tag/attribute allowlist (defense-in-depth for XSS)
// - MarkupString in Blazor for rendering raw HTML safely
// - Singleton service registration (the pipeline is thread-safe and reusable)
using System.Text.RegularExpressions;
using Markdig;
namespace ChatAgent.Client.Services
{
/// <summary>
/// Converts markdown to sanitized HTML. Registered as a singleton because
/// the Markdig pipeline is immutable and thread-safe after construction.
/// </summary>
public class MarkdownService
{
// The Markdig pipeline is configured once and reused for all conversions.
// UseAdvancedExtensions() enables GFM tables, pipe tables, task lists,
// auto-links, and other commonly used markdown extensions.
private readonly MarkdownPipeline _pipeline;
// Tags allowed in the sanitized output. Anything not on this list is stripped.
// This is the core XSS defense — even if Markdig produces unexpected HTML,
// only these structural tags survive.
private static readonly HashSet<string> AllowedTags = new(StringComparer.OrdinalIgnoreCase)
{
"p", "h1", "h2", "h3", "h4", "h5", "h6",
"strong", "em", "code", "pre",
"ul", "ol", "li",
"a",
"table", "thead", "tbody", "tr", "th", "td",
"br", "blockquote"
};
// Attributes allowed on specific tags. Only href on <a> tags is permitted.
// All other attributes (including event handlers like onclick) are stripped.
private static readonly Dictionary<string, HashSet<string>> AllowedAttributes =
new(StringComparer.OrdinalIgnoreCase)
{
{ "a", new HashSet<string>(StringComparer.OrdinalIgnoreCase) { "href" } }
};
// Regex to match script and style blocks including their content.
// These are stripped entirely — both tags and inner content — because
// they can execute code in the browser.
private static readonly Regex ScriptStyleRegex = new(
@"<(script|style)\b[^>]*>[\s\S]*?</\1>",
RegexOptions.Compiled | RegexOptions.IgnoreCase);
// Regex to match HTML tags (opening, closing, and self-closing).
// Used by the sanitizer to find and filter tags in the Markdig output.
private static readonly Regex TagRegex = new(
@"<(/?)(\w+)(\s[^>]*)?>",
RegexOptions.Compiled | RegexOptions.Singleline);
// Regex to match individual HTML attributes within a tag.
// Captures the attribute name so we can check it against the allowlist.
private static readonly Regex AttrRegex = new(
@"(\w+)\s*=\s*(?:""[^""]*""|'[^']*'|\S+)",
RegexOptions.Compiled);
public MarkdownService()
{
_pipeline = new MarkdownPipelineBuilder()
.UseAdvancedExtensions()
.Build();
}
/// <summary>
/// Converts a markdown string to sanitized HTML.
/// Safe to use with MarkupString in Blazor for rendering.
/// </summary>
public string ConvertToHtml(string markdown)
{
if (string.IsNullOrEmpty(markdown))
return string.Empty;
var rawHtml = Markdown.ToHtml(markdown, _pipeline);
return SanitizeHtml(rawHtml);
}
/// <summary>
/// Strips HTML tags and attributes not on the allowlist.
/// This is a conservative sanitizer — it only keeps structural tags
/// needed for markdown rendering and removes everything else.
/// </summary>
internal string SanitizeHtml(string html)
{
// First pass: remove script and style blocks entirely (tags + content)
html = ScriptStyleRegex.Replace(html, string.Empty);
// Second pass: filter remaining tags against the allowlist
return TagRegex.Replace(html, match =>
{
var isClosing = match.Groups[1].Value == "/";
var tagName = match.Groups[2].Value;
var attrs = match.Groups[3].Value;
// Strip tags not on the allowlist entirely
if (!AllowedTags.Contains(tagName))
return string.Empty;
if (isClosing)
return $"</{tagName}>";
// Filter attributes: only keep those on the allowlist for this tag
var sanitizedAttrs = string.Empty;
if (!string.IsNullOrWhiteSpace(attrs) &&
AllowedAttributes.TryGetValue(tagName, out var allowed))
{
var attrMatches = AttrRegex.Matches(attrs);
foreach (Match attrMatch in attrMatches)
{
var attrName = attrMatch.Groups[1].Value;
if (allowed.Contains(attrName))
{
sanitizedAttrs += " " + attrMatch.Value;
}
}
}
return $"<{tagName}{sanitizedAttrs}>";
});
}
}
}

View File

@@ -0,0 +1,140 @@
using ChatAgent.Client.Services;
namespace ChatAgent.Client.Tests;
/// <summary>
/// Tests for MarkdownService covering markdown rendering and HTML sanitization.
/// Each test verifies a specific markdown element or sanitization rule from the
/// rich-text-display spec.
/// </summary>
public class MarkdownServiceTests
{
private readonly MarkdownService _sut = new();
// --- Markdown rendering tests ---
[Fact]
public void ConvertToHtml_BoldText_RendersStrong()
{
var result = _sut.ConvertToHtml("**bold text**");
Assert.Contains("<strong>bold text</strong>", result);
}
[Fact]
public void ConvertToHtml_ItalicText_RendersEm()
{
var result = _sut.ConvertToHtml("*italic text*");
Assert.Contains("<em>italic text</em>", result);
}
[Fact]
public void ConvertToHtml_FencedCodeBlock_RendersPreCode()
{
var result = _sut.ConvertToHtml("```\nvar x = 1;\n```");
Assert.Contains("<pre>", result);
Assert.Contains("<code>", result);
Assert.Contains("var x = 1;", result);
}
[Fact]
public void ConvertToHtml_InlineCode_RendersCode()
{
var result = _sut.ConvertToHtml("Use `Console.WriteLine` here");
Assert.Contains("<code>Console.WriteLine</code>", result);
}
[Fact]
public void ConvertToHtml_UnorderedList_RendersUlLi()
{
var result = _sut.ConvertToHtml("- item one\n- item two");
Assert.Contains("<ul>", result);
Assert.Contains("<li>item one</li>", result);
Assert.Contains("<li>item two</li>", result);
}
[Fact]
public void ConvertToHtml_OrderedList_RendersOlLi()
{
var result = _sut.ConvertToHtml("1. first\n2. second");
Assert.Contains("<ol>", result);
Assert.Contains("<li>first</li>", result);
}
[Fact]
public void ConvertToHtml_Heading_RendersH2()
{
var result = _sut.ConvertToHtml("## My Heading");
Assert.Contains("<h2>My Heading</h2>", result);
}
[Fact]
public void ConvertToHtml_Table_RendersTableElements()
{
var markdown = "| Name | Value |\n|------|-------|\n| A | 1 |";
var result = _sut.ConvertToHtml(markdown);
Assert.Contains("<table>", result);
Assert.Contains("<th>", result);
Assert.Contains("<td>", result);
}
[Fact]
public void ConvertToHtml_Link_RendersAnchorWithHref()
{
var result = _sut.ConvertToHtml("[click here](https://example.com)");
Assert.Contains("<a", result);
Assert.Contains("href=\"https://example.com\"", result);
Assert.Contains("click here</a>", result);
}
[Fact]
public void ConvertToHtml_EmptyString_ReturnsEmpty()
{
Assert.Equal(string.Empty, _sut.ConvertToHtml(""));
Assert.Equal(string.Empty, _sut.ConvertToHtml(null!));
}
// --- Sanitization tests ---
[Fact]
public void ConvertToHtml_ScriptTag_IsStripped()
{
var result = _sut.ConvertToHtml("Hello <script>alert('xss')</script> world");
Assert.DoesNotContain("<script>", result);
Assert.DoesNotContain("alert", result);
}
[Fact]
public void ConvertToHtml_EventHandler_IsStripped()
{
var result = _sut.ConvertToHtml("<div onclick=\"alert('xss')\">test</div>");
Assert.DoesNotContain("onclick", result);
Assert.DoesNotContain("alert", result);
}
[Fact]
public void ConvertToHtml_ImgTag_IsStripped()
{
var result = _sut.ConvertToHtml("<img src=\"x\" onerror=\"alert('xss')\">");
Assert.DoesNotContain("<img", result);
Assert.DoesNotContain("onerror", result);
}
[Fact]
public void ConvertToHtml_SafeTagsPreserved()
{
var result = _sut.ConvertToHtml("**bold** and *italic* and `code`");
Assert.Contains("<strong>", result);
Assert.Contains("<em>", result);
Assert.Contains("<code>", result);
}
[Fact]
public void ConvertToHtml_AnchorOnlyKeepsHref()
{
// Markdig generates links from markdown — verify only href survives
var result = _sut.ConvertToHtml("[test](https://example.com)");
Assert.Contains("href=", result);
Assert.DoesNotContain("onclick", result);
Assert.DoesNotContain("style", result);
}
}