Skip to content

Add support for writing HTML literals using UTF8 strings#12848

Draft
DamianEdwards wants to merge 6 commits intomainfrom
damianedwards/utf8-html-literals-redux
Draft

Add support for writing HTML literals using UTF8 strings#12848
DamianEdwards wants to merge 6 commits intomainfrom
damianedwards/utf8-html-literals-redux

Conversation

@DamianEdwards
Copy link
Member

@DamianEdwards DamianEdwards commented Mar 1, 2026

Summary

Implements #8429 by enabling UTF-8 HTML literal emission for legacy .cshtml code generation when the generated type's inherited base class has a callable:

void WriteLiteral(ReadOnlySpan<byte> utf8HtmlLiteral)

When that overload is available to the generated type, Razor emits HTML literals as C# UTF-8 literals ("..."u8), allowing direct binding to the byte-span overload. If the overload is not available, generation remains the existing string-literal path.

Implementation

  • Adds Utf8WriteLiteralDetectionPass for legacy documents to detect UTF-8 WriteLiteral capability from the inherited base type.
  • Adds compilation helper logic to find an accessible WriteLiteral(ReadOnlySpan<byte>) overload across the inheritance chain.
  • Ensures source-generator execution has the metadata references needed for the same capability detection behavior.
  • Preserves existing UTF-8 literal writing flow in code generation once the option is enabled.

Behavior

Given:

@inherits MyUtf8PageBase
<h1>Hello World</h1>

and a base type containing:

public void WriteLiteral(ReadOnlySpan<byte> utf8HtmlLiteral)

generated HTML literal calls are emitted using UTF-8 string literals ("..."u8).

Tests

  • MVC integration coverage validates:
    • overload present => emits UTF-8 literals
    • overload absent => keeps string-literal emission
  • Source generator tests validate the same positive/negative behavior.
  • Supporting language/compiler tests are updated for this implementation.

Implements the @utf8HtmlLiterals directive (with boolean token) that when
enabled causes the Razor compiler to emit HTML literal blocks as C# UTF-8
string literals ("..."u8) instead of regular string literals.

This allows the page's base class to provide a WriteLiteral(ReadOnlySpan<byte>)
overload that writes pre-encoded UTF-8 bytes directly to the output, avoiding
runtime UTF-16 to UTF-8 encoding and associated memory allocations.

Key changes:
- Add WriteHtmlUtf8StringLiterals flag to RazorCodeGenerationOptions
- Add Utf8HtmlLiteralsDirective and Utf8HtmlLiteralsDirectivePass
- Register directive for Legacy (.cshtml) files, gated on Version_11_0
- Modify CodeWriterExtensions to append u8 suffix when flag is set
- Modify RuntimeNodeWriter to pass flag from options to code writer
- Use documentNode.Options in lowering phase (respects directive passes)
- Relax directive keyword validation to allow digits (not just letters)

Fixes #8429

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@davidwengier
Copy link
Member

Since this is for .NET 11 anyway, seems like there's plenty of time to get the ROS overload into the runtime. Should also ideally detect whether such an overload exists or not, an error if not, then people can polyfill easily on older runtimes.

Also should probably have a LDM about this :)

@DamianEdwards
Copy link
Member Author

Since this is for .NET 11 anyway, seems like there's plenty of time to get the ROS overload into the runtime.

The intent wasn't to get an overload into the runtime, at least not at this time. While we certainly could do that, it would make it slower in that case, not faster, as it would then convert from UTF8 bytes to string to place in MVC's output buffering infrastructure, which then turns it back to UTF8 bytes again when writing to the response. Overhauling MVC's output writing to support UTF8 bytes is a much larger undertaking, but of course this change to Razor would be required first.

For now, the goal here is to enable other .cshtml-based scenarios (i.e. non-MVC) to leverage this support and get the performance benefits, e.g. Razor Slices.

Should also ideally detect whether such an overload exists or not, an error if not, then people can polyfill easily on older runtimes.

We don't do this for other directives when custom base classes are being used AFAIK, e.g. if I use @inherits to set the base class to a type with no methods at all, the Razor compiler will simply emit code that doesn't compile due to missing members to call/overload, i.e. the *.cshtml contract as to what's assumed to exist on the base types is implicit.

Also should probably have a LDM about this :)

Didn't realize we discussed Razor compiler stuff there now, cool. LMK what the process is.

@davidwengier
Copy link
Member

davidwengier commented Mar 2, 2026

For now, the goal here is to enable other .cshtml-based scenarios (i.e. non-MVC)

That at least answer my other (unasked) question about why this is .cshtml only.

Didn't realize we discussed Razor compiler stuff there now, cool. LMK what the process is.

Oh, I don't mean the C# LDM. There has been one Razor LDM meeting so far, and I was asleep at the time, but the plan is for there to at least be some committee that can sign off on things, I believe.

We don't do this for other directives when custom base classes are being used AFAIK

I know we don't, but IMO that is not a good thing, and something we should be better about in future. BUT this is also something we can discuss at LDM and see if anyone else agrees with me :)

Replace @utf8HtmlLiterals directive wiring with automatic detection based on whether the inherited base type exposes a callable WriteLiteral(ReadOnlySpan<byte>) overload. Update compiler/source-generator plumbing and tests accordingly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@DamianEdwards DamianEdwards changed the title Add @utf8HtmlLiterals directive for opt-in UTF-8 HTML string literals Add support for WriteLiteral accepting UTF8 string literals Mar 3, 2026
@DamianEdwards DamianEdwards changed the title Add support for WriteLiteral accepting UTF8 string literals Add support for writing HTML literals using UTF8 strings Mar 3, 2026
DamianEdwards and others added 4 commits March 5, 2026 08:53
Resolve post-conflict source generator breakage by removing the unresolved suppression call, aligning option provider tuple shape, restoring cshtml test execution against output compilation, and updating incremental step expectations for compilation-dependent options.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@DamianEdwards DamianEdwards requested a review from a team March 5, 2026 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants