Skip to content

Add @utf8HtmlLiterals directive for opt-in UTF-8 HTML string literals#12848

Draft
DamianEdwards wants to merge 1 commit intomainfrom
damianedwards/utf8-html-literals-redux
Draft

Add @utf8HtmlLiterals directive for opt-in UTF-8 HTML string literals#12848
DamianEdwards wants to merge 1 commit intomainfrom
damianedwards/utf8-html-literals-redux

Conversation

@DamianEdwards
Copy link
Member

Summary

Implements the @utf8HtmlLiterals directive (#8429) that when enabled causes the Razor compiler to emit HTML literal blocks as C# UTF-8 string literals ("..."u8) instead of regular string literals.

Motivation

HTML content in .cshtml files is emitted as WriteLiteral("html content") calls using regular C# string literals (UTF-16). At runtime, these strings must be encoded to UTF-8 on every request, causing measurable overhead in high-performance scenarios. C# UTF-8 string literals allow the compiler to pre-encode the bytes, eliminating runtime encoding and reducing memory allocations.

Usage

@inherits MyUtf8PageBase
@utf8HtmlLiterals true

<html>
<body>
    <h1>Hello World</h1>
    <p>Current time: @DateTime.Now</p>
</body>
</html>

Generates:

WriteLiteral("<html>\r\n<body>\r\n    <h1>Hello World</h1>\r\n    <p>Current time: "u8);
Write(DateTime.Now);
WriteLiteral("</p>\r\n</body>\r\n</html>"u8);

The page base class must provide a WriteLiteral(ReadOnlySpan<byte>) overload:

public abstract class MyUtf8PageBase : PageBase
{
    public void WriteLiteral(ReadOnlySpan<byte> value)
    {
        WriteLiteral(Encoding.UTF8.GetString(value));
    }
}

Key design decisions

  • Directive style: Single directive with boolean token (@utf8HtmlLiterals true/false), consistent with @preservewhitespace
  • File kind: Legacy only (.cshtml — Razor Pages/MVC Views)
  • Language version gate: Version_11_0 (Preview) — prevents premature shipping
  • Inheritance: Works via _ViewImports.cshtml — enable globally, disable per-page

Changes

  • Add WriteHtmlUtf8StringLiterals flag to RazorCodeGenerationOptions (immutable Flags enum pattern)
  • Add Utf8HtmlLiteralsDirective and Utf8HtmlLiteralsDirectivePass
  • Register directive for Legacy file kind, gated on Version_11_0
  • Modify CodeWriterExtensions to append u8 suffix when flag is set
  • Modify RuntimeNodeWriter to pass flag from options to code writer
  • Use documentNode.Options in lowering phase (respects directive pass modifications)
  • Relax directive keyword validation to allow digits (previously only letters)
  • Unit tests for UTF-8 HTML content rendering
  • MVC integration test with baselines

Fixes #8429

Implements the @utf8HtmlLiterals directive (with boolean token) that when
enabled causes the Razor compiler to emit HTML literal blocks as C# UTF-8
string literals ("..."u8) instead of regular string literals.

This allows the page's base class to provide a WriteLiteral(ReadOnlySpan<byte>)
overload that writes pre-encoded UTF-8 bytes directly to the output, avoiding
runtime UTF-16 to UTF-8 encoding and associated memory allocations.

Key changes:
- Add WriteHtmlUtf8StringLiterals flag to RazorCodeGenerationOptions
- Add Utf8HtmlLiteralsDirective and Utf8HtmlLiteralsDirectivePass
- Register directive for Legacy (.cshtml) files, gated on Version_11_0
- Modify CodeWriterExtensions to append u8 suffix when flag is set
- Modify RuntimeNodeWriter to pass flag from options to code writer
- Use documentNode.Options in lowering phase (respects directive passes)
- Relax directive keyword validation to allow digits (not just letters)

Fixes #8429

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@davidwengier
Copy link
Member

Since this is for .NET 11 anyway, seems like there's plenty of time to get the ROS overload into the runtime. Should also ideally detect whether such an overload exists or not, an error if not, then people can polyfill easily on older runtimes.

Also should probably have a LDM about this :)

@DamianEdwards
Copy link
Member Author

Since this is for .NET 11 anyway, seems like there's plenty of time to get the ROS overload into the runtime.

The intent wasn't to get an overload into the runtime, at least not at this time. While we certainly could do that, it would make it slower in that case, not faster, as it would then convert from UTF8 bytes to string to place in MVC's output buffering infrastructure, which then turns it back to UTF8 bytes again when writing to the response. Overhauling MVC's output writing to support UTF8 bytes is a much larger undertaking, but of course this change to Razor would be required first.

For now, the goal here is to enable other .cshtml-based scenarios (i.e. non-MVC) to leverage this support and get the performance benefits, e.g. Razor Slices.

Should also ideally detect whether such an overload exists or not, an error if not, then people can polyfill easily on older runtimes.

We don't do this for other directives when custom base classes are being used AFAIK, e.g. if I use @inherits to set the base class to a type with no methods at all, the Razor compiler will simply emit code that doesn't compile due to missing members to call/overload, i.e. the *.cshtml contract as to what's assumed to exist on the base types is implicit.

Also should probably have a LDM about this :)

Didn't realize we discussed Razor compiler stuff there now, cool. LMK what the process is.

@davidwengier
Copy link
Member

davidwengier commented Mar 2, 2026

For now, the goal here is to enable other .cshtml-based scenarios (i.e. non-MVC)

That at least answer my other (unasked) question about why this is .cshtml only.

Didn't realize we discussed Razor compiler stuff there now, cool. LMK what the process is.

Oh, I don't mean the C# LDM. There has been one Razor LDM meeting so far, and I was asleep at the time, but the plan is for there to at least be some committee that can sign off on things, I believe.

We don't do this for other directives when custom base classes are being used AFAIK

I know we don't, but IMO that is not a good thing, and something we should be better about in future. BUT this is also something we can discuss at LDM and see if anyone else agrees with me :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add ability to opt-in to HTML literals being written as UTF8 string literals in generated class files

2 participants