Skip to content

[cDAC] Add data descriptor field type annotations#126163

Open
max-charlamb wants to merge 3 commits intomainfrom
dev/max-charlamb-/runtime-cdac-typing
Open

[cDAC] Add data descriptor field type annotations#126163
max-charlamb wants to merge 3 commits intomainfrom
dev/max-charlamb-/runtime-cdac-typing

Conversation

@max-charlamb
Copy link
Member

@max-charlamb max-charlamb commented Mar 26, 2026

Summary

Add a type system to the cDAC data descriptor infrastructure. Field types in CDAC_TYPE_FIELD that were previously specified as C comments (/*uint32*/) are now expressed as preprocessor defines (T_UINT32, T_POINTER, TYPE(GCHandle), etc.).

Changes

Native-side type defines (wrappeddatadescriptor.inc):

  • Primitive types: T_UINT8, T_UINT16, T_UINT32, T_UINT64, T_INT8...T_INT64, T_NUINT, T_NINT, T_POINTER, T_BOOL
  • Struct types: TYPE(name) for types declared with CDAC_TYPE_BEGIN in the same descriptor
  • Cross-descriptor types: EXTERN_TYPE(name) (not validated locally)
  • Array types: T_ARRAY(type) (expands to pointer in the blob)
  • In debug/checked builds, these expand to type name strings in the blob. In release builds, they expand to nothing.

Compile-time validation (cdactypevalidation.inc):

  • In debug builds, validates that every TYPE(name) reference matches a CDAC_TYPE_BEGIN declaration via static_assert.
  • Uses two passes: first collects all declared types, then validates references. Handles forward references.

MSVC string literal splitting (ContractDescriptorSourceFileEmitter.cs):

  • Splits long JSON descriptor strings into adjacent C string literals (under 2000 bytes each) for MSVC compatibility. The C standard guarantees adjacent literals are concatenated at compile time.

Descriptor updates

  • VM datadescriptor.inc: All ~200 fields and ~50 globals converted to type defines
  • GC datadescriptor.inc: All fields and globals converted, with EXTERN_TYPE(GCAllocContext) for the one cross-descriptor reference

Note

This content was generated with the assistance of GitHub Copilot.

Copilot AI review requested due to automatic review settings March 26, 2026 18:59
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @steveisok, @tommcdon, @dotnet/dotnet-diag
See info in area-owners.md if you want to be subscribed.

@github-actions

This comment has been minimized.

Remove documentation for CDAC_DATADESCRIPTOR_INC override mechanism that was
never implemented in wrappeddatadescriptor.inc. The include always uses
'datadescriptor.inc' resolved via build-configured include directories.

Fix comment typo in datadescriptor.cpp: T(GCHandle) -> TYPE(GCHandle).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Contributor

🤖 Copilot Code Review — PR #126163

Note

This review was generated by Copilot using Claude Opus 4.6. GPT-5.4 was launched as a parallel reviewer but did not complete within the 10-minute timeout.

Holistic Assessment

Motivation: The PR addresses a real deficiency — field type annotations in CDAC_TYPE_FIELD were previously ad-hoc C comments (/*uint32*/) invisible to the preprocessor and providing zero validation. Converting them to preprocessor defines enables compile-time validation, machine-readable type information in debug builds, and consistent annotation across all descriptors. The motivation is sound and well-justified.

Approach: The approach is well-designed and idiomatic for this codebase's preprocessor-heavy data descriptor infrastructure. The T_*/TYPE()/EXTERN_TYPE()/T_ARRAY() macro system is clean, the CDAC_STRINGIFY double-indirection correctly forces macro expansion before stringification, and the cdactypevalidation.inc two-pass validation using static_assert on tag structs is effective. The MSVC string literal splitting is a practical fix for a real compiler limitation.

Summary: ✅ LGTM. The core type annotation system and validation logic are correct and well-structured. The prior review's two issues (undocumented CDAC_DATADESCRIPTOR_INC and T(GCHandle) typo) have been fixed in the latest commit. I found one minor behavioral nuance worth noting (see below) but it is not blocking. All changed paths have been verified for correctness across debug/release configurations.


Detailed Findings

✅ Preprocessor Macro System — Correct Across All Configurations

Verified the full macro expansion chain in both debug and release builds:

  • Debug: CDAC_STRINGIFY(T_UINT32)CDAC_STRINGIFY_IMPL(uint32)"uint32"
  • Release: CDAC_STRINGIFY(T_UINT32)CDAC_STRINGIFY_IMPL()""
  • Struct types: CDAC_STRINGIFY(TYPE(GCHandle))"GCHandle" (debug), "" (release) ✓
  • Nested array: CDAC_STRINGIFY(T_ARRAY(TYPE(StressLogModuleDesc)))"pointer" (debug), "" (release) ✓

The #ifndef T_UINT8 guard in wrappeddatadescriptor.inc correctly ensures type defines persist across multiple inclusions while allowing cdactypevalidation.inc to override them temporarily for validation (since it #undefs all defines when done).

String pool size calculations (CDacStringPoolSizes via sizeof(CDAC_STRINGIFY(...))) are consistent with the NamesPool string data construction — verified that C string literal concatenation semantics produce matching byte counts.

✅ Compile-Time Validation — Sound Design

The two-pass approach in cdactypevalidation.inc is correct:

  • Pass 1: Declares struct cdac_type_tag_<name> { char dummy; }; for each CDAC_TYPE_BEGIN. These are complete types with sizeof = 1.
  • Pass 2: TYPE(name) expands to struct cdac_type_tag_<name>. For declared types, sizeof(...) > 0 passes. For undeclared types, the compiler fails on incomplete type (with the static_assert message as context).
  • EXTERN_TYPE and T_* primitives expand to char (always valid). T_ARRAY passes through to its inner type.

Verified that no CDAC_GLOBAL uses TYPE() references, so the scope of validation (only CDAC_TYPE_FIELD) is sufficient — globals only use primitive type annotations.

Verified all TYPE() references in both descriptors resolve to locally declared types. EXTERN_TYPE(GCAllocContext) in the GC descriptor correctly references a VM-declared type.

✅ MSVC String Literal Splitting — Correct

The chunking logic in ContractDescriptorSourceFileEmitter.cs is correct:

  • Chunks at 2000 bytes (safely under the ~2048 MSVC limit and 4095 C standard limit).
  • Avoids splitting mid-\" escape by backing up when escaped[chunkEnd - 1] == '\\'.
  • The CStringEscape regex only escapes "\", so \ only appears as part of \" sequences — the single-char backup logic is sufficient.
  • Adjacent C string literals ("chunk1" "chunk2") concatenate at compile time per the C standard.

✅ Mechanical Conversion — Complete and Consistent

Verified that all 579 CDAC_TYPE_FIELD entries across both descriptors (529 in VM, 50 in GC) and all CDAC_GLOBAL entries now use the new type annotation macros. No comment-style annotations (/*type*/) remain. No field names, offsets, or other parameters were changed — only the type annotation column was modified.

✅ Prior Review Issues — Fixed

The two issues flagged by the prior Copilot review have been addressed in commit 51a22246:

  1. CDAC_DATADESCRIPTOR_INC documented-but-unimplemented feature — documentation removed ✓
  2. T(GCHandle) comment typo — fixed to TYPE(GCHandle)

💡 Release Blob Behavioral Nuance — Informational

A few fields in the old code used bare (non-comment) type names (e.g., GCHandle, pointer, SLink) which were stringified into the release blob. These fields previously had DataType.GCHandle / DataType.pointer etc. in the managed reader. With the new macros, ALL fields have empty type names in release, so these few fields now resolve to DataType.Unknown.

This is harmless — the managed reader's FieldInfo.TypeName is string? (explicitly nullable), GetDataType("") returns DataType.Unknown gracefully, and no contract implementation depends on field-level DataType values for correctness. The old behavior was inconsistent (most fields were empty, a few had types), and the new behavior is uniformly clean. Debug/checked builds retain full type information.

This is not blocking — just worth noting for awareness.

✅ Documentation — Comprehensive and Accurate

The README updates accurately describe the new type annotation system, including the macro reference table, compile-time validation mechanism, and examples. The CDAC_DATADESCRIPTOR_INC section that described an unimplemented feature has been correctly removed.

Generated by Code Review for issue #126163 ·

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant