Skip to content

as/codegen#338

Draft
yuri91 wants to merge 85 commits intomasterfrom
as/codegen
Draft

as/codegen#338
yuri91 wants to merge 85 commits intomasterfrom
as/codegen

Conversation

@yuri91
Copy link
Copy Markdown
Member

@yuri91 yuri91 commented Jan 8, 2026

Just a draft to track the status of the AS codegen work

@yuri91 yuri91 force-pushed the as/codegen branch 2 times, most recently from 00178ad to b669cdd Compare January 14, 2026 17:01
@yuri91 yuri91 added hydra Run hydra on this PR and removed hydra Run hydra on this PR labels Jan 15, 2026
@yuri91 yuri91 force-pushed the as/codegen branch 2 times, most recently from 036ff2f to fbcc4b5 Compare April 17, 2026 13:02
@yuri91 yuri91 added the hydra Run hydra on this PR label Apr 17, 2026
@yuri91 yuri91 force-pushed the as/codegen branch 2 times, most recently from a12703a to c89941f Compare April 20, 2026 15:05
yuri91 and others added 21 commits April 21, 2026 17:32
Introduce a small helper that maps a data address space to the
corresponding function address space:

  Default   -> Default
  GenericJS -> Client   (functions live in the Client AS on the
                         JS side, not in GenericJS)
  Wasm      -> Wasm

Available in both a strongly-typed CheerpAS overload and an
`unsigned` overload. Used by the coroutine and later by
ItaniumCXXABI / LowerGlobalDtors / CodeGenModule / the asan
path to pick the correct AS for function pointers.
The helper wraps Function::Create with an explicit Cheerp AS, since
Module::getOrInsertFunction picks the module's default AS.
CheerpNativeRewriterPass rewrites C++ `new ClientType(...)` patterns
in IR. For client-transparent JS host types it turns the allocation
into a stack slot (alloca of pointer-to-T) that the constructor
writes into and the caller loads from; for non-transparent cases
it emits a synthetic `cheerpCreate<Name>` wrapper that invokes the
constructor and returns `this`. Thread Cheerp address spaces
through both paths:

- rewriteNativeAllocationUsers: the alloca lives in GenericJS (the
  slot is JS-side stack), and its element type is `T*` in the
  Client AS (the pointer it holds references a client-side object).
- rewriteIfNativeConstructorCall: the bitcast feeding the store
  into that slot takes the GenericJS AS to match the alloca.
- getReturningConstructor: create the synthetic `cheerpCreate<Name>`
  wrapper in the Client AS via cheerp::getOrCreateFunction, rather
  than the default AS from M.getOrInsertFunction.

TODO (ordering, Phase 2): this commit calls cheerp::getOrCreateFunction,
whose definition currently lives in a later commit
(FreeAndDeleteRemoval: make __genericjs__free trampoline AS-aware).
Reorder so the helper is introduced before its first use.
Clang was returning LangAS::Default for string/constant literals
in Cheerp mode, and the matching section-based "asmjs" detection
in LLVM's IRBuilder fell back to the default AS whenever a caller
didn't have a section set. Thread Cheerp address spaces through
every site that materializes a constant string:

- Clang CodeGenModule::GetGlobalConstantAddressSpace: return
  LangAS::cheerp_wasm for cheerp-wasm targets, LangAS::cheerp_genericjs
  otherwise.
- Clang CodeGenModule::castStringLiteralToDefaultAddressSpace:
  skip the default-AS cast in Cheerp mode; the AS is already
  correct.
- Clang CodeGenModule::GenerateStringLiteral: override AddrSpace
  to CheerpAS::Wasm on cheerp-wasm so the produced GlobalVariable
  lands in the wasm AS even on paths where the numeric AS derived
  from the LangAS lookup differs.
- LLVM IRBuilderBase::CreateGlobalString: detect "this string
  belongs in asmjs" from the AS argument (CheerpAS::Wasm +
  cheerp-wasm triple) instead of inspecting the basic block's
  parent function section — drives section from AS, not the
  reverse.
- LLVM OMPIRBuilder::getOrCreateSrcLocStr: pick the Cheerp AS by
  target triple when creating OpenMP source-location strings
  (previously always AS 0).
- LLVM SimplifyLibCalls: thread the source string's AS into
  CreateGlobalString/CreateGlobalStringPtr in optimizeStringNCpy
  and optimizePrintFString so generated scratch strings inherit
  the caller's AS.
yuri91 and others added 29 commits April 24, 2026 11:06
Prepare CGExpr/CGExprScalar for typed-ptrcast on AS boundaries:

- EmitStoreOfScalar: if the value's AS doesn't match the address
  element type's AS, insert a cast ("as_decay") so the store is
  well-typed. Currently uses Builder.CreateAddrSpaceCast; TODO:
  migrate to cheerp_typed_ptrcast (the preferred form used by the
  later CGCall boundary code) so the AS transition is explicit in
  IR and the upstream analyses see a single canonical intrinsic
  rather than a mix of addrspacecast and typed_ptrcast.
- VisitCastExpr: don't trip the "different address spaces" assert
  in Cheerp mode; emit PointerBitCastOrAddrSpaceCast in the
  fallback path. Same TODO applies — prefer cheerp_typed_ptrcast
  over a plain addrspacecast once the call sites are consistent.
- GetUserCastIntrinsic: take the concrete llvm::Type of the source
  pointer so the intrinsic's src type preserves the caller's AS
  instead of being re-derived from the QualType.
- EmitPointerWithAlignment / EmitCastLValue: thread the source
  pointer's LLVM type into GetUserCastIntrinsic.
Thread Cheerp address spaces through blocks codegen:
- _NSConcrete{Global,Stack}Block in correct AS
- block descriptor GV + BlockDescriptorType pointer AS
- generated block invoke / copy-helper / destroy-helper functions
  get the right AS and an asmjs section when targeting wasm
- pointer cast to generic block literal uses the right AS
- arrangeBuiltinFunction{Call,Declaration} threads isCheerpWasm
- DeclClonePragmaWeak propagates the Cheerp mode attr

TODO: split out the sanitizer ctor/dtor AS changes
(AddressSanitizer.cpp, ModuleUtils.cpp) into their own commit;
they were drive-by and are not about blocks.
We just do the bare minimum to make the tests pass
Replace plain addrspacecast with the cheerp_typed_ptrcast intrinsic
on the boundaries where a call-site pointer AS can differ from the
callee's expected AS:

- CGCall: call arguments and direct-return values now go through
  cheerp_typed_ptrcast when the AS mismatches; drop the sret-arg
  addrspacecast that is already handled upstream.
- CGExprScalar: user cast lowering uses cheerp_typed_ptrcast when
  the src/dst AS differ, falling back to the user-cast intrinsic
  otherwise.
- CodeGenFunction: give the "result.ptr" alloca the same AS as the
  return value so the stored pointer matches.
TLS data lives in wasm linear memory and must be in CheerpAS::Wasm.
A few helpers were still being emitted in AS 0, crashing later casts.

- ItaniumCXXABI: _ZTW thread wrappers now use the function AS that
  matches the variable's data AS; __tls_guard is always in Wasm AS
  with section "asmjs" (wasm-AS is the only option that works in
  TUs mixing js and wasm TLs, since js can read wasm memory but not
  the reverse; pure-js TLs are stripped by CheerpLowerAtomicPass).
- ThreadLocalLowering: __getThreadLocalAddress wrapper now returns
  i8* in Wasm AS and lives in the Wasm function AS. Assert instead
  of bitcasting when the intrinsic's AS doesn't match.
Asan was inserting allocas and memory-intrinsic callbacks in AS 0,
which then crashed AllocaLowering / IdenticalCodeFolding when the
surrounding wasm code was in CheerpAS::Wasm.

- createAllocaForLayout and createDynamicAllocasInitStorage: emit
  MyAlloca / DynamicAllocaLayout in Wasm AS for cheerp-wasm.
- handleDynamicAllocaCall: replacement alloca inherits the original
  alloca's AS so the downstream inttoptr/uses stay consistent.
- instrumentMemIntrinsic + initializeCallbacks: __asan_memcpy /
  memmove / memset are declared with i8* in Wasm AS, and the argument
  pointer casts target the same AS, removing the bogus addrspacecast
  to AS 0 on every memcpy/memmove/memset.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants