ZJIT: Fold LoadField on frozen objects to constants #911

tobi · 2025-12-10T20:20:22Z

Summary

This PR adds a compile-time optimization that folds LoadField instructions on frozen objects into constants. When reading instance variables from frozen constant objects (via attr_reader/attr_accessor), the JIT can now resolve the value at compile time rather than at runtime.

Before

v25:HeapObject[VALUE(0x1008)] = GuardShape v20, 0x1048
v26:BasicObject = LoadField v25, :@a@0x1049

After

v25:HeapObject[VALUE(0x1008)] = GuardShape v20, 0x1048
v27:Fixnum[1] = Const Value(1)

This enables further optimizations like constant propagation and arithmetic folding. For example, accessing two fields from a frozen object and adding them can now be fully folded to a single constant.

Test plan

Added 10 unit tests covering:
- Basic frozen constant objects with various value types (Fixnum, String, Symbol, nil, true/false)
- Multiple instance variables (correct offset handling)
- Negative cases: unfrozen objects and dynamic receivers correctly remain as LoadField
- Nested access with arithmetic (verifies constant propagation cascades)
Verified optimization works manually with --zjit-dump-hir

ruby/json@7b62fac525

The "EXIVAR" terminology has been replaced by "gen fields" AKA "generic fields". Exivar implies variable, but generic fields include more than just variables, e.g. `object_id`.

The NEWOBJ tracepoint can generate an object_id, that's alright, what we don't want is actual instance variables.

ruby/json@ccca602274

ruby/json@4bdb2d14fe

…truffleruby ruby/prism@c8e1b11120

While profiling `Monitor#synchronize` and `Mutex#synchronize` I noticed a fairly significant amount of time spent in `rb_check_typeddata`. By implementing a fast path that assumes the object is valid and that can be inlined, it does make a significant difference: Before: ``` Mutex 13.548M (± 3.6%) i/s (73.81 ns/i) - 68.566M in 5.067444 Monitor 10.497M (± 6.5%) i/s (95.27 ns/i) - 52.529M in 5.032698s ``` After: ``` Mutex 20.887M (± 0.3%) i/s (47.88 ns/i) - 106.021M in 5.075989s Monitor 16.245M (±13.3%) i/s (61.56 ns/i) - 80.705M in 5.099680s ``` ```ruby require 'bundler/inline' gemfile do gem "benchmark-ips" end mutex = Mutex.new require "monitor" monitor = Monitor.new Benchmark.ips do |x| x.report("Mutex") { mutex.synchronize { } } x.report("Monitor") { monitor.synchronize { } } end ```

… by heredocs See https://bugs.ruby-lang.org/issues/21756. Ripper fails to parse this, but prism actually also doesn't handle it correctly. When heredocs are used, even in lowercase percent arays there can be multiple `STRING_CONTENT` tokens. We need to concat them. Luckily we don't need to handle as many cases as in uppercase arrays where interpolation is allowed. ruby/prism@211677000e

Not so sure how to trigger it but this is definitly more correct. ruby/prism@1bc8ec5e5d

Attempt to fix the following SEGV: ``` ruby(gc_mark) ../src/gc/default/default.c:4429 ruby(gc_mark_children+0x45) [0x560b380bf8b5] ../src/gc/default/default.c:4625 ruby(gc_mark_stacked_objects) ../src/gc/default/default.c:4647 ruby(gc_mark_stacked_objects_all) ../src/gc/default/default.c:4685 ruby(gc_marks_rest) ../src/gc/default/default.c:5707 ruby(gc_marks+0x4e7) [0x560b380c41c1] ../src/gc/default/default.c:5821 ruby(gc_start) ../src/gc/default/default.c:6502 ruby(heap_prepare+0xa4) [0x560b380c4efc] ../src/gc/default/default.c:2074 ruby(heap_next_free_page) ../src/gc/default/default.c:2289 ruby(newobj_cache_miss) ../src/gc/default/default.c:2396 ruby(RB_SPECIAL_CONST_P+0x0) [0x560b380c5df4] ../src/gc/default/default.c:2420 ruby(RB_BUILTIN_TYPE) ../src/include/ruby/internal/value_type.h:184 ruby(newobj_init) ../src/gc/default/default.c:2136 ruby(rb_gc_impl_new_obj) ../src/gc/default/default.c:2500 ruby(newobj_of) ../src/gc.c:996 ruby(rb_imemo_new+0x37) [0x560b380d8bed] ../src/imemo.c:46 ruby(imemo_fields_new) ../src/imemo.c:105 ruby(rb_imemo_fields_new) ../src/imemo.c:120 ``` I have no reproduction, but my understanding based on the backtrace and error is that GC is triggered inside `newobj_init` causing the new object to be marked while in a incomplete state. I believe the fix is to pass the `shape_id` down to `newobj_init` so it can be set before the GC has a chance to trigger.

If GC trigger in the middle of `struct_alloc`, and the struct has more than 3 elements, then `fields_obj` reference is garbage. We must first check the shape to know if it was actually initialized.

This commit adds a specialized instruction iterator to the assembler with a custom "peek" method. The reason is that we want to add basic blocks to LIR. When we split instructions, we need to add any new instructions to the correct basic block. The custom iterator will maintain the correct basic block inside the assembler, that way when we push any new instructions they will be appended to the correct place.

This commit uses the custom instruction iterator in arm64 / x86_64 instruction splitting. Once we introduce basic blocks to LIR, the custom iterator will ensure that instructions are added to the correct place.

This should never be true. I added an `rb_bug` in case it was and it wasn't true in any of btest or test-all. Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>

Not every caller (for example, YJIT) actually needs to pass the object. YJIT (and, in the future, ZJIT) only need to pass the class.

In the past parse.y and ripper had different `value_expr` definition so that `value_expr` does nothing for ripper. ```c // parse.y #define value_expr(node) value_expr_gen(p, (node)) // ripper #define value_expr(node) ((void)(node)) ``` However Rearchitect Ripper (89cfc15) removed `value_expr` definition for ripper then this commit removes needless parse.y macro and uses `value_expr_gen` directly.

In the past parse.y and ripper had different `new_nil` definition so that `new_nil` returns `nil` for ripper. ```c // parse.y #define new_nil(loc) NEW_NIL(loc) // ripper #define new_nil(loc) Qnil ``` However Rearchitect Ripper (89cfc15) removed `new_nil` definition for ripper then this commit removes needless parse.y macro and uses `NEW_NIL` directly.

Fixes: ruby#685 This feature can easily break how you use other gems like factory_bot or prawn. ruby/psych#747 (comment) > But I kind of think we should leave `psych/y` around. If people really want to use it they could require the file. If you miss the function in Kernel, you can require it interactively or add it to `.irbrc`: ```ruby require 'psych/y' ``` ruby/psych@f1610b3f05

ruby/psych@4e9d08c285

It's used as an alternative to find-and-replace, so we should have nothing to replace.

We generally know the receiver's class from profile info. I see 600k of these when running lobsters.

Since we do a decent job of pre-sizing objects, don't handle the case where we would need to re-size an object. Also don't handle too-complex shapes. lobsters stats before: ``` Top-20 calls to C functions from JIT code (79.4% of total 90,051,140): rb_vm_opt_send_without_block: 19,762,433 (21.9%) rb_vm_setinstancevariable: 7,698,314 ( 8.5%) rb_hash_aref: 6,767,461 ( 7.5%) rb_vm_env_write: 5,373,080 ( 6.0%) rb_vm_send: 5,049,229 ( 5.6%) rb_vm_getinstancevariable: 4,535,259 ( 5.0%) rb_obj_is_kind_of: 3,746,306 ( 4.2%) rb_ivar_get_at_no_ractor_check: 3,745,237 ( 4.2%) rb_vm_invokesuper: 3,037,467 ( 3.4%) rb_ary_entry: 2,351,983 ( 2.6%) rb_vm_opt_getconstant_path: 1,344,740 ( 1.5%) rb_vm_invokeblock: 1,184,474 ( 1.3%) Hash#[]=: 1,064,288 ( 1.2%) rb_gc_writebarrier: 1,006,972 ( 1.1%) rb_ec_ary_new_from_values: 902,687 ( 1.0%) fetch: 898,667 ( 1.0%) rb_str_buf_append: 833,787 ( 0.9%) rb_class_allocate_instance: 822,024 ( 0.9%) Hash#fetch: 699,580 ( 0.8%) _bi20: 682,068 ( 0.8%) Top-4 setivar fallback reasons (100.0% of total 7,732,326): shape_transition: 6,032,109 (78.0%) not_monomorphic: 1,469,300 (19.0%) not_t_object: 172,636 ( 2.2%) too_complex: 58,281 ( 0.8%) ``` lobsters stats after: ``` Top-20 calls to C functions from JIT code (79.0% of total 88,322,656): rb_vm_opt_send_without_block: 19,777,880 (22.4%) rb_hash_aref: 6,771,589 ( 7.7%) rb_vm_env_write: 5,372,789 ( 6.1%) rb_gc_writebarrier: 5,195,527 ( 5.9%) rb_vm_send: 5,049,145 ( 5.7%) rb_vm_getinstancevariable: 4,538,485 ( 5.1%) rb_obj_is_kind_of: 3,746,241 ( 4.2%) rb_ivar_get_at_no_ractor_check: 3,745,172 ( 4.2%) rb_vm_invokesuper: 3,037,157 ( 3.4%) rb_ary_entry: 2,351,968 ( 2.7%) rb_vm_setinstancevariable: 1,703,337 ( 1.9%) rb_vm_opt_getconstant_path: 1,344,730 ( 1.5%) rb_vm_invokeblock: 1,184,290 ( 1.3%) Hash#[]=: 1,061,868 ( 1.2%) rb_ec_ary_new_from_values: 902,666 ( 1.0%) fetch: 898,666 ( 1.0%) rb_str_buf_append: 833,784 ( 0.9%) rb_class_allocate_instance: 821,778 ( 0.9%) Hash#fetch: 755,913 ( 0.9%) Top-4 setivar fallback reasons (100.0% of total 1,703,337): not_monomorphic: 1,472,405 (86.4%) not_t_object: 172,629 (10.1%) too_complex: 58,281 ( 3.4%) new_shape_needs_extension: 22 ( 0.0%) ``` I also noticed that primitive printing in HIR was broken so I fixed that. Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>

… increase: - ### TL;DR Bundler is heavily limited by the connection pool which manages a single connection. By increasing the number of connection, we can drastiscally speed up the installation process when many gems need to be downloaded and installed. ### Benchmark There are various factors that are hard to control such as compilation time and network speed but after dozens of tests I can consistently get aroud 70% speed increase when downloading and installing 472 gems, most having no native extensions (on purpose). ``` # Before bundle install 28.60s user 12.70s system 179% cpu 23.014 total # After bundle install 30.09s user 15.90s system 281% cpu 16.317 total ``` You can find on this gist how this was benchmarked and the Gemfile used https://gist.github.com/Edouard-chin/c8e39148c0cdf324dae827716fbe24a0 ### Context A while ago in #869, Aaron introduced a connection pool which greatly improved Bundler speed. It was noted in the PR description that managing one connection was already good enough and it wasn't clear whether we needed more connections. Aaron also had the intuition that we may need to increase the pool for downloading gems and he was right. > We need to study how RubyGems uses connections and make a decision > based on request usage (e.g. only use one connection for many small > requests like bundler API, and maybe many connections for > downloading gems) When bundler downloads and installs gem in parallel https://github.com/ruby/rubygems/blob/4f85e02fdd89ee28852722dfed42a13c9f5c9193/bundler/lib/bundler/installer/parallel_installer.rb#L128 most threads have to wait for the only connection in the pool to be available which is not efficient. ### Solution This commit modifies the pool size for the fetcher that Bundler uses. RubyGems fetcher will continue to use a single connection. The bundler fetcher is used in 2 places. 1. When downloading gems https://github.com/ruby/rubygems/blob/4f85e02fdd89ee28852722dfed42a13c9f5c9193/bundler/lib/bundler/source/rubygems.rb#L481-L484 2. When grabing the index (not the compact index) using the `bundle install --full-index` flag. https://github.com/ruby/rubygems/blob/4f85e02fdd89ee28852722dfed42a13c9f5c9193/bundler/lib/bundler/fetcher/index.rb#L9 Having more connections in 2) is not any useful but tweaking the size based on where the fetcher is used is a bit tricky so I opted to modify it at the class level. I fiddle with the pool size and found that 5 seems to be the sweet spot at least for my environment. ruby/rubygems@6063fd9963

ruby/stringio@e2d24ae8d7

(ruby/stringio#188) ruby/stringio@66360ee5f1

(ruby/stringio#190) ruby/stringio@77209fac20

(ruby/stringio#165) Adds to "Position": pos inside a character. Makes a couple of minor corrections. --------- ruby/stringio@ff332abafa Co-authored-by: Sutou Kouhei <kou@cozmixng.org>

(ruby/stringio#171) ruby/stringio@95a111017a

ruby/win32-registry@2a6ab00f67

ruby/optparse@f2e31e81a5

This refactors the concurrent set to examine and reserve a slot via CAS with the hash, before then doing the same with the key. This allows us to use an extra bit from the hash as a "continuation bit" which marks whether we have ever probed past this key while inserting. When that bit isn't set on deletion we can clear the field instead of placing a tombstone.

This removes all allocations from the find_or_insert loop, which requires us to start the search over after calling the provided create function. In exchange that allows us to assume that all concurrent threads insert will get the same view of the GC state, and so should all be attempting to clear and reuse a slot containing a garbage object.

Fix: ruby/forwardable#35 [Bug #21708] Trying to compile code to check if a method can use the delegation fastpath is a bit wasteful and cause `RUPYOPT=-d` to be full of misleading errors. It's simpler and faster to use a simple regexp to do the same check. ruby/forwardable@de1fbd182e

ruby/forwardable@0257b590c2

That call is surprisingly expensive, so trying doing it once in `#synchronize` and then passing the fiber to enter and exit saves quite a few cycles.

Make it embedded and compaction aware.

It's the most likely control character so it's worth giving a better error message for it. ruby/json@1da3fd9233

…y#15478) This fixes a crash when the new shape after a transition is too complex; we need to check that it's not complex before trying to read by index.

Encodings are RTypedData, not the deprecated RData. Although the structures are compatible we should use the correct API.

When accessing instance variables from frozen objects via attr_reader/ attr_accessor, fold the LoadField instruction to a constant at compile time. This enables further optimizations like constant propagation. - Add fold_getinstancevariable_frozen optimization in Function::optimize - Check if receiver type has a known ruby_object() that is frozen - Read the field value at compile time and replace with Const instruction - Add 10 unit tests covering various value types (fixnum, string, symbol, nil, true/false) and negative cases (unfrozen, dynamic receiver)

tekknolagi · 2025-12-10T20:33:14Z

Closing in favor of ruby#15483

byroot and others added 30 commits December 3, 2025 14:13

[ruby/json] Fix duplicated test_unsafe_load_with_options test case

05383a1

ruby/json@7b62fac525

Rename rb_obj_exivar_p -> rb_obj_gen_fields_p

5770c18

The "EXIVAR" terminology has been replaced by "gen fields" AKA "generic fields". Exivar implies variable, but generic fields include more than just variables, e.g. `object_id`.

fstring_concurrent_set_create: only assert the string has no ivars

b78db63

The NEWOBJ tracepoint can generate an object_id, that's alright, what we don't want is actual instance variables.

[ruby/json] Fix handling of depth

208271e

ruby/json@ccca602274

[ruby/json] Release 2.17.0

94581b1

ruby/json@4bdb2d14fe

Update default gems list at 94581b1 [ci skip]

20fc8af

[ruby/prism] Follow repo move from oracle/truffleruby to truffleruby/…

d7dffcd

…truffleruby ruby/prism@c8e1b11120

wb-protect autoload_const

f9cd94f

[ruby/prism] Fix wrong error message for lower percent i arrays

d5c7cf0

Not so sure how to trigger it but this is definitly more correct. ruby/prism@1bc8ec5e5d

ZJIT: Optimize NewArray to use rb_ec_ary_new_from_values (ruby#15391)

fd02356

gc.c: check if the struct has fields before marking the fields_obj

8d1a6bc

If GC trigger in the middle of `struct_alloc`, and the struct has more than 3 elements, then `fields_obj` reference is garbage. We must first check the shape to know if it was actually initialized.

Group malloc counters together

9913d8d

Track small malloc/free changes in thread local

a773bbf

ZJIT: Use the custom iterator

d7e55f8

This commit uses the custom instruction iterator in arm64 / x86_64 instruction splitting. Once we introduce basic blocks to LIR, the custom iterator will ensure that instructions are added to the correct place.

Remove spurious obj != klass check in shape_get_next

612a668

This should never be true. I added an `rb_bug` in case it was and it wasn't true in any of btest or test-all. Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>

Move imemo fields check out of shape_get_next

f167073

Not every caller (for example, YJIT) actually needs to pass the object. YJIT (and, in the future, ZJIT) only need to pass the class.

YJIT: Pass class and shape ID directly instead of object

b43e66d

[ruby/psych] Add option to disable symbol parsing

0e7e685

ruby/psych@4e9d08c285

ZJIT: Only use make_equal_to for instructions with output

c764269

It's used as an alternative to find-and-replace, so we should have nothing to replace.

ZJIT: Fix definite assignment to work with multiple entry blocks

19f0df0

ZJIT: Inline Kernel#class (ruby#15397)

3efd8c6

We generally know the receiver's class from profile info. I see 600k of these when running lobsters.

BurdetteLamar and others added 27 commits December 10, 2025 15:16

[ruby/stringio] [DOC] Fix link

668fe01

ruby/stringio@e2d24ae8d7

[ruby/stringio] [DOC] Tweaks for StringIO.getbyte

f623fcc

(ruby/stringio#188) ruby/stringio@66360ee5f1

[ruby/stringio] [DOC] Tweaks for StringIO#gets

5bc65db

(ruby/stringio#190) ruby/stringio@77209fac20

[ruby/stringio] [DOC] Tweaks for StringIO#each_line

b4a1f17

(ruby/stringio#165) Adds to "Position": pos inside a character. Makes a couple of minor corrections. --------- ruby/stringio@ff332abafa Co-authored-by: Sutou Kouhei <kou@cozmixng.org>

[ruby/stringio] [DOC] Doc for StringIO.size

6ec5c5f

(ruby/stringio#171) ruby/stringio@95a111017a

[ruby/win32-registry] v0.1.2

254653d

ruby/win32-registry@2a6ab00f67

Update default gems list at 254653d [ci skip]

a8b7fb7

[ruby/optparse] v0.8.1

8e87f20

ruby/optparse@f2e31e81a5

Update default gems list at 8e87f20 [ci skip]

492b1c7

Fix typo and shadowing

375025a

[ruby/forwardable] v1.4.0

e8a5527

ruby/forwardable@0257b590c2

Update default gems list at e8a5527 [ci skip]

ef4490d

Monitor: avoid repeated calls to rb_fiber_current()

c5608ab

That call is surprisingly expensive, so trying doing it once in `#synchronize` and then passing the fiber to enter and exit saves quite a few cycles.

Modernize Monitor TypedData

6777d10

Make it embedded and compaction aware.

[ruby/json] Add a specific error for unescaped newlines

023c6d8

It's the most likely control character so it's worth giving a better error message for it. ruby/json@1da3fd9233

Fix typos in comment of rb_current_execution_context()

2b66fc7

ZJIT: Check if shape is too complex before reading ivar by index (rub…

ed18a21

…y#15478) This fixes a crash when the new shape after a transition is too complex; we need to check that it's not complex before trying to read by index.

ZJIT: Exclude failing ruby-bench benchmarks (ruby#15479)

1eb10ca

Always treat encoding as TYPEDDATA

41ee658

Encodings are RTypedData, not the deprecated RData. Although the structures are compatible we should use the correct API.

ubuntu.yml: Add a ruby-bench job without ZJIT (ruby#15480)

330ddcc

Run zjit-test-update

3bc97b9

Add a test that we don't fold non-BasicObject

72ab0c3

Small cleanups

54379ac

tekknolagi closed this Dec 10, 2025

tekknolagi deleted the zjit-frozen-constant-folding branch December 10, 2025 21:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ZJIT: Fold LoadField on frozen objects to constants #911

ZJIT: Fold LoadField on frozen objects to constants #911

Uh oh!

tobi commented Dec 10, 2025

Uh oh!

tekknolagi commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

ZJIT: Fold LoadField on frozen objects to constants #911

ZJIT: Fold LoadField on frozen objects to constants #911

Uh oh!

Conversation

tobi commented Dec 10, 2025

Summary

Before

After

Test plan

Uh oh!

tekknolagi commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants