Skip to content

Conversation

@tobi
Copy link
Member

@tobi tobi commented Dec 10, 2025

Summary

This PR adds a compile-time optimization that folds LoadField instructions on frozen objects into constants. When reading instance variables from frozen constant objects (via attr_reader/attr_accessor), the JIT can now resolve the value at compile time rather than at runtime.

Before

v25:HeapObject[VALUE(0x1008)] = GuardShape v20, 0x1048
v26:BasicObject = LoadField v25, :@a@0x1049

After

v25:HeapObject[VALUE(0x1008)] = GuardShape v20, 0x1048
v27:Fixnum[1] = Const Value(1)

This enables further optimizations like constant propagation and arithmetic folding. For example, accessing two fields from a frozen object and adding them can now be fully folded to a single constant.

Test plan

  • Added 10 unit tests covering:
    • Basic frozen constant objects with various value types (Fixnum, String, Symbol, nil, true/false)
    • Multiple instance variables (correct offset handling)
    • Negative cases: unfrozen objects and dynamic receivers correctly remain as LoadField
    • Nested access with arithmetic (verifies constant propagation cascades)
  • Verified optimization works manually with --zjit-dump-hir

byroot and others added 30 commits December 3, 2025 14:13
The "EXIVAR" terminology has been replaced by "gen fields"
AKA "generic fields".

Exivar implies variable, but generic fields include more than
just variables, e.g. `object_id`.
The NEWOBJ tracepoint can generate an object_id, that's alright,
what we don't want is actual instance variables.
While profiling `Monitor#synchronize` and `Mutex#synchronize`
I noticed a fairly significant amount of time spent in
`rb_check_typeddata`.

By implementing a fast path that assumes the object is valid
and that can be inlined, it does make a significant difference:

Before:

```
  Mutex     13.548M (± 3.6%) i/s   (73.81 ns/i) -     68.566M in   5.067444
Monitor     10.497M (± 6.5%) i/s   (95.27 ns/i) -     52.529M in   5.032698s
```

After:

```
  Mutex     20.887M (± 0.3%) i/s   (47.88 ns/i) -    106.021M in   5.075989s
Monitor     16.245M (±13.3%) i/s   (61.56 ns/i) -     80.705M in   5.099680s
```

```ruby
require 'bundler/inline'

gemfile do
  gem "benchmark-ips"
end

mutex = Mutex.new
require "monitor"
monitor = Monitor.new

Benchmark.ips do |x|
  x.report("Mutex") { mutex.synchronize { } }
  x.report("Monitor") { monitor.synchronize { } }
end
```
… by heredocs

See https://bugs.ruby-lang.org/issues/21756. Ripper fails to parse this,
but prism actually also doesn't handle it correctly.

When heredocs are used, even in lowercase percent arays there can be
multiple `STRING_CONTENT` tokens. We need to concat them.

Luckily we don't need to handle as many cases as in uppercase arrays where interpolation is allowed.

ruby/prism@211677000e
Not so sure how to trigger it but this is definitly more correct.

ruby/prism@1bc8ec5e5d
Attempt to fix the following SEGV:

```
ruby(gc_mark) ../src/gc/default/default.c:4429
ruby(gc_mark_children+0x45) [0x560b380bf8b5] ../src/gc/default/default.c:4625
ruby(gc_mark_stacked_objects) ../src/gc/default/default.c:4647
ruby(gc_mark_stacked_objects_all) ../src/gc/default/default.c:4685
ruby(gc_marks_rest) ../src/gc/default/default.c:5707
ruby(gc_marks+0x4e7) [0x560b380c41c1] ../src/gc/default/default.c:5821
ruby(gc_start) ../src/gc/default/default.c:6502
ruby(heap_prepare+0xa4) [0x560b380c4efc] ../src/gc/default/default.c:2074
ruby(heap_next_free_page) ../src/gc/default/default.c:2289
ruby(newobj_cache_miss) ../src/gc/default/default.c:2396
ruby(RB_SPECIAL_CONST_P+0x0) [0x560b380c5df4] ../src/gc/default/default.c:2420
ruby(RB_BUILTIN_TYPE) ../src/include/ruby/internal/value_type.h:184
ruby(newobj_init) ../src/gc/default/default.c:2136
ruby(rb_gc_impl_new_obj) ../src/gc/default/default.c:2500
ruby(newobj_of) ../src/gc.c:996
ruby(rb_imemo_new+0x37) [0x560b380d8bed] ../src/imemo.c:46
ruby(imemo_fields_new) ../src/imemo.c:105
ruby(rb_imemo_fields_new) ../src/imemo.c:120
```

I have no reproduction, but my understanding based on the backtrace
and error is that GC is triggered inside `newobj_init` causing the
new object to be marked while in a incomplete state.

I believe the fix is to pass the `shape_id` down to `newobj_init`
so it can be set before the GC has a chance to trigger.
If GC trigger in the middle of `struct_alloc`, and the struct has more
than 3 elements, then `fields_obj` reference is garbage.

We must first check the shape to know if it was actually initialized.
This commit adds a specialized instruction iterator to the assembler
with a custom "peek" method.  The reason is that we want to add basic
blocks to LIR.  When we split instructions, we need to add any new
instructions to the correct basic block.  The custom iterator will
maintain the correct basic block inside the assembler, that way when we
push any new instructions they will be appended to the correct place.
This commit uses the custom instruction iterator in arm64 / x86_64
instruction splitting.  Once we introduce basic blocks to LIR, the
custom iterator will ensure that instructions are added to the correct
place.
This should never be true. I added an `rb_bug` in case it was and it
wasn't true in any of btest or test-all.

Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
Not every caller (for example, YJIT) actually needs to pass the object.
YJIT (and, in the future, ZJIT) only need to pass the class.
In the past parse.y and ripper had different `value_expr` definition
so that `value_expr` does nothing for ripper.

```c
// parse.y
#define value_expr(node) value_expr_gen(p, (node))

// ripper
#define value_expr(node) ((void)(node))
```

However Rearchitect Ripper (89cfc15)
removed `value_expr` definition for ripper then this commit removes
needless parse.y macro and uses `value_expr_gen` directly.
In the past parse.y and ripper had different `new_nil` definition
so that `new_nil` returns `nil` for ripper.

```c
// parse.y
#define new_nil(loc) NEW_NIL(loc)

// ripper
#define new_nil(loc) Qnil
```

However Rearchitect Ripper (89cfc15)
removed `new_nil` definition for ripper then this commit removes
needless parse.y macro and uses `NEW_NIL` directly.
Fixes: ruby#685

This feature can easily break how you use other gems like factory_bot or prawn.

ruby/psych#747 (comment)
> But I kind of think we should leave `psych/y` around. If people really want to use it they could require the file.

If you miss the function in Kernel, you can require it interactively or add it to `.irbrc`:
```ruby
require 'psych/y'
```

ruby/psych@f1610b3f05
It's used as an alternative to find-and-replace, so we should have
nothing to replace.
We generally know the receiver's class from profile info. I see 600k of these when running lobsters.
Since we do a decent job of pre-sizing objects, don't handle the case where we would need to re-size an object. Also don't handle too-complex shapes.

lobsters stats before:

```
Top-20 calls to C functions from JIT code (79.4% of total 90,051,140):
                             rb_vm_opt_send_without_block: 19,762,433 (21.9%)
                                rb_vm_setinstancevariable:  7,698,314 ( 8.5%)
                                             rb_hash_aref:  6,767,461 ( 7.5%)
                                          rb_vm_env_write:  5,373,080 ( 6.0%)
                                               rb_vm_send:  5,049,229 ( 5.6%)
                                rb_vm_getinstancevariable:  4,535,259 ( 5.0%)
                                        rb_obj_is_kind_of:  3,746,306 ( 4.2%)
                           rb_ivar_get_at_no_ractor_check:  3,745,237 ( 4.2%)
                                        rb_vm_invokesuper:  3,037,467 ( 3.4%)
                                             rb_ary_entry:  2,351,983 ( 2.6%)
                               rb_vm_opt_getconstant_path:  1,344,740 ( 1.5%)
                                        rb_vm_invokeblock:  1,184,474 ( 1.3%)
                                                 Hash#[]=:  1,064,288 ( 1.2%)
                                       rb_gc_writebarrier:  1,006,972 ( 1.1%)
                                rb_ec_ary_new_from_values:    902,687 ( 1.0%)
                                                    fetch:    898,667 ( 1.0%)
                                        rb_str_buf_append:    833,787 ( 0.9%)
                               rb_class_allocate_instance:    822,024 ( 0.9%)
                                               Hash#fetch:    699,580 ( 0.8%)
                                                    _bi20:    682,068 ( 0.8%)
Top-4 setivar fallback reasons (100.0% of total 7,732,326):
  shape_transition: 6,032,109 (78.0%)
   not_monomorphic: 1,469,300 (19.0%)
      not_t_object:   172,636 ( 2.2%)
       too_complex:    58,281 ( 0.8%)
```

lobsters stats after:

```
Top-20 calls to C functions from JIT code (79.0% of total 88,322,656):
                             rb_vm_opt_send_without_block: 19,777,880 (22.4%)
                                             rb_hash_aref:  6,771,589 ( 7.7%)
                                          rb_vm_env_write:  5,372,789 ( 6.1%)
                                       rb_gc_writebarrier:  5,195,527 ( 5.9%)
                                               rb_vm_send:  5,049,145 ( 5.7%)
                                rb_vm_getinstancevariable:  4,538,485 ( 5.1%)
                                        rb_obj_is_kind_of:  3,746,241 ( 4.2%)
                           rb_ivar_get_at_no_ractor_check:  3,745,172 ( 4.2%)
                                        rb_vm_invokesuper:  3,037,157 ( 3.4%)
                                             rb_ary_entry:  2,351,968 ( 2.7%)
                                rb_vm_setinstancevariable:  1,703,337 ( 1.9%)
                               rb_vm_opt_getconstant_path:  1,344,730 ( 1.5%)
                                        rb_vm_invokeblock:  1,184,290 ( 1.3%)
                                                 Hash#[]=:  1,061,868 ( 1.2%)
                                rb_ec_ary_new_from_values:    902,666 ( 1.0%)
                                                    fetch:    898,666 ( 1.0%)
                                        rb_str_buf_append:    833,784 ( 0.9%)
                               rb_class_allocate_instance:    821,778 ( 0.9%)
                                               Hash#fetch:    755,913 ( 0.9%)
Top-4 setivar fallback reasons (100.0% of total 1,703,337):
            not_monomorphic: 1,472,405 (86.4%)
               not_t_object:   172,629 (10.1%)
                too_complex:    58,281 ( 3.4%)
  new_shape_needs_extension:        22 ( 0.0%)
```

I also noticed that primitive printing in HIR was broken so I fixed that.

Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
… increase:

- ### TL;DR

  Bundler is heavily limited by the connection pool which manages a
  single connection. By increasing the number of connection, we can
  drastiscally speed up the installation process when many gems need
  to be downloaded and installed.

  ### Benchmark

  There are various factors that are hard to control such as
  compilation time and network speed but after dozens of tests I
  can consistently get aroud 70% speed increase when downloading and
  installing 472 gems, most having no native extensions (on purpose).

  ```
  # Before
  bundle install  28.60s user 12.70s system 179% cpu 23.014 total

  # After
  bundle install  30.09s user 15.90s system 281% cpu 16.317 total
  ```

  You can find on this gist how this was benchmarked and the Gemfile
  used https://gist.github.com/Edouard-chin/c8e39148c0cdf324dae827716fbe24a0

  ### Context

  A while ago in #869, Aaron introduced a connection pool which
  greatly improved Bundler speed. It was noted in the PR description
  that managing one connection was already good enough and it wasn't
  clear whether we needed more connections. Aaron also had the
  intuition that we may need to increase the pool for downloading
  gems and he was right.

  > We need to study how RubyGems uses connections and make a decision
  > based on request usage (e.g. only use one connection for many small
  > requests like bundler API, and maybe many connections for
  > downloading gems)

  When bundler downloads and installs gem in parallel https://github.com/ruby/rubygems/blob/4f85e02fdd89ee28852722dfed42a13c9f5c9193/bundler/lib/bundler/installer/parallel_installer.rb#L128
  most threads have to wait for the only connection in the pool to be
  available which is not efficient.

  ### Solution

  This commit modifies the pool size for the fetcher that Bundler
  uses. RubyGems fetcher will continue to use a single connection.

  The bundler fetcher is used in 2 places.

  1. When downloading gems https://github.com/ruby/rubygems/blob/4f85e02fdd89ee28852722dfed42a13c9f5c9193/bundler/lib/bundler/source/rubygems.rb#L481-L484
  2. When grabing the index (not the compact index) using the
    `bundle install --full-index` flag.
    https://github.com/ruby/rubygems/blob/4f85e02fdd89ee28852722dfed42a13c9f5c9193/bundler/lib/bundler/fetcher/index.rb#L9

  Having more connections in 2) is not any useful but tweaking the
  size based on where the fetcher is used is a bit tricky so I opted
  to modify it at the class level.
  I fiddle with the pool size and found that 5 seems to be the sweet
  spot at least for my environment.

ruby/rubygems@6063fd9963
BurdetteLamar and others added 27 commits December 10, 2025 15:16
(ruby/stringio#165)

Adds to "Position":  pos inside a character.

Makes a couple of minor corrections.

---------

ruby/stringio@ff332abafa

Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
This refactors the concurrent set to examine and reserve a slot via CAS
with the hash, before then doing the same with the key.

This allows us to use an extra bit from the hash as a "continuation bit"
which marks whether we have ever probed past this key while inserting.
When that bit isn't set on deletion we can clear the field instead of
placing a tombstone.
This removes all allocations from the find_or_insert loop, which
requires us to start the search over after calling the provided create
function.

In exchange that allows us to assume that all concurrent threads insert
will get the same view of the GC state, and so should all be attempting
to clear and reuse a slot containing a garbage object.
Fix: ruby/forwardable#35
[Bug #21708]

Trying to compile code to check if a method can use the delegation
fastpath is a bit wasteful and cause `RUPYOPT=-d` to be full of
misleading errors.

It's simpler and faster to use a simple regexp to do the same check.

ruby/forwardable@de1fbd182e
That call is surprisingly expensive, so trying doing it once
in `#synchronize` and then passing the fiber to enter and exit
saves quite a few cycles.
Make it embedded and compaction aware.
It's the most likely control character so it's worth
giving a better error message for it.

ruby/json@1da3fd9233
…y#15478)

This fixes a crash when the new shape after a transition  is too complex;
we need to check that it's not complex before trying to read by index.
Encodings are RTypedData, not the deprecated RData. Although the
structures are compatible we should use the correct API.
When accessing instance variables from frozen objects via attr_reader/
attr_accessor, fold the LoadField instruction to a constant at compile
time. This enables further optimizations like constant propagation.

- Add fold_getinstancevariable_frozen optimization in Function::optimize
- Check if receiver type has a known ruby_object() that is frozen
- Read the field value at compile time and replace with Const instruction
- Add 10 unit tests covering various value types (fixnum, string, symbol,
  nil, true/false) and negative cases (unfrozen, dynamic receiver)
@tekknolagi
Copy link

Closing in favor of ruby#15483

@tekknolagi tekknolagi closed this Dec 10, 2025
@tekknolagi tekknolagi deleted the zjit-frozen-constant-folding branch December 10, 2025 21:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.