-
Notifications
You must be signed in to change notification settings - Fork 4
Skip file_exists() on the hot path: an OPcache-cached require needs no stat() #134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
koriym
wants to merge
3
commits into
ray-di:1.x
Choose a base branch
from
koriym:skip-file-exists-opcache-hotpath
base: 1.x
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+313
−15
Open
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,36 @@ | ||
| # DI strategy benchmark | ||
|
|
||
| > [!WARNING] | ||
| > **Compile-test only — not part of the library.** It is not autoloaded and is never | ||
| > used in production. It exists solely to measure the compiled injector against the | ||
| > reflection and `serialize()`d injectors. | ||
|
|
||
| `di_benchmark.php` builds the same object graph three ways — `Ray\Di\Injector` | ||
| (reflection), a `serialize()`d injector, and `Ray\Compiler\CompiledInjector` — and reports | ||
| cold-start and steady-state (per-build) cost. | ||
|
|
||
| ## Run | ||
|
|
||
| Requires `vendor/` (`composer install`). Use production-like settings (Xdebug off, OPcache on): | ||
|
|
||
| ```bash | ||
| php -d xdebug.mode=off -d opcache.enable_cli=1 -d opcache.validate_timestamps=0 benchmark/di_benchmark.php | ||
| ``` | ||
|
|
||
| ## Reading the output | ||
|
|
||
| The first line is an **OPcache self-check** — the numbers are only trustworthy when it says `(valid)`: | ||
|
|
||
| ```text | ||
| OPcache: hit 100.0%, 9 compiled scripts cached (valid) | ||
| ``` | ||
|
|
||
| If it prints `INVALID`, OPcache is not caching the freshly generated scripts, so `compiled` is | ||
| re-parsing and looks several times slower than it really is. The script back-dates the generated | ||
| files to avoid this automatically; if it still reports `INVALID`, re-run with | ||
| `-d opcache.file_update_protection=0`. | ||
|
|
||
| ## Background | ||
|
|
||
| For the three strategies, why OPcache is the prerequisite, the measured results, and the full list | ||
| of benchmarking pitfalls, see **[docs/performance.md](../docs/performance.md)**. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,120 @@ | ||
| <?php | ||
|
|
||
| /** | ||
| * DI strategy benchmark — COMPILE TEST ONLY. | ||
| * | ||
| * Compares three strategies for building objects from a Ray.Di module: | ||
| * 1. reflection : Ray\Di\Injector (runtime reflection, warm Container) | ||
| * 2. serialize : serialize/unserialize Injector (warm Container restored from a blob) | ||
| * 3. compiled : Ray\Compiler\CompiledInjector (pre-compiled PHP scripts) | ||
| * | ||
| * It measures the two regimes that matter: | ||
| * - cold start : build the Container once (reflection) / unserialize (serialize) / require (compiled) | ||
| * - steady state : build a PROTOTYPE object repeatedly (the "build-many" regime, e.g. Grapher / entity hydration) | ||
| * | ||
| * NOT part of the library. Not autoloaded. Do not use in production. | ||
| * See benchmark/README.md. | ||
| * | ||
| * Run with realistic settings (Xdebug off, OPcache on): | ||
| * php -d xdebug.mode=off -d opcache.enable_cli=1 -d opcache.validate_timestamps=0 benchmark/di_benchmark.php | ||
| */ | ||
|
|
||
| declare(strict_types=1); | ||
|
|
||
| use Ray\Compiler\CompiledInjector; | ||
| use Ray\Compiler\Compiler; | ||
| use Ray\Compiler\FakeCarInterface; | ||
| use Ray\Compiler\FakeCarModule; | ||
| use Ray\Di\Injector; | ||
|
|
||
| require __DIR__ . '/../vendor/autoload.php'; | ||
|
|
||
| const ITERATIONS = 50000; | ||
|
|
||
| $interface = FakeCarInterface::class; | ||
| $tmp = sys_get_temp_dir() . '/ray_compiler_bench_' . getmypid(); | ||
| $aopDir = $tmp . '/aop'; | ||
| $diDir = $tmp . '/di'; | ||
| @mkdir($aopDir, 0777, true); | ||
| @mkdir($diDir, 0777, true); | ||
|
|
||
| /** @return array{0: float, 1: mixed} elapsed milliseconds and the callback result */ | ||
| function measure(callable $fn): array | ||
| { | ||
| $start = hrtime(true); | ||
| $result = $fn(); | ||
|
|
||
| return [(hrtime(true) - $start) / 1e6, $result]; | ||
| } | ||
|
|
||
| /** @return float microseconds per operation */ | ||
| function steady(callable $fn, int $iterations): float | ||
| { | ||
| for ($i = 0; $i < 2000; $i++) { | ||
| $fn(); // warm up | ||
| } | ||
|
|
||
| $start = hrtime(true); | ||
| for ($i = 0; $i < $iterations; $i++) { | ||
| $fn(); | ||
| } | ||
|
|
||
| return (hrtime(true) - $start) / 1e3 / $iterations; | ||
| } | ||
|
|
||
| // --- 1. reflection: build the Container once, then build prototypes repeatedly --- | ||
| [$reflectColdMs, $injector] = measure(static fn (): Injector => new Injector(new FakeCarModule(), $aopDir)); | ||
| $reflectSteadyUs = steady(static fn () => $injector->getInstance($interface), ITERATIONS); | ||
|
|
||
| // --- 2. serialize: cache the warm injector, restore it from a blob --- | ||
| [$serializeMs, $blob] = measure(static fn (): string => serialize($injector)); | ||
| $blobKb = strlen($blob) / 1024; | ||
| [$unserializeMs, $restored] = measure(static fn () => unserialize($blob)); | ||
| assert($restored instanceof Injector); | ||
| $serializeSteadyUs = steady(static fn () => $restored->getInstance($interface), ITERATIONS); | ||
|
|
||
| // --- 3. compiled: compile offline, then build prototypes from scripts --- | ||
| // CRITICAL: the generated scripts must be served from OPcache to be representative | ||
| // of production. OPcache refuses to cache files younger than | ||
| // opcache.file_update_protection (default 2s), so a benchmark that compiles and | ||
| // measures immediately re-parses every script on every require and makes "compiled" | ||
| // look ~5x slower than it really is. Age the scripts past that window first. | ||
| [$compileMs] = measure(static fn () => (new Compiler())->compile(new FakeCarModule(), $diDir)); | ||
| $scriptCount = count((array) glob($diDir . '/*.php')); | ||
| $opcacheOn = function_exists('opcache_get_status') && (bool) ini_get('opcache.enable_cli'); | ||
| // OPcache refuses files younger than opcache.file_update_protection (default 2s). | ||
| // Backdate the freshly written scripts so OPcache accepts them — this mirrors | ||
| // production, where scripts are compiled at deploy time, long before being served. | ||
| $backdated = time() - 3600; | ||
| foreach ((array) glob($diDir . '/*.php') as $file) { | ||
| touch((string) $file, $backdated); | ||
| } | ||
| [$compiledColdMs, $compiled] = measure(static fn (): CompiledInjector => new CompiledInjector($diDir)); | ||
| $compiledSteadyUs = steady(static fn () => $compiled->getInstance($interface), ITERATIONS); | ||
|
|
||
| // OPcache self-check — if the compiled scripts are not cached, the result is invalid. | ||
| $opcacheNote = 'OPcache: off (compiled re-parses every call — not representative)'; | ||
| if ($opcacheOn) { | ||
| $status = opcache_get_status(true); | ||
| $cached = count(array_filter(array_keys((array) ($status['scripts'] ?? [])), static fn ($f): bool => str_contains((string) $f, $diDir))); | ||
| $rate = (float) ($status['opcache_statistics']['opcache_hit_rate'] ?? 0.0); | ||
| $opcacheNote = $cached > 0 | ||
| ? sprintf('OPcache: hit %.1f%%, %d compiled scripts cached (valid)', $rate, $cached) | ||
| : 'OPcache: 0 compiled scripts cached — INVALID, scripts are re-parsing. Re-run with -d opcache.file_update_protection=0'; | ||
| } | ||
|
|
||
| $peakMb = memory_get_peak_usage(true) / 1048576; | ||
|
|
||
| printf("Ray.Compiler DI benchmark — FakeCar prototype graph (ctor + 5 setters + AOP + singleton mirrors)\n"); | ||
| printf("iterations=%d php=%s opcache=%d xdebug=%d\n", ITERATIONS, PHP_VERSION, (int) ini_get('opcache.enable_cli'), (int) extension_loaded('xdebug')); | ||
| printf("%s\n\n", $opcacheNote); | ||
| printf("%-12s | %-22s | %-18s | %s\n", 'strategy', 'cold start', 'steady (build-many)', 'deploy artifact'); | ||
| printf("%s\n", str_repeat('-', 86)); | ||
| printf("%-12s | %-22s | %16.1f us | %s\n", 'reflection', sprintf('%.1f ms (build)', $reflectColdMs), $reflectSteadyUs, '-'); | ||
| printf("%-12s | %-22s | %16.1f us | %s\n", 'serialize', sprintf('%.2f ms (unserialize)', $unserializeMs), $serializeSteadyUs, sprintf('%.0f KB blob (ser %.1f ms)', $blobKb, $serializeMs)); | ||
| printf("%-12s | %-22s | %16.1f us | %s\n", 'compiled', sprintf('%.2f ms (new injector)', $compiledColdMs), $compiledSteadyUs, sprintf('%d scripts (compile %.0f ms)', $scriptCount, $compileMs)); | ||
| printf("\npeak memory: %.1f MB\n", $peakMb); | ||
|
|
||
| // cleanup | ||
| array_map('unlink', (array) glob($tmp . '/{,*/}*.*', GLOB_BRACE)); | ||
| @array_map('rmdir', [$aopDir, $diDir, $tmp]); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,108 @@ | ||
| # Performance & OPcache | ||
|
|
||
| How Ray.Compiler is fast, the one prerequisite that makes it fast, and how to measure it | ||
| without fooling yourself. Distilled from benchmarking against `Ray\Di\Injector` (reflection) | ||
| and a `serialize()`d injector. | ||
|
|
||
| ## TL;DR | ||
|
|
||
| - **`CompiledInjector` is the fastest strategy — but only when its scripts are served from OPcache.** | ||
| - Without warm OPcache, every `require` **re-parses** the script and compiled looks **several times slower than it really is** — the table below measures ~178 µs cold vs ~22 µs warm (~8×). | ||
| - In production this is a non-issue: scripts are compiled at deploy time and OPcache keeps the opcodes in shared memory. In **benchmarks** it is the single biggest source of wrong numbers (see below). | ||
|
coderabbitai[bot] marked this conversation as resolved.
|
||
|
|
||
| ## The three strategies | ||
|
|
||
| | Strategy | What runs per object graph | Runtime cost | Notes | | ||
| |---|---|---|---| | ||
| | **reflection** (`Ray\Di\Injector`) | Build the whole `Container` from the module (annotation reading, binding resolution, AOP weaving), then instantiate via reflection | Container build is **hundreds of ms** for a large app, paid **every process** | Dev only. Untenable for shared-nothing (php-fpm). | | ||
| | **serialize** (`serialize()` the injector, `unserialize()` per request) | `unserialize()` reconstructs the `Container` object graph, then instantiate via reflection | Dominated by **`unserialize()`** — paid **every process** (a blob is data; it cannot live in shared OPcache) | Scales linearly with the binding set. | | ||
| | **compiled** (`CompiledInjector`) | `require` the few pre-generated scripts the graph touches, run flat `new`/setter code | **Lazy** — only the needed scripts; opcodes served from OPcache | Scripts/classes can be **preloaded** into shared memory across php-fpm workers. | | ||
|
|
||
| Key consequence: the heavy work (annotation reading, AOP class generation, binding analysis) is | ||
| done **once at compile/serialize time** for both `serialize` and `compiled`. What remains at runtime | ||
| is instantiation. `compiled` wins because flat, OPcache-cached opcodes beat reflection's dynamic | ||
| dispatch — and because it loads only the subset of bindings a given request actually uses. | ||
|
|
||
| ## Why OPcache is the prerequisite | ||
|
|
||
| A `require` of a script that is **already in OPcache** (with `opcache.validate_timestamps=0`, or a | ||
| warm realpath cache) does **no filesystem access and no parsing** — it just executes cached opcodes. | ||
| That is what makes compiled code fast. | ||
|
|
||
| Two settings decide whether that happens: | ||
|
|
||
| - **`opcache.validate_timestamps`** — set to `0` in production so OPcache never `stat()`s the file to | ||
| check for changes. | ||
| - **`opcache.file_update_protection`** (default **2 seconds**) — OPcache refuses to cache a file that | ||
| is *younger than this*, to avoid caching a half-written file. A process that **compiles and then | ||
| immediately runs** therefore re-parses every `require`. In production the gap between deploy-time | ||
| compilation and the first request is far larger than 2s, so this never bites; in a benchmark it | ||
| always does. | ||
|
|
||
| For php-fpm (shared-nothing), also note **preloading**: compiled scripts/classes can be loaded into | ||
| OPcache *shared memory* once and reused by every worker. A `serialize`d blob cannot — it is data and | ||
| must be `unserialize()`d into per-process memory on every request. | ||
|
|
||
| ## The `file_exists()` optimization | ||
|
|
||
| Because a cached `require` touches no filesystem, an eager `file_exists()` guard before it would be the | ||
| **only** `stat()` syscall left on the hot path — roughly **30% of the per-build cost** for a small | ||
| graph. So `prototype()` and `singleton()` `require` the script directly and check existence only on | ||
| failure: | ||
|
|
||
| ```php | ||
| try { | ||
| return require $file; // happy path: no stat(), just cached opcodes | ||
| } catch (Throwable $e) { | ||
| if (! file_exists($file)) { // stat() only on failure | ||
| throw new ScriptFileNotFound($filePath, 0, $e); | ||
| } | ||
| throw $e; // file exists -> error came from inside the script | ||
| } | ||
| ``` | ||
|
|
||
| PHP 8 makes a failed `require` a catchable `Error`, and `try`/`catch` is zero-cost when nothing is | ||
| thrown, so the happy path pays nothing while a missing script is still reported as the domain | ||
| `ScriptFileNotFound` (rather than a leaked generic `Error`). `CompiledInjector::getInstance()` keeps its | ||
| `file_exists()` pre-check (it reports unbound interfaces as `Unbound`); its redundant | ||
| `realpath($this->scriptDir)` — already canonicalised in the constructor — was removed. | ||
|
|
||
| ## Benchmarking correctly | ||
|
|
||
| `benchmark/di_benchmark.php` compares the three strategies and **prints the OPcache hit rate so you can | ||
| tell a valid run from a bogus one**. Pitfalls it (and you) must control for: | ||
|
|
||
| 1. **OPcache must actually cache the compiled scripts.** Back-date generated scripts | ||
| (`touch($file, time() - 3600)`) or run with `-d opcache.file_update_protection=0`. A `sleep()` does | ||
| **not** work on the CLI — OPcache's age check uses the request start time, not wall-clock. Always | ||
| confirm the benchmark reports `(valid)` / a non-zero cached-script count; if it prints `INVALID`, | ||
| the numbers are re-parsing artifacts. | ||
| 2. **Disable Xdebug** (`-d xdebug.mode=off`) — it inflates everything. | ||
| 3. **Watch for a stale global `opcache.preload`** in your `php.ini` — it pollutes shared memory and can | ||
| emit startup errors. Override it with an empty preload file. | ||
| 4. **Class-autoload warmth** — the first object graph in a process autoloads all its classes (a | ||
| one-time cost). Measure cold (fresh process) and warm (repeated build) separately; don't compare a | ||
| cold number against a warm one. | ||
| 5. **Singletons aren't "build-many."** A singleton root is built once per process — a tight loop over | ||
| it measures cache hits, not construction. Use a prototype root to measure per-build cost. | ||
| 6. **Object size matters.** A tiny graph hides the lazy-loading advantage of `compiled` over | ||
| `serialize`; benchmark a realistic root. | ||
|
|
||
| ## Measured (FakeCar graph, PHP 8.4, OPcache valid) | ||
|
|
||
| `ctor + 5 setters + AOP + singleton mirrors`, `N=50,000`, steady-state (warm) per build: | ||
|
|
||
| | Strategy | Cold start | Steady (per build) | | ||
| |---|---|---| | ||
| | reflection | ~11–25 ms (Container build) | ~46 µs | | ||
| | serialize | ~0.1 ms (unserialize) | ~47 µs | | ||
| | **compiled** | ~0.03 ms (new injector) | **~22 µs** | | ||
|
|
||
| `compiled` is ~2× faster per build than reflection/serialize once OPcache is warm. The same numbers | ||
| without warm OPcache show `compiled` at ~178 µs — the re-parse trap. Always check the `(valid)` line. | ||
|
|
||
| Run it yourself: | ||
|
|
||
| ```bash | ||
| php -d xdebug.mode=off -d opcache.enable_cli=1 -d opcache.validate_timestamps=0 benchmark/di_benchmark.php | ||
| ``` | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.