RFC: Pure-code source files via .phpc extension#22315
Open
hmennen90 wants to merge 1 commit into
Open
Conversation
Introduce a new opt-in file extension ".phpc" whose semantics are: the
file is parsed as pure PHP. The lexer enters ST_IN_SCRIPTING on the
first byte; no opening <?php is required. A leading UTF-8 BOM and a
CLI shebang line are silently skipped. Files whose name does not end
in ".phpc" take the historical code path unchanged.
Implementation lives in open_file_for_scanning: a byte-exact memcmp
against ".phpc" on the filename's tail decides between
BEGIN(ST_IN_SCRIPTING) and the classic BEGIN(SHEBANG)/BEGIN(INITIAL).
The change is strictly additive; no pre-existing test is modified.
15 new .phpt tests in Zend/tests/phpc/ cover basic pure-PHP, mixed
.php/.phpc require chains, UTF-8 BOM, __halt_compiler() and the
__COMPILER_HALT_OFFSET__ constant, ?> drop-out, empty files,
declare(strict_types=1), namespaces+classes, eval() invariance,
token_get_all() invariance, CLI shebang scripts, literal <?php in
.phpc as a syntax error, and strict extension matching (".phpcc"
must NOT trigger pure mode).
Full regression run against Zend/, ext/tokenizer/, ext/standard/,
ext/spl/, ext/reflection/, ext/phar/ (9836 tests): 0 failures, 0
modifications to any pre-existing test.
RFC: https://wiki.php.net/rfc/optional_php_tags
Pre-RFC discussion: https://news-web.php.net/php.internals/131024
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Reference implementation for the Pure-code source files via
.phpcextension RFC.A file whose name ends in
.phpcis parsed as pure PHP: the lexer entersST_IN_SCRIPTINGon the first byte, with no opening<?phprequired. A leading UTF-8 BOM and an optional CLI shebang line are skipped. The.phpextension and every other existing code-loading path are untouched.Why a PR before the RFC vote?
Two reasons:
__halt_compiler(),<?=inside.phpc,.phpccnot matching, etc.) — having a working patch lets reviewers verify those claims rather than take them on faith.What's in the patch
Zend/zend_language_scanner.l(+53 / −2)In
open_file_for_scanning:file_handle->opened_path(orfilenameas fallback) ends in the byte sequence.phpcviamemcmp.0xEF 0xBB 0xBF) if present.CG(skip_shebang)is set and the next two bytes are#!, advance past the entire shebang line (incl. trailing\n); remember to start at line 2.SCNG(yy_cursor)past whatever we just skipped.BEGIN(ST_IN_SCRIPTING).BEGIN(SHEBANG)/BEGIN(INITIAL)logic runs untouched.The starting line number is propagated through to the existing
CG(zend_lineno) = …reset at the function's tail (uses a localphpc_start_linenoto survive that reset).The generated
Zend/zend_language_scanner.cis.gitignored, so it isn't part of this diff —makeregenerates it viare2cat build time. Tested with re2c 4.5.1.Zend/tests/phpc/(+15 tests, all new)001_basic.phpt.phpcproduces same output as classic<?php002_php_unchanged.phpt.phpstays template-shaped (BC sanity)003_phpc_requires_php.phpt.phpc→.phprequire chain works004_php_requires_phpc.phpt.php→.phpcrequire chain works005_utf8_bom.phpt.phpcis silently skipped006_halt_compiler.phpt__halt_compiler()works in.phpc;__COMPILER_HALT_OFFSET__populated007_closing_tag.phpt?>in.phpcdrops to inline output, mirroring.phpsemantics008_empty.phpt.phpcfile: no output, no error009_declare_strict_types.phptdeclare(strict_types=1)as first statement in.phpc010_class_definition.phpt.phpc011_eval_unchanged.phpteval()(string-compile path) is independent of file extension012_token_get_all_unchanged.phpttoken_get_all()string path unchanged013_shebang_main_script.phpt#!-script in.phpcworks;__LINE__reports the line after the shebang014_phpc_with_open_tag.phpt<?phpinside.phpcis a parse error (not magic re-open)015_php_with_phpc_substring.phptfoo.phpccandfoo_phpc.phpare NOT.phpcEach test creates a temporary
.phpc(or.php) sibling file,requires it, and cleans it up viaregister_shutdown_function. No magic, easy to read.Backward compatibility
Zero modifications to any pre-existing test in php-src.
Full regression run with this patch applied:
Zend/ext/tokenizer/ext/standard/ext/spl/ext/reflection/ext/phar/(4 pre-existing
XFAILs are unchanged.)This is the strongest BC guarantee the patch could carry: a
.phpc-less codebase is byte-identical to the codebase before this PR.Things the patch does NOT do
-p/--pureCLI flag. That's a sister feature (also discussed in the pre-RFC thread) but kept out of this RFC's scope. Will be a follow-up..phpcdispatch shares thecompile_filepath Phar entries already use, so it works — but no dedicated Phar fixture is shipped here. Happy to add one in review if requested.Test plan
makebuilds againstre2c 4.5.1,bison 3.8.2(macOS Sonoma 25.5)Zend/tests/phpc/— 15/15 passZend/suite — 0 failuresext/tokenizer/ ext/standard/ ext/spl/ ext/reflection/ ext/phar/combined — 0 failuresHow to review
Smallest meaningful diff is
Zend/zend_language_scanner.llines 567–620-ish. Everything else is regenerated artefact or new tests.Quick smoke:
vs.