44
55Utilities for generating PHP code.
66
7-
87## Normalizers
98
10- The normalizers generate readable PHP labels (class names, namespaces, property names, etc) from valid UTF-8 strings,
9+ The normalizers generate readable PHP labels (class names, namespaces, property names, etc) from valid UTF-8 strings,
1110[ transliterating] them to ASCII and spelling out any invalid characters.
1211
13- ### Usage:
12+ ### Usage
1413
1514The following code (forgive the Japanese - a certain translation tool tells me it means "Pet Store"):
15+
1616``` php
1717<?php
1818
@@ -24,11 +24,13 @@ echo $namespace;
2424```
2525
2626outputs:
27- ```
27+
28+ ``` text
2829Petto\Shoppu
2930```
3031
3132and:
33+
3234``` php
3335<?php
3436
@@ -40,47 +42,48 @@ echo $property;
4042```
4143
4244outputs:
43- ```
45+
46+ ``` text
4447twoDollarBill
4548```
4649
4750See the [ tests] for more examples.
4851
4952### Why?
5053
51- You must ** never** run code generated from untrusted user input. But there are a few cases where you do want to
54+ You must ** never** run code generated from untrusted user input. But there are a few cases where you do want to
5255_ output_ code generated from (mostly) trusted input.
5356
5457In my case, I need to generate classes and properties from an OpenAPI specification. There are no hard-and-fast rules
55- on the characters present, just a vague "it is RECOMMENDED to follow common programming naming conventions". Whatever
56- they are.
58+ on the characters present, just a vague "it is RECOMMENDED to follow common programming naming conventions". Whatever
59+ they are.
5760
5861### How?
5962
60- Each normalizer uses ` ext-intl ` 's [ Transliterator] to turn the UTF-8 string into Latin-ASCII. Where a character has no
61- equivalent in ASCII (the "€" symbol is a good example), it uses the [ Unicode name] of the character to spell it out (to
62- ` Euro ` , after some minor clean-up). For ASCII characters that are not valid in a PHP label, it provides its own spell
63+ Each normalizer uses ` ext-intl ` 's [ Transliterator] to turn the UTF-8 string into Latin-ASCII. Where a character has no
64+ equivalent in ASCII (the "€" symbol is a good example), it uses the [ Unicode name] of the character to spell it out (to
65+ ` Euro ` , after some minor clean-up). For ASCII characters that are not valid in a PHP label, it provides its own spell
6366outs. For instance, a backtick "` ; " becomes ` Backtick ` .
6467
65- Initial digits are also spelt out: "123foo" becomes ` OneTwoThreeFoo ` . Finally reserved words are suffixed with a
66- user-supplied string so they don't mess things up. In the first usage example above, if we normalized "class" it would
68+ Initial digits are also spelt out: "123foo" becomes ` OneTwoThreeFoo ` . Finally reserved words are suffixed with a
69+ user-supplied string so they don't mess things up. In the first usage example above, if we normalized "class" it would
6770become ` ClassController ` .
6871
69- The results may not be pretty. If for some mad reason your input contains ` ͖` - put your glasses on! - the label will
70- contain ` CombiningRightArrowheadAndUpArrowheadBelow ` . But it _ is_ valid PHP, and stands a chance of being as unique as
72+ The results may not be pretty. If for some mad reason your input contains ` ͖ ` - put your glasses on! - the label will
73+ contain ` CombiningRightArrowheadAndUpArrowheadBelow ` . But it _ is_ valid PHP, and stands a chance of being as unique as
7174the original. Which brings me to...
7275
73-
7476## Unique labelers
7577
76- The normalization process reduces around a million Unicode code points down to just 162 ASCII characters. Then it
77- mangles the label further by stripping separators, reducing whitespace and turning it into camelCase, snake_case or
78+ The normalization process reduces around a million Unicode code points down to just 162 ASCII characters. Then it
79+ mangles the label further by stripping separators, reducing whitespace and turning it into camelCase, snake_case or
7880whatever your programming preference. It's gonna be lossy - nothing we can do about that.
7981
8082The unique labelers' job is to add back lost uniqueness, using a ` UniqueStrategyInterface ` to decorate any non-unique
8183class names in the list it is given.
8284
8385To guarantee uniqueness within a set of class name labels, use the ` UniqueClassLabeller ` :
86+
8487``` php
8588<?php
8689
@@ -96,7 +99,8 @@ var_dump($unique);
9699```
97100
98101outputs:
99- ```
102+
103+ ``` text
100104array(3) {
101105 'Déjà vu' =>
102106 string(7) "DejaVu1"
@@ -107,10 +111,11 @@ array(3) {
107111}
108112```
109113
110- There are labelers for each of the normalizers: ` UniqueClassLabeler ` , ` UniqueConstantLabeler ` , ` UniquePropertyLabeler `
111- and ` UniqueVariableLabeler ` . Along with the ` NumberSuffix ` implementation of ` UniqueStrategyInterface ` , we provide a
114+ There are labelers for each of the normalizers: ` UniqueClassLabeler ` , ` UniqueConstantLabeler ` , ` UniquePropertyLabeler `
115+ and ` UniqueVariableLabeler ` . Along with the ` NumberSuffix ` implementation of ` UniqueStrategyInterface ` , we provide a
112116` SpellOutOrdinalPrefix ` strategy. Using that instead of ` NumberSuffix ` above would output:
113- ```
117+
118+ ``` text
114119array(3) {
115120 'Déjà vu' =>
116121 string(11) "FirstDejaVu"
@@ -123,8 +128,7 @@ array(3) {
123128
124129Kinda cute, but a bit verbose for my taste.
125130
126-
127131[ transliterating ] : https://unicode-org.github.io/icu/userguide/transforms/general/#script-transliteration
128132[ tests ] : ./test/AbstractNormalizerTest.php
129133[ Transliterator ] : https://www.php.net/manual/en/class.transliterator.php
130- [ Unicode name ] : https://unicode.org/charts/charindex.html
134+ [ Unicode name ] : https://unicode.org/charts/charindex.html
0 commit comments