Skip to content

Commit 9350739

Browse files
authored
docs: document Gherkin parser compatibility mode (#219)
1 parent 478d99d commit 9350739

2 files changed

Lines changed: 176 additions & 0 deletions

File tree

user_guide/gherkin.rst

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,12 @@ real, human language telling you what code you should write.
1616
If you're still new to Behat, jump into the :doc:`/quick_start` first,
1717
then return here to learn more about Gherkin.
1818

19+
.. note::
20+
21+
You can configure whether Behat's Gherkin parsing is compatible with
22+
previous Behat versions, or with the official ``cucumber/gherkin``
23+
parsers. See :doc:`gherkin/parser_mode` for more details.
24+
1925
Gherkin Syntax
2026
--------------
2127

@@ -103,3 +109,9 @@ run:
103109
Behat the ability to have multilanguage features in one suite.
104110

105111
.. _`Business Readable, Domain Specific Language`: http://martinfowler.com/bliki/BusinessReadableDSL.html
112+
113+
.. toctree::
114+
:maxdepth: 2
115+
:hidden:
116+
117+
gherkin/parser_mode

user_guide/gherkin/parser_mode.rst

Lines changed: 164 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,164 @@
1+
Gherkin Compatibility Mode
2+
==========================
3+
4+
Behat uses the `behat/gherkin`_ library to parse your feature files into the data structures that
5+
Behat will use to execute them.
6+
7+
In most cases, this parses identically to `the official parsers provided by the Cucumber project`_.
8+
However, there are some small differences in how our parser has traditionally treated some specific
9+
syntax compared to the official parsers.
10+
11+
To resolve this, we have added a ``GherkinCompatibilityMode`` setting to the parser. This setting
12+
has two possible options:
13+
14+
* ``GherkinCompatibilityMode::LEGACY`` - match our previous behaviour. This is the default in Behat 3.x.
15+
* ``GherkinCompatibilityMode::GHERKIN_32`` - match the official parsers. This will become the default in Behat 4.0.
16+
17+
.. caution::
18+
``GherkinCompatibilityMode::GHERKIN_32`` is currently considered experimental. We expect that
19+
there will be more changes to how the parser behaves in this mode before we mark it as stable.
20+
21+
Configuring the parser mode
22+
---------------------------
23+
24+
In Behat >= 3.30, you can specify the parser compatibility mode for your project in
25+
your :doc:`/user_guide/configuration`:
26+
27+
.. code-block:: php
28+
29+
<?php
30+
use Behat\Config\GherkinOptions;
31+
use Behat\Config\Profile;
32+
use Behat\Gherkin\GherkinCompatibilityMode;
33+
34+
return new Config()
35+
->withProfile(new Profile('default')
36+
->withGherkinOptions(new GherkinOptions()
37+
->withCompatibilityMode(GherkinCompatibilityMode::GHERKIN_32)
38+
)
39+
)
40+
;
41+
42+
Differences between parser modes
43+
--------------------------------
44+
45+
Tables containing whitespace or escaped newlines
46+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
47+
48+
In ``GHERKIN_32`` mode, table cells can include newlines, which will be unescaped during parsing. Note that
49+
newlines are unescaped **after** we remove the cell padding.
50+
51+
For example, with the following table:
52+
53+
.. code-block:: gherkin
54+
55+
Given 3 lines of poetry on 5 lines:
56+
| \nraindrops--\nher last kiss\ngoodbye.\n |
57+
58+
In ``GHERKIN_32`` mode, the table will parse as:
59+
60+
.. code-block:: php
61+
62+
[
63+
[
64+
<<<TEXT
65+
66+
raindrops--
67+
her last kiss
68+
goodbye.
69+
70+
TEXT
71+
]
72+
]
73+
74+
In legacy mode, this would be parsed as ``'\nraindrops--\nher last kiss\ngoodbye.'``.
75+
76+
The other difference is in how the parser trims padding of table cells:
77+
78+
* In ``GHERKIN_32`` mode, all leading and trailing whitespace, including tabs and unicode whitespace, is removed.
79+
* In ``LEGACY`` mode, only literal space characters are removed.
80+
81+
82+
Docstrings
83+
~~~~~~~~~~
84+
85+
Docstrings (which Behat has historically referred to as PyStrings) in feature files can contain escaped delimiters -
86+
for example:
87+
88+
.. code-block:: gherkin
89+
90+
And a DocString with escaped separator inside
91+
"""
92+
first line
93+
\"\"\"
94+
third line
95+
"""
96+
97+
In ``GHERKIN_32`` mode, the parser will unescape the delimiters - e.g. this will be parsed as:
98+
99+
.. code-block:: text
100+
101+
first line
102+
"""
103+
third line
104+
105+
In legacy mode, the parsed string is not unescaped - e.g. it includes the literal ``\"\"\"`` text.
106+
107+
Parsing of tags
108+
~~~~~~~~~~~~~~~
109+
110+
In ``GHERKIN_32`` mode:
111+
112+
* Parsing fails if any tags contain whitespace (e.g. ``@some tag``). In legacy mode, these have triggered
113+
an ``E_USER_DEPRECATED`` since behat/gherkin v4.9.0
114+
* The values returned by ``$node->getTags()`` will **include** the ``@`` prefix. In legacy mode,
115+
this was removed. This may affect custom hooks / event listeners that inspect the tag values at
116+
runtime.
117+
118+
119+
File language
120+
~~~~~~~~~~~~~
121+
122+
In ``GHERKIN_32`` mode, if a file includes a ``#language`` annotation:
123+
124+
* Any whitespace in / around the tag will be ignored - so ``# language : fr`` will be
125+
recognised as a valid language tag. In legacy mode, this would have been treated as a comment.
126+
* Parsing fails if the language is not recognised - so ``#language: no-such`` will cause an error.
127+
In legacy mode, this would have been ignored and parsing would continue in the default language.
128+
129+
Whitespace following step keywords
130+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
131+
132+
In ``GHERKIN_32`` mode, a space between a step keyword and the rest of the text is treated as part of the keyword. This
133+
is because in a small number of languages there is no space after the keyword.
134+
135+
With a step in English like ``Then something should happen``, if you call ``StepNode::getKeyword()`` then:
136+
137+
* In ``GHERKIN_32`` mode the return value will be ``'Then '``
138+
* In ``LEGACY`` mode the return value will be ``'Then'``
139+
140+
In a language that does not place spaces after the keyword (e.g. Japanese), the return value will be the same in both
141+
modes.
142+
143+
Elements with descriptions
144+
~~~~~~~~~~~~~~~~~~~~~~~~~~
145+
146+
Gherkin syntax allows multi-line descriptions on ``Feature:``, ``Background:``, ``Scenario:``, ``Scenario Outline:``,
147+
and ``Examples:`` elements.
148+
149+
Historically, we only parsed the description separately for a ``Feature`` node. For other nodes, we parsed the full
150+
text as a multi-line title.
151+
152+
In ``GHERKIN_32`` mode, if one of the elements listed above has multi-line text, then:
153+
154+
* The first line (containing the keyword) will be parsed as the title.
155+
* Following lines will be parsed as the description.
156+
* Any blank lines between the title & description will be ignored (in legacy mode, these were included at the start of
157+
the description).
158+
* Any left padding will be removed from the first line of the description, but subsequent lines will have the same
159+
left padding / indentation as the feature file. In legacy mode, we attempted to left-trim all lines to match the
160+
indentation of the keyword.
161+
162+
163+
.. _`behat/gherkin`: http://martinfowler.com/bliki/BusinessReadableDSL.html
164+
.. _`the official parsers provided by the Cucumber project`: https://github.com/cucumber/gherkin

0 commit comments

Comments
 (0)