Skip to content

Commit 3b0a3c4

Browse files
[3.14] gh-138907: Support RFC 9309 in robotparser (GH-138908) (GH-149374)
* empty lines are always ignored instead of separating groups * the "user-agent" line after a rule starts a new group * groups matching the same user agent are now merged * the rule with the longest match wins instead of the first matching rule * in case of equal matches, the “Allow” rule wins over “Disallow” * special characters “$” and “*” are now supported in rules * prefer full match for user agent (cherry picked from commit bc285e5) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
1 parent b05ee20 commit 3b0a3c4

4 files changed

Lines changed: 441 additions & 111 deletions

File tree

Doc/library/urllib.robotparser.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
This module provides a single class, :class:`RobotFileParser`, which answers
2121
questions about whether or not a particular user agent can fetch a URL on the
2222
website that published the :file:`robots.txt` file. For more details on the
23-
structure of :file:`robots.txt` files, see http://www.robotstxt.org/orig.html.
23+
structure of :file:`robots.txt` files, see :rfc:`9309`.
2424

2525

2626
.. class:: RobotFileParser(url='')

0 commit comments

Comments
 (0)