Skip to content

Innmind/Robots.txt

Repository files navigation

Robots.txt

CI codecov Type Coverage

Robots.txt parser

Installation

composer require innmind/robots-txt

Usage

use Innmind\RobotsTxt\Parser;
use Innmind\OperatingSystem\Factory;
use Innmind\Url\Url;

$os = Factory::build();
$parse = Parser::of(
    $os->remote()->http(),
    'My user agent',
);
$robots = $parse(Url::of('https://github.com/robots.txt'))->match(
    static fn($robots) => $robots,
    static fn() => throw new \RuntimeException('robots.txt not found'),
);
$robots->disallows('My user agent', Url::of('/humans.txt')); //false
$robots->disallows('My user agent', Url::of('/any/other/url')); //true

Note

Here only the path /humans.txt is allowed because by default github disallows any user agent to crawl there website except for this file.

About

Robots.txt parser

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages