Sitemap Tracker

English · Deutsch

Crawls websites and generates standard-compliant sitemap.xml files. Uses Playwright for JavaScript rendering or httpx for fast HTTP crawling.

Screenshots

Main View

Sitemap Tree

Crawl History

Installation

One-Liner (Standalone, no Python required)

Linux / macOS:

curl -fsSL https://raw.githubusercontent.com/michaelblaess/sitemap-tracker/main/install.sh | bash

Windows (PowerShell):

irm https://raw.githubusercontent.com/michaelblaess/sitemap-tracker/main/install.ps1 | iex

Usage

# Simple crawl (httpx mode, fast)
sitemap-tracker https://example.com

# With JavaScript rendering (Playwright)
sitemap-tracker https://example.com --render

# Save sitemap directly
sitemap-tracker https://example.com --output sitemap.xml

# Limit crawl depth
sitemap-tracker https://example.com --max-depth 5

# More concurrency
sitemap-tracker https://example.com --concurrency 16

# Ignore robots.txt
sitemap-tracker https://example.com --ignore-robots

# With cookies (e.g. for login)
sitemap-tracker https://example.com --cookie session=abc123

CLI Parameters

Parameter	Description	Default
`URL`	Start URL of the website	-
`--output`, `-o`	Output path for sitemap.xml	`sitemap_<host>_<timestamp>.xml`
`--max-depth`, `-d`	Maximum crawl depth	10
`--concurrency`, `-c`	Parallel requests	8
`--timeout`, `-t`	Timeout per page (seconds)	30
`--render`	Render JavaScript with Playwright	off
`--no-headless`	Browser visible (debugging)	off
`--ignore-robots`	Ignore robots.txt	off
`--user-agent`	Custom User-Agent	Chrome 131
`--cookie`	Set cookie (NAME=VALUE, multiple)	-

Keyboard Shortcuts (TUI)

Key	Function
`c`	Start crawl (URL dialog)
`x`	Cancel crawl / JSON error report
`m`	Save sitemap
`s`	Settings
`g`	Export form report (JSON)
`j`	JIRA table to clipboard
`e`	Show errors only
`f`	Sitemap diff
`d`	Copy URL details
`l`	Toggle log
`h`	History
`i`	Info dialog
`q`	Quit

Copying / exporting the log runs via right-click on the log panel. Hovering a shortcut in the footer reveals a full tooltip explaining what it does. URLs in the log, header and detail panel are clickable without holding Ctrl.

Features

Dual mode: httpx (fast, HTML only) or Playwright (JavaScript rendering)
robots.txt: Respected by default, --ignore-robots to disable
Auto-split: With >50,000 URLs, an automatic sitemap index with partial sitemaps
Priority: Automatically based on crawl depth (home page = 1.0)
lastmod: From HTTP Last-Modified header
URL normalization: Duplicates avoided through normalization
Form detection: <form> tags are detected, marked in the table and exportable as JSON
Dead-link source viewer: For every 4xx/5xx page, jump to the referring page's HTML and see the exact line that contains the broken link — Pygments-highlighted, the match line painted in a warm gold band. From there: open the source in your browser, copy a paste-ready snippet (broken URL + ±3 lines of context + line number) to the clipboard, or save the full HTML as evidence
Results context menu: Right-click on any results row for the five bulk actions (toggle errors-only filter, save sitemap as XML, save error report as JSON, copy JIRA table, generate forms report). 4xx/5xx rows additionally get a one-click jump into the broken-link source viewer
Live TUI: Progress, statistics and URL details in real time — results table and page tree split across two tabs
Sortable results: Click any column header (Status, HTTP, Depth, Links, Form, Time, Size, Date, URL) to sort — second click reverses. Active column gets a ▲/▼ marker
Date & size columns: Last-Modified date and page size are shown directly in the results table, side by side with the URL
Clickable links: URLs in the log, crawl header and detail panel open in your default browser on a single click (no Ctrl required); local result files (sitemap.xml, JSON reports) open in the OS default app
Page tree: Hierarchical view of all crawled URLs with HTTP status, dead-link and not-in-sitemap markers — embedded as a tab, siblings sorted alphabetically; the table's filter applies to the tree as well (matching nodes plus their ancestors stay visible)
URL dialog: c opens a dialog (pre-filled with the last URL) to enter or change the target URL — no restart needed
Crawl header: All crawl statistics — mode, robots.txt, concurrency, status codes, progress — grouped in one collapsible header
Page details: Selecting a URL shows grouped panels — page info, issues, tech stack, SEO/meta data and HTTP headers
Footer tooltips: Every shortcut shows a hover-tooltip explaining what it does — even the cryptic ones like JIRA table, sitemap diff or form report
Issue detection: Flags common problems per page — HTTP errors, missing/overlong title & description, missing H1/viewport/canonical, noindex, slow load, large page
Tech-stack detection: Detects the CMS, JS/CSS frameworks and server software of each page
Page preview: Optional in-terminal screenshot of the selected page (TGP/Sixel with half-block fallback) — toggle in settings
Resizable panels: Splitters to freely resize the URL table, log and stats panels
Log panel: Right-click context menu — copy, export to file, or hide
Settings dialog: Language, robots.txt, Playwright, page preview, concurrency, timeout and crawl depth — persisted across runs
Filter with history: Filter the URL table by URL/status; recent filter terms in a dropdown
Crawl history: Past crawls with date, URL, parameters and final stats (crawled / 2xx / errors); date in the UI's locale (DE: dd.MM.yyyy, EN: ISO)

Browser Strategy

System Chrome preferred (faster startup, less memory)
Bundled Chromium as fallback (included in standalone installation)

Privacy

Important: Crawling a website may be perceived as unusual traffic by the operator. Please note:

Inform the website operator before crawling, especially for large websites
Respect robots.txt (enabled by default)
Use reasonable concurrency and timeout values
This tool is intended for your own websites and authorized analyses

Development

Setup

git clone https://github.com/michaelblaess/sitemap-tracker.git
cd sitemap-tracker

# Windows
.\bootstrap.ps1

# Linux/macOS
./bootstrap.sh

Local Start

# Windows
.\run.ps1 https://example.com

# Linux/macOS
./run.sh https://example.com

Creating a Release

git tag vX.Y.Z
git push origin vX.Y.Z

GitHub Actions automatically builds executables for Windows, Linux and macOS.

License

Apache License 2.0 - see LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 126 Commits
.github/workflows		.github/workflows
assets		assets
demo		demo
docs		docs
src/sitemap_tracker		src/sitemap_tracker
tests		tests
.git-blame-ignore-revs		.git-blame-ignore-revs
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
DETAIL-PANEL-PLAN.md		DETAIL-PANEL-PLAN.md
LICENSE		LICENSE
PHASE-PLAN.md		PHASE-PLAN.md
README.de.md		README.de.md
README.md		README.md
bootstrap.ps1		bootstrap.ps1
bootstrap.sh		bootstrap.sh
compile-linux.sh		compile-linux.sh
compile-macos.sh		compile-macos.sh
compile-win64.ps1		compile-win64.ps1
install.ps1		install.ps1
install.sh		install.sh
pyproject.toml		pyproject.toml
run.ps1		run.ps1
run.sh		run.sh
tape.ps1		tape.ps1
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sitemap Tracker

Screenshots

Main View

Sitemap Tree

Crawl History

Installation

One-Liner (Standalone, no Python required)

Usage

CLI Parameters

Keyboard Shortcuts (TUI)

Features

Browser Strategy

Privacy

Development

Setup

Local Start

Creating a Release

License

About

Uh oh!

Releases 26

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sitemap Tracker

Screenshots

Main View

Sitemap Tree

Crawl History

Installation

One-Liner (Standalone, no Python required)

Usage

CLI Parameters

Keyboard Shortcuts (TUI)

Features

Browser Strategy

Privacy

Development

Setup

Local Start

Creating a Release

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 26

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages