A comprehensive corpus of public Qlik QVD binary files and related source code (load scripts, workbooks, CSV ground-truth) collected from GitHub repositories.
The goal is to support open-source reverse-engineering of the QVD binary format. This project only collects files — it does not parse, execute, or redistribute proprietary Qlik tooling.
| Extension | Count | Description |
|---|---|---|
.qvd |
1,145 | QlikView Data — XML header + columnar bit-stuffed binary payload |
.qvs |
2,459 | Load scripts (STORE … INTO / LOAD … FROM) |
.qvw |
1,251 | QlikView workbooks |
.qvf |
822 | Qlik Sense applications |
.csv/.txt |
853 | Paired data tables (ground-truth for parser validation) |
.xml/.json |
242 | Paired metadata / schema files |
4,711 GitHub repositories scanned. 716 contain target files.
Source listings:
- Sources.md — Human-readable listing of all repositories with per-extension counts
- index.json — Machine-readable index with per-file paths, sizes, and SHA-256 hashes
If you just want the QVD files on your disk:
# Clone the repo (only the index + scripts, not the 8 GB of data)
git clone https://github.com/Sigilweaver/QVD-Sources.git
cd QVD-Sources
# Download everything listed in index.json
uv run scripts/download.py
# Or download only .qvd files
uv run scripts/download.py --extension qvd
# Or only .qvs load scripts
uv run scripts/download.py --extension qvs
# Preview without downloading
uv run scripts/download.py --dry-run
# Single repo
uv run scripts/download.py --repo withdave/qlik
# Verify SHA-256 hashes of files already on disk
uv run scripts/download.py --verifyFiles are saved to downloads/{owner}/{repo}/{path}, preserving the original
directory structure.
# Full run: discover new repos + check for updates + download
uv run scripts/scan.py
# Discovery only (no downloads — just find new candidate repos)
uv run scripts/scan.py --discover-only
# Re-check known repos only (skip discovery search)
uv run scripts/scan.py --check-onlyuv run scripts/gen_index.py # regenerate index.json from known_repos.json
uv run scripts/gen_sources.py # regenerate Sources.md
uv run scripts/report.py # print a summary to the terminal-
Discover — Search GitHub for repos containing QVD-related files using 36 repository-search queries and 7 code-search queries via the
ghCLI. Qlik-owned organisations are filtered out. -
Check — For each candidate repo, resolve the current HEAD commit SHA. If the SHA matches the last check in
data/known_repos.json, skip entirely — no further API calls. This makes repeated runs cheap. -
Download & classify — For repos with new commits, enumerate the git tree (
?recursive=1). Record every file matching a target extension. Download viaraw.githubusercontent.com, preservingowner/repo/pathstructure. Record per-file metadata (path, size, SHA-256).
| Priority | Extensions | Rule |
|---|---|---|
| 1 | .qvd |
Always collected |
| 2 | .qvs, .qvw, .qvf |
Always collected — "Rosetta Stone" cross-references |
| 2 | .csv, .txt |
Collected only when the repo also contains a .qvd |
| 3 | .xml, .json |
Collected only when the repo also contains a .qvd |
- All GitHub API traffic goes through the
ghCLI. - Downloads use
raw.githubusercontent.comonly. No cloning. - No requests to
qlik.com,qlikview.com, orqliksense.com. - Downloaded files are never executed. All bytes are treated as untrusted.
- Only
data/anddownloads/are written at runtime. - Rate limits: ≥2 s between repo-search pages, ≥3 s between code-search pages, 65 s back-off on 403 / rate-limit responses (one retry).
- Only
owner/reposlugs, paths, sizes, hashes, and timestamps are logged.
QVD-Sources/
├── README.md # This file
├── Sources.md # Human-readable source listing
├── index.json # Machine-readable index (per-file paths + hashes)
├── LICENSE # MIT
├── pyproject.toml
├── scripts/
│ ├── scan.py # 3-phase scan orchestrator
│ ├── download.py # User-facing bulk downloader
│ ├── gen_index.py # Generate index.json from known_repos.json
│ ├── gen_sources.py # Generate Sources.md
│ ├── report.py # Terminal summary
│ ├── github.py # GitHub API + download helpers
│ └── state.py # Persistent state management
├── data/
│ └── known_repos.json # Committed — the repo/commit/file ledger
└── downloads/ # Git-ignored — downloaded files
The referenced repositories are owned by their respective authors and subject to their own licenses.