Skip to content

Sigilweaver/QVD-Sources

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

QVD-Sources

A comprehensive corpus of public Qlik QVD binary files and related source code (load scripts, workbooks, CSV ground-truth) collected from GitHub repositories.

The goal is to support open-source reverse-engineering of the QVD binary format. This project only collects files — it does not parse, execute, or redistribute proprietary Qlik tooling.

Current Index

Extension Count Description
.qvd 1,145 QlikView Data — XML header + columnar bit-stuffed binary payload
.qvs 2,459 Load scripts (STORE … INTO / LOAD … FROM)
.qvw 1,251 QlikView workbooks
.qvf 822 Qlik Sense applications
.csv/.txt 853 Paired data tables (ground-truth for parser validation)
.xml/.json 242 Paired metadata / schema files

4,711 GitHub repositories scanned. 716 contain target files.

Source listings:

  • Sources.md — Human-readable listing of all repositories with per-extension counts
  • index.json — Machine-readable index with per-file paths, sizes, and SHA-256 hashes

Quick start — downloading files

If you just want the QVD files on your disk:

# Clone the repo (only the index + scripts, not the 8 GB of data)
git clone https://github.com/Sigilweaver/QVD-Sources.git
cd QVD-Sources

# Download everything listed in index.json
uv run scripts/download.py

# Or download only .qvd files
uv run scripts/download.py --extension qvd

# Or only .qvs load scripts
uv run scripts/download.py --extension qvs

# Preview without downloading
uv run scripts/download.py --dry-run

# Single repo
uv run scripts/download.py --repo withdave/qlik

# Verify SHA-256 hashes of files already on disk
uv run scripts/download.py --verify

Files are saved to downloads/{owner}/{repo}/{path}, preserving the original directory structure.

Running the scanner (maintainer workflow)

Prerequisites

  • uv (installs Python automatically)
  • gh CLI installed and authenticated (gh auth login)

Scan

# Full run: discover new repos + check for updates + download
uv run scripts/scan.py

# Discovery only (no downloads — just find new candidate repos)
uv run scripts/scan.py --discover-only

# Re-check known repos only (skip discovery search)
uv run scripts/scan.py --check-only

Regenerate published files

uv run scripts/gen_index.py     # regenerate index.json from known_repos.json
uv run scripts/gen_sources.py   # regenerate Sources.md
uv run scripts/report.py        # print a summary to the terminal

Methodology

  1. Discover — Search GitHub for repos containing QVD-related files using 36 repository-search queries and 7 code-search queries via the gh CLI. Qlik-owned organisations are filtered out.

  2. Check — For each candidate repo, resolve the current HEAD commit SHA. If the SHA matches the last check in data/known_repos.json, skip entirely — no further API calls. This makes repeated runs cheap.

  3. Download & classify — For repos with new commits, enumerate the git tree (?recursive=1). Record every file matching a target extension. Download via raw.githubusercontent.com, preserving owner/repo/path structure. Record per-file metadata (path, size, SHA-256).

Target file priorities

Priority Extensions Rule
1 .qvd Always collected
2 .qvs, .qvw, .qvf Always collected — "Rosetta Stone" cross-references
2 .csv, .txt Collected only when the repo also contains a .qvd
3 .xml, .json Collected only when the repo also contains a .qvd

Operational constraints

  • All GitHub API traffic goes through the gh CLI.
  • Downloads use raw.githubusercontent.com only. No cloning.
  • No requests to qlik.com, qlikview.com, or qliksense.com.
  • Downloaded files are never executed. All bytes are treated as untrusted.
  • Only data/ and downloads/ are written at runtime.
  • Rate limits: ≥2 s between repo-search pages, ≥3 s between code-search pages, 65 s back-off on 403 / rate-limit responses (one retry).
  • Only owner/repo slugs, paths, sizes, hashes, and timestamps are logged.

Repo structure

QVD-Sources/
├── README.md              # This file
├── Sources.md             # Human-readable source listing
├── index.json             # Machine-readable index (per-file paths + hashes)
├── LICENSE                # MIT
├── pyproject.toml
├── scripts/
│   ├── scan.py            # 3-phase scan orchestrator
│   ├── download.py        # User-facing bulk downloader
│   ├── gen_index.py       # Generate index.json from known_repos.json
│   ├── gen_sources.py     # Generate Sources.md
│   ├── report.py          # Terminal summary
│   ├── github.py          # GitHub API + download helpers
│   └── state.py           # Persistent state management
├── data/
│   └── known_repos.json   # Committed — the repo/commit/file ledger
└── downloads/             # Git-ignored — downloaded files

License

MIT

The referenced repositories are owned by their respective authors and subject to their own licenses.

About

Curated corpus of public Qlik QVD binary files for parser conformance testing.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages