Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/advanced/debugging-results.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,9 @@ To inspect a site with high N_FAIL, query with `--format json` to see all fields
afquery query --db ./db/ --locus chr1:12345678 --format json
```

!!! tip "Identify failing samples"
Use `afquery variant-info --db ./db/ --locus chr1:12345678` to see exactly which samples have FAIL status and their metadata (technology, phenotype codes). This helps determine if failures cluster in a specific technology or sample subset. See [Variant Info](../guides/variant-info.md).

If N_FAIL is consistently high across many sites, check the variant calling pipeline and FILTER field settings in your VCFs.

---
Expand Down
23 changes: 10 additions & 13 deletions docs/advanced/filter-pass-tracking.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,16 @@ for r in results:

`N_FAIL` is always an `int` (default `0`).

### Identifying specific FAIL samples

To see which individual samples have FAIL status at a position, use `variant-info`:

```bash
afquery variant-info --db ./db/ --locus chr1:925952
```

Each carrier row shows its `filter` column as `PASS` or `FAIL`, along with sample metadata (technology, phenotype codes). This helps pinpoint whether failures cluster in a specific technology or sample group. See [Variant Info](../guides/variant-info.md) for full options.

---

## VCF Annotation
Expand All @@ -85,19 +95,6 @@ afquery annotate --db ./db/ --input variants.vcf --output annotated.vcf

---

## Schema Version Compatibility

The `fail_bitmap` and `N_FAIL` tracking requires schema version 2.0 or later. Databases created with older versions of AFQuery do not contain `fail_bitmap` data.

| Database schema | N_FAIL behavior |
|----------------|-----------------|
| ≥ 2.0 | `N_FAIL` is always an integer (0 or more) |
| < 2.0 (legacy) | `N_FAIL` is `None` in Python API results |

To upgrade a legacy database, rebuild it with `afquery create-db`. There is no in-place migration.

---

## PASS-Only Enforcement

AF reflects the quality-filtered allele frequency — the frequency of the alt allele among high-quality calls. This is appropriate for most clinical and research use cases. PASS-only ingestion is always enforced.
Expand Down
2 changes: 1 addition & 1 deletion docs/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,7 @@ Without stratification, rare variant interpretation in mixed cohorts can be misl
AFQuery is purpose-built for fast subcohort AF computation and is not a general-purpose genomic database:

- **Not a joint genotyper**: AFQuery does not perform joint genotyping. Input VCFs should be individually called before ingestion.
- **Not a variant database**: AFQuery stores only genotype-level summaries (bitmaps). Individual sample genotypes cannot be retrieved from the database.
- **Not a genotype store**: AFQuery stores genotype summaries as bitmaps, not raw FORMAT fields. Use `variant-info` to list carriers and their genotype class (het/hom) at a specific position; for full per-sample VCF fields (GQ, DP, AD, etc.), consult the source VCFs.
- **No statistical genetics**: AFQuery does not compute Hardy-Weinberg equilibrium, population stratification, or other statistical genetics metrics.
- **Batch queries**: The `--from-file` batch mode supports variants across multiple chromosomes in a single call. Point queries (`--locus`) and region queries (`--region`) target a single position or range; for multi-position multi-chromosome lookups, use `--from-file`.
- **Cohort size limit**: Performance at >100K samples has not been validated. Memory requirements for the build phase scale with cohort size.
Expand Down
17 changes: 15 additions & 2 deletions docs/getting-started/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,19 @@ See [Sample Filtering](../guides/sample-filtering.md) for the full include/exclu

---

## 5. Query a Region
## 5. Inspect Carriers (optional)

See which samples carry the variant you just queried:

```bash
afquery variant-info --db ./my_db/ --locus chr1:925952
```

This lists each carrier with their sex, technology, phenotype codes, genotype (het/hom), and FILTER status. See [Variant Info](../guides/variant-info.md) for details.

---

## 6. Query a Region

```bash
afquery query \
Expand All @@ -100,7 +112,7 @@ afquery query \

---

## 6. Annotate a VCF
## 7. Annotate a VCF

Given a VCF with variants you want to annotate:

Expand Down Expand Up @@ -130,5 +142,6 @@ The output VCF gains INFO fields (see [Annotate a VCF](../guides/annotate-vcf.md

- [Key Concepts](concepts.md) — understand how bitmaps, Parquet, and metadata filtering work together
- [Sample Filtering](../guides/sample-filtering.md) — full syntax for phenotype, sex, and technology filters
- [Variant Info](../guides/variant-info.md) — list carriers of any variant with metadata
- [Annotate a VCF](../guides/annotate-vcf.md) — annotation options, parallelism, and downstream usage
- [ACMG Criteria](../use-cases/acmg-use-cases.md) — applying local AF to BA1, PM2, and PS4
46 changes: 39 additions & 7 deletions docs/getting-started/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,39 @@ chr1:946000 T>C AC=2 AN=20 AF=0.1000 n_eligible=10 N_HET=2 N_HOM_ALT=0 N_

---

## 5. Filter by Sex
## 5. Inspect Variant Carriers

After finding a variant of interest, use `variant-info` to see which specific samples carry it:

```bash
afquery variant-info --db ./demo_db/ --locus chr1:925952
```

Example output:

```
sample_id sample_name sex tech phenotypes genotype filter
--------- ----------- ------ ------ --------------- -------- ------
0 DEMO_001 female wgs E11.9,I10 het PASS
2 DEMO_003 male wgs E11.9 het PASS
4 DEMO_005 female wes_v1 E11.9,control het PASS
6 DEMO_007 male wes_v2 control het PASS
8 DEMO_009 female wgs E11.9 hom PASS
```

Each row is one carrier. The `genotype` column shows `het` (heterozygous), `hom` (homozygous alt), or `alt` (non-ref with FILTER≠PASS). The `filter` column indicates whether the call passed quality filters in the source VCF.

For machine-readable output, use `--format tsv`:

```bash
afquery variant-info --db ./demo_db/ --locus chr1:925952 --format tsv > carriers.tsv
```

See [Variant Info](../guides/variant-info.md) for full options including allele-specific queries and sample filtering.

---

## 6. Filter by Sex

Query only female samples:

Expand Down Expand Up @@ -168,7 +200,7 @@ AN drops from 20 to 10 in both cases because only 5 samples are eligible. The AF

---

## 6. Filter by Phenotype
## 7. Filter by Phenotype

Query samples tagged with `E11.9`:

Expand Down Expand Up @@ -198,7 +230,7 @@ The `^` prefix means "exclude". Excluding controls removes 4 samples, leaving 6.

---

## 7. Filter by Technology
## 8. Filter by Technology

Restrict to WGS samples only:

Expand Down Expand Up @@ -259,7 +291,7 @@ No variants found for the given filters.

---

## 8. Combine Filters
## 9. Combine Filters

All filter dimensions compose with AND:

Expand All @@ -282,7 +314,7 @@ Only one sample meets all three criteria (DEMO_001: female, wgs, E11.9). With n_

---

## 9. Annotate a VCF
## 10. Annotate a VCF

Use one of the demo VCFs as input:

Expand Down Expand Up @@ -313,7 +345,7 @@ See [Annotate a VCF](../guides/annotate-vcf.md) for filtering and downstream usa

---

## 10. Bulk Export with Dump
## 11. Bulk Export with Dump

Export all variant frequencies to CSV:

Expand Down Expand Up @@ -350,7 +382,7 @@ This adds columns following the pattern `AC_{sex}_{tech}`, `AN_{sex}_{tech}`, `A

---

## 11. Interpret Results with ACMG Criteria
## 12. Interpret Results with ACMG Criteria

With the annotated VCF or query results, you can apply ACMG criteria:

Expand Down
163 changes: 163 additions & 0 deletions docs/guides/variant-info.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
# Variant Info

`afquery variant-info` returns the list of samples carrying a specific variant, together with each sample's metadata: sex, sequencing technology, phenotype codes, genotype (het/hom), and FILTER status (PASS/FAIL).

This avoids the need to re-query raw VCF files when inspecting individual variant carriers.

---

## Basic usage

```bash
afquery variant-info --db ./db/ --locus chr1:925952
```

!!! tip
`variant-info` is the natural next step after `query` — once you find a variant of interest, use it to see which specific samples carry it.

By default all samples are queried and results are printed as an aligned text table:

```
sample_id sample_name sex tech phenotypes genotype filter
--------- ----------- ------ --------- ------------ -------- ------
3 P003 male WGS E11.9,J45 het PASS
17 P017 female WES_kit_A E11.9 hom PASS
42 P042 male WGS I10 alt FAIL
```

---

## Filtering to a specific allele

When multiple alleles exist at the same position, use `--ref` and `--alt` to restrict to one:

```bash
afquery variant-info --db ./db/ --locus chr17:41245466 --ref A --alt T
```

!!! note
Without `--ref`/`--alt`, carriers for all alleles at the locus are returned and a warning is emitted if more than one allele is found. Specify both flags to disambiguate at multi-allelic sites.

---

## Sample filters

`variant-info` accepts the same sample filters as `query`:

```bash
# Only female carriers
afquery variant-info --db ./db/ --locus chr1:925952 --sex female

# Only carriers with phenotype E11.9
afquery variant-info --db ./db/ --locus chr1:925952 --phenotype E11.9

# Exclude phenotype I10
afquery variant-info --db ./db/ --locus chr1:925952 --phenotype ^I10

# Restrict to WGS samples
afquery variant-info --db ./db/ --locus chr1:925952 --tech WGS

# Combine filters
afquery variant-info --db ./db/ --locus chr1:925952 \
--sex female --phenotype E11.9 --tech WGS,WES_kit_A
```

See [Sample Filtering](sample-filtering.md) for the full filter syntax.

---

## Output formats

### TSV

Machine-readable tab-separated output, suitable for downstream processing:

```bash
afquery variant-info --db ./db/ --locus chr1:925952 --format tsv > carriers.tsv
```

```
sample_id sample_name sex tech phenotypes genotype filter
3 P003 male WGS E11.9,J45 het PASS
17 P017 female WES_kit_A E11.9 hom PASS
42 P042 male WGS I10 alt FAIL
```

### JSON

Structured output with variant metadata and a sample list:

```bash
afquery variant-info --db ./db/ --locus chr1:925952 --format json
```

```json
{
"variant": {
"chrom": "chr1",
"pos": 925952,
"ref": ".",
"alt": "."
},
"samples": [
{
"sample_id": 3,
"sample_name": "P003",
"sex": "male",
"tech": "WGS",
"phenotypes": ["E11.9", "J45"],
"genotype": "het",
"filter": "PASS"
},
{
"sample_id": 42,
"sample_name": "P042",
"sex": "male",
"tech": "WGS",
"phenotypes": ["I10"],
"genotype": "alt",
"filter": "FAIL"
}
]
}
```

When `--ref` and `--alt` are specified, the `variant` block contains the actual alleles. Otherwise, `"."` is used as a placeholder.

---

## Genotype values

| Value | Meaning |
|---|---|
| `het` | Heterozygous carrier, FILTER=PASS |
| `hom` | Homozygous alt carrier, FILTER=PASS |
| `alt` | Non-ref carrier with FILTER≠PASS (ploidy unknown) |

---

## All options

| Option | Default | Description |
|---|---|---|
| `--db` | required | Path to database directory |
| `--locus` | required | `CHROM:POS` (e.g. `chr1:925952`) |
| `--ref` | — | Filter to specific reference allele |
| `--alt` | — | Filter to specific alternate allele |
| `--phenotype` | all | Include phenotype (repeatable; `^CODE` excludes) |
| `--sex` | `both` | `male`, `female`, or `both` |
| `--tech` | all | Include technology (repeatable; `^NAME` excludes) |
| `--format` | `text` | `text`, `tsv`, or `json` |
| `--no-warn` | off | Suppress `AfqueryWarning` messages |

See also [CLI Reference → variant-info](../reference/cli.md#variant-info).

---

## Next Steps

- [Sample Filtering](sample-filtering.md) — full filter syntax for phenotype, sex, and technology
- [Understanding Output](../getting-started/understanding-output.md) — field definitions and special cases
- [FILTER=PASS Tracking](../advanced/filter-pass-tracking.md) — understanding FAIL genotypes
- [Python API → variant_info](../reference/python-api.md#variant_info) — programmatic access
- [ACMG Criteria](../use-cases/acmg-use-cases.md) — using carrier info for variant classification
6 changes: 6 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ AFQuery pre-indexes genotypes as [Roaring Bitmaps](https://roaringbitmap.org/) i
- **Server-less** — a directory of Parquet files + SQLite. Copy to share, no daemon required.
- **Ploidy-aware** — correct AN on chrX PAR/non-PAR, chrY, and chrM.
- **Technology-aware AN** — per-position capture BED intersection across WGS, WES kits, and panels.
- **Carrier lookup** — list samples carrying any variant with full metadata (sex, tech, phenotypes, genotype, FILTER status).
- **VCF annotation** — add `AFQUERY_AC/AN/AF/N_HET/N_HOM_ALT/N_HOM_REF/N_FAIL` INFO fields from any sample subset.
- **Audit changelog** — every database operation is recorded for reproducibility.

Expand Down Expand Up @@ -78,25 +79,30 @@ graph TD
E["Classify variants<br/>using ACMG criteria"]
F["Compare AF across<br/>groups"]

M["Find carriers of<br/>a variant"]

A -->|First time| G["5-Min Quickstart"]
A -->|Build| B
A -->|Query| C
A -->|Annotate| D
A -->|Classify| E
A -->|Compare| F
A -->|Carriers| M

B --> H["Create a Database"]
C --> I["Query Guide"]
D --> J["Annotate a VCF"]
E --> K["ACMG Criteria"]
F --> L["Cohort Stratification"]
M --> N["Variant Info"]

click G "getting-started/quickstart/"
click H "guides/create-database/"
click I "guides/query/"
click J "guides/annotate-vcf/"
click K "use-cases/acmg-use-cases/"
click L "use-cases/cohort-stratification/"
click N "guides/variant-info/"

style A fill:#e3f2fd
style G fill:#e8f5e9
Expand Down
Loading
Loading