diff --git a/docs/advanced/debugging-results.md b/docs/advanced/debugging-results.md
index f7b835d..256439b 100644
--- a/docs/advanced/debugging-results.md
+++ b/docs/advanced/debugging-results.md
@@ -54,6 +54,9 @@ To inspect a site with high N_FAIL, query with `--format json` to see all fields
afquery query --db ./db/ --locus chr1:12345678 --format json
```
+!!! tip "Identify failing samples"
+ Use `afquery variant-info --db ./db/ --locus chr1:12345678` to see exactly which samples have FAIL status and their metadata (technology, phenotype codes). This helps determine if failures cluster in a specific technology or sample subset. See [Variant Info](../guides/variant-info.md).
+
If N_FAIL is consistently high across many sites, check the variant calling pipeline and FILTER field settings in your VCFs.
---
diff --git a/docs/advanced/filter-pass-tracking.md b/docs/advanced/filter-pass-tracking.md
index 45875b3..cc18c61 100644
--- a/docs/advanced/filter-pass-tracking.md
+++ b/docs/advanced/filter-pass-tracking.md
@@ -69,6 +69,16 @@ for r in results:
`N_FAIL` is always an `int` (default `0`).
+### Identifying specific FAIL samples
+
+To see which individual samples have FAIL status at a position, use `variant-info`:
+
+```bash
+afquery variant-info --db ./db/ --locus chr1:925952
+```
+
+Each carrier row shows its `filter` column as `PASS` or `FAIL`, along with sample metadata (technology, phenotype codes). This helps pinpoint whether failures cluster in a specific technology or sample group. See [Variant Info](../guides/variant-info.md) for full options.
+
---
## VCF Annotation
@@ -85,19 +95,6 @@ afquery annotate --db ./db/ --input variants.vcf --output annotated.vcf
---
-## Schema Version Compatibility
-
-The `fail_bitmap` and `N_FAIL` tracking requires schema version 2.0 or later. Databases created with older versions of AFQuery do not contain `fail_bitmap` data.
-
-| Database schema | N_FAIL behavior |
-|----------------|-----------------|
-| ≥ 2.0 | `N_FAIL` is always an integer (0 or more) |
-| < 2.0 (legacy) | `N_FAIL` is `None` in Python API results |
-
-To upgrade a legacy database, rebuild it with `afquery create-db`. There is no in-place migration.
-
----
-
## PASS-Only Enforcement
AF reflects the quality-filtered allele frequency — the frequency of the alt allele among high-quality calls. This is appropriate for most clinical and research use cases. PASS-only ingestion is always enforced.
diff --git a/docs/faq.md b/docs/faq.md
index 2f554c0..bca7372 100644
--- a/docs/faq.md
+++ b/docs/faq.md
@@ -209,7 +209,7 @@ Without stratification, rare variant interpretation in mixed cohorts can be misl
AFQuery is purpose-built for fast subcohort AF computation and is not a general-purpose genomic database:
- **Not a joint genotyper**: AFQuery does not perform joint genotyping. Input VCFs should be individually called before ingestion.
-- **Not a variant database**: AFQuery stores only genotype-level summaries (bitmaps). Individual sample genotypes cannot be retrieved from the database.
+- **Not a genotype store**: AFQuery stores genotype summaries as bitmaps, not raw FORMAT fields. Use `variant-info` to list carriers and their genotype class (het/hom) at a specific position; for full per-sample VCF fields (GQ, DP, AD, etc.), consult the source VCFs.
- **No statistical genetics**: AFQuery does not compute Hardy-Weinberg equilibrium, population stratification, or other statistical genetics metrics.
- **Batch queries**: The `--from-file` batch mode supports variants across multiple chromosomes in a single call. Point queries (`--locus`) and region queries (`--region`) target a single position or range; for multi-position multi-chromosome lookups, use `--from-file`.
- **Cohort size limit**: Performance at >100K samples has not been validated. Memory requirements for the build phase scale with cohort size.
diff --git a/docs/getting-started/quickstart.md b/docs/getting-started/quickstart.md
index 56f813a..cb9b1bd 100644
--- a/docs/getting-started/quickstart.md
+++ b/docs/getting-started/quickstart.md
@@ -90,7 +90,19 @@ See [Sample Filtering](../guides/sample-filtering.md) for the full include/exclu
---
-## 5. Query a Region
+## 5. Inspect Carriers (optional)
+
+See which samples carry the variant you just queried:
+
+```bash
+afquery variant-info --db ./my_db/ --locus chr1:925952
+```
+
+This lists each carrier with their sex, technology, phenotype codes, genotype (het/hom), and FILTER status. See [Variant Info](../guides/variant-info.md) for details.
+
+---
+
+## 6. Query a Region
```bash
afquery query \
@@ -100,7 +112,7 @@ afquery query \
---
-## 6. Annotate a VCF
+## 7. Annotate a VCF
Given a VCF with variants you want to annotate:
@@ -130,5 +142,6 @@ The output VCF gains INFO fields (see [Annotate a VCF](../guides/annotate-vcf.md
- [Key Concepts](concepts.md) — understand how bitmaps, Parquet, and metadata filtering work together
- [Sample Filtering](../guides/sample-filtering.md) — full syntax for phenotype, sex, and technology filters
+- [Variant Info](../guides/variant-info.md) — list carriers of any variant with metadata
- [Annotate a VCF](../guides/annotate-vcf.md) — annotation options, parallelism, and downstream usage
- [ACMG Criteria](../use-cases/acmg-use-cases.md) — applying local AF to BA1, PM2, and PS4
diff --git a/docs/getting-started/tutorial.md b/docs/getting-started/tutorial.md
index f2c9fcc..92f6431 100644
--- a/docs/getting-started/tutorial.md
+++ b/docs/getting-started/tutorial.md
@@ -138,7 +138,39 @@ chr1:946000 T>C AC=2 AN=20 AF=0.1000 n_eligible=10 N_HET=2 N_HOM_ALT=0 N_
---
-## 5. Filter by Sex
+## 5. Inspect Variant Carriers
+
+After finding a variant of interest, use `variant-info` to see which specific samples carry it:
+
+```bash
+afquery variant-info --db ./demo_db/ --locus chr1:925952
+```
+
+Example output:
+
+```
+sample_id sample_name sex tech phenotypes genotype filter
+--------- ----------- ------ ------ --------------- -------- ------
+0 DEMO_001 female wgs E11.9,I10 het PASS
+2 DEMO_003 male wgs E11.9 het PASS
+4 DEMO_005 female wes_v1 E11.9,control het PASS
+6 DEMO_007 male wes_v2 control het PASS
+8 DEMO_009 female wgs E11.9 hom PASS
+```
+
+Each row is one carrier. The `genotype` column shows `het` (heterozygous), `hom` (homozygous alt), or `alt` (non-ref with FILTER≠PASS). The `filter` column indicates whether the call passed quality filters in the source VCF.
+
+For machine-readable output, use `--format tsv`:
+
+```bash
+afquery variant-info --db ./demo_db/ --locus chr1:925952 --format tsv > carriers.tsv
+```
+
+See [Variant Info](../guides/variant-info.md) for full options including allele-specific queries and sample filtering.
+
+---
+
+## 6. Filter by Sex
Query only female samples:
@@ -168,7 +200,7 @@ AN drops from 20 to 10 in both cases because only 5 samples are eligible. The AF
---
-## 6. Filter by Phenotype
+## 7. Filter by Phenotype
Query samples tagged with `E11.9`:
@@ -198,7 +230,7 @@ The `^` prefix means "exclude". Excluding controls removes 4 samples, leaving 6.
---
-## 7. Filter by Technology
+## 8. Filter by Technology
Restrict to WGS samples only:
@@ -259,7 +291,7 @@ No variants found for the given filters.
---
-## 8. Combine Filters
+## 9. Combine Filters
All filter dimensions compose with AND:
@@ -282,7 +314,7 @@ Only one sample meets all three criteria (DEMO_001: female, wgs, E11.9). With n_
---
-## 9. Annotate a VCF
+## 10. Annotate a VCF
Use one of the demo VCFs as input:
@@ -313,7 +345,7 @@ See [Annotate a VCF](../guides/annotate-vcf.md) for filtering and downstream usa
---
-## 10. Bulk Export with Dump
+## 11. Bulk Export with Dump
Export all variant frequencies to CSV:
@@ -350,7 +382,7 @@ This adds columns following the pattern `AC_{sex}_{tech}`, `AN_{sex}_{tech}`, `A
---
-## 11. Interpret Results with ACMG Criteria
+## 12. Interpret Results with ACMG Criteria
With the annotated VCF or query results, you can apply ACMG criteria:
diff --git a/docs/guides/variant-info.md b/docs/guides/variant-info.md
new file mode 100644
index 0000000..c4cde67
--- /dev/null
+++ b/docs/guides/variant-info.md
@@ -0,0 +1,163 @@
+# Variant Info
+
+`afquery variant-info` returns the list of samples carrying a specific variant, together with each sample's metadata: sex, sequencing technology, phenotype codes, genotype (het/hom), and FILTER status (PASS/FAIL).
+
+This avoids the need to re-query raw VCF files when inspecting individual variant carriers.
+
+---
+
+## Basic usage
+
+```bash
+afquery variant-info --db ./db/ --locus chr1:925952
+```
+
+!!! tip
+ `variant-info` is the natural next step after `query` — once you find a variant of interest, use it to see which specific samples carry it.
+
+By default all samples are queried and results are printed as an aligned text table:
+
+```
+sample_id sample_name sex tech phenotypes genotype filter
+--------- ----------- ------ --------- ------------ -------- ------
+3 P003 male WGS E11.9,J45 het PASS
+17 P017 female WES_kit_A E11.9 hom PASS
+42 P042 male WGS I10 alt FAIL
+```
+
+---
+
+## Filtering to a specific allele
+
+When multiple alleles exist at the same position, use `--ref` and `--alt` to restrict to one:
+
+```bash
+afquery variant-info --db ./db/ --locus chr17:41245466 --ref A --alt T
+```
+
+!!! note
+ Without `--ref`/`--alt`, carriers for all alleles at the locus are returned and a warning is emitted if more than one allele is found. Specify both flags to disambiguate at multi-allelic sites.
+
+---
+
+## Sample filters
+
+`variant-info` accepts the same sample filters as `query`:
+
+```bash
+# Only female carriers
+afquery variant-info --db ./db/ --locus chr1:925952 --sex female
+
+# Only carriers with phenotype E11.9
+afquery variant-info --db ./db/ --locus chr1:925952 --phenotype E11.9
+
+# Exclude phenotype I10
+afquery variant-info --db ./db/ --locus chr1:925952 --phenotype ^I10
+
+# Restrict to WGS samples
+afquery variant-info --db ./db/ --locus chr1:925952 --tech WGS
+
+# Combine filters
+afquery variant-info --db ./db/ --locus chr1:925952 \
+ --sex female --phenotype E11.9 --tech WGS,WES_kit_A
+```
+
+See [Sample Filtering](sample-filtering.md) for the full filter syntax.
+
+---
+
+## Output formats
+
+### TSV
+
+Machine-readable tab-separated output, suitable for downstream processing:
+
+```bash
+afquery variant-info --db ./db/ --locus chr1:925952 --format tsv > carriers.tsv
+```
+
+```
+sample_id sample_name sex tech phenotypes genotype filter
+3 P003 male WGS E11.9,J45 het PASS
+17 P017 female WES_kit_A E11.9 hom PASS
+42 P042 male WGS I10 alt FAIL
+```
+
+### JSON
+
+Structured output with variant metadata and a sample list:
+
+```bash
+afquery variant-info --db ./db/ --locus chr1:925952 --format json
+```
+
+```json
+{
+ "variant": {
+ "chrom": "chr1",
+ "pos": 925952,
+ "ref": ".",
+ "alt": "."
+ },
+ "samples": [
+ {
+ "sample_id": 3,
+ "sample_name": "P003",
+ "sex": "male",
+ "tech": "WGS",
+ "phenotypes": ["E11.9", "J45"],
+ "genotype": "het",
+ "filter": "PASS"
+ },
+ {
+ "sample_id": 42,
+ "sample_name": "P042",
+ "sex": "male",
+ "tech": "WGS",
+ "phenotypes": ["I10"],
+ "genotype": "alt",
+ "filter": "FAIL"
+ }
+ ]
+}
+```
+
+When `--ref` and `--alt` are specified, the `variant` block contains the actual alleles. Otherwise, `"."` is used as a placeholder.
+
+---
+
+## Genotype values
+
+| Value | Meaning |
+|---|---|
+| `het` | Heterozygous carrier, FILTER=PASS |
+| `hom` | Homozygous alt carrier, FILTER=PASS |
+| `alt` | Non-ref carrier with FILTER≠PASS (ploidy unknown) |
+
+---
+
+## All options
+
+| Option | Default | Description |
+|---|---|---|
+| `--db` | required | Path to database directory |
+| `--locus` | required | `CHROM:POS` (e.g. `chr1:925952`) |
+| `--ref` | — | Filter to specific reference allele |
+| `--alt` | — | Filter to specific alternate allele |
+| `--phenotype` | all | Include phenotype (repeatable; `^CODE` excludes) |
+| `--sex` | `both` | `male`, `female`, or `both` |
+| `--tech` | all | Include technology (repeatable; `^NAME` excludes) |
+| `--format` | `text` | `text`, `tsv`, or `json` |
+| `--no-warn` | off | Suppress `AfqueryWarning` messages |
+
+See also [CLI Reference → variant-info](../reference/cli.md#variant-info).
+
+---
+
+## Next Steps
+
+- [Sample Filtering](sample-filtering.md) — full filter syntax for phenotype, sex, and technology
+- [Understanding Output](../getting-started/understanding-output.md) — field definitions and special cases
+- [FILTER=PASS Tracking](../advanced/filter-pass-tracking.md) — understanding FAIL genotypes
+- [Python API → variant_info](../reference/python-api.md#variant_info) — programmatic access
+- [ACMG Criteria](../use-cases/acmg-use-cases.md) — using carrier info for variant classification
diff --git a/docs/index.md b/docs/index.md
index d206c52..a634a4d 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -30,6 +30,7 @@ AFQuery pre-indexes genotypes as [Roaring Bitmaps](https://roaringbitmap.org/) i
- **Server-less** — a directory of Parquet files + SQLite. Copy to share, no daemon required.
- **Ploidy-aware** — correct AN on chrX PAR/non-PAR, chrY, and chrM.
- **Technology-aware AN** — per-position capture BED intersection across WGS, WES kits, and panels.
+- **Carrier lookup** — list samples carrying any variant with full metadata (sex, tech, phenotypes, genotype, FILTER status).
- **VCF annotation** — add `AFQUERY_AC/AN/AF/N_HET/N_HOM_ALT/N_HOM_REF/N_FAIL` INFO fields from any sample subset.
- **Audit changelog** — every database operation is recorded for reproducibility.
@@ -78,18 +79,22 @@ graph TD
E["Classify variants using ACMG criteria"]
F["Compare AF across groups"]
+ M["Find carriers of a variant"]
+
A -->|First time| G["5-Min Quickstart"]
A -->|Build| B
A -->|Query| C
A -->|Annotate| D
A -->|Classify| E
A -->|Compare| F
+ A -->|Carriers| M
B --> H["Create a Database"]
C --> I["Query Guide"]
D --> J["Annotate a VCF"]
E --> K["ACMG Criteria"]
F --> L["Cohort Stratification"]
+ M --> N["Variant Info"]
click G "getting-started/quickstart/"
click H "guides/create-database/"
@@ -97,6 +102,7 @@ graph TD
click J "guides/annotate-vcf/"
click K "use-cases/acmg-use-cases/"
click L "use-cases/cohort-stratification/"
+ click N "guides/variant-info/"
style A fill:#e3f2fd
style G fill:#e8f5e9
diff --git a/docs/reference/cli.md b/docs/reference/cli.md
index 689c723..9ef24ba 100644
--- a/docs/reference/cli.md
+++ b/docs/reference/cli.md
@@ -104,6 +104,30 @@ afquery dump [OPTIONS]
---
+## variant-info
+
+List samples carrying a specific variant, with their metadata.
+
+```
+afquery variant-info [OPTIONS]
+```
+
+| Option | Type | Default | Description |
+|--------|------|---------|-------------|
+| `--db` | TEXT | **required** | Path to database directory |
+| `--locus` | TEXT | **required** | Single position as `CHROM:POS` (e.g., `chr1:925952`) |
+| `--ref` | TEXT | None | Filter to specific reference allele |
+| `--alt` | TEXT | None | Filter to specific alternate allele |
+| `--phenotype` | TEXT | None | Phenotype filter. Repeatable; comma-separated or multiple flags. Use `^` prefix to exclude. |
+| `--sex` | `male`\|`female`\|`both` | `both` | Restrict to specified sex |
+| `--tech` | TEXT | None | Technology filter. Repeatable; comma-separated or multiple flags. Use `^` prefix to exclude. |
+| `--format` | `text`\|`json`\|`tsv` | `text` | Output format |
+| `--no-warn` | flag | False | Suppress `AfqueryWarning` messages |
+
+Returns one row per carrier sample with genotype (`het`/`hom`/`alt`) and FILTER status (`PASS`/`FAIL`). Use `--ref`/`--alt` to restrict to a specific allele when multiple alleles share the same position.
+
+---
+
## update-db
Add samples, remove samples, update sample metadata, or compact the database.
diff --git a/docs/reference/glossary.md b/docs/reference/glossary.md
index 8851ad8..28cd6b7 100644
--- a/docs/reference/glossary.md
+++ b/docs/reference/glossary.md
@@ -20,6 +20,10 @@ See [Roaring Bitmap](#roaring-bitmap).
A 1 Mbp (1,000,000 base pair) genomic interval used to partition variant data within each chromosome. Bucket 0 covers positions 0–999,999; bucket 1 covers 1,000,000–1,999,999; and so on. See [Data Model](data-model.md).
+## Carrier
+
+A sample that has at least one copy of the alternate allele at a given position (heterozygous or homozygous alt). Use `variant-info` to list carriers with their metadata. See [Variant Info](../guides/variant-info.md).
+
## Capture Index
An interval tree (stored as a `.pkl` file) built from a BED file that defines the genomic regions covered by a whole-exome sequencing (WES) technology. Used to determine which WES samples are eligible at a given position. WGS samples do not require a capture index. See [Key Concepts](../getting-started/concepts.md#capture-index-wes).
diff --git a/docs/reference/python-api.md b/docs/reference/python-api.md
index 404ab4a..bd3033b 100644
--- a/docs/reference/python-api.md
+++ b/docs/reference/python-api.md
@@ -281,6 +281,60 @@ Export allele frequency data to CSV. If `output` is None, writes to stdout.
---
+### variant_info
+
+```python
+db.variant_info(
+ chrom: str,
+ pos: int,
+ ref: str | None = None,
+ alt: str | None = None,
+ phenotype: list[str] | None = None,
+ sex: str = "both",
+ tech: list[str] | None = None,
+) -> list[SampleCarrier]
+```
+
+Return all samples carrying the variant at the given position, with their metadata.
+
+**Parameters:**
+
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `chrom` | str | Chromosome name (e.g., `"chr1"`, `"chrX"`) |
+| `pos` | int | 1-based genomic position |
+| `ref` | str \| None | Filter to specific reference allele |
+| `alt` | str \| None | Filter to specific alternate allele |
+| `phenotype` | list[str] \| None | Phenotype filter codes. Use `"^CODE"` prefix to exclude. |
+| `sex` | str | `"both"` (default), `"male"`, or `"female"` |
+| `tech` | list[str] \| None | Technology filter. Use `"^TECH"` prefix to exclude. |
+
+**Returns:** List of `SampleCarrier` objects sorted by `sample_id`. Empty list if no eligible carrier exists.
+
+**Example:**
+
+```python
+carriers = db.variant_info("chr1", pos=925952)
+for c in carriers:
+ print(f"{c.sample_name} {c.genotype} {c.tech_name} {c.phenotypes}")
+
+# With allele and sample filters
+carriers = db.variant_info(
+ chrom="chr17", pos=41245466, ref="A", alt="T",
+ phenotype=["E11.9"], sex="female", tech=["WGS"],
+)
+```
+
+The `variant_info` function is also available at the top level for one-off use:
+
+```python
+from afquery import variant_info
+
+carriers = variant_info("./db/", "chr1", 925952, ref="G", alt="A")
+```
+
+---
+
### add_samples
```python
@@ -415,6 +469,34 @@ class VariantKey:
---
+## SampleCarrier
+
+```python
+@dataclass
+class SampleCarrier:
+ sample_id: int # 0-based internal ID
+ sample_name: str # Name from manifest
+ sex: str # 'male' | 'female'
+ tech_name: str # Sequencing technology
+ phenotypes: list[str] # Sorted phenotype codes
+ genotype: str # 'het' | 'hom' | 'alt'
+ filter_pass: bool # False = FILTER≠PASS
+```
+
+Returned by `Database.variant_info()`. Each instance represents one sample carrying the queried variant.
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `sample_id` | int | 0-based internal sample ID |
+| `sample_name` | str | Sample name from manifest |
+| `sex` | str | `"male"` or `"female"` |
+| `tech_name` | str | Sequencing technology name |
+| `phenotypes` | list[str] | Sorted list of phenotype codes |
+| `genotype` | str | `"het"` (heterozygous, PASS), `"hom"` (homozygous alt, PASS), or `"alt"` (non-ref, FILTER≠PASS) |
+| `filter_pass` | bool | `True` if FILTER=PASS, `False` otherwise |
+
+---
+
## SampleFilter
```python
@@ -488,4 +570,9 @@ batch_results = db.query_batch_multi(variants)
regions = [("chr1", 900000, 1000000), ("chrX", 5000000, 6000000)]
region_results = db.query_region_multi(regions)
print(f"Found {len(region_results)} variants across {len(regions)} regions")
+
+# Carrier lookup
+carriers = db.variant_info("chr1", pos=925952)
+for c in carriers:
+ print(f"{c.sample_name} {c.genotype} {c.tech_name} PASS={c.filter_pass}")
```
diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md
index 6b93444..af9d921 100644
--- a/docs/troubleshooting.md
+++ b/docs/troubleshooting.md
@@ -119,16 +119,6 @@ afquery info --db ./db/ --samples | grep SAMP
---
-## Version Compatibility
-
-If you get `"Unsupported schema_version"` errors, upgrade AFQuery:
-
-```bash
-pip install --upgrade afquery
-```
-
----
-
## WES Technology Treated as WGS
**Symptom:** AN for WES samples is much higher than expected; positions outside the capture panel return results.
diff --git a/docs/use-cases/acmg-use-cases.md b/docs/use-cases/acmg-use-cases.md
index 02d306c..66132a9 100644
--- a/docs/use-cases/acmg-use-cases.md
+++ b/docs/use-cases/acmg-use-cases.md
@@ -133,6 +133,9 @@ if cases and controls:
print("Strong enrichment — PS4 applicable if case count is sufficient")
```
+!!! tip "Verify carriers"
+ Use `afquery variant-info` to list the specific samples carrying the variant and confirm their phenotype assignments match expectations. See [Variant Info](../guides/variant-info.md).
+
!!! warning "Statistical considerations"
The original ACMG/AMP framework (Richards et al., 2015) does not specify a minimum odds ratio for PS4. The OR > 5.0 threshold used here is a widely adopted convention, but ClinGen expert panels have proposed different thresholds depending on the gene/disease context (e.g., OR ≥ 6 for strong, OR ≥ 3 for moderate in hearing loss). The enrichment ratio above is a screening tool. For formal PS4 application, compute a Fisher's exact test on the 2×2 table of (carrier/non-carrier) × (case/control) and apply the thresholds relevant to your clinical context.
@@ -223,6 +226,7 @@ bcftools filter -i 'AFQUERY_N_FAIL > 0' annotated.vcf | head
## Next Steps
+- [Variant Info](../guides/variant-info.md) — list carriers with metadata to verify phenotype assignments
- [Clinical Prioritization](../use-cases/clinical-prioritization.md) — full annotation and filtering workflow
- [Pseudo-controls](../use-cases/pseudo-controls.md) — case vs. control AF comparison
- [Population-Specific AF](../use-cases/population-specific-af.md) — ancestry-stratified queries
diff --git a/docs/use-cases/clinical-prioritization.md b/docs/use-cases/clinical-prioritization.md
index 3b154b3..6dad977 100644
--- a/docs/use-cases/clinical-prioritization.md
+++ b/docs/use-cases/clinical-prioritization.md
@@ -114,8 +114,19 @@ A typical clinical pipeline retains ~1,500 rare/novel candidates after cohort AF
For detailed ACMG workflows with worked examples and AN threshold guidance, see [ACMG Criteria (BA1/PM2/PS4)](acmg-use-cases.md).
+### 6. Inspect carriers of rare candidates
+
+After identifying rare variants, use `variant-info` to see who carries each one — useful for confirming case/control enrichment or checking sample quality:
+
+```bash
+afquery variant-info --db ./db/ --locus chr2:166845670 --ref G --alt A
+```
+
+This lists each carrier with their phenotype codes, sequencing technology, genotype, and FILTER status. See [Variant Info](../guides/variant-info.md) for filtering options.
+
## Next Steps
+- [Variant Info](../guides/variant-info.md) — list carriers of any variant with metadata
- [Annotate a VCF](../guides/annotate-vcf.md) — full annotation options
- [FILTER=PASS Tracking](../advanced/filter-pass-tracking.md) — using N_FAIL in filtering
- [Population-Specific AF](population-specific-af.md) — local vs. gnomAD comparison
diff --git a/mkdocs.yml b/mkdocs.yml
index 6dfd8ff..d7eb52d 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -84,6 +84,7 @@ nav:
- Query Allele Frequencies: guides/query.md
- Annotate a VCF: guides/annotate-vcf.md
- Bulk Export: guides/dump-export.md
+ - Variant Info: guides/variant-info.md
- Sample Filtering: guides/sample-filtering.md
- 🏥 Clinical Workflows:
- ACMG Criteria (BA1/PM2/PS4): use-cases/acmg-use-cases.md
diff --git a/src/afquery/__init__.py b/src/afquery/__init__.py
index 137c149..0103038 100644
--- a/src/afquery/__init__.py
+++ b/src/afquery/__init__.py
@@ -1,5 +1,6 @@
from afquery._version import __version__
from .database import Database
-from .models import QueryResult, AfqueryWarning
+from .models import QueryResult, SampleCarrier, AfqueryWarning
+from .variant_info import variant_info
-__all__ = ["Database", "QueryResult", "AfqueryWarning", "__version__"]
+__all__ = ["Database", "QueryResult", "SampleCarrier", "AfqueryWarning", "variant_info", "__version__"]
diff --git a/src/afquery/cli.py b/src/afquery/cli.py
index e13212b..18450a7 100644
--- a/src/afquery/cli.py
+++ b/src/afquery/cli.py
@@ -109,6 +109,57 @@ def _print_results(results, fmt: str) -> None:
)
+def _print_carriers(carriers, variant_key, fmt: str) -> None:
+ """Print variant_info results in the requested format."""
+ if fmt == "json":
+ out = {
+ "variant": {
+ "chrom": variant_key[0], "pos": variant_key[1],
+ "ref": variant_key[2], "alt": variant_key[3],
+ },
+ "samples": [
+ {
+ "sample_id": c.sample_id,
+ "sample_name": c.sample_name,
+ "sex": c.sex,
+ "tech": c.tech_name,
+ "phenotypes": c.phenotypes,
+ "genotype": c.genotype,
+ "filter": "PASS" if c.filter_pass else "FAIL",
+ }
+ for c in carriers
+ ],
+ }
+ click.echo(json.dumps(out, indent=2))
+ elif fmt == "tsv":
+ click.echo("sample_id\tsample_name\tsex\ttech\tphenotypes\tgenotype\tfilter")
+ for c in carriers:
+ click.echo(
+ f"{c.sample_id}\t{c.sample_name}\t{c.sex}\t{c.tech_name}\t"
+ f"{','.join(c.phenotypes)}\t{c.genotype}\t"
+ f"{'PASS' if c.filter_pass else 'FAIL'}"
+ )
+ else: # text
+ if not carriers:
+ click.echo("No carriers found for the given filters.")
+ return
+ headers = ["sample_id", "sample_name", "sex", "tech", "phenotypes", "genotype", "filter"]
+ rows = [
+ [
+ str(c.sample_id), c.sample_name, c.sex, c.tech_name,
+ ",".join(c.phenotypes) if c.phenotypes else ".",
+ c.genotype, "PASS" if c.filter_pass else "FAIL",
+ ]
+ for c in carriers
+ ]
+ widths = [max(len(h), max((len(r[i]) for r in rows), default=0)) for i, h in enumerate(headers)]
+ fmt_row = " ".join(f"{{:<{w}}}" for w in widths)
+ click.echo(fmt_row.format(*headers))
+ click.echo(" ".join("-" * w for w in widths))
+ for row in rows:
+ click.echo(fmt_row.format(*row))
+
+
@click.group()
def cli():
"""AFQuery: bitmap-indexed allele frequency engine for local genomic cohorts.
@@ -116,7 +167,7 @@ def cli():
Enables fast AC/AN/AF queries on user-defined subcohorts (phenotype, sex,
technology) without rescanning VCFs.
- Commands: query, annotate, dump, info, version, create-db, update-db, check, benchmark
+ Commands: query, variant-info, annotate, dump, info, version, create-db, update-db, check, benchmark
"""
@@ -178,6 +229,47 @@ def query(db, locus, region, from_file, phenotype, sex, ref, alt, tech, fmt, no_
_print_results(results, fmt)
+@cli.command("variant-info")
+@click.option("--db", required=True, help="Path to database directory.")
+@click.option("--locus", required=True, help="Position as CHROM:POS (e.g., chr1:925952).")
+@click.option("--ref", default=None, help="Filter to specific reference allele.")
+@click.option("--alt", default=None, help="Filter to specific alternate allele.")
+@click.option("--phenotype", multiple=True, help="Phenotype to include. Repeatable; comma-separated or multiple flags. Use ^ prefix to exclude. (default: include all samples)")
+@click.option("--sex", default="both", type=click.Choice(["male", "female", "both"]), help="Restrict to specified sex. Options: male, female, both. (default: both)")
+@click.option("--tech", multiple=True, help="Technology filter. Repeatable; comma-separated or multiple flags. Use ^ prefix to exclude. (default: include all samples)")
+@click.option("--format", "fmt", default="text", type=click.Choice(["text", "json", "tsv"]), help="Output format. Options: text, json, tsv. (default: text)")
+@click.option("--no-warn", is_flag=True, default=False, help="Suppress AfqueryWarning messages.")
+def variant_info_cmd(db, locus, ref, alt, phenotype, sex, tech, fmt, no_warn):
+ """List samples carrying a specific variant, with their metadata.
+
+ Returns one row per carrier sample with genotype (het/hom/alt) and FILTER
+ status (PASS/FAIL). Use --ref/--alt to restrict to a specific allele when
+ multiple alleles share the same position.
+
+ \b
+ Examples:
+ afquery variant-info --db ./db --locus chr1:925952
+ afquery variant-info --db ./db --locus chr1:925952 --ref A --alt T
+ afquery variant-info --db ./db --locus chrX:2700000 --sex female --format tsv
+ """
+ if no_warn:
+ import warnings
+ from .models import AfqueryWarning
+ warnings.filterwarnings("ignore", category=AfqueryWarning)
+
+ chrom, pos = _parse_locus(locus)
+ database = Database(db)
+ carriers = database.variant_info(
+ chrom=chrom, pos=pos,
+ ref=ref, alt=alt,
+ phenotype=_expand_tokens(phenotype), sex=sex,
+ tech=_expand_tokens(tech),
+ )
+
+ _variant_key = (chrom, pos, ref or ".", alt or ".")
+ _print_carriers(carriers, _variant_key, fmt)
+
+
@cli.command()
@click.option("--db", required=True, help="Path to database directory.")
@click.option("--input", "input_vcf", required=True, help="Input VCF file (plain or .gz).")
@@ -201,7 +293,7 @@ def annotate(db, input_vcf, output_vcf, phenotype, sex, tech, threads, verbose,
AFQUERY_N_HET heterozygous carrier count (per ALT)
AFQUERY_N_HOM_ALT homozygous alt count (per ALT)
AFQUERY_N_HOM_REF homozygous ref count (per ALT)
- AFQUERY_N_FAIL samples with FILTER!=PASS (v2 databases only)
+ AFQUERY_N_FAIL samples with FILTER!=PASS
"""
if no_warn:
import warnings
diff --git a/src/afquery/database.py b/src/afquery/database.py
index dbd3e99..12f2c95 100644
--- a/src/afquery/database.py
+++ b/src/afquery/database.py
@@ -2,7 +2,7 @@
import sqlite3
from pathlib import Path
-from .models import QueryParams, QueryResult, SampleFilter
+from .models import QueryParams, QueryResult, SampleCarrier, SampleFilter
from .query import QueryEngine
@@ -86,6 +86,40 @@ def query_region(
sf = self._make_filter(phenotype, sex, tech)
return self._engine.query_region(chrom, start, end, sf)
+ def variant_info(
+ self,
+ chrom: str,
+ pos: int,
+ ref: str | None = None,
+ alt: str | None = None,
+ phenotype: list[str] | None = None,
+ sex: str = "both",
+ tech: list[str] | None = None,
+ ) -> list[SampleCarrier]:
+ """Return all samples carrying a variant, with their metadata.
+
+ Each element includes sample name, sex, technology, phenotype codes,
+ genotype (``"het"``, ``"hom"``, or ``"alt"`` for FILTER≠PASS samples),
+ and FILTER status.
+
+ Args:
+ chrom: Chromosome (e.g. ``"chr1"`` or ``"1"``).
+ pos: Position, 1-based.
+ ref: Reference allele filter. If omitted, all alleles at *pos* are returned.
+ alt: Alternate allele filter. If omitted, all alleles at *pos* are returned.
+ phenotype: Phenotype filter tokens. Use ``"^CODE"`` prefix to exclude.
+ sex: ``"both"`` (default), ``"male"``, or ``"female"``.
+ tech: Technology filter tokens. Use ``"^TECH"`` prefix to exclude.
+
+ Returns:
+ List of :class:`~afquery.models.SampleCarrier` sorted by
+ ``sample_id``. Empty list if the variant is absent or no eligible
+ carrier exists.
+ """
+ sf = self._make_filter(phenotype, sex, tech)
+ params = QueryParams(chrom=chrom, pos=pos, filter=sf, ref=ref, alt=alt)
+ return self._engine.variant_info(params)
+
def query_region_multi(
self,
regions: list[tuple[str, int, int]],
diff --git a/src/afquery/models.py b/src/afquery/models.py
index 4b7cf3a..996a5ac 100644
--- a/src/afquery/models.py
+++ b/src/afquery/models.py
@@ -73,3 +73,15 @@ class QueryResult:
N_HOM_ALT: int = 0
N_HOM_REF: int = 0
N_FAIL: int = 0
+
+
+@dataclass
+class SampleCarrier:
+ """A sample carrying a given variant, with its associated metadata."""
+ sample_id: int
+ sample_name: str
+ sex: str # 'male' | 'female'
+ tech_name: str
+ phenotypes: list[str]
+ genotype: str # 'het' | 'hom' | 'alt' (FILTER≠PASS, ploidy unknown)
+ filter_pass: bool # False = FILTER≠PASS
diff --git a/src/afquery/query.py b/src/afquery/query.py
index 1f1dd76..450dd97 100644
--- a/src/afquery/query.py
+++ b/src/afquery/query.py
@@ -9,7 +9,7 @@
from .bitmaps import deserialize
from .capture import CaptureIndex, load_capture_indices
from .constants import normalize_chrom, ALL_CHROMS, CHROM_ORDER
-from .models import AfqueryWarning, QueryParams, QueryResult, VariantKey, Sample, Technology, SampleFilter
+from .models import AfqueryWarning, QueryParams, QueryResult, SampleCarrier, VariantKey, Sample, Technology, SampleFilter
from .ploidy import compute_AN, split_ploidy
BATCH_IN_THRESHOLD = 10_000
@@ -43,9 +43,18 @@ def __init__(self, db_path: str):
"SELECT tech_id, tech_name, bed_path FROM technologies"
).fetchall()
]
+
+ # Phenotype cache: sample_id → list of phenotype codes
+ self._sample_phenotypes: dict[int, list[str]] = {}
+ for row in con.execute(
+ "SELECT sample_id, phenotype_code FROM sample_phenotype"
+ ).fetchall():
+ self._sample_phenotypes.setdefault(row[0], []).append(row[1])
+
con.close()
self._samples = samples
+ self._samples_by_id: dict[int, Sample] = {s.sample_id: s for s in samples}
self._tech_map = {t.tech_id: t for t in techs}
sex_bms = self._bitmaps.get("sex", {})
self._male_bm = sex_bms.get("male", BitMap())
@@ -549,3 +558,108 @@ def query_region(
sample_bm = self._build_sample_bitmap(sf)
return self._query_region_inner(chrom, start, end, sample_bm)
+
+ def variant_info(self, params: QueryParams) -> list[SampleCarrier]:
+ """Return all samples carrying the variant at params.chrom:params.pos.
+
+ Each returned :class:`~afquery.models.SampleCarrier` includes the
+ sample's name, sex, technology, phenotype codes, genotype (het/hom/alt),
+ and FILTER status.
+
+ Args:
+ params: Query parameters (chrom, pos, optional ref/alt, sample filter).
+
+ Returns:
+ List of :class:`~afquery.models.SampleCarrier`, sorted by
+ ``sample_id``. Empty if the variant is absent or no eligible
+ carrier exists.
+ """
+ chrom = normalize_chrom(params.chrom)
+ pos = params.pos
+
+ if chrom not in self._all_known_chroms:
+ warnings.warn(
+ f"Chromosome {chrom!r} has no data in this database. "
+ f"Available: {sorted(self._all_known_chroms)!r}",
+ AfqueryWarning, stacklevel=3,
+ )
+ return []
+
+ sample_bm = self._build_sample_bitmap(params.filter)
+ parquet_path = self._parquet_path(chrom, pos)
+ if parquet_path is None:
+ return []
+
+ con = duckdb.connect(config={"temp_directory": "/tmp"})
+ rows = con.execute(
+ "SELECT pos, ref, alt, het_bitmap, hom_bitmap, fail_bitmap"
+ " FROM read_parquet(?) WHERE pos = ?",
+ [str(parquet_path), pos],
+ ).fetchall()
+ con.close()
+
+ if not rows:
+ return []
+
+ if params.ref is not None:
+ rows = [r for r in rows if r[1] == params.ref]
+ if params.alt is not None:
+ rows = [r for r in rows if r[2] == params.alt]
+ if not rows:
+ return []
+
+ # Warn if multiple alleles found without explicit ref/alt filter
+ if len(rows) > 1 and params.ref is None and params.alt is None:
+ alts = ", ".join(r[2] for r in rows)
+ warnings.warn(
+ f"Multiple alleles found at {chrom}:{pos} ({alts}). "
+ "Use --ref/--alt to restrict to a specific variant.",
+ AfqueryWarning, stacklevel=3,
+ )
+
+ carriers: list[SampleCarrier] = []
+ for row in rows:
+ row_pos, ref, alt, het_bytes, hom_bytes, fail_bytes = row
+ het_bm = deserialize(bytes(het_bytes))
+ hom_bm = deserialize(bytes(hom_bytes))
+ fail_bm = deserialize(bytes(fail_bytes))
+ het_elig = het_bm & sample_bm
+ hom_elig = hom_bm & sample_bm
+ fail_elig = fail_bm & sample_bm
+
+ seen: set[int] = set()
+ for sid in sorted(hom_elig):
+ seen.add(sid)
+ is_fail = sid in fail_elig
+ carriers.append(self._make_carrier(sid, "alt" if is_fail else "hom", not is_fail))
+ for sid in sorted(het_elig):
+ if sid not in seen:
+ seen.add(sid)
+ is_fail = sid in fail_elig
+ carriers.append(self._make_carrier(sid, "alt" if is_fail else "het", not is_fail))
+ for sid in sorted(fail_elig):
+ if sid not in seen:
+ seen.add(sid)
+ carriers.append(self._make_carrier(sid, "alt", False))
+
+ carriers.sort(key=lambda c: c.sample_id)
+ return carriers
+
+ def _make_carrier(
+ self,
+ sample_id: int,
+ genotype: str,
+ filter_pass: bool,
+ ) -> SampleCarrier:
+ s = self._samples_by_id[sample_id]
+ tech_name = self._tech_map[s.tech_id].tech_name
+ phenotypes = self._sample_phenotypes.get(sample_id, [])
+ return SampleCarrier(
+ sample_id=sample_id,
+ sample_name=s.sample_name,
+ sex=s.sex,
+ tech_name=tech_name,
+ phenotypes=sorted(phenotypes),
+ genotype=genotype,
+ filter_pass=filter_pass,
+ )
diff --git a/src/afquery/variant_info.py b/src/afquery/variant_info.py
new file mode 100644
index 0000000..b07fd0a
--- /dev/null
+++ b/src/afquery/variant_info.py
@@ -0,0 +1,49 @@
+"""variant_info — retrieve sample carriers for a specific variant.
+
+Thin convenience wrapper around :class:`~afquery.database.Database` for
+programmatic use without instantiating the class directly.
+"""
+from .database import Database
+from .models import SampleCarrier
+
+
+def variant_info(
+ db_path: str,
+ chrom: str,
+ pos: int,
+ ref: str | None = None,
+ alt: str | None = None,
+ phenotype: list[str] | None = None,
+ sex: str = "both",
+ tech: list[str] | None = None,
+) -> list[SampleCarrier]:
+ """Return all samples carrying the given variant, with their metadata.
+
+ Args:
+ db_path: Path to the AFQuery database directory.
+ chrom: Chromosome (e.g. ``"chr1"`` or ``"1"``).
+ pos: Position, 1-based.
+ ref: Reference allele filter. If omitted, all alleles at *pos* are returned.
+ alt: Alternate allele filter. If omitted, all alleles at *pos* are returned.
+ phenotype: Phenotype filter tokens. Use ``"^CODE"`` prefix to exclude.
+ sex: ``"both"`` (default), ``"male"``, or ``"female"``.
+ tech: Technology filter tokens. Use ``"^TECH"`` prefix to exclude.
+
+ Returns:
+ List of :class:`~afquery.models.SampleCarrier` sorted by
+ ``sample_id``. Empty list if the variant is absent or no eligible
+ carrier exists.
+ """
+ db = Database(db_path)
+ return db.variant_info(
+ chrom=chrom,
+ pos=pos,
+ ref=ref,
+ alt=alt,
+ phenotype=phenotype,
+ sex=sex,
+ tech=tech,
+ )
+
+
+__all__ = ["variant_info"]
diff --git a/tests/test_dump.py b/tests/test_dump.py
index 58a9f67..fc8ac78 100644
--- a/tests/test_dump.py
+++ b/tests/test_dump.py
@@ -142,7 +142,7 @@ def test_csv_header_base_columns(self, test_db):
assert "N_HET" in rows[0]
assert "N_HOM_ALT" in rows[0]
assert "N_HOM_REF" in rows[0]
- assert "N_FAIL" in rows[0] # v2 DB
+ assert "N_FAIL" in rows[0]
def test_ac_positive_filter(self, test_db):
"""By default all exported rows must have AC > 0."""
@@ -360,7 +360,7 @@ def test_by_sex_and_phenotype_cartesian_labels(self, test_db):
assert "AC_female_E11.9" in rows[0]
def test_n_fail_in_group_columns(self, test_db):
- """v2 DB: N_FAIL_