This document describes STAR-Flex, the Flex-specific module in STAR Suite.
STAR-Flex adds a pseudo-chromosome alignment pipeline for 10x Genomics Flex (Fixed RNA Profiling) samples using probes for transcript detection and RTL tags for multiplexing. A hybrid reference is generated with the regular genome and synthetic chromosomes for each probe. STAR's native alignment machinery quantifies probe alignment and uses genomic hits to confirm matches and detect off-probe noise. The rest of the workflow diverges from the standard STAR Solo workflow because RTL tags are on the same mate as the probe (not the cell barcode), so STAR's barcode/UMI correction and deduplication routines cannot be used. A fast inline path handles Flex-specific processing after alignment.
The Flex pipeline includes:
- Sample tag detection during alignment identifies multiplexed sample barcodes
- Inline hash capture stores CB/UMI/gene tuples directly in memory
- Cell Barcode (CB) correction applies 1MM pseudocount-based correction (Cell Ranger compatible)
- UMI correction uses clique-based 1MM deduplication
- Cell filtering via OrdMag (simple EmptyDrops) or full EmptyDrops per sample
- Tag occupancy filtering via Monte Carlo estimation of the expected distribution of samples per cell barcode
- MEX output produces raw and per-sample filtered matrices
When --flex no (default), STAR behavior is identical to upstream.
The following features were originally developed in the STAR-Flex fork and are now
part of STAR-core. They work with all STAR modes (bulk, single-cell, Flex). See the
main suite README.md for full documentation and flags.
- Cutadapt-style trimming (
--trimCutadapt Yes): See trimming docs. - TranscriptVB quantification (
--quantMode TranscriptVB): VB/EM transcript-level quantification with Salmon parity. - SLAM-seq (
--slamQuantMode 1): See slam/docs/SLAM_seq.md. - Spill-to-disk BAM sorting (
--outBAMsortMethod samtools): Bounded-RAM coordinate sorting. Works with Flex. - Y-chromosome BAM/FASTQ splitting (
--emitNoYBAM yes,--emitYNoYFastq yes): Split reads by chrY alignment. Developed for MorPHiC KOLF cell lines. Tested and validated with Flex in both sorted and unsorted modes (seetests/TEST_REPORT_Y_SPLIT_FLEX.md). See Y-chromosome BAM split docs.
-
AutoIndex + CellRanger-style references: Optional reference download + integrity verification, CellRanger-style FASTA/GTF formatting, and automatic index creation in
--genomeDir(--autoIndex,--forceIndex,--forceAllIndex). -
Transcriptome FASTA Generation: Generate
transcriptome.faduring index creation for Salmon quantification parity and TranscriptVB error modeling. Eliminates the need to run gffread/rsem-prepare-reference separately.
- Flex Pipeline: Inline hash pipeline for 10x Genomics Flex (Fixed RNA Profiling) samples.
For complete parameter reference, see flex parameter docs (STAR-Flex-only flags) and upstream README.md (all other parameters).
For detailed technical documentation of the flex data flow and algorithms, see docs/flex_methodology.md.
STAR \
--genomeDir /path/to/flex_reference \
--readFilesIn R2.fastq.gz R1.fastq.gz \
--readFilesCommand zcat \
--soloType CB_UMI_Simple \
--soloCBwhitelist /path/to/737K-fixed-rna-profiling.txt \
--flex yes \
--soloFlexExpectedCellsPerTag 3000 \
--soloSampleWhitelist sample_whitelist.tsv \
--soloProbeList probe_list.txt \
--soloSampleProbes probe-barcodes-fixed-rna-profiling-rna.txt \
--soloSampleProbeOffset 68 \
--soloFlexOutputPrefix output/per_sample \
--soloMultiMappers Rescue \
--soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts \
--soloUMIfiltering MultiGeneUMI_CR \
--soloUMIdedup 1MM_CR \
--soloFeatures Gene \
--outFileNamePrefix output/To split BAM output into Y and noY files:
STAR \
--genomeDir /path/to/flex_reference \
--readFilesIn R2.fastq.gz R1.fastq.gz \
--readFilesCommand zcat \
--soloType CB_UMI_Simple \
--soloCBwhitelist /path/to/737K-fixed-rna-profiling.txt \
--flex yes \
--soloFlexExpectedCellsPerTag 3000 \
--soloSampleWhitelist sample_whitelist.tsv \
--soloProbeList probe_list.txt \
--soloSampleProbes probe-barcodes-fixed-rna-profiling-rna.txt \
--soloSampleProbeOffset 68 \
--soloFlexOutputPrefix output/per_sample \
--soloMultiMappers Rescue \
--soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts \
--soloUMIfiltering MultiGeneUMI_CR \
--soloUMIdedup 1MM_CR \
--soloFeatures Gene \
--outSAMtype BAM SortedByCoordinate \
--emitNoYBAM yes \
--outFileNamePrefix output/This produces:
output/Aligned.sortedByCoord.out_Y.bam- Reads with any Y-chromosome alignmentoutput/Aligned.sortedByCoord.out_noY.bam- Reads with no Y-chromosome alignments- Primary BAM (
output/Aligned.sortedByCoord.out.bam) is suppressed by default
To emit a read-name list for FASTQ filtering (with or without Y/noY BAMs):
STAR \
... \
--emitYReadNames yes \
--outFileNamePrefix output/This writes output/Aligned.out_Y.names.txt by default (override with --YReadNamesOutput).
To emit Y/noY FASTQ files directly during alignment:
STAR \
... \
--emitYNoYFastq yes \
--emitYNoYFastqCompression gz \
--outFileNamePrefix output/This creates FASTQs named after the input files, with _Y / _noY inserted before the last _R1 or _R2.
For example, Sample_R1_001.fastq.gz becomes Sample_Y_R1_001.fastq.gz and Sample_noY_R1_001.fastq.gz
(output written under the --outFileNamePrefix directory).
If no _R1/_R2 token is found, STAR falls back to Y_reads.mateN.fastq(.gz) and noY_reads.mateN.fastq(.gz) under the output prefix.
You can override names explicitly with --YFastqOutputPrefix and --noYFastqOutputPrefix.
If a separate barcode read is present (e.g., scRNA-seq R3), only the true mates (R1/R2) are emitted.
Edge cases to be aware of:
- If the reference has no Y contigs, the Y FASTQs are empty and a warning is logged.
- FASTA inputs produce
.fa(.gz)outputs with>headers and no+/quality lines. - Multiple input files per mate derive output names from the first file for each mate.
--emitYNoYFastqCompression nonewrites uncompressed.fastq/.faoutputs.- Unmapped reads are routed to noY.
You can use --emitYNoYFastq yes with --outSAMtype None to emit FASTQ files without BAM output.
To keep the primary BAM alongside the split files:
STAR \
... \
--emitNoYBAM yes \
--keepBAM yes \
--outFileNamePrefix output/Note: The Y/noY split is a general-purpose core feature developed for MorPHiC requirements for KOLF cell lines. It works with all modes: Flex, single-cell, and bulk RNA-seq. Validated with Flex in both sorted and unsorted modes (see tests/TEST_REPORT_Y_SPLIT_FLEX.md). In single-cell mode, R1/R2 are not traditional paired-end mates, so routing is based on each read's own alignments. In bulk paired-end mode, if either mate has a Y-chromosome alignment, both mates route to _Y.bam.
| Input | Description |
|---|---|
| Flex reference genome | Hybrid genome with probe pseudo-chromosomes (see Building References) |
| CB whitelist | 10x barcode whitelist (e.g., 737K-fixed-rna-profiling.txt) |
| Sample whitelist | TSV mapping sample tag sequences to labels |
| Probe list | Gene list from probe set |
| Sample probe barcodes | 10x probe barcode sequences file |
| Flag | Default | Description |
|---|---|---|
--flex |
no |
Enable flex pipeline (yes/no) |
| Flag | Default | Description |
|---|---|---|
--emitNoYBAM |
no |
Enable Y-chromosome BAM splitting (yes/no). When enabled, emits two additional BAM files: <out>_noY.bam (reads with no Y-chromosome alignments) and <out>_Y.bam (reads with any Y-chromosome alignment). Primary BAM is suppressed by default unless --keepBAM yes is specified. |
--emitYReadNames |
no |
Emit list of read names with any Y-chromosome alignment (one per line). Can be used with or without Y/noY BAMs. |
--emitYNoYFastq |
no |
Emit Y/noY FASTQ files directly during alignment (yes/no). |
--emitYNoYFastqCompression |
gz |
Compression for Y/noY FASTQ output (gz/none). |
--YFastqOutputPrefix |
- | Optional: override output prefix for Y FASTQ files (default: derived from input name; falls back to Y_reads.mateN). |
--noYFastqOutputPrefix |
- | Optional: override output prefix for noY FASTQ files (default: derived from input name; falls back to noY_reads.mateN). |
--keepBAM |
no |
Keep primary BAM output when --emitNoYBAM yes is enabled (yes/no) |
--noYOutput |
- | Optional: override default path for noY BAM output (default: <out>_noY.bam) |
--YOutput |
- | Optional: override default path for Y BAM output (default: <out>_Y.bam) |
--YReadNamesOutput |
- | Optional: override output path for Y read names list (default: <out>Aligned.out_Y.names.txt) |
| Flag | Default | Description |
|---|---|---|
--soloSampleWhitelist |
- | Path to sample tag whitelist TSV |
--soloProbeList |
auto | Path to probe gene list (auto-detects from genome index if not specified) |
--soloSampleProbes |
- | Path to 10x sample probe barcodes |
--soloSampleProbeOffset |
0 | Offset in read for sample probe sequence |
--soloSampleSearchNearby |
yes |
Search nearby positions for sample tag |
--soloSampleStrictMatch |
no |
Require strict match for sample tag |
| Flag | Default | Description |
|---|---|---|
--soloFlexExpectedCellsPerTag |
0 | Expected cells per sample tag |
--soloFlexExpectedCellsTotal |
0 | Total expected cells (alternative to per-tag) |
--soloFlexAllowedTags |
- | Optional: restrict to specific sample tags |
--soloFlexOutputPrefix |
- | Output prefix for per-sample MEX |
| Flag | Default | Description |
|---|---|---|
--soloFlexEdNiters |
10000 | Monte Carlo simulation iterations |
--soloFlexEdFdrThreshold |
0 (disabled) | FDR threshold for cell calling; if set (>0), FDR gate is used |
--soloFlexEdPvalueThreshold |
0.05 | Raw p-value threshold when FDR gate is disabled (default behavior) |
--soloFlexEdLower |
100 | Lower UMI bound for ambient profile |
output/
├── Solo.out/Gene/raw/ # Raw MEX (all barcodes)
│ ├── barcodes.tsv
│ ├── features.tsv
│ └── matrix.mtx
├── per_sample/ # Per-sample filtered MEX (labels from whitelist)
│ ├── SampleA/Gene/filtered/
│ ├── SampleB/Gene/filtered/
│ └── flexfilter_summary.tsv # Cell calling statistics
├── Aligned.sortedByCoord.out_Y.bam # Y-chromosome reads (if --emitNoYBAM yes)
└── Aligned.sortedByCoord.out_noY.bam # Non-Y reads (if --emitNoYBAM yes)
When --emitNoYBAM yes is enabled:
_Y.bam: Contains all reads where any alignment (primary, secondary, or supplementary) touches a Y-chromosome contig_noY.bam: Contains all reads with no Y-chromosome alignments- Primary BAM (
Aligned.sortedByCoord.out.bamorAligned.out.bam) is suppressed by default unless--keepBAM yesis specified - Works with both
BAM UnsortedandBAM SortedByCoordinateoutput types
The flex pipeline requires a hybrid reference genome that includes pseudo-chromosomes for probe sequences. We benchmarked hash-based gene assignment techniques as an alternative, which were faster but resulted in 15–20% sensitivity loss and required blacklisting and downstream QC to achieve parity with Cell Ranger. The pseudo-chromosome approach avoids these trade-offs by leveraging STAR's native alignment machinery.
Scripts are provided in scripts/ to build these references:
STAR --runMode genomeGenerate \
--genomeDir /path/to/flex_index \
--genomeFastaFiles /path/to/genome.fa \
--sjdbGTFfile /path/to/genes.gtf \
--sjdbOverhang 100 \
--flexGeneProbeSet /path/to/Chromium_Human_Transcriptome_Probe_Set_v2.0.0_GRCh38-2024-A.csv \
--runThreadN 8| Input | Description |
|---|---|
--genomeFastaFiles |
Base genome FASTA file |
--sjdbGTFfile |
Gene annotation GTF file (can be gzipped) |
--flexGeneProbeSet |
10x Flex probe CSV file (50bp gene probes) |
| Flag | Default | Description |
|---|---|---|
--flexGeneProbeSet |
- | Path to 50bp gene probe CSV file |
--flexGeneProbeLength |
50 | Expected probe length (fails if mismatch) |
flex_index/
├── probe_gene_list.txt # Unique gene IDs with probes (auto-detected for --soloProbeList)
├── flex_probe_artifacts/ # Probe processing artifacts
│ ├── filtered_probe_set.csv # Probes matching GTF genes
│ ├── probes_only.fa # Probe-only FASTA
│ ├── probes_only.gtf # Probe-only GTF entries
│ ├── genome.filtered.fa # Hybrid FASTA (used for indexing)
│ ├── genes.filtered.gtf # Hybrid GTF (used for indexing)
│ ├── probe_genes_exons.bed # Probe coordinates
│ ├── probe_list.txt # Unique gene IDs
│ └── metadata/
│ └── reference_manifest.json
├── Genome # Standard STAR index files
├── SA
├── SAindex
└── ... (other STAR index files)
The integrated preprocessor applies these filters:
- 50bp A/C/G/T only - Fails if any probe has invalid length or characters
- Skip DEPRECATED - Excludes probes marked as deprecated
- Gene match - Keeps only probes whose gene_id exists in the target GTF
- Deterministic ordering - Stable sort by gene_id then probe_id
For custom workflows or debugging, standalone shell scripts are available:
# Filter probes and build hybrid reference
./scripts/filter_probes_to_gtf.sh \
--probe-set /path/to/probes.csv \
--gtf /path/to/genes.gtf.gz \
--base-fasta /path/to/genome.fa \
--output-dir ./probe_artifactsThe legacy build_filtered_reference.sh and make_filtered_star_index.sh scripts are also available. See scripts/README.md for details.
After building, use the index with the probe gene list:
STAR \
--genomeDir /path/to/flex_index \
--flex yes \
... # other flex parameters
# --soloProbeList is auto-detected from probe_gene_list.txt in the index directorySTAR-Flex includes an index-time workflow to reproduce the “CellRanger-style” reference preparation (download → integrity checks → format FASTA/GTF → genomeGenerate).
STAR --runMode genomeGenerate \
--genomeDir /path/to/index \
--autoIndex Yes \
--cellrangerStyleIndex Yes \
--autoCksumUpdate Yes \
--sjdbOverhang 100 \
--runThreadN 16Key outputs and paths:
- Formatted inputs:
${genomeDir}/cellranger_ref/genome.fa,${genomeDir}/cellranger_ref/genes.gtf - Download cache (default):
${genomeDir}/cellranger_ref_cache(override with--cellrangerStyleCacheDir) - Rebuild controls:
--forceIndex Yes(re-index),--forceAllIndex Yes(re-download + re-index)
See autoindex docs for URL selection (--cellrangerRefRelease / --faUrl / --gtfUrl), checksum flags, and parity test scripts.
STAR-Flex can generate transcriptome.fa during index creation, eliminating the need for separate gffread/rsem-prepare-reference runs. This is required for:
- Salmon quantification (identical output for parity)
- TranscriptVB error modeling (fragment length distribution estimation)
STAR --runMode genomeGenerate \
--genomeDir /path/to/index \
--genomeFastaFiles /path/to/genome.fa \
--sjdbGTFfile /path/to/genes.gtf \
--sjdbOverhang 100 \
--genomeGenerateTranscriptome Yes \
--runThreadN 8This produces ${genomeDir}/transcriptome.fa alongside the standard index files.
| Flag | Default | Description |
|---|---|---|
--genomeGenerateTranscriptome |
No |
Enable transcriptome FASTA generation (Yes/No) |
--genomeGenerateTranscriptomeFasta |
- |
Custom output path (default: ${genomeDir}/transcriptome.fa) |
--genomeGenerateTranscriptomeOverwrite |
No |
Overwrite existing file (Yes/No) |
When --cellrangerStyleIndex Yes, STAR-Flex formats the annotation inputs into ${genomeDir}/cellranger_ref/:
${genomeDir}/cellranger_ref/genome.fa${genomeDir}/cellranger_ref/genes.gtf
When combined with --genomeGenerateTranscriptome Yes, the transcriptome is written to both:
${genomeDir}/transcriptome.fa(standard path)${genomeDir}/cellranger_ref/transcriptome.fa(CellRanger-compatible path)
STAR --runMode genomeGenerate \
--genomeDir /path/to/index \
--genomeFastaFiles /path/to/genome.fa \
--sjdbGTFfile /path/to/genes.gtf \
--sjdbOverhang 100 \
--genomeGenerateTranscriptome Yes \
--cellrangerStyleIndex Yes \
--runThreadN 8The transcriptome FASTA follows Salmon conventions:
- Headers: Transcript IDs without version suffixes (e.g.,
>ENST00000456328not>ENST00000456328.2) - Line width: 70 characters
- Ordering: Matches
transcriptInfo.tabfor Salmon parity - Negative strand: Exons concatenated in genomic order, then reverse-complemented
Test with the included chr21+chr22 subset:
./test/run_transcriptome_generation.sh --allThis runs:
- Synthetic tests: Basic transcriptome generation with small fixtures
- Default path tests: Validates
${genomeDir}/transcriptome.faoutput - CellRanger tests: Real GENCODE chr21+chr22 with CellRanger filtering
A standalone tool run_flexfilter_mex is available for offline MEX processing. This allows re-running the OrdMag/EmptyDrops cell calling pipeline on existing composite MEX files without re-running STAR alignment.
Use cases:
- Parameter tuning (adjust expected cells, EmptyDrops thresholds)
- Reprocessing with different filtering settings
- Integration with non-STAR pipelines (any tool producing composite CB+TAG MEX)
- Batch reprocessing of archived STAR outputs
The tool is optional and not built by the default make STAR target:
cd source
make flexfilterThis produces tools/flexfilter/run_flexfilter_mex.
The tool expects a composite MEX directory containing:
matrix.mtx- Matrix Market sparse matrix (orInlineHashDedup_matrix.mtx)barcodes.tsv- Composite barcodes in CB16+TAG8 format (24 characters)features.tsv- Gene IDs (tab-separated)
The composite barcode format concatenates the 16bp cell barcode with the 8bp sample tag:
AAACCCAAGAAACACTACGTACGT # CB16 (AAACCCAAGAAACACT) + TAG8 (ACGTACGT)
./tools/flexfilter/run_flexfilter_mex \
--mex-dir /path/to/Solo.out/Gene/raw \
--total-expected 12000 \
--output-prefix /path/to/filtered_output| Parameter | Description |
|---|---|
--mex-dir |
Path to composite MEX directory (required) |
--total-expected |
Total expected cells across all samples (required) |
--output-prefix |
Output directory prefix (required) |
--sample-whitelist |
TSV file mapping sample names to tag sequences |
--ed-lower-bound |
Lower UMI bound for EmptyDrops (default: 500) |
--ed-fdr |
FDR threshold for EmptyDrops (default: 0.01) |
--disable-occupancy |
Skip occupancy post-filter (for testing) |
output_prefix/
├── SampleA/Gene/filtered/
│ ├── matrix.mtx
│ ├── barcodes.tsv
│ ├── features.tsv
│ └── EmptyDrops/
│ └── emptydrops_results.tsv
├── SampleB/Gene/filtered/
│ └── ...
└── flexfilter_summary.tsv
# Original STAR run produced Solo.out/Gene/raw/
# Reprocess with higher cell expectation
./tools/flexfilter/run_flexfilter_mex \
--mex-dir /storage/run1/Solo.out/Gene/raw \
--total-expected 20000 \
--output-prefix /storage/run1/refiltered_20k
# Or with explicit sample whitelist
./tools/flexfilter/run_flexfilter_mex \
--mex-dir /storage/run1/Solo.out/Gene/raw \
--sample-whitelist samples.tsv \
--total-expected 15000 \
--output-prefix /storage/run1/refiltered_explicitSample whitelist format (samples.tsv):
Sample_A ACGTACGT
Sample_B TGCATGCA
Sample_C GGCCGGCC
Labels in the first column are used verbatim for per-sample directories, and the order in the whitelist is preserved.
# Requires tests/gold_standard/ fixtures
./tools/flexfilter/test_smoke.sh
# Validate output format
./tools/flexfilter/validate_output.py /path/to/outputSee tools/flexfilter/README.md for complete CLI reference and advanced options.
A standalone C tool that splits FASTQ files based on a Y-only BAM produced by STAR's --emitNoYBAM feature. Given the _Y.bam output from STAR's Y-chromosome split, this tool partitions original FASTQ files into Y/noY sets while preserving read order.
Use cases:
- Split FASTQ files after STAR alignment with Y/noY BAM output
- Prepare separate inputs for sex-specific analyses
- Filter out Y-chromosome reads from FASTQ files
- Downstream analysis requiring separate Y/non-Y FASTQs
Key features:
- Uses htslib for BAM reading and kseq.h for robust FASTQ parsing
- Dual-hash collision protection (FNV-1a + djb2) for read name lookup
- File-level threading with semaphore-bounded concurrency
- Preserves original read order in outputs
- Handles gzipped and uncompressed FASTQs
The tool is optional and not built by the default make STAR target:
cd source
make remove_y_readsThis produces tools/remove_y_reads/remove_y_reads.
Alternatively, build directly:
cd tools/remove_y_reads
make./tools/remove_y_reads/remove_y_reads \
-y Aligned.sortedByCoord.out_Y.bam \
--threads 4 \
--gzip-level 6 \
-o output_dir \
sample_R1.fastq.gz sample_R2.fastq.gzOutput: For each input FASTQ, produces <stem>_Y.fastq.gz and <stem>_noY.fastq.gz.
| Flag | Description |
|---|---|
-y, --ybam |
Y-only BAM file (required) |
-o, --outdir |
Output directory (default: alongside input) |
-t, --threads |
Number of parallel workers (default: 1) |
-z, --gzip-level |
Compression level 1-9 (default: 6) |
-h, --help |
Show help message |
- Read order preservation: Outputs maintain the same read order as input FASTQs
- Name normalization: Automatically handles FASTQ name formats (strips
@,/1,/2, comments) - Collision detection: Uses hash + length to guard against rare hash collisions
- Multi-threaded: Process multiple FASTQ files in parallel (file-level parallelism)
- Gzip support: Handles both compressed and uncompressed FASTQ files
- Dynamic parsing: Uses kseq.h for robust parsing of arbitrarily long reads
# Step 1: Run STAR with Y/noY split
STAR \
--genomeDir /path/to/reference \
--readFilesIn R1.fastq.gz R2.fastq.gz \
--readFilesCommand zcat \
--outSAMtype BAM SortedByCoordinate \
--emitNoYBAM yes \
--outFileNamePrefix output/
# Step 2: Split original FASTQs based on Y BAM
./tools/remove_y_reads/remove_y_reads \
-y output/Aligned.sortedByCoord.out_Y.bam \
--threads 4 \
-o output/fastq_split \
R1.fastq.gz R2.fastq.gz
# Result: output/fastq_split/R1_Y.fastq.gz, R1_noY.fastq.gz, etc.# Basic self-contained test
./tests/run_remove_y_reads_test.sh
# Comprehensive test (single-threaded, multithreaded, multiple files)
./tests/run_y_removal_comprehensive_test.shTest report generated at tests/TEST_REPORT_REMOVE_Y_FASTQ.md.
For detailed technical documentation, see docs/Y_CHROMOSOME_BAM_SPLIT.md.
Standard STAR build process:
cd source
make -j8The flex objects are automatically included in the build.
See docs/TESTING_flex.md for detailed testing instructions.
Quick test:
./tests/run_flex_multisample_test.shGold standard comparison files are bundled in tests/gold_standard/.
source/
├── libflex/ # Core flex filtering library
│ ├── FlexFilter.cpp/h # Main filter orchestration
│ ├── EmptyDropsMultinomial.cpp/h # Full EmptyDrops
│ ├── OrdMagStage.cpp/h # Simple EmptyDrops (OrdMag)
│ └── OccupancyGuard.cpp/h # Occupancy-based filtering
├── solo/
│ └── CbCorrector.cpp/h # CB correction with pseudocounts
├── SampleDetector.cpp/h # Sample tag detection
├── InlineCBCorrection.cpp/h # Inline hash CB correction
├── UMICorrector.cpp/h # Clique-based UMI correction
├── MexWriter.cpp/h # MEX matrix output
├── GeneResolver.cpp/h # Probe-to-gene mapping
├── SoloFeature_flexfilter.cpp # FlexFilter integration
├── SoloFeature_writeMexFromInlineHashDedup.cpp
└── UmiCodec.h # UMI encoding/decoding helpers
tools/flexfilter/ # Standalone FlexFilter CLI
├── run_flexfilter_mex.cpp # Main CLI wrapper
├── Makefile # Build configuration
├── README.md # CLI documentation
├── test_smoke.sh # Smoke test script
└── validate_output.py # Output validation script
tools/remove_y_reads/ # Standalone FASTQ Y-splitter CLI
├── remove_y_reads.c # Main implementation (C + htslib)
└── Makefile # Build configuration
- Baseline: STAR 2.7.11b
- When
--flex no(default), behavior is identical to upstream STAR - Upstream
README.mdandCHANGES.mdare not modified