STAR-Flex: 10x Fixed RNA Profiling Pipeline

This document describes STAR-Flex, the Flex-specific module in STAR Suite.

Overview

STAR-Flex adds a pseudo-chromosome alignment pipeline for 10x Genomics Flex (Fixed RNA Profiling) samples using probes for transcript detection and RTL tags for multiplexing. A hybrid reference is generated with the regular genome and synthetic chromosomes for each probe. STAR's native alignment machinery quantifies probe alignment and uses genomic hits to confirm matches and detect off-probe noise. The rest of the workflow diverges from the standard STAR Solo workflow because RTL tags are on the same mate as the probe (not the cell barcode), so STAR's barcode/UMI correction and deduplication routines cannot be used. A fast inline path handles Flex-specific processing after alignment.

The Flex pipeline includes:

Sample tag detection during alignment identifies multiplexed sample barcodes
Inline hash capture stores CB/UMI/gene tuples directly in memory
Cell Barcode (CB) correction applies 1MM pseudocount-based correction (Cell Ranger compatible)
UMI correction uses clique-based 1MM deduplication
Cell filtering via OrdMag (simple EmptyDrops) or full EmptyDrops per sample
Tag occupancy filtering via Monte Carlo estimation of the expected distribution of samples per cell barcode
MEX output produces raw and per-sample filtered matrices

When --flex no (default), STAR behavior is identical to upstream.

Core Features Available in Flex Mode

The following features were originally developed in the STAR-Flex fork and are now part of STAR-core. They work with all STAR modes (bulk, single-cell, Flex). See the main suite README.md for full documentation and flags.

Cutadapt-style trimming (--trimCutadapt Yes): See trimming docs.
TranscriptVB quantification (--quantMode TranscriptVB): VB/EM transcript-level quantification with Salmon parity.
SLAM-seq (--slamQuantMode 1): See slam/docs/SLAM_seq.md.
Spill-to-disk BAM sorting (--outBAMsortMethod samtools): Bounded-RAM coordinate sorting. Works with Flex.
Y-chromosome BAM/FASTQ splitting (--emitNoYBAM yes, --emitYNoYFastq yes): Split reads by chrY alignment. Developed for MorPHiC KOLF cell lines. Tested and validated with Flex in both sorted and unsorted modes (see tests/TEST_REPORT_Y_SPLIT_FLEX.md). See Y-chromosome BAM split docs.

STAR-Flex Extras

Index-Time Features

AutoIndex + CellRanger-style references: Optional reference download + integrity verification, CellRanger-style FASTA/GTF formatting, and automatic index creation in --genomeDir (--autoIndex, --forceIndex, --forceAllIndex).
Transcriptome FASTA Generation: Generate transcriptome.fa during index creation for Salmon quantification parity and TranscriptVB error modeling. Eliminates the need to run gffread/rsem-prepare-reference separately.

Flex-Specific Features

Flex Pipeline: Inline hash pipeline for 10x Genomics Flex (Fixed RNA Profiling) samples.

For complete parameter reference, see flex parameter docs (STAR-Flex-only flags) and upstream README.md (all other parameters).

For detailed technical documentation of the flex data flow and algorithms, see docs/flex_methodology.md.

Quick Start

STAR \
  --genomeDir /path/to/flex_reference \
  --readFilesIn R2.fastq.gz R1.fastq.gz \
  --readFilesCommand zcat \
  --soloType CB_UMI_Simple \
  --soloCBwhitelist /path/to/737K-fixed-rna-profiling.txt \
  --flex yes \
  --soloFlexExpectedCellsPerTag 3000 \
  --soloSampleWhitelist sample_whitelist.tsv \
  --soloProbeList probe_list.txt \
  --soloSampleProbes probe-barcodes-fixed-rna-profiling-rna.txt \
  --soloSampleProbeOffset 68 \
  --soloFlexOutputPrefix output/per_sample \
  --soloMultiMappers Rescue \
  --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts \
  --soloUMIfiltering MultiGeneUMI_CR \
  --soloUMIdedup 1MM_CR \
  --soloFeatures Gene \
  --outFileNamePrefix output/

Example: Y-Chromosome BAM Split

To split BAM output into Y and noY files:

STAR \
  --genomeDir /path/to/flex_reference \
  --readFilesIn R2.fastq.gz R1.fastq.gz \
  --readFilesCommand zcat \
  --soloType CB_UMI_Simple \
  --soloCBwhitelist /path/to/737K-fixed-rna-profiling.txt \
  --flex yes \
  --soloFlexExpectedCellsPerTag 3000 \
  --soloSampleWhitelist sample_whitelist.tsv \
  --soloProbeList probe_list.txt \
  --soloSampleProbes probe-barcodes-fixed-rna-profiling-rna.txt \
  --soloSampleProbeOffset 68 \
  --soloFlexOutputPrefix output/per_sample \
  --soloMultiMappers Rescue \
  --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts \
  --soloUMIfiltering MultiGeneUMI_CR \
  --soloUMIdedup 1MM_CR \
  --soloFeatures Gene \
  --outSAMtype BAM SortedByCoordinate \
  --emitNoYBAM yes \
  --outFileNamePrefix output/

This produces:

output/Aligned.sortedByCoord.out_Y.bam - Reads with any Y-chromosome alignment
output/Aligned.sortedByCoord.out_noY.bam - Reads with no Y-chromosome alignments
Primary BAM (output/Aligned.sortedByCoord.out.bam) is suppressed by default

To emit a read-name list for FASTQ filtering (with or without Y/noY BAMs):

STAR \
  ... \
  --emitYReadNames yes \
  --outFileNamePrefix output/

This writes output/Aligned.out_Y.names.txt by default (override with --YReadNamesOutput).

To emit Y/noY FASTQ files directly during alignment:

STAR \
  ... \
  --emitYNoYFastq yes \
  --emitYNoYFastqCompression gz \
  --outFileNamePrefix output/

This creates FASTQs named after the input files, with _Y / _noY inserted before the last _R1 or _R2. For example, Sample_R1_001.fastq.gz becomes Sample_Y_R1_001.fastq.gz and Sample_noY_R1_001.fastq.gz (output written under the --outFileNamePrefix directory). If no _R1/_R2 token is found, STAR falls back to Y_reads.mateN.fastq(.gz) and noY_reads.mateN.fastq(.gz) under the output prefix. You can override names explicitly with --YFastqOutputPrefix and --noYFastqOutputPrefix. If a separate barcode read is present (e.g., scRNA-seq R3), only the true mates (R1/R2) are emitted.

Edge cases to be aware of:

If the reference has no Y contigs, the Y FASTQs are empty and a warning is logged.
FASTA inputs produce .fa(.gz) outputs with > headers and no +/quality lines.
Multiple input files per mate derive output names from the first file for each mate.
--emitYNoYFastqCompression none writes uncompressed .fastq/.fa outputs.
Unmapped reads are routed to noY.

You can use --emitYNoYFastq yes with --outSAMtype None to emit FASTQ files without BAM output.

To keep the primary BAM alongside the split files:

STAR \
  ... \
  --emitNoYBAM yes \
  --keepBAM yes \
  --outFileNamePrefix output/

Note: The Y/noY split is a general-purpose core feature developed for MorPHiC requirements for KOLF cell lines. It works with all modes: Flex, single-cell, and bulk RNA-seq. Validated with Flex in both sorted and unsorted modes (see tests/TEST_REPORT_Y_SPLIT_FLEX.md). In single-cell mode, R1/R2 are not traditional paired-end mates, so routing is based on each read's own alignments. In bulk paired-end mode, if either mate has a Y-chromosome alignment, both mates route to _Y.bam.

Required Inputs

Input	Description
Flex reference genome	Hybrid genome with probe pseudo-chromosomes (see Building References)
CB whitelist	10x barcode whitelist (e.g., `737K-fixed-rna-profiling.txt`)
Sample whitelist	TSV mapping sample tag sequences to labels
Probe list	Gene list from probe set
Sample probe barcodes	10x probe barcode sequences file

Parameters

Master Switch

Flag	Default	Description
`--flex`	`no`	Enable flex pipeline (`yes`/`no`)

Y-Chromosome BAM Split

Flag	Default	Description
`--emitNoYBAM`	`no`	Enable Y-chromosome BAM splitting (`yes`/`no`). When enabled, emits two additional BAM files: `<out>_noY.bam` (reads with no Y-chromosome alignments) and `<out>_Y.bam` (reads with any Y-chromosome alignment). Primary BAM is suppressed by default unless `--keepBAM yes` is specified.
`--emitYReadNames`	`no`	Emit list of read names with any Y-chromosome alignment (one per line). Can be used with or without Y/noY BAMs.
`--emitYNoYFastq`	`no`	Emit Y/noY FASTQ files directly during alignment (`yes`/`no`).
`--emitYNoYFastqCompression`	`gz`	Compression for Y/noY FASTQ output (`gz`/`none`).
`--YFastqOutputPrefix`	-	Optional: override output prefix for Y FASTQ files (default: derived from input name; falls back to `Y_reads.mateN`).
`--noYFastqOutputPrefix`	-	Optional: override output prefix for noY FASTQ files (default: derived from input name; falls back to `noY_reads.mateN`).
`--keepBAM`	`no`	Keep primary BAM output when `--emitNoYBAM yes` is enabled (`yes`/`no`)
`--noYOutput`	-	Optional: override default path for noY BAM output (default: `<out>_noY.bam`)
`--YOutput`	-	Optional: override default path for Y BAM output (default: `<out>_Y.bam`)
`--YReadNamesOutput`	-	Optional: override output path for Y read names list (default: `<out>Aligned.out_Y.names.txt`)

Sample Detection

Flag	Default	Description
`--soloSampleWhitelist`	-	Path to sample tag whitelist TSV
`--soloProbeList`	auto	Path to probe gene list (auto-detects from genome index if not specified)
`--soloSampleProbes`	-	Path to 10x sample probe barcodes
`--soloSampleProbeOffset`	0	Offset in read for sample probe sequence
`--soloSampleSearchNearby`	`yes`	Search nearby positions for sample tag
`--soloSampleStrictMatch`	`no`	Require strict match for sample tag

FlexFilter (Cell Calling)

Flag	Default	Description
`--soloFlexExpectedCellsPerTag`	0	Expected cells per sample tag
`--soloFlexExpectedCellsTotal`	0	Total expected cells (alternative to per-tag)
`--soloFlexAllowedTags`	-	Optional: restrict to specific sample tags
`--soloFlexOutputPrefix`	-	Output prefix for per-sample MEX

EmptyDrops Parameters (Advanced)

Flag	Default	Description
`--soloFlexEdNiters`	10000	Monte Carlo simulation iterations
`--soloFlexEdFdrThreshold`	0 (disabled)	FDR threshold for cell calling; if set (>0), FDR gate is used
`--soloFlexEdPvalueThreshold`	0.05	Raw p-value threshold when FDR gate is disabled (default behavior)
`--soloFlexEdLower`	100	Lower UMI bound for ambient profile

Output Structure

output/
├── Solo.out/Gene/raw/          # Raw MEX (all barcodes)
│   ├── barcodes.tsv
│   ├── features.tsv
│   └── matrix.mtx
├── per_sample/                  # Per-sample filtered MEX (labels from whitelist)
│   ├── SampleA/Gene/filtered/
│   ├── SampleB/Gene/filtered/
│   └── flexfilter_summary.tsv   # Cell calling statistics
├── Aligned.sortedByCoord.out_Y.bam      # Y-chromosome reads (if --emitNoYBAM yes)
└── Aligned.sortedByCoord.out_noY.bam    # Non-Y reads (if --emitNoYBAM yes)

When --emitNoYBAM yes is enabled:

_Y.bam: Contains all reads where any alignment (primary, secondary, or supplementary) touches a Y-chromosome contig
_noY.bam: Contains all reads with no Y-chromosome alignments
Primary BAM (Aligned.sortedByCoord.out.bam or Aligned.out.bam) is suppressed by default unless --keepBAM yes is specified
Works with both BAM Unsorted and BAM SortedByCoordinate output types

Building References

The flex pipeline requires a hybrid reference genome that includes pseudo-chromosomes for probe sequences. We benchmarked hash-based gene assignment techniques as an alternative, which were faster but resulted in 15–20% sensitivity loss and required blacklisting and downstream QC to achieve parity with Cell Ranger. The pseudo-chromosome approach avoids these trade-offs by leveraging STAR's native alignment machinery.

Scripts are provided in scripts/ to build these references:

Integrated Index Generation (Recommended)

STAR --runMode genomeGenerate \
  --genomeDir /path/to/flex_index \
  --genomeFastaFiles /path/to/genome.fa \
  --sjdbGTFfile /path/to/genes.gtf \
  --sjdbOverhang 100 \
  --flexGeneProbeSet /path/to/Chromium_Human_Transcriptome_Probe_Set_v2.0.0_GRCh38-2024-A.csv \
  --runThreadN 8

Required Inputs

Input	Description
`--genomeFastaFiles`	Base genome FASTA file
`--sjdbGTFfile`	Gene annotation GTF file (can be gzipped)
`--flexGeneProbeSet`	10x Flex probe CSV file (50bp gene probes)

Flex Index Parameters

Flag	Default	Description
`--flexGeneProbeSet`	-	Path to 50bp gene probe CSV file
`--flexGeneProbeLength`	50	Expected probe length (fails if mismatch)

Output Structure

flex_index/
├── probe_gene_list.txt           # Unique gene IDs with probes (auto-detected for --soloProbeList)
├── flex_probe_artifacts/         # Probe processing artifacts
│   ├── filtered_probe_set.csv    # Probes matching GTF genes
│   ├── probes_only.fa            # Probe-only FASTA
│   ├── probes_only.gtf           # Probe-only GTF entries
│   ├── genome.filtered.fa        # Hybrid FASTA (used for indexing)
│   ├── genes.filtered.gtf        # Hybrid GTF (used for indexing)
│   ├── probe_genes_exons.bed     # Probe coordinates
│   ├── probe_list.txt            # Unique gene IDs
│   └── metadata/
│       └── reference_manifest.json
├── Genome                        # Standard STAR index files
├── SA
├── SAindex
└── ... (other STAR index files)

Probe Filtering Rules

The integrated preprocessor applies these filters:

50bp A/C/G/T only - Fails if any probe has invalid length or characters
Skip DEPRECATED - Excludes probes marked as deprecated
Gene match - Keeps only probes whose gene_id exists in the target GTF
Deterministic ordering - Stable sort by gene_id then probe_id

Alternative: Shell Scripts

For custom workflows or debugging, standalone shell scripts are available:

# Filter probes and build hybrid reference
./scripts/filter_probes_to_gtf.sh \
  --probe-set /path/to/probes.csv \
  --gtf /path/to/genes.gtf.gz \
  --base-fasta /path/to/genome.fa \
  --output-dir ./probe_artifacts

The legacy build_filtered_reference.sh and make_filtered_star_index.sh scripts are also available. See scripts/README.md for details.

Using the Flex Index

After building, use the index with the probe gene list:

STAR \
  --genomeDir /path/to/flex_index \
  --flex yes \
  ... # other flex parameters
  # --soloProbeList is auto-detected from probe_gene_list.txt in the index directory

AutoIndex + CellRanger-Style References

STAR-Flex includes an index-time workflow to reproduce the “CellRanger-style” reference preparation (download → integrity checks → format FASTA/GTF → genomeGenerate).

STAR --runMode genomeGenerate \
  --genomeDir /path/to/index \
  --autoIndex Yes \
  --cellrangerStyleIndex Yes \
  --autoCksumUpdate Yes \
  --sjdbOverhang 100 \
  --runThreadN 16

Key outputs and paths:

Formatted inputs: ${genomeDir}/cellranger_ref/genome.fa, ${genomeDir}/cellranger_ref/genes.gtf
Download cache (default): ${genomeDir}/cellranger_ref_cache (override with --cellrangerStyleCacheDir)
Rebuild controls: --forceIndex Yes (re-index), --forceAllIndex Yes (re-download + re-index)

See autoindex docs for URL selection (--cellrangerRefRelease / --faUrl / --gtfUrl), checksum flags, and parity test scripts.

Transcriptome FASTA Generation

STAR-Flex can generate transcriptome.fa during index creation, eliminating the need for separate gffread/rsem-prepare-reference runs. This is required for:

Salmon quantification (identical output for parity)
TranscriptVB error modeling (fragment length distribution estimation)

Basic Usage

STAR --runMode genomeGenerate \
  --genomeDir /path/to/index \
  --genomeFastaFiles /path/to/genome.fa \
  --sjdbGTFfile /path/to/genes.gtf \
  --sjdbOverhang 100 \
  --genomeGenerateTranscriptome Yes \
  --runThreadN 8

This produces ${genomeDir}/transcriptome.fa alongside the standard index files.

Parameters

Flag	Default	Description
`--genomeGenerateTranscriptome`	`No`	Enable transcriptome FASTA generation (`Yes`/`No`)
`--genomeGenerateTranscriptomeFasta`	`-`	Custom output path (default: `${genomeDir}/transcriptome.fa`)
`--genomeGenerateTranscriptomeOverwrite`	`No`	Overwrite existing file (`Yes`/`No`)

CellRanger-Style Index

When --cellrangerStyleIndex Yes, STAR-Flex formats the annotation inputs into ${genomeDir}/cellranger_ref/:

${genomeDir}/cellranger_ref/genome.fa
${genomeDir}/cellranger_ref/genes.gtf

When combined with --genomeGenerateTranscriptome Yes, the transcriptome is written to both:

${genomeDir}/transcriptome.fa (standard path)
${genomeDir}/cellranger_ref/transcriptome.fa (CellRanger-compatible path)

STAR --runMode genomeGenerate \
  --genomeDir /path/to/index \
  --genomeFastaFiles /path/to/genome.fa \
  --sjdbGTFfile /path/to/genes.gtf \
  --sjdbOverhang 100 \
  --genomeGenerateTranscriptome Yes \
  --cellrangerStyleIndex Yes \
  --runThreadN 8

Output Format

The transcriptome FASTA follows Salmon conventions:

Headers: Transcript IDs without version suffixes (e.g., >ENST00000456328 not >ENST00000456328.2)
Line width: 70 characters
Ordering: Matches transcriptInfo.tab for Salmon parity
Negative strand: Exons concatenated in genomic order, then reverse-complemented

Verification

Test with the included chr21+chr22 subset:

./test/run_transcriptome_generation.sh --all

This runs:

Synthetic tests: Basic transcriptome generation with small fixtures
Default path tests: Validates ${genomeDir}/transcriptome.fa output
CellRanger tests: Real GENCODE chr21+chr22 with CellRanger filtering

Standalone FlexFilter Tool

A standalone tool run_flexfilter_mex is available for offline MEX processing. This allows re-running the OrdMag/EmptyDrops cell calling pipeline on existing composite MEX files without re-running STAR alignment.

Use cases:

Parameter tuning (adjust expected cells, EmptyDrops thresholds)
Reprocessing with different filtering settings
Integration with non-STAR pipelines (any tool producing composite CB+TAG MEX)
Batch reprocessing of archived STAR outputs

Building

The tool is optional and not built by the default make STAR target:

cd source
make flexfilter

This produces tools/flexfilter/run_flexfilter_mex.

Input Requirements

The tool expects a composite MEX directory containing:

matrix.mtx - Matrix Market sparse matrix (or InlineHashDedup_matrix.mtx)
barcodes.tsv - Composite barcodes in CB16+TAG8 format (24 characters)
features.tsv - Gene IDs (tab-separated)

The composite barcode format concatenates the 16bp cell barcode with the 8bp sample tag:

AAACCCAAGAAACACTACGTACGT  # CB16 (AAACCCAAGAAACACT) + TAG8 (ACGTACGT)

Basic Usage

./tools/flexfilter/run_flexfilter_mex \
  --mex-dir /path/to/Solo.out/Gene/raw \
  --total-expected 12000 \
  --output-prefix /path/to/filtered_output

Key Parameters

Parameter	Description
`--mex-dir`	Path to composite MEX directory (required)
`--total-expected`	Total expected cells across all samples (required)
`--output-prefix`	Output directory prefix (required)
`--sample-whitelist`	TSV file mapping sample names to tag sequences
`--ed-lower-bound`	Lower UMI bound for EmptyDrops (default: 500)
`--ed-fdr`	FDR threshold for EmptyDrops (default: 0.01)
`--disable-occupancy`	Skip occupancy post-filter (for testing)

Output Structure

output_prefix/
├── SampleA/Gene/filtered/
│   ├── matrix.mtx
│   ├── barcodes.tsv
│   ├── features.tsv
│   └── EmptyDrops/
│       └── emptydrops_results.tsv
├── SampleB/Gene/filtered/
│   └── ...
└── flexfilter_summary.tsv

Example Workflow: Reprocess with Different Expected Cells

# Original STAR run produced Solo.out/Gene/raw/
# Reprocess with higher cell expectation
./tools/flexfilter/run_flexfilter_mex \
  --mex-dir /storage/run1/Solo.out/Gene/raw \
  --total-expected 20000 \
  --output-prefix /storage/run1/refiltered_20k

# Or with explicit sample whitelist
./tools/flexfilter/run_flexfilter_mex \
  --mex-dir /storage/run1/Solo.out/Gene/raw \
  --sample-whitelist samples.tsv \
  --total-expected 15000 \
  --output-prefix /storage/run1/refiltered_explicit

Sample whitelist format (samples.tsv):

Sample_A	ACGTACGT
Sample_B	TGCATGCA
Sample_C	GGCCGGCC

Labels in the first column are used verbatim for per-sample directories, and the order in the whitelist is preserved.

Testing

# Requires tests/gold_standard/ fixtures
./tools/flexfilter/test_smoke.sh

# Validate output format
./tools/flexfilter/validate_output.py /path/to/output

See tools/flexfilter/README.md for complete CLI reference and advanced options.

remove_y_reads - FASTQ Splitter

A standalone C tool that splits FASTQ files based on a Y-only BAM produced by STAR's --emitNoYBAM feature. Given the _Y.bam output from STAR's Y-chromosome split, this tool partitions original FASTQ files into Y/noY sets while preserving read order.

Use cases:

Split FASTQ files after STAR alignment with Y/noY BAM output
Prepare separate inputs for sex-specific analyses
Filter out Y-chromosome reads from FASTQ files
Downstream analysis requiring separate Y/non-Y FASTQs

Key features:

Uses htslib for BAM reading and kseq.h for robust FASTQ parsing
Dual-hash collision protection (FNV-1a + djb2) for read name lookup
File-level threading with semaphore-bounded concurrency
Preserves original read order in outputs
Handles gzipped and uncompressed FASTQs

Building

The tool is optional and not built by the default make STAR target:

cd source
make remove_y_reads

This produces tools/remove_y_reads/remove_y_reads.

Alternatively, build directly:

cd tools/remove_y_reads
make

Basic Usage

./tools/remove_y_reads/remove_y_reads \
    -y Aligned.sortedByCoord.out_Y.bam \
    --threads 4 \
    --gzip-level 6 \
    -o output_dir \
    sample_R1.fastq.gz sample_R2.fastq.gz

Output: For each input FASTQ, produces <stem>_Y.fastq.gz and <stem>_noY.fastq.gz.

Key Parameters

Flag	Description
`-y, --ybam`	Y-only BAM file (required)
`-o, --outdir`	Output directory (default: alongside input)
`-t, --threads`	Number of parallel workers (default: 1)
`-z, --gzip-level`	Compression level 1-9 (default: 6)
`-h, --help`	Show help message

Features

Read order preservation: Outputs maintain the same read order as input FASTQs
Name normalization: Automatically handles FASTQ name formats (strips @, /1, /2, comments)
Collision detection: Uses hash + length to guard against rare hash collisions
Multi-threaded: Process multiple FASTQ files in parallel (file-level parallelism)
Gzip support: Handles both compressed and uncompressed FASTQ files
Dynamic parsing: Uses kseq.h for robust parsing of arbitrarily long reads

Example Workflow

# Step 1: Run STAR with Y/noY split
STAR \
  --genomeDir /path/to/reference \
  --readFilesIn R1.fastq.gz R2.fastq.gz \
  --readFilesCommand zcat \
  --outSAMtype BAM SortedByCoordinate \
  --emitNoYBAM yes \
  --outFileNamePrefix output/

# Step 2: Split original FASTQs based on Y BAM
./tools/remove_y_reads/remove_y_reads \
    -y output/Aligned.sortedByCoord.out_Y.bam \
    --threads 4 \
    -o output/fastq_split \
    R1.fastq.gz R2.fastq.gz

# Result: output/fastq_split/R1_Y.fastq.gz, R1_noY.fastq.gz, etc.

Testing

# Basic self-contained test
./tests/run_remove_y_reads_test.sh

# Comprehensive test (single-threaded, multithreaded, multiple files)
./tests/run_y_removal_comprehensive_test.sh

Test report generated at tests/TEST_REPORT_REMOVE_Y_FASTQ.md.

For detailed technical documentation, see docs/Y_CHROMOSOME_BAM_SPLIT.md.

Building STAR-Flex

Standard STAR build process:

cd source
make -j8

The flex objects are automatically included in the build.

Testing

See docs/TESTING_flex.md for detailed testing instructions.

Quick test:

./tests/run_flex_multisample_test.sh

Gold standard comparison files are bundled in tests/gold_standard/.

Code Organization

source/
├── libflex/                    # Core flex filtering library
│   ├── FlexFilter.cpp/h        # Main filter orchestration
│   ├── EmptyDropsMultinomial.cpp/h  # Full EmptyDrops
│   ├── OrdMagStage.cpp/h       # Simple EmptyDrops (OrdMag)
│   └── OccupancyGuard.cpp/h    # Occupancy-based filtering
├── solo/
│   └── CbCorrector.cpp/h       # CB correction with pseudocounts
├── SampleDetector.cpp/h        # Sample tag detection
├── InlineCBCorrection.cpp/h    # Inline hash CB correction
├── UMICorrector.cpp/h          # Clique-based UMI correction
├── MexWriter.cpp/h             # MEX matrix output
├── GeneResolver.cpp/h          # Probe-to-gene mapping
├── SoloFeature_flexfilter.cpp  # FlexFilter integration
├── SoloFeature_writeMexFromInlineHashDedup.cpp
└── UmiCodec.h                  # UMI encoding/decoding helpers

tools/flexfilter/               # Standalone FlexFilter CLI
├── run_flexfilter_mex.cpp      # Main CLI wrapper
├── Makefile                    # Build configuration
├── README.md                   # CLI documentation
├── test_smoke.sh               # Smoke test script
└── validate_output.py          # Output validation script

tools/remove_y_reads/           # Standalone FASTQ Y-splitter CLI
├── remove_y_reads.c            # Main implementation (C + htslib)
└── Makefile                    # Build configuration

Compatibility

Baseline: STAR 2.7.11b
When --flex no (default), behavior is identical to upstream STAR
Upstream README.md and CHANGES.md are not modified

FilesExpand file tree

README_flex.md

Latest commit

History

README_flex.md

File metadata and controls

STAR-Flex: 10x Fixed RNA Profiling Pipeline

Overview

Core Features Available in Flex Mode

STAR-Flex Extras

Index-Time Features

Flex-Specific Features

Quick Start

Example: Y-Chromosome BAM Split

Required Inputs

Parameters

Master Switch

Y-Chromosome BAM Split

Sample Detection

FlexFilter (Cell Calling)

EmptyDrops Parameters (Advanced)

Output Structure

Building References

Integrated Index Generation (Recommended)

Required Inputs

Flex Index Parameters

Output Structure

Probe Filtering Rules

Alternative: Shell Scripts

Using the Flex Index

AutoIndex + CellRanger-Style References

Transcriptome FASTA Generation

Basic Usage

Parameters

CellRanger-Style Index

Output Format

Verification

Standalone FlexFilter Tool

Building

Input Requirements

Basic Usage

Key Parameters

Output Structure

Example Workflow: Reprocess with Different Expected Cells

Testing

remove_y_reads - FASTQ Splitter

Building

Basic Usage

Key Parameters

Features

Example Workflow

Testing

Building STAR-Flex

Testing

Code Organization

Compatibility