feat: add optional FINE_TUNE_MODELS step for DL model fine-tuning#54
feat: add optional FINE_TUNE_MODELS step for DL model fine-tuning#54
Conversation
When --enable_fine_tuning is enabled, the pipeline: 1. Runs the standard flow through ASSEMBLE_EMPIRICAL_LIBRARY 2. Fine-tunes RT/IM (optionally fragment) models on the empirical library 3. Re-generates the in-silico library using tuned models (--tokens, --rt-model, --im-model) 4. Uses the tuned library for INDIVIDUAL_ANALYSIS and FINAL_QUANTIFICATION New parameters: --enable_fine_tuning (default: false) — enable the fine-tuning step --tune_fr (default: false) — also fine-tune the fragmentation model --tune_lr (default: DIA-NN's 0.0005) — fine-tuning learning rate New module: modules/local/diann/fine_tune_models/ - Takes empirical library + FASTA + diann_config - Runs diann --tune-lib --tune-rt --tune-im - Outputs: dict.txt, rt.d0.pt, im.d0.pt (optionally fr.d0.pt) INSILICO_LIBRARY_GENERATION updated with optional tuned model inputs (pass [] when not fine-tuning, actual files when fine-tuning). Version guard: requires DIA-NN >= 2.0. Cannot be combined with --skip_preliminary_analysis. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Per Vadim Demichev's recommendation, fine-tuning should run as a separate phase BEFORE the main analysis, not mid-pipeline: Phase 0 (when --enable_fine_tuning): 1. INSILICO_LIBRARY_GENERATION (default models) 2. TUNE_PRELIMINARY_ANALYSIS (on --tune_n_files subset) 3. TUNE_ASSEMBLE_LIBRARY (empirical library from subset) 4. FINE_TUNE_MODELS (train RT/IM/fragment models) 5. TUNED_LIBRARY_GENERATION (re-generate with tuned models) Phase 1 (always, the standard pipeline): Uses tuned in-silico library (or default if no fine-tuning) PRELIMINARY_ANALYSIS → ASSEMBLE → INDIVIDUAL → FINAL New parameter: --tune_n_files (default: 3) — number of files for the tuning search subset. Uses Nextflow process aliases (TUNE_PRELIMINARY_ANALYSIS, TUNE_ASSEMBLE_LIBRARY, TUNED_LIBRARY_GENERATION) to avoid duplicate process invocation conflicts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Adds an optional model fine-tuning step to the pipeline, eliminating the need for two separate pipeline runs when working with non-standard modifications.
How it works
When
--enable_fine_tuningis enabled:FINE_TUNE_MODELStrains RT/IM models on the empirical library (--tune-lib --tune-rt --tune-im)TUNED_LIBRARY_GENERATIONregenerates the in-silico library with tuned models (--tokens,--rt-model,--im-model)New parameters
--enable_fine_tuningfalse--tune_frfalse--tune_lrnullWhen to use
Fine-tuning is beneficial for:
Standard modifications (Phospho, Oxidation, Acetylation, etc.) do not need fine-tuning.
Version requirement
Requires DIA-NN >= 2.0. Cannot be combined with
--skip_preliminary_analysis.Test plan
--enable_fine_tuning false(default) — verify no FINE_TUNE step appears--enable_fine_tuning trueon DIA-NN 2.x — verify tuning + library regeneration--enable_fine_tuning trueon DIA-NN 1.8.1 — verify version guard error--enable_fine_tuning true --skip_preliminary_analysis true— verify conflict error🤖 Generated with Claude Code