Skip to content

feat: add optional FINE_TUNE_MODELS step for DL model fine-tuning#54

Open
ypriverol wants to merge 3 commits intodevfrom
feat/fine-tune-step
Open

feat: add optional FINE_TUNE_MODELS step for DL model fine-tuning#54
ypriverol wants to merge 3 commits intodevfrom
feat/fine-tune-step

Conversation

@ypriverol
Copy link
Copy Markdown
Member

Summary

Adds an optional model fine-tuning step to the pipeline, eliminating the need for two separate pipeline runs when working with non-standard modifications.

How it works

When --enable_fine_tuning is enabled:

INSILICO_LIBRARY → PRELIMINARY_ANALYSIS → ASSEMBLE_EMPIRICAL_LIBRARY
    → FINE_TUNE_MODELS → TUNED_INSILICO_LIBRARY
    → INDIVIDUAL_ANALYSIS → FINAL_QUANTIFICATION
  1. Standard flow produces the empirical library
  2. FINE_TUNE_MODELS trains RT/IM models on the empirical library (--tune-lib --tune-rt --tune-im)
  3. TUNED_LIBRARY_GENERATION regenerates the in-silico library with tuned models (--tokens, --rt-model, --im-model)
  4. The rest of the pipeline uses the tuned library

New parameters

Parameter Default Description
--enable_fine_tuning false Enable fine-tuning step after empirical library assembly
--tune_fr false Also fine-tune the fragmentation model (quality-sensitive)
--tune_lr null Fine-tuning learning rate (DIA-NN default: 0.0005)

When to use

Fine-tuning is beneficial for:

  • Custom chemical labels (mTRAQ, dimethyl, SILAC labels)
  • Exotic PTMs not in DIA-NN's built-in model
  • Unmodified cysteines

Standard modifications (Phospho, Oxidation, Acetylation, etc.) do not need fine-tuning.

Version requirement

Requires DIA-NN >= 2.0. Cannot be combined with --skip_preliminary_analysis.

Test plan

  • Run with --enable_fine_tuning false (default) — verify no FINE_TUNE step appears
  • Run with --enable_fine_tuning true on DIA-NN 2.x — verify tuning + library regeneration
  • Run with --enable_fine_tuning true on DIA-NN 1.8.1 — verify version guard error
  • Run with --enable_fine_tuning true --skip_preliminary_analysis true — verify conflict error

🤖 Generated with Claude Code

When --enable_fine_tuning is enabled, the pipeline:
1. Runs the standard flow through ASSEMBLE_EMPIRICAL_LIBRARY
2. Fine-tunes RT/IM (optionally fragment) models on the empirical library
3. Re-generates the in-silico library using tuned models (--tokens, --rt-model, --im-model)
4. Uses the tuned library for INDIVIDUAL_ANALYSIS and FINAL_QUANTIFICATION

New parameters:
  --enable_fine_tuning (default: false) — enable the fine-tuning step
  --tune_fr (default: false) — also fine-tune the fragmentation model
  --tune_lr (default: DIA-NN's 0.0005) — fine-tuning learning rate

New module: modules/local/diann/fine_tune_models/
  - Takes empirical library + FASTA + diann_config
  - Runs diann --tune-lib --tune-rt --tune-im
  - Outputs: dict.txt, rt.d0.pt, im.d0.pt (optionally fr.d0.pt)

INSILICO_LIBRARY_GENERATION updated with optional tuned model inputs
(pass [] when not fine-tuning, actual files when fine-tuning).

Version guard: requires DIA-NN >= 2.0.
Cannot be combined with --skip_preliminary_analysis.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 13, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dd37e14e-9532-41f1-b106-dc81d4389c01

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/fine-tune-step

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 13, 2026

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 034587e

+| ✅ 106 tests passed       |+
#| ❔  19 tests were ignored |#
#| ❔   1 tests had warnings |#
!| ❗   4 tests had warnings |!
Details

❗ Test warnings:

  • files_exist - File not found: conf/igenomes.config
  • files_exist - File not found: conf/igenomes_ignored.config
  • files_exist - File not found: .github/workflows/awstest.yml
  • files_exist - File not found: .github/workflows/awsfulltest.yml

❔ Tests ignored:

❔ Tests fixed:

✅ Tests passed:

Run details

  • nf-core/tools version 3.5.2
  • Run at 2026-04-14 05:57:10

ypriverol and others added 2 commits April 13, 2026 20:53
Per Vadim Demichev's recommendation, fine-tuning should run as a
separate phase BEFORE the main analysis, not mid-pipeline:

Phase 0 (when --enable_fine_tuning):
  1. INSILICO_LIBRARY_GENERATION (default models)
  2. TUNE_PRELIMINARY_ANALYSIS (on --tune_n_files subset)
  3. TUNE_ASSEMBLE_LIBRARY (empirical library from subset)
  4. FINE_TUNE_MODELS (train RT/IM/fragment models)
  5. TUNED_LIBRARY_GENERATION (re-generate with tuned models)

Phase 1 (always, the standard pipeline):
  Uses tuned in-silico library (or default if no fine-tuning)
  PRELIMINARY_ANALYSIS → ASSEMBLE → INDIVIDUAL → FINAL

New parameter: --tune_n_files (default: 3) — number of files
for the tuning search subset.

Uses Nextflow process aliases (TUNE_PRELIMINARY_ANALYSIS,
TUNE_ASSEMBLE_LIBRARY, TUNED_LIBRARY_GENERATION) to avoid
duplicate process invocation conflicts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant