Skip to content

Refactor into unified schema, refactor Tool, CLI + CI/CD, new vis#60

Open
pdiakumis wants to merge 80 commits into
mainfrom
dev
Open

Refactor into unified schema, refactor Tool, CLI + CI/CD, new vis#60
pdiakumis wants to merge 80 commits into
mainfrom
dev

Conversation

@pdiakumis
Copy link
Copy Markdown
Collaborator

This 'mini' refactor involved a few more components than expected.

Schema redesign

  • Replace separate raw.yaml + tidy.yaml per tool with a single unified schema.yaml.
    Each table now has a flat columns list where each column carries its raw + tidy name, type, description, and a versions array.
    Config derives versioned raw and tidy views from this unified source on demand.

Tool refactor

  • Refactor parse/tidy dispatch — subclasses can now override per-table parse_<tname>() and tidy_<tname>() methods; falls back to ftype-based dispatch otherwise. Saves a ton of duplicated code.
  • Add support for txt-nohead, csv-nohead-long, and csv file types. csv auto-dispatches to .parse_file() with delim = ",".
    I am questioning why on earth I used txt instead of tsv. Oh well.
  • list_files(): disambiguate when file names with different basenames reduce to the same prefix when matched by different patterns for the same table (e.g. *.flagstat and *.flag_counts.tsv both stripping to "sample1")
  • Tool$write, Workflow$write, Workflow$nemofy: add pfix_include argument (default FALSE) to control whether input_pfix is prepended to tidy tables.
  • Tool / Workflow: make input_id and output_id optional (default NULL); Workflow print enhanced.
  • nemo_metadata(): typecheck input/output ids.
  • Include raw_path in writer output.

CLI overhaul

  • tidy: rename -i ID to --input_id; add --output_id and --ulid (mutually exclusive) for attaching an output run identifier; --ulid auto-generates a ULID.
  • tidy: add --prefix_include flag to opt in to an input_pfix column in outputs (previously always prepended).
  • Fix -q/--quiet to set NEMO_LOG_ENABLE=FALSE instead of using suppressMessages(); nemo_log() now re-checks the env var at call time so the flag takes effect after package load.

New functions

  • nemo_uml(): refactor UML generation - cleaner entry-point, improved docs (Refactor uml generation #51).
  • nemo_gha_mermaid(): reads local deploy.yaml and fetches reusable workflow YAMLs from tidywf/actions to generate a Mermaid flowchart of the full CI/CD pipeline (GitHub Actions mermaid diagram #52).
  • nemo_schema_reactable() / nemo_schemavis_data(): interactive reactable schema explorer.
  • nemo_schema_check(): validate a schema YAML against expected structure.

Reference implementations (Tool1 / Workflow1)

  • Add table4 (txt-nohead), table5 (csv-nohead-long), and table6 (csv) as examples of new file type support.
  • Workflow1: registered in CLI workflow dispatcher (nemoverse_wf_dispatch) for end-to-end CLI testing.
  • inst/scripts/file_to_yaml.R: refactored to generate schema.yaml skeletons from sample files; output goes to inst/config/tools/_tmp/.

Infrastructure / CI

Vignettes / Docs

  • New: cicd.qmd (CI/CD pipeline vignette), schema_walkthrough.qmd, schema_table.qmd.
  • Reorganised and renamed existing vignettes for clarity.
  • UML vignette updated to use new nemo_uml() entry-point.

Tests

  • Substantially expanded test coverage: Tool, Tool1, Workflow, Workflow1, CLI (list, tidy), parse, schema_check, schema_vis, gha.

pdiakumis and others added 30 commits April 22, 2026 00:12
* linkml: add tool1 schema

* linkml: add schema utils

* linkml: add schema vignette + schema_to_mermaid.R

* linkml: reorder tool1 schema

* add schema_versions for mermaid diagrams

* pkgdown fixes

* gha: refactor deploy workflow for dev and main branches

* pkgdown: use auto development mode

* Bump version: 0.0.3 => 0.0.3.9000

* r-ulid: grab from umccr conda channel

* makefile: add bump rule

* makefile: add bump rule

* Bump version: 0.0.3.9000 => 0.0.3.9001

* rattler-build upload anaconda: use channel, not label

* Bump version: 0.0.3.9001 => 0.0.3.9002

* gha conda: drop umccr prefix to find dev label

* Bump version: 0.0.3.9002 => 0.0.3.9003

* gha: use ssh-key for bot committing to protected branch

* Bump version: 0.0.3.9003 => 0.0.3.9004

* gha conda pkgdown: drop umccr prefix to find dev label

* Bump version: 0.0.3.9004 => 0.0.3.9005

* gha conda pkgdown: specify dev label

* Bump version: 0.0.3.9005 => 0.0.3.9006

* [bot] Updating conda-lock files (v0.0.3.9006)

* precommit: add air formatter

* add CLAUDE.md

* claude: add new nemotool skill

* "Claude PR Assistant workflow"

* "Claude Code Review workflow"

* Change GitHub + Anaconda orgs (#39)

* change gh org

* change anaconda org

* change anaconda org

* GitHub Actions: use GitHub app for branch protection override (#40)

* gha: use gh app for branch protection override

* gha: use app email

* gha: use same wf for dev + main (#41)

* GitHub Actions: use reusable workflows for conda + pkgdown (#42)

* gha: fix permissions (#43)

* Add GHA-based version bumping workflow (#44)

* Bump version: 0.0.3.9006 => 0.0.3.9007

* [bot] Updating conda-lock files (v0.0.3.9007)

* precommit update

* remove LinkML schema system (to be redesigned in separate PR)

* gha: remove auto claude code review workflow

* gha: restrict claude workflow to repo owners/collaborators/members

---------

Co-authored-by: GitHub Actions <actions@github.com>
Co-authored-by: tidywf-ci-bot[bot] <3171681+tidywf-ci-bot[bot]@users.noreply.github.com>
pdiakumis and others added 22 commits May 12, 2026 01:27
- metadata:
  - write to _metadata/metadata.json
  - include input filenames
- include pkg name in init for easier metadata access for children
  - Rename `-i ID` to `--input_id`
  - Add `--output_id`, and `--ulid` (mutually exclusive group);
    - `--ulid` auto-generates a ULID as the output ID
  - Add `--prefix_include` flag to opt in to `input_pfix` column in outputs
    (previously always prepended)
  - Fix `-q`/`--quiet` in both tidy and list subcommands to set
    NEMO_LOG_ENABLE=FALSE instead of using suppressMessages(); requires
    nemo_log() to re-check the env var at call time
  - Update docs
* vignettes: reuse installation doc-templates

* vignettes: cleanup

* bumpversion cleanup

* readme fix

* doc fix
@pdiakumis pdiakumis self-assigned this May 16, 2026
Copilot AI review requested due to automatic review settings May 16, 2026 05:46
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR performs a broad nemo refactor around unified tool schemas, updated Tool/Workflow dispatch, CLI changes, schema visualization, CI/CD documentation, and refreshed package infrastructure.

Changes:

  • Replaces raw/tidy config split with unified schema.yaml and expands Tool1 examples/file types.
  • Adds/refactors CLI, metadata, schema check/visualization, UML, and GitHub Actions Mermaid helpers.
  • Updates documentation, pkgdown, conda/GitHub Actions deployment, generated Rd files, and tests.

Reviewed changes

Copilot reviewed 114 out of 120 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
DESCRIPTION Updates version, URLs, imports/suggests/remotes.
NAMESPACE Exports new CLI/schema/UML/GHA helpers.
R/Config.R Implements unified schema accessors/helpers.
R/Tool.R Refactors parsing/tidying/writing dispatch.
R/Tool1.R Updates Tool1 custom parse/tidy examples.
R/Workflow.R Updates workflow metadata/write behavior.
R/Workflow1.R Updates Workflow1 docs/examples.
R/cli.R Adjusts CLI setup.
R/cli_list.R Adds list CLI helpers/max/quiet handling.
R/cli_tidy.R Adds tidy CLI helpers and new IDs/options.
R/gha.R Adds GitHub Actions Mermaid generation.
R/log.R Makes logging respect env var dynamically.
R/metadata.R Refactors metadata IDs/input dirs.
R/parse.R Expands parser docs/examples for versions/no-header.
R/schema_check.R Adds schema validation helper.
R/schema_vis.R Adds schema reactable data/render helpers.
R/uml.R Adds UML generation helper.
R/utils.R Updates dispatch registry and type remap assertion.
README.qmd Updates README source links/install snippets.
README.md Updates generated README output.
Makefile Adds bump/check targets and install tweak.
LICENSE Updates license year metadata.
LICENSE.md Updates copyright year.
.Rbuildignore Ignores .claude.
.gitignore Updates Quarto/tmp ignore patterns.
.pre-commit-config.yaml Switches formatting hooks to air.
.bumpversion.toml Updates bump config/version targets.
.github/dependabot.yml Adds GitHub Actions Dependabot config.
.github/workflows/bump.yaml Points bump workflow to tidywf/actions.
.github/workflows/claude.yml Restricts Claude trigger author associations.
.github/workflows/deploy.yaml Replaces deploy flow with reusable workflows.
.claude/.gitignore Ignores Claude TODO.
.claude/CLAUDE.md Adds Claude project guidance.
.claude/skills/nemotool/SKILL.md Adds tool-scaffolding skill docs.
data-raw/fake_tool1.R Expands fake Tool1 data generation.
deploy/conda/env/yaml/bump.yaml Adds bump conda env.
deploy/conda/env/yaml/nemo.yaml Updates channel/version.
deploy/conda/env/yaml/pkgdown.yaml Updates pkgdown env dependencies.
deploy/conda/recipe/recipe.yaml Updates conda package metadata/dependencies.
inst/cli/nemo.R Uses imported nemo_cli.
inst/config/tools/tool1/raw.yaml Removes legacy raw schema.
inst/config/tools/tool1/tidy.yaml Removes legacy tidy schema.
inst/config/tools/tool1/schema.yaml Adds unified Tool1 schema.
inst/doc-templates/installation/_conda.qmd Adds parameterized conda install fragment.
inst/doc-templates/installation/_docker.qmd Adds parameterized Docker fragment.
inst/doc-templates/installation/_pixi.qmd Adds parameterized Pixi fragment.
inst/doc-templates/installation/_r.qmd Adds parameterized R install fragment.
inst/doc-templates/notes/cdk.md Adds CDK notes template.
inst/doc-templates/notes/pixi.md Updates Pixi channel docs.
inst/doc-templates/notes/rds.md Adds RDS notes template.
inst/doc-templates/notes/sql.md Adds SQL notes template.
inst/documentation/installation/_conda.qmd Removes old conda fragment.
inst/documentation/installation/_docker.qmd Removes old Docker fragment.
inst/documentation/installation/_installation.qmd Removes old aggregate install fragment.
inst/documentation/installation/_pixi.qmd Removes old Pixi fragment.
inst/documentation/installation/_r.qmd Removes old R fragment.
inst/extdata/tool1/latest/sampleA.tool1.table1.tsv Updates latest table1 fixture.
inst/extdata/tool1/latest/sampleA.tool1.table4.tsv Adds latest no-header fixture.
inst/extdata/tool1/latest/sampleA.tool1.table5.csv Adds latest long CSV fixture.
inst/extdata/tool1/latest/sampleA.tool1.table6.csv Adds latest CSV fixture.
inst/extdata/tool1/v1.0.0/sampleA.tool1.table2.tsv Adds v1 table2 fixture.
inst/extdata/tool1/v1.0.0/sampleA.tool1.table3.tsv Adds v1 key/value fixture.
inst/extdata/tool1/v1.0.0/sampleA.tool1.table4.tsv Adds v1 no-header fixture.
inst/extdata/tool1/v1.0.0/sampleA.tool1.table6.csv Adds v1 CSV fixture.
inst/extdata/tool1/v4.5.6/sampleA.tool1.table1.tsv Adds v4.5.6 table1 fixture.
inst/scripts/file_to_yaml.R Refactors schema skeleton generator.
inst/scripts/uml.R Removes old UML script.
man/Tool1.Rd Regenerates Tool1 docs.
man/Workflow.Rd Regenerates Workflow docs.
man/Workflow1.Rd Regenerates Workflow1 docs.
man/cli_list_add_args.Rd Adds CLI list docs.
man/cli_list_parse_args.Rd Adds CLI list parse docs.
man/cli_nemo_list.Rd Adds CLI list run docs.
man/cli_nemo_tidy.Rd Adds CLI tidy run docs.
man/cli_tidy_add_args.Rd Adds CLI tidy args docs.
man/cli_tidy_parse_args.Rd Adds CLI tidy parse docs.
man/config_prep_multi.Rd Updates config prep example.
man/config_prep_raw.Rd Updates config prep default type.
man/config_sort_versions.Rd Adds version-sort docs.
man/nemo-package.Rd Updates package links.
man/nemo_gha_mermaid.Rd Adds GHA Mermaid docs.
man/nemo_metadata.Rd Updates metadata docs.
man/nemo_schema_check.Rd Adds schema check docs.
man/nemo_schema_reactable.Rd Adds schema reactable docs.
man/nemo_schemavis_data.Rd Adds schema data docs.
man/nemo_uml.Rd Adds UML docs.
man/parse_file.Rd Updates parser examples.
man/parse_file_keyvalue.Rd Updates key/value parser examples.
man/parse_file_nohead.Rd Adds no-header parser examples.
man/reactable_schema.Rd Adds reactable schema docs.
man/schema_guess.Rd Updates schema guess example.
pkgdown/_pkgdown.yml Updates site URL/articles/reference.
pkgdown/extra.scss Adds Mermaid styling.
tests/testthat/test-roxytest-testexamples-Config.R Updates Config roxytest examples.
tests/testthat/test-roxytest-testexamples-Tool.R Adds Tool roxytest examples.
tests/testthat/test-roxytest-testexamples-Tool1.R Expands Tool1 tests.
tests/testthat/test-roxytest-testexamples-Workflow.R Expands Workflow tests.
tests/testthat/test-roxytest-testexamples-Workflow1.R Expands Workflow1 tests.
tests/testthat/test-roxytest-testexamples-cli_list.R Adds CLI list tests.
tests/testthat/test-roxytest-testexamples-cli_tidy.R Adds CLI tidy tests.
tests/testthat/test-roxytest-testexamples-gha.R Adds GHA Mermaid tests.
tests/testthat/test-roxytest-testexamples-parse.R Expands parse tests.
tests/testthat/test-roxytest-testexamples-schema_check.R Adds schema check tests.
tests/testthat/test-roxytest-testexamples-schema_vis.R Adds schema vis tests.
tests/testthat/test-roxytest-testexamples-utils.R Updates generated line reference.
vignettes/.gitignore Ignores rendered HTML.
vignettes/NEWS.qmd Adds dev changelog.
vignettes/cicd.qmd Adds CI/CD Mermaid vignette.
vignettes/contribute.qmd Updates contributor schema guidance.
vignettes/devnotes.qmd Updates repository URL.
vignettes/fig/uml/nemo.uml Removes generated UML source.
vignettes/installation.qmd Parameterizes installation vignette.
vignettes/notes.qmd Points notes to new templates.
vignettes/schema_table.qmd Adds schema table vignette.
vignettes/schema_walkthrough.qmd Adds schema walkthrough vignette.
vignettes/structure.Rmd Updates structure/schema docs.
vignettes/uml.qmd Renders UML via new helper.
Comments suppressed due to low confidence (3)

deploy/conda/recipe/recipe.yaml:62

  • The package code now uses stringr:: in file discovery, but the conda recipe does not include r-stringr in host/run requirements. The conda package can therefore install without the dependency and fail when Tool$list_files() is used.
    R/gha.R:148
  • Step names are inserted directly into Mermaid node labels without escaping. A workflow step containing a double quote or other Mermaid label delimiter will generate invalid diagram syntax, and the input comes from YAML files this function reads dynamically.
.gha_render_steps <- function(steps, prefix, indent) {
  ids <- paste0(prefix, seq_along(steps))
  nodes <- paste0(indent, ids, '["', steps, '"]')
  edges <- if (length(ids) > 1) {

R/Config.R:374

  • This helper still generates the old raw-schema shape with a schema key, but Config$read() now only accepts unified schema.yaml files with tables.<table>.columns. Because config_prep_raw() is exported, callers using it to build configs will produce YAML that the refactored Config cannot read.
config_prep_raw <- function(path, name, descr, pat, type = "txt", v = "latest", ...) {
  schema <- config_prep_raw_schema(path = path, ...)
  attr(pat, "quoted") <- TRUE
  list(
    list(
      description = glue("'{descr}'"),
      pattern = pat,
      ftype = glue("'{type}'"),
      schema = list(schema) |> purrr::set_names(v)
    )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread DESCRIPTION
Comment thread .github/workflows/deploy.yaml
Comment thread R/gha.R
Comment thread README.md
Comment thread R/Config.R Outdated
Comment thread R/Tool.R
Comment thread .claude/skills/nemotool/SKILL.md
Comment thread README.qmd
Comment thread R/cli_tidy.R Outdated
Comment thread R/cli_tidy.R Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants