Skip to content

Comments

Updates to event dataclasses#45

Open
jlumpe wants to merge 11 commits intosnakemake:feat-event-dataclassesfrom
jlumpe:event-dataclass-updates
Open

Updates to event dataclasses#45
jlumpe wants to merge 11 commits intosnakemake:feat-event-dataclassesfrom
jlumpe:event-dataclass-updates

Conversation

@jlumpe
Copy link
Contributor

@jlumpe jlumpe commented Oct 18, 2025

This isn't quite finished and I was planning on submitting a pull request directly to main after the base one had been merged, but I'm adding it now for the purposes of discussion.

  • Added LogEventData base class.
  • Added explicit mapping between LogEvent enum values and LogEventData subclasses (LogEventData.event class attribute, and LOG_EVENT_CLASSES global var).
  • Added default implementation of from_record() class method that just inspects the dataclass fields:
    • Allows attributes to be missing from the record if the corresponding field has a default value, raises an exception otherwise.
    • Most subclasses just use the default implementation.
    • This changes the previous behavior in a few places, where a default value was applied in from_record() that was different than the dataclass field default.

Still to be done:

  • Add data_from_record() function which dispatches to proper subclass based on record.event.
    • Should the from_record() class method be made private in this case?
  • Add to_extra() method, which converts to the dict to be passed as the extra argument to a logging function.
    • Tests for round trip of data to/from LogRecord instance.

@coderabbitai
Copy link

coderabbitai bot commented Oct 18, 2025

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jlumpe
Copy link
Contributor Author

jlumpe commented Oct 18, 2025

Also updated ShellCmd.from_record() - falls back to record.cmd if record.shellcmd is not defined, this is used inconsistently in the base Snakemake repo as far as I can tell (only instances I found were in jobs.py and shell.py). Removed the renaming of name -> rule_name because those two instances don't set the attribute.

@cademirch
Copy link
Collaborator

Thanks! I've been working on integrating the event data classes upstream, and I'm wondering if it's worth keeping the log event enum? I'm thinking Handlers can just check the instance of the event. But I'm not sure if I'm missing something or changing for the sake of changing.

@jlumpe
Copy link
Contributor Author

jlumpe commented Oct 18, 2025

I think the enum is useful. In my specific case I'm using it to define the associated type when converting to/from JSON, I could see it being used in other similar ways. With the current setup it is also necessary to determine how to extract the data from a LogEvent instance.

@cademirch
Copy link
Collaborator

Looks good @jlumpe. I will try to review this ASAP. I know there are some fields I want to add to JobInfo to better handle retries/attempts (cademirch/snkmt#6)

@jlumpe
Copy link
Contributor Author

jlumpe commented Jan 12, 2026

Sorry, pushed the changes I have so far but they're still not quite ready. Here's a summary of what's included:

Summary of changes

  • Update conversion methods:
    • extra() method converts directly to extra dict passed to logging methods, facilitating easy use of dataclasses directly in logging code instead of relying on untyped dictionaries.
    • Extra dict includes event dataclass itself under 'event_data', making it accessible directly from the LogRecord once logging code is adapted.
    • LogEventData.from_record(), LogEventData.from_extra() dispatch to correct subclass based on event type.
  • Suffix all dataclass names with Event - can remove if desired, but some of them otherwise have very generic names like Error, Progress that aren't very clear when used in some base repo modules that have a ton of class imports.
  • Add some docstrings and more complete type annotations, removed deprecated typing aliases like List/Dict.
  • Add/alter dataclass attributes based on observed usage in main repo:
    • JobInfo (in Job.job_info()):
      • Change benchmark type from list[str] to str
      • Add Boolean attributes: local, is_checkpoint, is_handover (these default to None - should they always be bools?)
    • ShellCmd:
      • Made jobid optional because neither of the two instances of use in the base repo provide it.
    • DebugDag:
      • Change type of exception from Optional[str] to Optional[BaseException]
    • GroupInfo (in GroupJob.log_info()):
      • Change group_id from int to str
    • GroupError (in GroupJob.log_error()):
      • Change groupid from int to str
      • Change job_error_info from dict[str, Any] to list[dict[str, Any]]
  • Other updates to specific dataclasses:
    • RunInfo:
      • Allow use of stats in constructor as well as when creating from extra dictionary, this makes things easier in the code that logs the event.
      • Get values of per_rule_job_count and total_job_count from extra dictionary if present, to support round-trip.
      • Updated total_job_count default to sum of per_rule_job_count values.
    • RuleGraph: Add TypedDict description of rule graph

Potential issues

  • Some of the extra() method implementations do conversion of attribute values. This could possibly cause issues for existing loggers reading those attributes from the LogRecords:
    • ShellCmd: some records created with cmd attribute instead shellcmd, class normalizes to the latter.
    • JobInfo.resources: converted from NamedList to dict
  • JobInfo.priority attribute: typed as integer, use in main repo conditionally sets this to "highest". The default formatter doesn't use this property, possibly this condition should be removed.

Still to do

  • Tests are unfinished: finish adding example instances of all classes, add specific tests of classes that have specialized conversion logic.
  • I think the job/group ID attribute names should be made consistent, currently there is a mix of jobid and job_id. The extra dict conversion methods could be changed to make this backwards compatible.

Other thoughts

Currently the only use of the ERROR event is in snakemake.exceptions.print_exception(), which only provides the exception attribute. Generally, all of these attributes should be available from the LogRecord instance itself (if the call to the logging function includes exc_info, which it doesn't currently but I'm adding a PR for that), so I'm not sure if they're all that useful here. I think something better to include would be the linemaps dict used by the error handling functions to map line numbers back to the original source files. This would allow handlers to properly format exceptions (actually this should probably be included in the other error-related dataclasses as well).

@jlumpe
Copy link
Contributor Author

jlumpe commented Jan 12, 2026

It's not ready for a PR yet, but check out this commit for how the new dataclasses would be used in the main library.

@jlumpe jlumpe force-pushed the event-dataclass-updates branch from 3786656 to d71315c Compare January 16, 2026 06:12
@jlumpe
Copy link
Contributor Author

jlumpe commented Jan 16, 2026

This should be ready to merge.

I've gone ahead and renamed certain attributes for consistency, as was already done with JOB_STARTED. This can be removed if desired. It's done automatically by adding an "alias" item to the dataclass field's metadata, the old name will be retained on the log record for backwards compatibility.

@jlumpe
Copy link
Contributor Author

jlumpe commented Jan 16, 2026

Also made the conversion functions return None if the record has no event rather than raising an exception, realized this makes things easier when integrating into logger code in main package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants