Create _validate_map_items_post_run function#594
Open
dale-wahl wants to merge 1 commit into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Run a validation on
map_itemafter a newDataSetis created inBasicProcessor.Why? The existing
warn_unmappablewith the option to send a warning to admins would fire every time a dataset called iterate items (IF aprocessorobject is passed toiterate_itemswhich may be another issue worth addressing as it is no required and other than this only used to check for interruption). This meant every processor run on a DataSet with unmappable items would create an alert. I recently created a more robust alert in the Search worker but it felt like we were missing any other DataSet that might have a map_item issue (e.g. other datasources with changes to form or potentially processors that usemap_itemsuch as the LLM prompter).This is mostly unblocking (it will hold up the queue for the same processor type, but subsequent processors should run prior).
Error handling and alerting improvements:
_validate_map_item_post_runmethod in BasicProcessorimport_from_filemethod insearch.pyand simplified the per-item logging (that just updated DataSet logs now).Hopefully this streamlines error reporting and reduces redundant admin notifications.
This does have a downside: I left the current
map_itemcheck inSearch.import_from_file. We actually modify each item there so we are already iterating and checkingmap_itemthere as well allows us to update the status and clearly inform the user (which I cannot do in the validation function if I do it post finishing the dataset). So we iterate twice now through those DataSets. There may be a better solution for that so perhaps the review could give it a brief thought. This still seems cleaner and more comprehensive.