Merged
Conversation
Refactor code to improve readability and reference to existing Node attributes.
jayckaiser
approved these changes
Jun 14, 2024
jayckaiser
added a commit
that referenced
this pull request
Jun 14, 2024
* Update CHANGELOG and VERSION. * Hotfix/optional parquet sources (#86) * Update optional file check in FileSource to build an empty dataframe if an empty folder is passed. * Remove explicit file check in compile. * Re-add filesize check in FileSource.execute(). * Move FtpSource connect from compile to execute. * Fix attribute naming bug. * Fix bug. * Allow filepaths to be passed in optional FileSources, and check the existance of the path before loading the dataframe. * Update CHANGELOG. * fix add_columns typo in readme * update changelog * Feature/union all columns (#94) * Add 'fill_missing' optional field to UnionOperation that uses default Pandas concat logic without erroring out. Still raise a debug message when applicable. * Rename new field to 'fill_missing_columns' for clarity. * Update dataframe.py Rename fill_missing_columns to fill_missing. * Update dataframe.py * Update CHANGELOG.md * Update CHANGELOG.md * Rename UnionOperation's fill_missing field to fill_missing_columns; update README. * Git clone timeout when running `earthmover deps` (#93) * try using subprocess with timeout * Update error message * tweak timeouts * switch to makedirs * don't error if dir already exists * remove package path on failure * adjust deletes * typo * switch to rmtree * remove gitpython dependency * remove unused import * remove unused var * add optional git timeout config * reverse accidentally removed kwargs * add notes on git_auth_timeout config to readme * code cleanup * Update README. --------- Co-authored-by: jayckaiser <jayckaiser@gmail.com> * Update changelog. * Fix escape chars in output when `linearize: False` (#98) * fixes a bug where escape characters were present in the output file when linearize is False * remove unneeded Dask import * update return value and comment based on notes from Jay --------- Co-authored-by: Tom Reitz <treitz@edanalytics.org> * fixing a bug introduced in the last version where nested JSON would be loaded as a stringified Python dictionaty, which is difficult to use in downstream Jinja (#97) Co-authored-by: Tom Reitz <treitz@edanalytics.org> * Only write `earthmover_compiled.yaml` on compile, not run (#91) * only write to disk on compile, not run * update readme with change to earthmover_compiled.yaml * Add `earthmover clean` command and some CLI error handling (#87) * add 'clean' command and clean up CLI messaging * comment justifying dictionary * update changlog * remove skip_mkdir, make compiled_yaml_file a class attribute * replace dict with list of constntas --------- Co-authored-by: Jay Kaiser <jayckaiser@gmail.com> * Update CHANGELOG with new features. * Fix `__row_data__` in `add_columns` and `modify_columns` operations (#99) * fix __row_data__ in Jinja expressions of add_columns and modify_columns operations * update how __row_data__ is added to prefent an error about modifying row --------- Co-authored-by: Tom Reitz <treitz@edanalytics.org> * Feature: Refactor Destination Execute (#95) * Update config parsing to use ErrorHandler.assert_get_key() for all fields; move and unify Jinja template processing to execute. * Update destination.py * Update CHANGELOG. * makes destination template optional (#88) * makes destination template optional; when not specified, each row is turned into a JSON object where column names become object properties * implement changes based on feedback from Jay * bugfix * Minor cleanup. --------- Co-authored-by: Tom Reitz <treitz@edanalytics.org> Co-authored-by: jayckaiser <jayckaiser@gmail.com> * Update CHANGELOG. * adding the `debug` operation (#100) * adding debug operation * Update dataframe.py Refactor code to improve readability and reference to existing Node attributes. --------- Co-authored-by: Tom Reitz <treitz@edanalytics.org> Co-authored-by: Jay Kaiser <jayckaiser@gmail.com> * Use Node.full_name in Node.check_expectations(), instead of redefining the string manually. * Update CHANGELOG. * Feature/flatten operation whitespace cleanup (#101) * adding a flatten_operation * README tweak * implement changes based on feedback from Jay * Clean up comments and whitespace in new FlattenOperation. * Add print statements to debug tuple problem. * Minor cleanup. * Minor cleanup. * Add single quotes to strip and trim variables in FlattenOperation. * Fix single quote representation in trim_whitespace. --------- Co-authored-by: Tom Reitz <treitz@edanalytics.org> * Update CHANGELOG. --------- Co-authored-by: johncmerfeld <John.Merfeld@gmail.com> Co-authored-by: Samantha LeBlanc <56237580+sleblanc23@users.noreply.github.com> Co-authored-by: Tom Reitz <tom@tomreitz.com> Co-authored-by: Tom Reitz <treitz@edanalytics.org>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Per discussion with Jay following PR #89, we decided to split out the debug operation to a separate PR for now - this is it.
We will return to discussion of the
earthmover showcommand and how to best to implement it in the future.