Skip to content

adding the debug operation#100

Merged
jayckaiser merged 2 commits intorc/0.3.2from
feature/debug_operation
Jun 14, 2024
Merged

adding the debug operation#100
jayckaiser merged 2 commits intorc/0.3.2from
feature/debug_operation

Conversation

@tomreitz
Copy link
Collaborator

Per discussion with Jay following PR #89, we decided to split out the debug operation to a separate PR for now - this is it.

We will return to discussion of the earthmover show command and how to best to implement it in the future.

Refactor code to improve readability and reference to existing Node attributes.
@jayckaiser jayckaiser merged commit 055c588 into rc/0.3.2 Jun 14, 2024
@jayckaiser jayckaiser deleted the feature/debug_operation branch June 14, 2024 20:58
jayckaiser added a commit that referenced this pull request Jun 14, 2024
* Update CHANGELOG and VERSION.

* Hotfix/optional parquet sources (#86)

* Update optional file check in FileSource to build an empty dataframe if an empty folder is passed.

* Remove explicit file check in compile.

* Re-add filesize check in FileSource.execute().

* Move FtpSource connect from compile to execute.

* Fix attribute naming bug.

* Fix bug.

* Allow filepaths to be passed in optional FileSources, and check the existance of the path before loading the dataframe.

* Update CHANGELOG.

* fix add_columns typo in readme

* update changelog

* Feature/union all columns (#94)

* Add 'fill_missing' optional field to UnionOperation that uses default Pandas concat logic without erroring out. Still raise a debug message when applicable.

* Rename new field to 'fill_missing_columns' for clarity.

* Update dataframe.py

Rename fill_missing_columns to fill_missing.

* Update dataframe.py

* Update CHANGELOG.md

* Update CHANGELOG.md

* Rename UnionOperation's fill_missing field to fill_missing_columns; update README.

* Git clone timeout when running `earthmover deps` (#93)

* try using subprocess with timeout

* Update error message

* tweak timeouts

* switch to makedirs

* don't error if dir already exists

* remove package path on failure

* adjust deletes

* typo

* switch to rmtree

* remove gitpython dependency

* remove unused import

* remove unused var

* add optional git timeout config

* reverse accidentally removed kwargs

* add notes on git_auth_timeout config to readme

* code cleanup

* Update README.

---------

Co-authored-by: jayckaiser <jayckaiser@gmail.com>

* Update changelog.

* Fix escape chars in output when `linearize: False` (#98)

* fixes a bug where escape characters were present in the output file when linearize is False

* remove unneeded Dask import

* update return value and comment based on notes from Jay

---------

Co-authored-by: Tom Reitz <treitz@edanalytics.org>

* fixing a bug introduced in the last version where nested JSON would be loaded as a stringified Python dictionaty, which is difficult to use in downstream Jinja (#97)

Co-authored-by: Tom Reitz <treitz@edanalytics.org>

* Only write `earthmover_compiled.yaml` on compile, not run (#91)

* only write to disk on compile, not run

* update readme with change to earthmover_compiled.yaml

* Add `earthmover clean` command and some CLI error handling (#87)

* add 'clean' command and clean up CLI messaging

* comment justifying dictionary

* update changlog

* remove skip_mkdir, make compiled_yaml_file a class attribute

* replace dict with list of constntas

---------

Co-authored-by: Jay Kaiser <jayckaiser@gmail.com>

* Update CHANGELOG with new features.

* Fix `__row_data__` in `add_columns` and `modify_columns` operations (#99)

* fix __row_data__ in Jinja expressions of add_columns and modify_columns operations

* update how __row_data__ is added to prefent an error about modifying row

---------

Co-authored-by: Tom Reitz <treitz@edanalytics.org>

* Feature: Refactor Destination Execute (#95)

* Update config parsing to use ErrorHandler.assert_get_key() for all fields; move and unify Jinja template processing to execute.

* Update destination.py

* Update CHANGELOG.

* makes destination template optional (#88)

* makes destination template optional; when not specified, each row is turned into a JSON object where column names become object properties

* implement changes based on feedback from Jay

* bugfix

* Minor cleanup.

---------

Co-authored-by: Tom Reitz <treitz@edanalytics.org>
Co-authored-by: jayckaiser <jayckaiser@gmail.com>

* Update CHANGELOG.

* adding the `debug` operation (#100)

* adding debug operation

* Update dataframe.py

Refactor code to improve readability and reference to existing Node attributes.

---------

Co-authored-by: Tom Reitz <treitz@edanalytics.org>
Co-authored-by: Jay Kaiser <jayckaiser@gmail.com>

* Use Node.full_name in Node.check_expectations(), instead of redefining the string manually.

* Update CHANGELOG.

* Feature/flatten operation whitespace cleanup (#101)

* adding a flatten_operation

* README tweak

* implement changes based on feedback from Jay

* Clean up comments and whitespace in new FlattenOperation.

* Add print statements to debug tuple problem.

* Minor cleanup.

* Minor cleanup.

* Add single quotes to strip and trim variables in FlattenOperation.

* Fix single quote representation in trim_whitespace.

---------

Co-authored-by: Tom Reitz <treitz@edanalytics.org>

* Update CHANGELOG.

---------

Co-authored-by: johncmerfeld <John.Merfeld@gmail.com>
Co-authored-by: Samantha LeBlanc <56237580+sleblanc23@users.noreply.github.com>
Co-authored-by: Tom Reitz <tom@tomreitz.com>
Co-authored-by: Tom Reitz <treitz@edanalytics.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants