feat: load data irrespective of column ordering#89
Open
muhammad-ammar wants to merge 4 commits intomasterfrom
Open
feat: load data irrespective of column ordering#89muhammad-ammar wants to merge 4 commits intomasterfrom
muhammad-ammar wants to merge 4 commits intomasterfrom
Conversation
muhammad-ammar
commented
Oct 17, 2023
| if load_in_order: | ||
| table_column_names = [name for name, __ in table_columns] | ||
| columns_to_load = get_columns_load_order(s3_url, table, table_column_names) | ||
| columns_load_order = '( {} )'.format(', '.join(columns_to_load)) |
Author
There was a problem hiding this comment.
Note to Reviewers: columns_load_order will be added into LOAD DATA command below but for now I am planning to merge the changes as it is. I will check the logs to see if the current changes are working or not?
0be1916 to
cc9b1cf
Compare
cc9b1cf to
ad735e7
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
JIRA: https://2u-internal.atlassian.net/browse/ENT-7602
Description: Changes in this PR are based on
col_name_or_user_varoption. Please see https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Integrating.LoadFromS3.html.load_s3_data_to_mysqltask loads data from csv files into mysql tables. Currently for a correct transfer of data, the order of columns in csv file must match with table columns order in toml file in prefect-flows. If a new field is added in csv in the middle or start of existing columns but that field is not present in table schema in toml file in prefect-flows, this will cause incorrect data transfer.Changes in this PR will
Release Plan:
For now the new implementation is disabled and we are just logging details about the new changes.
We will release whole work in multiple releases.
First release: We will check logs and see if the things are working as expected
Second release:
Make changes in warehouse-transforms to include columns names in all the CSV files
Make changes in prefect-flows to handle column names in CSV files
Third release: Enable the new feature to transfer data from csv files into mysql without considering order of columns