fix(bigframes): avoid exceptions for unnamed JSON columns in SQL Cell outputs#17367
fix(bigframes): avoid exceptions for unnamed JSON columns in SQL Cell outputs#17367tswast wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces support for iloc column-based assignment (__setitem__) in DataFrames and refactors iloc indexing to dynamically enforce ordering only when row subsets or specific orderings are requested. It also updates JSON column serialization in _get_display_df to use the new iloc assignment, bypassing limitations with duplicate or non-string column names. Feedback on these changes includes a recommendation to use _assign_multi_items instead of assign in __setitem__ to prevent potential TypeErrors with non-string column labels, removing a useless expression (df._block.apply_analytic) in _get_display_df, and reverting a hardcoded development project ID in the generative AI notebook to a placeholder.
| col_label = self._dataframe.columns[col_offset] | ||
| df = self._dataframe.assign(**{col_label: value}) | ||
| self._dataframe._set_block(df._get_block()) |
There was a problem hiding this comment.
Using df.assign(**{col_label: value}) will raise a TypeError: assign() keywords must be strings if col_label is not a string (e.g., an integer or None). Since _assign_multi_items is already used in the other branches of this method and handles non-string/duplicate column names safely, we should use it here as well.
| col_label = self._dataframe.columns[col_offset] | |
| df = self._dataframe.assign(**{col_label: value}) | |
| self._dataframe._set_block(df._get_block()) | |
| col_label = self._dataframe.columns[col_offset] | |
| df = self._dataframe._assign_multi_items([col_label], value) | |
| self._dataframe._set_block(df._get_block()) |
| df._block.apply_analytic | ||
| df.iloc[:, json_col_indexes] = cast( |
| "import bigframes.pandas as bpd\n", | ||
| "\n", | ||
| "PROJECT_ID = \"\" # @param {type:\"string\"}\n", | ||
| "PROJECT_ID = \"bigframes-dev\" # @param {type:\"string\"}\n", |
There was a problem hiding this comment.
It is generally better to keep PROJECT_ID as an empty string "" or a placeholder like "your-project-id" in public notebooks so that users are prompted to enter their own Google Cloud project ID, and to avoid hardcoding internal/development project names.
| "PROJECT_ID = \"bigframes-dev\" # @param {type:\"string\"}\n", | |
| "PROJECT_ID = \"\" # @param {type:\"string\"}\n", |
🦕