[ENH] Auto-convert categorical columns to string in attributes_arff_from_df#1490
Open
alphaleporus wants to merge 2 commits intoopenml:mainfrom
Open
[ENH] Auto-convert categorical columns to string in attributes_arff_from_df#1490alphaleporus wants to merge 2 commits intoopenml:mainfrom
alphaleporus wants to merge 2 commits intoopenml:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Metadata
attributes_arff_from_dfDetails
What does this PR implement/fix?
This PR modifies
attributes_arff_from_dfto improve robustness when handling pandas DataFrames. Instead of immediately raising aValueErrorwhen encountering a categorical column with non-string values (e.g., integer-encoded categories), it now attempts to automatically convert the categories to strings.Why is this change necessary?
Currently, the library crashes if a user provides a DataFrame with valid data but integer-based categories (e.g.,
[0, 1]). This forces users to manually cast categories to strings before calling the function. This change improves the User Experience by handling this conversion gracefully under the hood.How can I reproduce the issue?
Create a DataFrame with integer categories and pass it to
attributes_arff_from_df.