Skip to content

Fix handling of string columns in ModelMatrix #1085

@nalimilan

Description

@nalimilan

With latest master, string columns trigger an error when used in a formula to build a ModelMatrix. We should probably treat them as categorical variables, either by converting them to CategoricalArray, or (even better) by building contrasts for them on the fly (without a copy).

Though I wonder what to do with other kinds of non-numeric columns. Raise an error? Treat them as categorical by default?

@kleinschmidt Comments?

Reproducer:

julia> using DataFrames

julia> df = DataFrame(A = 1:4, B = ["M", "F", "F", "M"])
4×2 DataFrames.DataFrame
│ Row │ A │ B   │
├─────┼───┼─────┤
│ 11"M" │
│ 22"F" │
│ 33"F" │
│ 44"M" │

julia> ModelMatrix(ModelFrame(A~B, df))
ERROR: MethodError: Cannot `convert` an object of type String to an object of type Float64
This may have arisen from a call to the constructor Float64(...),
since type constructors fall back to convert methods.
 in copy!(::Base.LinearFast, ::Array{Float64,2}, ::Base.LinearFast, ::Array{String,2}) at ./abstractarray.jl:575
 in modelmat_cols(::Type{Array{Float64,2}}, ::NullableArrays.NullableArray{String,1}) at /home/milan/.julia/DataFrames/src/statsmodels/formula.jl:349
 in #modelmat_cols#122(::Bool, ::Function, ::Type{Array{Float64,2}}, ::Symbol, ::DataFrames.ModelFrame) at /home/milan/.julia/DataFrames/src/statsmodels/formula.jl:342
 in (::DataFrames.#kw##modelmat_cols)(::Array{Any,1}, ::DataFrames.#modelmat_cols, ::Type{Array{Float64,2}}, ::Symbol, ::DataFrames.ModelFrame) at ./<missing>:0
 in DataFrames.ModelMatrix{Array{Float64,2}}(::DataFrames.ModelFrame) at /home/milan/.julia/DataFrames/src/statsmodels/formula.jl:478
 in DataFrames.ModelMatrix{T<:AbstractArray{T<:AbstractFloat,2}}(::DataFrames.ModelFrame) at /home/milan/.julia/DataFrames/src/statsmodels/formula.jl:501

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions