Get tests pass again, fix handling of string columns#13
Conversation
Codecov Report
@@ Coverage Diff @@
## master #13 +/- ##
==========================================
- Coverage 93.82% 93.11% -0.72%
==========================================
Files 5 5
Lines 324 334 +10
==========================================
+ Hits 304 311 +7
- Misses 20 23 +3
Continue to review full report at Codecov.
|
src/modelframe.jl
Outdated
|
|
||
| const DEFAULT_CONTRASTS = DummyCoding | ||
|
|
||
| _levels(x::Union{CategoricalArray, NullableCategoricalArray}) = levels(x) |
There was a problem hiding this comment.
Note to self: need to change this to unique, cf. JuliaData/DataFrames.jl#1147 (comment).
There was a problem hiding this comment.
Actually that was quite easy. What is going to be harder is to handle DataArrays the same way, since unique(::PooledDataArray) does not return values in the order specified by levels. Since unique does not guarantee any particular order for levels, I suggest we change DataArrays to follow the ordering of levels (just like CategoricalArray). That way, we don't need any common API other than unique.
There was a problem hiding this comment.
README.md
Outdated
| **Documentation:** [](https://juliastats.github.io/StatsModels.jl/latest) | ||
|
|
||
| Originally part of [DataFrames.jl](http://github.com/JuliaStats/DataFrames.jl). | ||
| Originally part of [DataTables.jl](http://github.com/JuliaStats/DataTables.jl). |
There was a problem hiding this comment.
Technically both are true, but indeed that's not the most informative choice...
There was a problem hiding this comment.
It might be worth expanding on this a little in the readme; something along the lines of "this was once part of dataframes, now it depends on datatables, but eventually will support generic tabular data stores including both dataframes and datatables"
There was a problem hiding this comment.
(Just to avoid confusion when people come here looking for help/information)
There was a problem hiding this comment.
Yes. I would wait until things have stabilized a bit more, and then explain how the package is supposed to be used. Anyway we need generic interfaces very soon to support DataFrames.
5a3550d to
0c2536d
Compare
|
Julia 0.6 failure is fixed by JuliaLang/julia#20665. |
All non-real columns are now considered as categorical, since conversion
to a float column will likely fail. Types which should be converted to float
will have to define is_categorical(::T) = false. Before this, Vector{String}
and NullableVector{String} columns triggered an error.
Replace ContrastsMatrix(c::ContrastsMatrix, x::CategoricalArray) with
ContrastsMatrix(c::ContrastsMatrix, levels::AbstractVector), offering
equivalent functionality: it's more consistent with the ContrastsMatrix()
constructor, and more general since only the levels are actually needed.
The code is currently written to work with Nullable.
0c2536d to
739d478
Compare
|
I've squashed some commits and removed the hacks now that DataTables no longer export conflicting function definitions. So I'd say it's good to go. |
Move to DataTables temporarily since the code is written to work with Nullable for now. This allows porting JuliaData/DataFrames.jl@781bbbd to StatsModels (first commit), which is a first step towards supporting any kind of categorical input (i.e. both
DataArrayandCategoricalArray).If this works, we could remove the conflicting API from DataTables so that we no longer need the
usinghacks. Then we could start relying only on the AbstractTable interface.