Say I am interested in getting things like the model matrix and coef names from a formula. As far as I'm aware, the current shortest method to do that is:
using DataFrames
using StatsModels
df = DataFrame(x = rand(100), y = rand(100))
f = @formula(y ~ 1 + x)
f = apply_schema(f, schema(f, df))
modelmatrix(f, df)
coefnames(f)
I was wondering whether it was possible to add a method to extract back the formula call from schema(f, df), which would then allow us to make apply_schema() dispatch directly on a schema and get all the necessary info, and be able to write something like
f = apply_schema(schema(@formula(y ~ 1 + x), df))
# Or
f = schema(@formula(y ~ 1 + x), df) |>
apply_schema
Alternatively, this would allow other methods to implicitly call apply_schema() on a schema. I am not familiar with the code base so I'm not sure this makes sense, but I think in essence what I am asking is ways to reduce the repeated input of the same objects.
Say I am interested in getting things like the model matrix and coef names from a formula. As far as I'm aware, the current shortest method to do that is:
I was wondering whether it was possible to add a method to extract back the formula call from
schema(f, df), which would then allow us to makeapply_schema()dispatch directly on a schema and get all the necessary info, and be able to write something likeAlternatively, this would allow other methods to implicitly call
apply_schema()on a schema. I am not familiar with the code base so I'm not sure this makes sense, but I think in essence what I am asking is ways to reduce the repeated input of the same objects.