Refactor Sem multi-group support#317
Refactor Sem multi-group support#317alyst wants to merge 18 commits intoStructuralEquationModels:develfrom
Conversation
3c39941 to
32cea82
Compare
- for SemImplied require spec::SemSpec as positional - for SemLossFunction require implied argument
eb039a2 to
88a1ff0
Compare
88a1ff0 to
0406f29
Compare
| In this case, [`FiniteDiffWrapper`](@ref) method to generate a wrapper around the specific `SemLoss` term that only uses its objective | ||
| to calculate the gradient using the finite difference approximation. |
There was a problem hiding this comment.
| In this case, [`FiniteDiffWrapper`](@ref) method to generate a wrapper around the specific `SemLoss` term that only uses its objective | |
| to calculate the gradient using the finite difference approximation. | |
| In this case, [`FiniteDiffWrapper`](@ref) can be used to generate a wrapper around the specific `SemLoss` term. This wrapper only uses the `LossTerm`s objective, and calculates the gradient using finite difference approximation. |
|
@Maximilian-Stefan-Ernst It might be a nice idea to use copilot for catching typos, incorrect sentences, but also potential bugs. |
src/frontend/fit/fitmeasures/chi2.jl
Outdated
| function χ²(fit::SemFit, model::AbstractSemSingle) | ||
| check_single_lossfun(model; throw_error = true) | ||
| return χ²(model.loss.functions[1], fit::SemFit, model::AbstractSemSingle) | ||
| return χ²(typeof(term1), fit, model) |
There was a problem hiding this comment.
Is there a reason to pass typeof(term1) instead of term1? I personally find the syntax a bit cleaner without the extra typeof call.
There was a problem hiding this comment.
This call calculates chi2 for the entire model (which could be multi-group), I wanted to make it clear that no information from the term1 is used.
src/frontend/fit/fitmeasures/chi2.jl
Outdated
| ############################################################################################ | ||
| function χ²(fit::SemFit, model::AbstractSem) | ||
| terms = sem_terms(model) | ||
| isempty(terms) && return 0.0 |
There was a problem hiding this comment.
Maybe we should throw an error for a Sem with no terms?
There was a problem hiding this comment.
I think the Sem constructor should throw an exception if there are no SEM terms.
Returning 0 here seems legit if there is no data, but we can also change it into @assert.
Throwing an exception in situations like this will just add a lot of redundant code.
| # FIXME remove this implicit logic | ||
| # SemWLS only accepts vech-ed implied covariance | ||
| if isa(loss, Type) && (loss <: SemWLS) && !haskey(kwargs, :vech) | ||
| implied_kwargs = copy(kwargs) | ||
| implied_kwargs[:vech] = true | ||
| else | ||
| implied_kwargs = kwargs | ||
| end |
There was a problem hiding this comment.
I think before this was handled inside RAMSymbolic - is there a reason to move it here?
There was a problem hiding this comment.
My original idea was to move away from passing the kwargs... around.
It causes various issues, adds maintenance overhead, and interferes with the modularity/extensibility goal.
Instead, the constructors should support their specific keywords only.
This PR still retains kwargs... passthrough for compatibility, but moves all "parameter compatibility" logic to the Sem construction.
With the old logic the user only specifies the implied and loss types, and SEM.jl figures how to tweak their parameters to make them compatible.
RAMSymbolic/SemWLS is one such example, where RAMSymbolic decides how to construct itself knowing how it will be used.
In this PR, the implied constructors are agnostic of the loss functions they will be used in.
The loss constructor will throw an exception if the implied object is not compatible.
In principle the logic here could be removed -- the user should be able to easily fix the SEM construction based on the error message thrown by SemWLS.
deduplicate the correction scale methods and move to Sem.jl
remove update_observed!()
to suppress info about inv(obs_cov)
0406f29 to
0096211
Compare
0096211 to
05abcd9
Compare
This is a largest remaining part of #193, which changes some interfaces.
Refactoring of the SEM types
AbstractLossis the base type for all functionsSemLoss{O,I} <: AbstractLossis the base type for all SEM losses, it now requires to have observed::O and implied::I fieldSemLossctor should always be given observed and implied (positional),meanstructurekeyword is gone -- loss should always respect implied specification.LossTermis a thin wrapper aroundAbstractLossthat adds optional id of the loss term and optional weightSemis a container ofLossTermobjects (accessible vialoss_terms(sem), orloss_term(sem, id)), so it can handle multiple SEM terms (accessible viasem_terms(sem)-- subset ofloss_terms(sem), orsem_term(sem, id)).It replaces both the old
SemandSemEnsemble.AbstractSingleSem,AbstractSemCollectionandSemEnsembleare gone.Method changes
Multi-term SEMs could be created like
Or with weights specification
The new Sem() and loss-term constructors rely less on keyword arguments and more on positional arguments, but some keywords support is present.
update_observed!()was removed. It was only used byreplace_observed(),but otherwise in-place model modification with unclear semantics is error-prone.
replace_observed(sem, data)was simplified by removing support of additional keywords or requirement to pass SEM specification.It only creates a copy of the given
Semwith the observed data replaced,but implied and loss definitions intact.
Changing observed vars is not supported -- that is something use-case specific
that user should implement in their code.
check_single_lossfun()was renamed intocheck_same_semterm_type()asit better describes what it does. If check is successful, it returns the specific
subtype of
SemLoss.bootstrap()andse_bootstrap()usebootstrap!(acc::BootstrapAccumulator, ...)function to reduce code duplication
bootstrap()returnsBootstrapResult{T}for better type inferencefit_measures()now also accepts vector of functions, and includesCFIby default (DEFAULT_FIT_MEASURESconstant)test_fitmeasures()was tweaked to handle more repetitive code: calculating the subset of fit measures, and compairing this subset against lavaan refs, checking for measures that could not be applied to given loss types (SemWLS).