Background
The avant-garde model core (recurrent heterogeneous graph, LTI, ACT, Mamba-2 SSD, Titans) is stable and working. The HF wrapper layer (hf_model.py, trainer.py) has accumulated friction points that break standard HF workflows (Trainer, pipeline, AutoModelForCausalLM.from_pretrained, model.generate()). A previous patch fixed the redundant lm_head dead-weight bug; this issue completes the V5 integration.
Problems Identified
| # |
Problem |
Location |
Impact |
| 1 |
Weight tying bypasses HF standard mechanism |
hf_model.py HelixPreTrainedModel |
_tied_weights_keys = {} and get_expanded_tied_weights_keys override kill model.tie_weights() and from_pretrained weight restoration. |
| 2 |
Custom generate_ext() instead of standard generate() |
hf_model.py HelixForCausalLM |
Cannot use pipeline("text-generation", ...) or standard StoppingCriteria; agent must re-implement stop strings, temperature, top-p manually. |
| 3 |
Custom Trainer reinvents transformers.Trainer |
trainer.py |
Duplicates AMP, gradient accumulation, scheduler, logging. Loses DeepSpeed/FSDP support. |
| 4 |
Dataset overlap masking at risk from standard collators |
dataset.py / trainer.py |
DataCollatorForLanguageModeling regenerates labels and destroys custom -100 overlap masks. |
| 5 |
Tokenizer wrapper adds friction |
tokenizer.py |
HelixTokenizer is not a first-class PreTrainedTokenizer; passing it to HF Trainer requires ._backend indirection. |
| 6 |
Auto-registration wrapped in silent try/except |
hf_model.py bottom |
Registration failures are swallowed. |
Architectural Constraints (Non-Negotiable)
The following are sacred and must not be modified:
helix_lm/graph.py — heterogeneous graph wiring
helix_lm/recurrent.py — LTI injection, ACT halting, recurrence
helix_lm/nodes.py — all node implementations (attention variants, SwiGLU, SSM, Titans, gates)
helix_lm/mamba2.py — Mamba-2 SSD parallel scan
helix_lm/rope.py — rotary embeddings
The following behaviors must be preserved exactly:
- Document-aware chunking with overlap masking:
DocumentAwareDataset must continue to return labels with -100 on overlap heads and padding tails. Standard DataCollatorForLanguageModeling must not be used; it overwrites these labels.
- No KV-cache / no
past_key_values state: The recurrent graph re-initializes node_states = {} on every forward call. Generation must pass the full sequence (or last seq_len window) on every step, never just input_ids[:, -1:]. config.use_cache must remain False.
- Parameter count stability: After all fixes,
HelixForCausalLM(HelixConfig.tiny(vocab_size=50257)) must report exactly 13,347,974 parameters.
Proposed Fix Stages
See attached agent prompt package for executable stage-by-stage tasks.
Background
The avant-garde model core (recurrent heterogeneous graph, LTI, ACT, Mamba-2 SSD, Titans) is stable and working. The HF wrapper layer (
hf_model.py,trainer.py) has accumulated friction points that break standard HF workflows (Trainer,pipeline,AutoModelForCausalLM.from_pretrained,model.generate()). A previous patch fixed the redundantlm_headdead-weight bug; this issue completes the V5 integration.Problems Identified
hf_model.pyHelixPreTrainedModel_tied_weights_keys = {}andget_expanded_tied_weights_keysoverride killmodel.tie_weights()andfrom_pretrainedweight restoration.generate_ext()instead of standardgenerate()hf_model.pyHelixForCausalLMpipeline("text-generation", ...)or standardStoppingCriteria; agent must re-implement stop strings, temperature, top-p manually.Trainerreinventstransformers.Trainertrainer.pydataset.py/trainer.pyDataCollatorForLanguageModelingregenerates labels and destroys custom-100overlap masks.tokenizer.pyHelixTokenizeris not a first-classPreTrainedTokenizer; passing it to HF Trainer requires._backendindirection.try/excepthf_model.pybottomArchitectural Constraints (Non-Negotiable)
The following are sacred and must not be modified:
helix_lm/graph.py— heterogeneous graph wiringhelix_lm/recurrent.py— LTI injection, ACT halting, recurrencehelix_lm/nodes.py— all node implementations (attention variants, SwiGLU, SSM, Titans, gates)helix_lm/mamba2.py— Mamba-2 SSD parallel scanhelix_lm/rope.py— rotary embeddingsThe following behaviors must be preserved exactly:
DocumentAwareDatasetmust continue to returnlabelswith-100on overlap heads and padding tails. StandardDataCollatorForLanguageModelingmust not be used; it overwrites these labels.past_key_valuesstate: The recurrent graph re-initializesnode_states = {}on every forward call. Generation must pass the full sequence (or lastseq_lenwindow) on every step, never justinput_ids[:, -1:].config.use_cachemust remainFalse.HelixForCausalLM(HelixConfig.tiny(vocab_size=50257))must report exactly 13,347,974 parameters.Proposed Fix Stages
See attached agent prompt package for executable stage-by-stage tasks.