HuggingFace Integration Friction: Weight Tying, Generation, Trainer, and Dataset Collator Alignment


## Background

The avant-garde model core (recurrent heterogeneous graph, LTI, ACT, Mamba-2 SSD, Titans) is stable and working. The HF wrapper layer (`hf_model.py`, `trainer.py`) has accumulated friction points that break standard HF workflows (`Trainer`, `pipeline`, `AutoModelForCausalLM.from_pretrained`, `model.generate()`). A previous patch fixed the redundant `lm_head` dead-weight bug; this issue completes the V5 integration.

## Problems Identified

| # | Problem | Location | Impact |
|---|---------|----------|--------|
| 1 | **Weight tying bypasses HF standard mechanism** | `hf_model.py` `HelixPreTrainedModel` | `_tied_weights_keys = {}` and `get_expanded_tied_weights_keys` override kill `model.tie_weights()` and `from_pretrained` weight restoration. |
| 2 | **Custom `generate_ext()` instead of standard `generate()`** | `hf_model.py` `HelixForCausalLM` | Cannot use `pipeline("text-generation", ...)` or standard `StoppingCriteria`; agent must re-implement stop strings, temperature, top-p manually. |
| 3 | **Custom `Trainer` reinvents `transformers.Trainer`** | `trainer.py` | Duplicates AMP, gradient accumulation, scheduler, logging. Loses DeepSpeed/FSDP support. |
| 4 | **Dataset overlap masking at risk from standard collators** | `dataset.py` / `trainer.py` | `DataCollatorForLanguageModeling` regenerates labels and destroys custom `-100` overlap masks. |
| 5 | **Tokenizer wrapper adds friction** | `tokenizer.py` | `HelixTokenizer` is not a first-class `PreTrainedTokenizer`; passing it to HF Trainer requires `._backend` indirection. |
| 6 | **Auto-registration wrapped in silent `try/except`** | `hf_model.py` bottom | Registration failures are swallowed. |

## Architectural Constraints (Non-Negotiable)

The following are **sacred** and must not be modified:
- `helix_lm/graph.py` — heterogeneous graph wiring
- `helix_lm/recurrent.py` — LTI injection, ACT halting, recurrence
- `helix_lm/nodes.py` — all node implementations (attention variants, SwiGLU, SSM, Titans, gates)
- `helix_lm/mamba2.py` — Mamba-2 SSD parallel scan
- `helix_lm/rope.py` — rotary embeddings

The following behaviors must be preserved exactly:
- **Document-aware chunking with overlap masking**: `DocumentAwareDataset` must continue to return `labels` with `-100` on overlap heads and padding tails. Standard `DataCollatorForLanguageModeling` **must not** be used; it overwrites these labels.
- **No KV-cache / no `past_key_values` state**: The recurrent graph re-initializes `node_states = {}` on every forward call. Generation must pass the **full sequence** (or last `seq_len` window) on every step, never just `input_ids[:, -1:]`. `config.use_cache` must remain `False`.
- **Parameter count stability**: After all fixes, `HelixForCausalLM(HelixConfig.tiny(vocab_size=50257))` must report exactly **13,347,974** parameters.

## Proposed Fix Stages
See attached agent prompt package for executable stage-by-stage tasks.

#	Problem	Location	Impact
1	Weight tying bypasses HF standard mechanism	`hf_model.py` `HelixPreTrainedModel`	`_tied_weights_keys = {}` and `get_expanded_tied_weights_keys` override kill `model.tie_weights()` and `from_pretrained` weight restoration.
2	Custom `generate_ext()` instead of standard `generate()`	`hf_model.py` `HelixForCausalLM`	Cannot use `pipeline("text-generation", ...)` or standard `StoppingCriteria`; agent must re-implement stop strings, temperature, top-p manually.
3	Custom `Trainer` reinvents `transformers.Trainer`	`trainer.py`	Duplicates AMP, gradient accumulation, scheduler, logging. Loses DeepSpeed/FSDP support.
4	Dataset overlap masking at risk from standard collators	`dataset.py` / `trainer.py`	`DataCollatorForLanguageModeling` regenerates labels and destroys custom `-100` overlap masks.
5	Tokenizer wrapper adds friction	`tokenizer.py`	`HelixTokenizer` is not a first-class `PreTrainedTokenizer`; passing it to HF Trainer requires `._backend` indirection.
6	Auto-registration wrapped in silent `try/except`	`hf_model.py` bottom	Registration failures are swallowed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HuggingFace Integration Friction: Weight Tying, Generation, Trainer, and Dataset Collator Alignment #17

Background

Problems Identified

Architectural Constraints (Non-Negotiable)

Proposed Fix Stages

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

HuggingFace Integration Friction: Weight Tying, Generation, Trainer, and Dataset Collator Alignment #17

Description

Background

Problems Identified

Architectural Constraints (Non-Negotiable)

Proposed Fix Stages

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions