Feature Details
Implement lightweight utilities that verify the structural integrity of feature batches before they hit the model. These checks should be fast, torch-friendly, and easy to call in both unit tests and runtime (debug mode). The goal is to catch silent bugs in sparse-lag formatting, padding, and concatenation.
The validators should cover:
- Tensor shapes (e.g., $(B, K, D_{val})$ for values, $(B, K)$ for lag IDs, $(B,)$ for ticker IDs).
- Mask semantics: pad_mask is boolean, True means “ignore/pad”, and no non-pad exists beyond the last valid index.
- Alignment across tensors in the same batch (same $B$, same $K$).
- Dtype sanity (e.g., embeddings indices are
int64, values are float32/float64).
- Monotone padding: once padded, all subsequent positions in that row must be padded.
- Optional value checks: NaN/Inf guards on numeric features.
Affected Modules
As stated in the parent issue.
Implementation Checklist
Limitations
As stated in the parent issue.
Feature Details
Implement lightweight utilities that verify the structural integrity of feature batches before they hit the model. These checks should be fast, torch-friendly, and easy to call in both unit tests and runtime (debug mode). The goal is to catch silent bugs in sparse-lag formatting, padding, and concatenation.
The validators should cover:
int64, values arefloat32/float64).Affected Modules
As stated in the parent issue.
Implementation Checklist
strict=False, return a report dict instead of raising.FeatureGen(behind a debug=True flag) to run per batch in debug mode.• Happy paths: correct shapes/masks/dtypes pass.
• Failure cases: mismatched K, ragged masks, wrong dtypes, NaNs/Infs, non-boolean masks.
• Edge cases: K=1, empty after padding (all pad), mixed dtypes.
Limitations
As stated in the parent issue.