Skip to content

Modernized Training Pipeline with Hydra & Architecture Search#2

Open
nakmuaycoder wants to merge 39 commits into
feat/v2from
feat/training_pipeline
Open

Modernized Training Pipeline with Hydra & Architecture Search#2
nakmuaycoder wants to merge 39 commits into
feat/v2from
feat/training_pipeline

Conversation

@nakmuaycoder
Copy link
Copy Markdown
Owner

This first PR marks the transition of the classiAcceleration project from legacy scripts to a modern, modular, and production-ready PyTorch architecture. The primary objective is to ensure total reproducibility and streamline the search for optimized CNN architectures suitable for tiny embedded devices.

🎯 Key Changes :
Hydra Migration: Replaced command-line arguments (argparse) with a hierarchical YAML configuration framework (ml/config/). This decouples models, data settings, and training parameters for cleaner experiments.
Architecture Search Ready: The new TinyMLConvNet is now fully parameterizable (input channels, filters count). You can now easily benchmark "tiny" models (e.g., 4-8 filters) against "standard" versions via a simple CLI option.
Modular Pipeline Composition: Implemented an online vectorized pipeline including:
Random3DRotation (Fast 3D spatial augmentation).
MinMaxNormalize (Sensor-based normalization for +/- 4g).
VectorNorm (Optional module to compute L2 norm of XYZ for 1-channel models).
Quality & Security Standards (CI):
GitHub Actions integration: Automated unit tests (pytest) and linting/formatting (ruff) on every PR.
Type Hinting: Full implementation of Python Type Hints to improve code readability and maintainability.
TensorBoard Integration: Automatic logging of training metrics (Loss/Accuracy) and hyperparameter comparison (hparams) to visually track architectural tradeoffs.

@nakmuaycoder nakmuaycoder self-assigned this Apr 6, 2026
@nakmuaycoder
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modernizes the classiAcceleration project by transitioning from a legacy TensorFlow 1.x approach to a professional PyTorch workflow for TinyML. Key changes include the implementation of a 1D-CNN model architecture, a 3D rotation augmentation layer using Rodrigues' formula, and a structured preprocessing pipeline with comprehensive unit tests. Review feedback highlights critical bugs in the AccelLogParser regarding the truncation of multi-byte hexadecimal values and potential data corruption due to misaligned sensor axes when handling missing BLE packets.

Comment thread ml/src/parser.py Outdated
Comment thread ml/src/parser.py Outdated
Comment thread ml/src/parser.py Outdated
Comment thread ml/src/parser.py Outdated
@nakmuaycoder
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modernizes the TinyML Fitness Tracker by transitioning from a legacy TensorFlow 1.x script-based approach to a professional PyTorch workflow. Key additions include a 3D rotation augmentation layer using Rodrigues' formula, a parameterizable 1D-CNN architecture optimized for microcontrollers, and a robust data preprocessing pipeline. The PR also introduces a comprehensive testing suite, pre-commit hooks, and a unified Makefile for project management. Feedback focuses on improving numerical stability in normalization layers, addressing potential division-by-zero errors, and enhancing the robustness of the log parser against empty or malformed input files.

Comment thread ml/src/model.py Outdated
Comment thread ml/src/model.py Outdated
Comment thread ml/src/model.py
Comment thread ml/src/parser.py
nakmuaycoder and others added 8 commits April 6, 2026 13:16
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@nakmuaycoder
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modernizes the TinyML fitness tracker by migrating the stack from TensorFlow 1.x to PyTorch and introducing a professional, modular workflow. Key additions include a 1D-CNN architecture, vectorized 3D rotation augmentation, and a comprehensive testing suite. Feedback highlights a synchronization vulnerability in the BLE log parser that could lead to misaligned sensor data, memory inefficiencies in the dataset's windowing logic, and a recommendation to optimize the full training pipeline parameters to support future learnable augmentations.

Comment thread ml/src/parser.py Outdated
Comment thread ml/src/train.py Outdated
Comment thread ml/src/dataset.py
nakmuaycoder and others added 3 commits April 6, 2026 21:16
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@nakmuaycoder
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modernizes the project by transitioning from a legacy TensorFlow 1.x approach to a professional PyTorch workflow for TinyML, including a modular architecture with Hydra configuration and Optuna-based architecture search. Key technical additions include a 3D rotation augmentation layer and a comprehensive testing suite. The code review identified a potential race condition in the training script's accuracy tracking and noted that per-file PCA fitting causes inconsistent feature mappings. Additionally, feedback suggested prioritizing log-based labels over filename fallbacks and adopting more robust date extraction logic in the parser.

Comment thread ml/src/train.py Outdated
Comment on lines +214 to +229
if os.path.exists(best_acc_file):
try:
with open(best_acc_file) as f:
overall_best_acc = float(f.read().strip())
except Exception:
pass

if final_acc > overall_best_acc:
overall_best_acc = final_acc
best_model_path = os.path.join(best_dir, f"best_pipeline{suffix}.pth")
torch.save(base_model.state_dict(), best_model_path)
try:
with open(best_acc_file, "w") as f:
f.write(f"{overall_best_acc:.6f}\n")
except Exception as e:
print(f"Warning: Could not write overall best accuracy: {e}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The logic for tracking the overall best accuracy across different trials by reading from and writing to a shared file (best_accuracy.txt) is not process-safe. If the hyperparameter search is executed in parallel (e.g., by setting n_jobs > 1 in the Hydra/Optuna configuration), multiple processes may attempt to access this file simultaneously, leading to race conditions and potential data corruption. Consider using a dedicated experiment tracking framework (like MLflow or Weights & Biases) or implementing a file locking mechanism if local tracking is preferred.

Comment thread ml/src/dataset.py
Comment on lines +58 to +59
pca_pipeline = make_pipeline(StandardScaler(), PCA(n_components=3))
data = pca_pipeline.fit_transform(data).astype(np.float32)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Fitting a PCA pipeline on each file individually (fit_transform) results in inconsistent feature mappings across the dataset. The principal components will align with the directions of maximum variance within each specific recording, which may correspond to different physical axes depending on sensor orientation or movement patterns in that session. This can introduce significant noise and make it harder for the model to generalize. It is generally recommended to fit the PCA on the entire training set once and apply that fixed transformation to all samples, or rely on the 3D rotation augmentation to achieve orientation invariance.

Comment thread ml/src/parser.py
Comment on lines +98 to +99
if fallback_label == 0:
current_label = parsed_label
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current logic prioritizes the label derived from the filename over the label characteristic found within the log file. If fallback_label is non-zero (e.g., walk or run was detected in the filename), any label updates found in the log are ignored. Typically, the data captured within the log (Characteristic 2105) should be treated as the primary source of truth, with the filename serving as a fallback if the characteristic is missing.

                    current_label = parsed_label

Comment thread ml/src/parser.py
try:
# Expecting legacy format LBX_LOGS_YYYY-MM-DD_...
# On essaie d'extraire la date du nom de fichier LBX_LOGS_2020-11-20_...
date_str = os.path.basename(path_log).split("_")[2].replace("-", "")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The date extraction logic is highly dependent on a specific filename format (LBX_LOGS_YYYY-MM-DD_...). If the filename structure changes, this will raise an error. While the try-except block catches the failure, a more flexible approach (such as using a regular expression) would improve the robustness of the data ingestion pipeline against variations in naming conventions.

@nakmuaycoder
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modernizes the classiAcceleration project by migrating from a legacy TensorFlow 1.x approach to a professional PyTorch workflow. Key additions include a parameterizable 1D-CNN architecture, a robust data augmentation pipeline using Rodrigues' rotation formula, and a modernized preprocessing CLI. The update also integrates Hydra and Optuna for advanced configuration and Neural Architecture Search, alongside a comprehensive unit testing suite. Review feedback identifies a critical race condition when tracking the best model across parallel trials and a discrepancy between the implementation and documentation regarding PCA alignment granularity. Additionally, it is recommended to log metrics from the best-performing epoch rather than the final state to ensure more accurate hyperparameter evaluation.

Comment thread ml/src/train.py Outdated
Comment on lines +237 to +258
best_dir = os.path.join(cfg.output_dir, "best")
os.makedirs(best_dir, exist_ok=True)
best_acc_file = os.path.join(best_dir, "best_accuracy.txt")

# Read overall best accuracy across runs/trials
overall_best_acc = 0.0
if os.path.exists(best_acc_file):
try:
with open(best_acc_file) as f:
overall_best_acc = float(f.read().strip())
except Exception:
pass

if final_acc > overall_best_acc:
overall_best_acc = final_acc
best_model_path = os.path.join(best_dir, f"best_pipeline{suffix}.pth")
torch.save(base_model.state_dict(), best_model_path)
try:
with open(best_acc_file, "w") as f:
f.write(f"{overall_best_acc:.6f}\n")
except Exception as e:
print(f"Warning: Could not write overall best accuracy: {e}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In a multirun environment with parallel jobs (n_jobs: 5), reading and writing to a shared best_accuracy.txt file and saving the best model to a shared directory without any file locking mechanism introduces a significant race condition. Multiple trials may attempt to update the overall best accuracy and model simultaneously, leading to corrupted files or incorrect selection of the best model across the search space.

Comment thread ml/src/dataset.py
Comment on lines +57 to +59
if self.use_pca:
pca_pipeline = make_pipeline(StandardScaler(), PCA(n_components=3))
data = pca_pipeline.fit_transform(data).astype(np.float32)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a discrepancy between the implementation and the documentation. The README states that PCA alignment is performed 'on each window', but the code currently fits and transforms the PCA pipeline on the entire file at once. If the sensor orientation changes within a single log file, per-file PCA will fail to align coordinates correctly. Consider moving the PCA transformation inside the sliding window loop if window-wise alignment is required.

Comment thread ml/src/train.py Outdated
"use_norm": cfg.data.use_norm,
"total_params": total_params,
}
writer.add_hparams(hparams, {"hparam/accuracy": best_acc, "hparam/f1": final_f1})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The add_hparams call logs the final_f1 score from the last epoch of training. For a more accurate comparison of architectures during search, it is recommended to log the metrics (Accuracy and F1) corresponding to the best model found during the run, rather than the final state which might have overfitted or degraded.

… training metrics to use F1-score for early stopping and model selection
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant