Modernized Training Pipeline with Hydra & Architecture Search by nakmuaycoder · Pull Request #2 · nakmuaycoder/classiAcceleration

nakmuaycoder · 2026-04-06T09:31:11Z

This first PR marks the transition of the classiAcceleration project from legacy scripts to a modern, modular, and production-ready PyTorch architecture. The primary objective is to ensure total reproducibility and streamline the search for optimized CNN architectures suitable for tiny embedded devices.

🎯 Key Changes :
Hydra Migration: Replaced command-line arguments (argparse) with a hierarchical YAML configuration framework (ml/config/). This decouples models, data settings, and training parameters for cleaner experiments.
Architecture Search Ready: The new TinyMLConvNet is now fully parameterizable (input channels, filters count). You can now easily benchmark "tiny" models (e.g., 4-8 filters) against "standard" versions via a simple CLI option.
Modular Pipeline Composition: Implemented an online vectorized pipeline including:
Random3DRotation (Fast 3D spatial augmentation).
MinMaxNormalize (Sensor-based normalization for +/- 4g).
VectorNorm (Optional module to compute L2 norm of XYZ for 1-channel models).
Quality & Security Standards (CI):
GitHub Actions integration: Automated unit tests (pytest) and linting/formatting (ruff) on every PR.
Type Hinting: Full implementation of Python Type Hints to improve code readability and maintainability.
TensorBoard Integration: Automatic logging of training metrics (Loss/Accuracy) and hyperparameter comparison (hparams) to visually track architectural tradeoffs.

nakmuaycoder · 2026-04-06T09:37:44Z

/gemini review

gemini-code-assist

Code Review

This pull request modernizes the classiAcceleration project by transitioning from a legacy TensorFlow 1.x approach to a professional PyTorch workflow for TinyML. Key changes include the implementation of a 1D-CNN model architecture, a 3D rotation augmentation layer using Rodrigues' formula, and a structured preprocessing pipeline with comprehensive unit tests. Review feedback highlights critical bugs in the AccelLogParser regarding the truncation of multi-byte hexadecimal values and potential data corruption due to misaligned sensor axes when handling missing BLE packets.

nakmuaycoder · 2026-04-06T10:18:40Z

/gemini review

gemini-code-assist

Code Review

This pull request modernizes the TinyML Fitness Tracker by transitioning from a legacy TensorFlow 1.x script-based approach to a professional PyTorch workflow. Key additions include a 3D rotation augmentation layer using Rodrigues' formula, a parameterizable 1D-CNN architecture optimized for microcontrollers, and a robust data preprocessing pipeline. The PR also introduces a comprehensive testing suite, pre-commit hooks, and a unified Makefile for project management. Feedback focuses on improving numerical stability in normalization layers, addressing potential division-by-zero errors, and enhancing the robustness of the log parser against empty or malformed input files.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

nakmuaycoder · 2026-04-06T19:08:55Z

/gemini review

gemini-code-assist

Code Review

This pull request modernizes the TinyML fitness tracker by migrating the stack from TensorFlow 1.x to PyTorch and introducing a professional, modular workflow. Key additions include a 1D-CNN architecture, vectorized 3D rotation augmentation, and a comprehensive testing suite. Feedback highlights a synchronization vulnerability in the BLE log parser that could lead to misaligned sensor data, memory inefficiencies in the dataset's windowing logic, and a recommendation to optimize the full training pipeline parameters to support future learnable augmentations.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

… Optuna search space with augmentation parameters

…ests to ensure data integrity

nakmuaycoder · 2026-05-25T12:04:53Z

/gemini review

gemini-code-assist

Code Review

This pull request modernizes the project by transitioning from a legacy TensorFlow 1.x approach to a professional PyTorch workflow for TinyML, including a modular architecture with Hydra configuration and Optuna-based architecture search. Key technical additions include a 3D rotation augmentation layer and a comprehensive testing suite. The code review identified a potential race condition in the training script's accuracy tracking and noted that per-file PCA fitting causes inconsistent feature mappings. Additionally, feedback suggested prioritizing log-based labels over filename fallbacks and adopting more robust date extraction logic in the parser.

gemini-code-assist · 2026-05-25T12:06:27Z

+            if os.path.exists(best_acc_file):
+                try:
+                    with open(best_acc_file) as f:
+                        overall_best_acc = float(f.read().strip())
+                except Exception:
+                    pass
+
+            if final_acc > overall_best_acc:
+                overall_best_acc = final_acc
+                best_model_path = os.path.join(best_dir, f"best_pipeline{suffix}.pth")
+                torch.save(base_model.state_dict(), best_model_path)
+                try:
+                    with open(best_acc_file, "w") as f:
+                        f.write(f"{overall_best_acc:.6f}\n")
+                except Exception as e:
+                    print(f"Warning: Could not write overall best accuracy: {e}")


The logic for tracking the overall best accuracy across different trials by reading from and writing to a shared file (best_accuracy.txt) is not process-safe. If the hyperparameter search is executed in parallel (e.g., by setting n_jobs > 1 in the Hydra/Optuna configuration), multiple processes may attempt to access this file simultaneously, leading to race conditions and potential data corruption. Consider using a dedicated experiment tracking framework (like MLflow or Weights & Biases) or implementing a file locking mechanism if local tracking is preferred.

gemini-code-assist · 2026-05-25T12:06:27Z

+                pca_pipeline = make_pipeline(StandardScaler(), PCA(n_components=3))
+                data = pca_pipeline.fit_transform(data).astype(np.float32)


Fitting a PCA pipeline on each file individually (fit_transform) results in inconsistent feature mappings across the dataset. The principal components will align with the directions of maximum variance within each specific recording, which may correspond to different physical axes depending on sensor orientation or movement patterns in that session. This can introduce significant noise and make it harder for the model to generalize. It is generally recommended to fit the PCA on the entire training set once and apply that fixed transformation to all samples, or rely on the 3D rotation augmentation to achieve orientation invariance.

gemini-code-assist · 2026-05-25T12:06:27Z

+                    if fallback_label == 0:
+                        current_label = parsed_label


The current logic prioritizes the label derived from the filename over the label characteristic found within the log file. If fallback_label is non-zero (e.g., walk or run was detected in the filename), any label updates found in the log are ignored. Typically, the data captured within the log (Characteristic 2105) should be treated as the primary source of truth, with the filename serving as a fallback if the characteristic is missing.

current_label = parsed_label

gemini-code-assist · 2026-05-25T12:06:28Z

+            try:
+                # Expecting legacy format LBX_LOGS_YYYY-MM-DD_...
+                # On essaie d'extraire la date du nom de fichier LBX_LOGS_2020-11-20_...
+                date_str = os.path.basename(path_log).split("_")[2].replace("-", "")


The date extraction logic is highly dependent on a specific filename format (LBX_LOGS_YYYY-MM-DD_...). If the filename structure changes, this will raise an error. While the try-except block catches the failure, a more flexible approach (such as using a regular expression) would improve the robustness of the data ingestion pipeline against variations in naming conventions.

…tation for training pipeline features

…rganization and tracking

…als" This reverts commit a6a4f1f.

…rocessing

…launcher in config

… console output during sweep trials

…e layer

nakmuaycoder · 2026-05-25T13:45:29Z

/gemini review

gemini-code-assist

Code Review

This pull request modernizes the classiAcceleration project by migrating from a legacy TensorFlow 1.x approach to a professional PyTorch workflow. Key additions include a parameterizable 1D-CNN architecture, a robust data augmentation pipeline using Rodrigues' rotation formula, and a modernized preprocessing CLI. The update also integrates Hydra and Optuna for advanced configuration and Neural Architecture Search, alongside a comprehensive unit testing suite. Review feedback identifies a critical race condition when tracking the best model across parallel trials and a discrepancy between the implementation and documentation regarding PCA alignment granularity. Additionally, it is recommended to log metrics from the best-performing epoch rather than the final state to ensure more accurate hyperparameter evaluation.

gemini-code-assist · 2026-05-25T13:50:14Z

+            best_dir = os.path.join(cfg.output_dir, "best")
+            os.makedirs(best_dir, exist_ok=True)
+            best_acc_file = os.path.join(best_dir, "best_accuracy.txt")
+
+            # Read overall best accuracy across runs/trials
+            overall_best_acc = 0.0
+            if os.path.exists(best_acc_file):
+                try:
+                    with open(best_acc_file) as f:
+                        overall_best_acc = float(f.read().strip())
+                except Exception:
+                    pass
+
+            if final_acc > overall_best_acc:
+                overall_best_acc = final_acc
+                best_model_path = os.path.join(best_dir, f"best_pipeline{suffix}.pth")
+                torch.save(base_model.state_dict(), best_model_path)
+                try:
+                    with open(best_acc_file, "w") as f:
+                        f.write(f"{overall_best_acc:.6f}\n")
+                except Exception as e:
+                    print(f"Warning: Could not write overall best accuracy: {e}")


In a multirun environment with parallel jobs (n_jobs: 5), reading and writing to a shared best_accuracy.txt file and saving the best model to a shared directory without any file locking mechanism introduces a significant race condition. Multiple trials may attempt to update the overall best accuracy and model simultaneously, leading to corrupted files or incorrect selection of the best model across the search space.

gemini-code-assist · 2026-05-25T13:50:14Z

+            if self.use_pca:
+                pca_pipeline = make_pipeline(StandardScaler(), PCA(n_components=3))
+                data = pca_pipeline.fit_transform(data).astype(np.float32)


There is a discrepancy between the implementation and the documentation. The README states that PCA alignment is performed 'on each window', but the code currently fits and transforms the PCA pipeline on the entire file at once. If the sensor orientation changes within a single log file, per-file PCA will fail to align coordinates correctly. Consider moving the PCA transformation inside the sliding window loop if window-wise alignment is required.

gemini-code-assist · 2026-05-25T13:50:14Z

+        "use_norm": cfg.data.use_norm,
+        "total_params": total_params,
+    }
+    writer.add_hparams(hparams, {"hparam/accuracy": best_acc, "hparam/f1": final_f1})


The add_hparams call logs the final_f1 score from the last epoch of training. For a more accurate comparison of architectures during search, it is recommended to log the metrics (Accuracy and F1) corresponding to the best model found during the run, rather than the final state which might have overfitted or degraded.

… training metrics to use F1-score for early stopping and model selection

nakmuaycoder added 11 commits April 5, 2026 16:22

refactoring

6ecf271

feat: data parsing utilities

13f1d44

Add make format

9051ea8

feat: repo formatting

7900246

feat: implement 3D rotation augmentation for accelerometer data

7ff5f59

feat: implement TinyML augmentation, and preprocessing modules.

b17476d

feat: CI

2c81100

feat: implement dataset

af6ab5b

feat: log parsing utilities

544b969

refactor: replace legacy log preprocessing scripts

28367d7

refactor: drop analysis notebooks

7f3e727

nakmuaycoder self-assigned this Apr 6, 2026

gemini-code-assist Bot reviewed Apr 6, 2026

View reviewed changes

Comment thread ml/src/parser.py Outdated

Comment thread ml/src/parser.py Outdated

Comment thread ml/src/parser.py Outdated

Comment thread ml/src/parser.py Outdated

fix: Pr review

c7c64cb

gemini-code-assist Bot reviewed Apr 6, 2026

View reviewed changes

Comment thread ml/src/model.py Outdated

Comment thread ml/src/model.py Outdated

Comment thread ml/src/model.py

Comment thread ml/src/parser.py

nakmuaycoder and others added 8 commits April 6, 2026 13:16

Update ml/src/parser.py

b236aa3

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update ml/src/model.py

084352b

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update ml/src/model.py

cf508a9

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update ml/src/model.py

8c81a77

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

feat: optuna

197a8fe

fix: fix labelling bug

49f768b

feat: implement training pipeline

5518933

feat: NAS configuration targets, and expand hyperparameter search space

7b9ed89

gemini-code-assist Bot reviewed Apr 6, 2026

View reviewed changes

Comment thread ml/src/parser.py Outdated

Comment thread ml/src/train.py Outdated

Comment thread ml/src/dataset.py

nakmuaycoder and others added 3 commits April 6, 2026 21:16

feat: add default training configuration for TinyML research

2aa512f

Update ml/src/train.py

36beb2a

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update ml/src/dataset.py

f74320b

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

nakmuaycoder added 3 commits May 25, 2026 11:50

aug

f767bb4

feat: update Hydra configuration with directory management and expand…

bc9c1c6

… Optuna search space with augmentation parameters

feat: implement row validation logic in AccelLogParser and add unit t…

0802e77

…ests to ensure data integrity

gemini-code-assist Bot reviewed May 25, 2026

View reviewed changes

nakmuaycoder added 12 commits May 25, 2026 14:07

chore: remove unused requirements.txt file

18f3e63

chore: update Hydra output directory configuration and revise documen…

c79c993

…tation for training pipeline features

feat: add data.seq_len to hyperparameter search space in config

61139dc

feat: add experiment_name configuration to improve output directory o…

728bdad

…rganization and tracking

chore: increase hyperparameter optimization search to 150 trials

a6a4f1f

chore: remove tensorboard dependency from pyproject.toml and uv.lock

cec13b4

Revert "chore: increase hyperparameter optimization search to 150 tri…

45a634a

…als" This reverts commit a6a4f1f.

chore: increase n_jobs to 4 in study configuration for faster trial p…

96c6257

…rocessing

feat: add hydra-joblib-launcher to dependencies and configure joblib …

63b857b

…launcher in config

chore: increase n_trials to 150 and n_jobs to 5 in study configuration

659c8ae

feat: implement Hydra multirun logging support and suppress redundant…

29bb105

… console output during sweep trials

feat: add dynamic window-wise min-max normalization to MinMaxNormaliz…

81ec643

…e layer

gemini-code-assist Bot reviewed May 25, 2026

View reviewed changes

feat: implement class balancing with WeightedRandomSampler and update…

8368942

… training metrics to use F1-score for early stopping and model selection

		pca_pipeline = make_pipeline(StandardScaler(), PCA(n_components=3))
		data = pca_pipeline.fit_transform(data).astype(np.float32)

Conversation

nakmuaycoder commented Apr 6, 2026

Uh oh!

nakmuaycoder commented Apr 6, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nakmuaycoder commented Apr 6, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nakmuaycoder commented Apr 6, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nakmuaycoder commented May 25, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

nakmuaycoder commented May 25, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant