GitHub - rmpt-phys/Hydro_Time_Series_Processing_and_Modeling_with_a_ML_Framework: End-to-end pipeline for hydrometeorological time-series processing and flash-flood modeling with a ML framework. Integrates rain-gauge and river-stage data through advanced preprocessing (with outlier detection & noise filtering) for flood forecasting with classical ML models; performance evaluation and exploratory/statistical analysis of results.

Hydrometeorological Time-Series Processing and Flash-Flood Modeling Using a Cross-Validated ML Framework

About this repository:

This repository implements an end-to-end pipeline for reading, analyzing, and preprocessing rain-gauge and river-stage data, together with flash-flood modeling and performance evaluation of machine learning algorithms.

The core component of the repository is the notebook Project_Summary.ipynb. It summarizes the main contributions of the author during a four-month research project carried out at the National Center for Natural Disaster Monitoring and Alerts (Centro Nacional de Monitoramento e Alertas de Desastres Naturais - CEMADEN), in collaboration with other researchers. The notebook serves as a structured and self-contained guide covering the problem statement, motivation, a brief literature overview, and the adopted methodology. The workflow is organized into two main parts:

Part 1: Conversion of raw gauge measurements into unified hydrometeorological datasets, including data cleaning, filtering, temporal alignment, and preparation for modeling;
Part 2: Flash-flood modeling using the ML4FF framework, including a step-by-step guide to configuring the auxiliary Python script Run_ML4FF_Code.py (input data definition, cross-validation setup, training/test splits, and output management), followed by a structured analysis of model performance and predictions.

The notebook loads station-separated datasets, aligns them onto a unified 10-minute time grid, and explicitly handles missing data through NaN insertion. It then performs exploratory and statistical analyses to extract key time-series characteristics for each station, producing a curated collection of data frames restricted to stations that satisfy predefined activity and data-quality criteria.

River-stage records are processed through a two-step pipeline: outlier detection and removal using the Hampel filter, followed by zero-phase low-pass filtering to obtain smooth, noise-reduced signals. Rainfall data are treated using a separate procedure tailored to their discrete and event-driven nature.

The processed rain-gauge and river-stage datasets are exported as structured CSV files, and dedicated training datasets are generated for integration with the ML4FF framework. The repository also includes example results (ML4FF_Results) and provides a reproducible workflow for building forecasting models and rigorously evaluating their predictive performance.

Code author:

- Rafael Marques Paes Teixeira 
- Orcid: 0000-0001-7290-3573

Important contributions to the notebook were made by:

- Leonardo Bacelar Lima Santos (project manager)
- Orcid: 0000-0002-3129-772X

Collaborators that contributed with important codes, data and discussions:

- Andrea S. Viteri López     (0000-0002-9929-391X)
- Lidiane S. Lima            (0000-0001-5490-3975)
- Elton V. Escobar Silva     (0000-0002-9437-9351)
- Jaqueline A. J. P. Soares  (0000-0002-2569-7620)
- Kleber L. Rocha-Filho      (0009-0001-7558-1108)
- Glauston R. T. Lima        (0000-0002-6854-7921)
- Cristiano W. Eichholz      (0000-0001-7123-5438)

Contact emails: rafael.mpt@gmail.com, santoslbl@gmail.com

Preparations for the complete execution of the notebook:

The ML4FF framework and the needed data to reproduce the results presented and analyzed in this notebook can be obtained from the links below:

https://zenodo.org/records/17654660

https://github.com/jaqueline-soares/ML4FF-framework

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
Aux_Files		Aux_Files
ML4FF_Results		ML4FF_Results
Project_Summary.ipynb		Project_Summary.ipynb
README.md		README.md
Run_ML4FF_Code.py		Run_ML4FF_Code.py
TRW_Gauge_Map.png		TRW_Gauge_Map.png
cemaden.png		cemaden.png
conda_env.yml		conda_env.yml
iFast.png		iFast.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hydrometeorological Time-Series Processing and Flash-Flood Modeling Using a Cross-Validated ML Framework

Preparations for the complete execution of the notebook:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Hydrometeorological Time-Series Processing and Flash-Flood Modeling Using a Cross-Validated ML Framework

Preparations for the complete execution of the notebook:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages