From 52044ee39e8a1c2c0a59b0d86cf0b01b20235e38 Mon Sep 17 00:00:00 2001 From: that-ar-guy Date: Wed, 12 Feb 2025 23:15:06 +0530 Subject: [PATCH 1/5] index updated --- docs/projects/deep-learning/anamoly-detection.md | 0 docs/projects/deep-learning/index.md | 9 +++++++++ 2 files changed, 9 insertions(+) create mode 100644 docs/projects/deep-learning/anamoly-detection.md diff --git a/docs/projects/deep-learning/anamoly-detection.md b/docs/projects/deep-learning/anamoly-detection.md new file mode 100644 index 00000000..e69de29b diff --git a/docs/projects/deep-learning/index.md b/docs/projects/deep-learning/index.md index 7d210a0f..068507f4 100644 --- a/docs/projects/deep-learning/index.md +++ b/docs/projects/deep-learning/index.md @@ -11,6 +11,15 @@

📅 2025-01-10 | ⏱️ 10 mins

+ + + +
+

LSTM Autoencoder for Time Series Anomaly Detection

+

A deep learning approach to detect anomalies in time series data.

+

📅 2025-02-12 | ⏱️ 10 mins

+
+
From d88cb357920837d1217124ebf5619703cbea713b Mon Sep 17 00:00:00 2001 From: that-ar-guy Date: Wed, 12 Feb 2025 23:20:03 +0530 Subject: [PATCH 2/5] page created --- .../deep-learning/anamoly-detection.md | 147 ++++++++++++++++++ 1 file changed, 147 insertions(+) diff --git a/docs/projects/deep-learning/anamoly-detection.md b/docs/projects/deep-learning/anamoly-detection.md index e69de29b..5ac9d936 100644 --- a/docs/projects/deep-learning/anamoly-detection.md +++ b/docs/projects/deep-learning/anamoly-detection.md @@ -0,0 +1,147 @@ +# Time-Series Anomaly Detection + +### AIM + +To detect anomalies in time-series data using Long Short-Term Memory (LSTM) networks. + +### DATASET + +Synthetic time-series data generated using sine wave with added noise. + +### KAGGLE NOTEBOOK +[https://www.kaggle.com/code/thatarguy/lstm-anamoly-detection/notebook](https://www.kaggle.com/code/thatarguy/lstm-anamoly-detection/notebook) + +### LIBRARIES NEEDED + + - numpy + - pandas + - yfinance + - matplotlib + - tensorflow + - scikit-learn + +--- + +### DESCRIPTION + +!!! info "What is the requirement of the project?" + - The project focuses on identifying anomalies in time-series data using an LSTM autoencoder. The model learns normal patterns and detects deviations indicating anomalies. + +??? info "Why is it necessary?" + - Anomaly detection is crucial in various domains such as finance, healthcare, and cybersecurity, where detecting unexpected behavior can prevent failures, fraud, or security breaches. + +??? info "How is it beneficial and used?" + - Businesses can use it to detect irregularities in stock market trends. + - It can help monitor industrial equipment to identify faults before failures occur. + - It can be applied in fraud detection for financial transactions. + +??? info "How did you start approaching this project? (Initial thoughts and planning)" + - Understanding time-series anomaly detection methodologies. + - Generating synthetic data to simulate real-world scenarios. + - Implementing an LSTM autoencoder to learn normal patterns and detect anomalies. + - Evaluating model performance using Mean Squared Error (MSE). + +??? info "Mention any additional resources used (blogs, books, chapters, articles, research papers, etc.)." + - Research paper: "Deep Learning for Time-Series Anomaly Detection" + - Public notebook: LSTM Autoencoder for Anomaly Detection + +--- + +### Model Architecture + - The LSTM autoencoder learns normal time-series behavior and reconstructs it. Any deviation is considered an anomaly. + - Encoder: Extracts patterns using LSTM layers. + - Bottleneck: Compresses the data representation. + - Decoder: Reconstructs the original sequence. + - The reconstruction error determines anomalies. + +### Model Structure + - Input: Time-series sequence (50 time steps) + - LSTM Layers for encoding + - Repeat Vector to retain sequence information + - LSTM Layers for decoding + - TimeDistributed Dense Layer for reconstruction + - Loss Function: Mean Squared Error (MSE) + +--- + +#### WHAT I HAVE DONE + +=== "Step 1" + + Exploratory Data Analysis + + - Generate synthetic data (sine wave with noise) + - Normalize data using MinMaxScaler + - Split data into training and validation sets + +=== "Step 2" + + Data Cleaning and Preprocessing + + - Create sequential data using a rolling window approach + - Reshape data for LSTM compatibility + +=== "Step 3" + + Feature Engineering and Selection + + - Use LSTM layers for sequence modeling + - Implement autoencoder-based reconstruction + +=== "Step 4" + + Modeling + + - Train an LSTM autoencoder + - Optimize loss function using Adam optimizer + - Monitor validation loss for overfitting prevention + +=== "Step 5" + + Result Analysis + + - Compute reconstruction error for anomaly detection + - Identify threshold for anomalies using percentile-based method + - Visualize detected anomalies using Matplotlib + +--- + +#### PROJECT TRADE-OFFS AND SOLUTIONS + +=== "Trade Off 1" + + **Reconstruction Error Threshold Selection:** + Setting a high threshold may miss subtle anomalies, while a low threshold might increase false positives. + + - **Solution**: Use the 95th percentile of reconstruction errors as the threshold to balance false positives and false negatives. + +--- + +### CONCLUSION + +#### WHAT YOU HAVE LEARNED + +!!! tip "Insights gained from the data" + - Time-series anomalies often appear as sudden deviations from normal patterns. + +??? tip "Improvements in understanding machine learning concepts" + - Learned about LSTM autoencoders and their ability to reconstruct normal sequences. + +??? tip "Challenges faced and how they were overcome" + - Handling high reconstruction errors by tuning model hyperparameters. + - Selecting an appropriate anomaly threshold using statistical methods. + +--- + +#### USE CASES OF THIS MODEL + +=== "Application 1" + + - Financial fraud detection through irregular transaction patterns. + +=== "Application 2" + + - Predictive maintenance in industrial settings by identifying equipment failures. + +--- + From 68284bdaee14dc87fdc532e3f62089c6d3735a43 Mon Sep 17 00:00:00 2001 From: that-ar-guy Date: Tue, 18 Feb 2025 23:09:30 +0530 Subject: [PATCH 3/5] ss to be added --- .../deep-learning/anamoly-detection.md | 145 ++++++++++-------- 1 file changed, 84 insertions(+), 61 deletions(-) diff --git a/docs/projects/deep-learning/anamoly-detection.md b/docs/projects/deep-learning/anamoly-detection.md index 5ac9d936..f0fe3e8c 100644 --- a/docs/projects/deep-learning/anamoly-detection.md +++ b/docs/projects/deep-learning/anamoly-detection.md @@ -1,28 +1,34 @@ -# Time-Series Anomaly Detection +# 📜 Time-Series Anomaly Detection -### AIM +
+ +
+## 🎯 AIM To detect anomalies in time-series data using Long Short-Term Memory (LSTM) networks. -### DATASET +## 📊 DATASET LINK +[NOT USED] -Synthetic time-series data generated using sine wave with added noise. - -### KAGGLE NOTEBOOK +## 📓 KAGGLE NOTEBOOK [https://www.kaggle.com/code/thatarguy/lstm-anamoly-detection/notebook](https://www.kaggle.com/code/thatarguy/lstm-anamoly-detection/notebook) -### LIBRARIES NEEDED +??? Abstract "Kaggle Notebook" + + + - - numpy - - pandas - - yfinance - - matplotlib - - tensorflow - - scikit-learn +## ⚙️ TECH STACK + +| **Category** | **Technologies** | +|--------------------------|---------------------------------------------| +| **Languages** | Python | +| **Libraries/Frameworks** | TensorFlow, Keras, scikit-learn, numpy, pandas, matplotlib | +| **Tools** | Jupyter Notebook, VS Code | --- -### DESCRIPTION +## 📝 DESCRIPTION !!! info "What is the requirement of the project?" - The project focuses on identifying anomalies in time-series data using an LSTM autoencoder. The model learns normal patterns and detects deviations indicating anomalies. @@ -47,79 +53,98 @@ Synthetic time-series data generated using sine wave with added noise. --- -### Model Architecture - - The LSTM autoencoder learns normal time-series behavior and reconstructs it. Any deviation is considered an anomaly. - - Encoder: Extracts patterns using LSTM layers. - - Bottleneck: Compresses the data representation. - - Decoder: Reconstructs the original sequence. - - The reconstruction error determines anomalies. - -### Model Structure - - Input: Time-series sequence (50 time steps) - - LSTM Layers for encoding - - Repeat Vector to retain sequence information - - LSTM Layers for decoding - - TimeDistributed Dense Layer for reconstruction - - Loss Function: Mean Squared Error (MSE) +## 🔍 PROJECT EXPLANATION + +### 🧩 DATASET OVERVIEW & FEATURE DETAILS + +??? example "📂 Synthetic dataset" + + - The dataset consists of a sine wave with added noise. + + | Feature Name | Description | Datatype | + |--------------|-------------|:------------:| + | time | Timestamp | int64 | + | value | Sine wave value with noise | float64 | --- -#### WHAT I HAVE DONE +### 🛤 PROJECT WORKFLOW -=== "Step 1" +!!! success "Project workflow" - Exploratory Data Analysis + ``` mermaid + graph LR + A[Start] --> B{Generate Data}; + B --> C[Normalize Data]; + C --> D[Create Sequences]; + D --> E[Train LSTM Autoencoder]; + E --> F[Compute Reconstruction Error]; + F --> G[Identify Anomalies]; + ``` +=== "Step 1" - Generate synthetic data (sine wave with noise) - Normalize data using MinMaxScaler - Split data into training and validation sets === "Step 2" - - Data Cleaning and Preprocessing - - Create sequential data using a rolling window approach - Reshape data for LSTM compatibility === "Step 3" + - Implement LSTM autoencoder for anomaly detection + - Optimize model using Adam optimizer - Feature Engineering and Selection +=== "Step 4" + - Compute reconstruction error for anomaly detection + - Identify threshold for anomalies using percentile-based method - - Use LSTM layers for sequence modeling - - Implement autoencoder-based reconstruction +=== "Step 5" + - Visualize detected anomalies using Matplotlib -=== "Step 4" +--- - Modeling +### 🖥 CODE EXPLANATION - - Train an LSTM autoencoder - - Optimize loss function using Adam optimizer - - Monitor validation loss for overfitting prevention +=== "LSTM Autoencoder" + - The model consists of an encoder, bottleneck, and decoder. + - It learns normal time-series behavior and reconstructs it. + - Deviations from normal patterns are considered anomalies. -=== "Step 5" +--- - Result Analysis +### ⚖️ PROJECT TRADE-OFFS AND SOLUTIONS - - Compute reconstruction error for anomaly detection - - Identify threshold for anomalies using percentile-based method - - Visualize detected anomalies using Matplotlib +=== "Reconstruction Error Threshold Selection" + - Setting a high threshold may miss subtle anomalies, while a low threshold might increase false positives. + - **Solution**: Use the 95th percentile of reconstruction errors as the threshold to balance false positives and false negatives. --- -#### PROJECT TRADE-OFFS AND SOLUTIONS +## 🖼 SCREENSHOTS -=== "Trade Off 1" +!!! tip "Visualizations and EDA of different features" - **Reconstruction Error Threshold Selection:** - Setting a high threshold may miss subtle anomalies, while a low threshold might increase false positives. + === "Synthetic Data Plot" + - - **Solution**: Use the 95th percentile of reconstruction errors as the threshold to balance false positives and false negatives. +??? example "Model performance graphs" + + === "Reconstruction Error Plot" --- -### CONCLUSION +## 📉 MODELS USED AND THEIR EVALUATION METRICS -#### WHAT YOU HAVE LEARNED +| Model | Reconstruction Error (MSE) | +|------------------|---------------------------| +| LSTM Autoencoder | 0.015 | + +--- + +## ✅ CONCLUSION + +### 🔑 KEY LEARNINGS !!! tip "Insights gained from the data" - Time-series anomalies often appear as sudden deviations from normal patterns. @@ -133,15 +158,13 @@ Synthetic time-series data generated using sine wave with added noise. --- -#### USE CASES OF THIS MODEL - -=== "Application 1" +### 🌍 USE CASES - - Financial fraud detection through irregular transaction patterns. +=== "Financial Fraud Detection" + - Detect irregular transaction patterns using anomaly detection. -=== "Application 2" +=== "Predictive Maintenance" + - Identify equipment failures in industrial settings before they occur. - - Predictive maintenance in industrial settings by identifying equipment failures. ---- From a921c2f4d345da80a9bdb81986468d6d102247da Mon Sep 17 00:00:00 2001 From: Mohammed Abdul Rahman <130785777+that-ar-guy@users.noreply.github.com> Date: Tue, 18 Feb 2025 23:13:29 +0530 Subject: [PATCH 4/5] added images need to check locally --- docs/projects/deep-learning/anamoly-detection.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/projects/deep-learning/anamoly-detection.md b/docs/projects/deep-learning/anamoly-detection.md index f0fe3e8c..2cbf0954 100644 --- a/docs/projects/deep-learning/anamoly-detection.md +++ b/docs/projects/deep-learning/anamoly-detection.md @@ -126,12 +126,13 @@ To detect anomalies in time-series data using Long Short-Term Memory (LSTM) netw !!! tip "Visualizations and EDA of different features" === "Synthetic Data Plot" - + ![img](https://github.com/user-attachments/assets/4ff144a9-756a-43e3-aba2-609d92cbacd2) + ??? example "Model performance graphs" === "Reconstruction Error Plot" - + ![img](https://github.com/user-attachments/assets/e33a0537-9e23-4e21-b0e5-153a78ac4000) --- ## 📉 MODELS USED AND THEIR EVALUATION METRICS From ec659bc5ef11983dea81716b3b1a7468f6dc4021 Mon Sep 17 00:00:00 2001 From: that-ar-guy Date: Tue, 18 Feb 2025 23:15:26 +0530 Subject: [PATCH 5/5] images are correct --- docs/projects/deep-learning/anamoly-detection.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/projects/deep-learning/anamoly-detection.md b/docs/projects/deep-learning/anamoly-detection.md index 2cbf0954..5615d8b4 100644 --- a/docs/projects/deep-learning/anamoly-detection.md +++ b/docs/projects/deep-learning/anamoly-detection.md @@ -126,13 +126,13 @@ To detect anomalies in time-series data using Long Short-Term Memory (LSTM) netw !!! tip "Visualizations and EDA of different features" === "Synthetic Data Plot" - ![img](https://github.com/user-attachments/assets/4ff144a9-756a-43e3-aba2-609d92cbacd2) + ![img](https://github.com/user-attachments/assets/e33a0537-9e23-4e21-b0e5-153a78ac4000) ??? example "Model performance graphs" === "Reconstruction Error Plot" - ![img](https://github.com/user-attachments/assets/e33a0537-9e23-4e21-b0e5-153a78ac4000) + ![img](https://github.com/user-attachments/assets/4ff144a9-756a-43e3-aba2-609d92cbacd2) --- ## 📉 MODELS USED AND THEIR EVALUATION METRICS