diff --git a/docs/algorithms/deep-learning/neural-networks/multilayer-perceptron.md b/docs/algorithms/deep-learning/neural-networks/multilayer-perceptron.md
new file mode 100644
index 00000000..62412ed7
--- /dev/null
+++ b/docs/algorithms/deep-learning/neural-networks/multilayer-perceptron.md
@@ -0,0 +1,133 @@
+![image](https://github.com/user-attachments/assets/018c5462-5977-415f-8600-65f5560722fd)
+
+# Multilayer Perceptron (MLP)
+
+---
+
+## **What is a Multilayer Perceptron (MLP)?**
+
+A **Multilayer Perceptron (MLP)** is a type of **artificial neural network (ANN)** that consists of multiple layers of neurons, designed to learn and map relationships between input data and output predictions. It is  foundational building block for Deep Learning.
+
+### **Key Characteristics of MLP**:
+- **Fully Connected Layers**: Each neuron in one layer is connected to every neuron in the next layer.
+- **Non-linear Activation Functions**: Introduces non-linearity to help the model learn complex patterns.
+- **Supervised Learning**: Typically trained using labeled data with **backpropagation** and optimization algorithms like **Stochastic Gradient Descent (SGD)** or **Adam**.
+
+---
+
+## **Architecture of MLP**
+
+An MLP consists of three main types of layers:
+
+1. **Input Layer**:
+   - Accepts the input features (e.g., pixels of an image, numerical data).
+   - Every neuron corresponds to one input feature .
+
+2. **Hidden Layers**:
+   - Perform intermediate computations to learn the patterns and relationships in data.
+   - Can have one or more layers depending on the complexity of the problem.
+
+3. **Output Layer**:
+   - Produces the final prediction.
+   - The number of neurons corresponds to the number of output classes (for classification tasks) or a single neuron for regression tasks.
+
+### **Flow of Data in MLP**:
+1. **Linear transformation**: \( z = W \cdot x + b \)  
+   - \( W \): Weight matrix  
+   - \( x \): Input  
+   - \( b \): Bias  
+2. **Non-linear activation**: \( a = f(z) \), where \( f \) is an activation function (e.g., ReLU, sigmoid, or tanh).
+
+---
+
+## **Applications of Multilayer Perceptron**
+
+### **Classification**:
+- Handwritten digit recognition (e.g., MNIST dataset).
+- Sentiment analysis of text.
+- Image classification for small datasets.
+
+### **Regression**:
+- Predicting house prices based on features like area, location, etc.
+- Forecasting time series data like stock prices or weather, etc.
+
+### **Healthcare**:
+- Disease diagnosis based on patient records.
+- Predicting patient outcomes in hospitals.
+
+### **Finance**:
+- Fraud detection in credit card transactions.
+- Risk assessment and loan approval.
+
+### **Speech and Audio**:
+- Voice recognition.
+- Music genre classification.
+
+---
+
+## **Key Concepts in MLP**
+
+### **1. Activation Functions**:
+- Introduced non linearity to model, enabling it to learn complex patterns.
+- Commonly used:
+  - **ReLU (Rectified Linear Unit)**: \( f(x) = \max(0, x) \)
+  - **Sigmoid**: \( f(x) = \frac{1}{1 + e^{-x}} \)
+  - **Tanh**: \( f(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}} \)
+
+### **2. Loss Functions**:
+- It measures the difference between predicted and actual values.
+- Common examples:
+  - **Mean Squared Error (MSE)**: Used for regression.
+  - **Categorical Crossentropy**: Used for classification.
+
+### **3. Backpropagation**:
+- A technique used to compute gradients for updating weights.
+- Consists of:
+  1. **Forward pass**: Calculate the output.
+  2. **Backward pass**: Compute gradients using the chain rule.
+  3. **Weight update**: Optimize weights using an optimizer.
+
+### **4. Optimizers**:
+- Algorithms that adjusts weights to minimize the loss function to improve model.
+- Examples: **SGD**, **Adam**, **RMSprop**.
+
+---
+
+## **Advantages of MLP**
+- Can model non-linear relationships between inputs and outputs.
+- Versatile for solving both classification and regression problems.
+- Ability to approximate any continuous function (Universal Approximation Theorem).
+
+---
+
+## **Limitations of MLP**
+- Computationally expensive for large datasets.
+- Prone to overfitting if not regularized properly.
+- Less effective for image or sequential data without specialized architectures (e.g., CNNs, RNNs).
+
+---
+
+## **Code Example: Implementing MLP Using Keras**
+
+```python
+from tensorflow.keras.models import Sequential
+from tensorflow.keras.layers import Dense
+
+# Build the MLP
+model = Sequential([
+    Dense(128, activation='relu', input_shape=(20,)),  # Input layer (20 features)
+    Dense(64, activation='relu'),                     # Hidden layer
+    Dense(1, activation='sigmoid')                    # Output layer (binary classification)
+])
+
+# Compile the model
+model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
+
+# Summary of the model
+model.summary()
+```
+---
+
+# **Applications in Real-world Projects**
+* Use MLP for datasets where data is in tabular or vector format (e.g., CSV files).
+* Fine-tune the architecture by adjusting the number of neurons and layers based on your dataset.
diff --git a/docs/algorithms/deep-learning/neural-networks/recurrent-neural-networks.md b/docs/algorithms/deep-learning/neural-networks/recurrent-neural-networks.md
new file mode 100644
index 00000000..221ab80f
--- /dev/null
+++ b/docs/algorithms/deep-learning/neural-networks/recurrent-neural-networks.md
@@ -0,0 +1,128 @@
+# Recurrent Neural Networks (RNN)
+
+---
+
+## **What is a Recurrent Neural Network (RNN)?**
+
+A **Recurrent Neural Network (RNN)** is a type of artificial neural network designed for modeling **sequential data**. Unlike traditional feedforward networks, RNNs have the capability to remember information from previous time steps, making them well-suited for tasks involving temporal or sequential relationships.
+
+### **Key Characteristics of RNN**:
+- **Sequential Processing**: Processes inputs sequentially, one step at a time.
+- **Memory Capability**: Uses hidden states to store information about previous steps.
+- **Shared Weights**: The same weights are applied across all time steps, reducing complexity.
+
+---
+
+## **Architecture of RNN**
+
+### **Components of RNN**:
+1. **Input Layer**:
+   - Accepts sequential input data (e.g., time-series data, text, or audio signals).
+
+2. **Hidden Layer with Recurrence**:
+   - Maintains a **hidden state** \( h_t \), which is updated at each time step based on the input and the previous hidden state.
+   - Formula:  
+     \[
+     h_t = f(W_h \cdot h_{t-1} + W_x \cdot x_t + b)
+     \]
+     Where:
+     - \( h_t \): Current hidden state.
+     - \( h_{t-1} \): Previous hidden state.
+     - \( x_t \): Input at time step \( t \).
+     - \( W_h, W_x \): Weight matrices.
+     - \( b \): Bias.
+     - \( f \): Activation function (e.g., tanh or ReLU).
+
+3. **Output Layer**:
+   - Produces output based on the current hidden state.
+   - Formula:  
+     \[
+     y_t = g(W_y \cdot h_t + c)
+     \]
+     Where:
+     - \( y_t \): Output at time step \( t \).
+     - \( W_y \): Output weight matrix.
+     - \( c \): Output bias.
+     - \( g \): Activation function (e.g., softmax or sigmoid).
+
+---
+
+## **Types of RNNs**
+
+### **1. Vanilla RNN**:
+- Standard RNN that processes sequential data using the hidden state.
+- Struggles with long-term dependencies due to **vanishing gradient problems**.
+
+### **2. Long Short-Term Memory (LSTM)**:
+- A specialized type of RNN that can learn long-term dependencies by using **gates** to control the flow of information.
+- Components:
+  - **Forget Gate**: Decides what to forget.
+  - **Input Gate**: Decides what to store.
+  - **Output Gate**: Controls the output.
+
+### **3. Gated Recurrent Unit (GRU)**:
+- A simplified version of LSTM that combines the forget and input gates into a single **update gate**.
+
+---
+
+## **Applications of RNN**
+
+### **1. Natural Language Processing (NLP)**:
+- Text generation (e.g., predictive typing, chatbots).
+- Sentiment analysis.
+- Language translation.
+
+### **2. Time-Series Analysis**:
+- Stock price prediction.
+- Weather forecasting.
+- Energy demand forecasting.
+
+### **3. Speech and Audio Processing**:
+- Speech-to-text transcription.
+- Music generation.
+
+### **4. Video Analysis**:
+- Video captioning.
+- Action recognition.
+
+---
+
+## **Advantages of RNN**
+- Can handle sequential and time-dependent data.
+- Shared weights reduce model complexity.
+- Effective for tasks with context dependencies, such as language modeling.
+
+---
+
+## **Limitations of RNN**
+- **Vanishing Gradient Problem**:
+  - Makes it difficult to learn long-term dependencies.
+- Computationally expensive for long sequences.
+- Struggles with parallelization compared to other architectures like CNNs.
+
+---
+
+## **Code Example: Implementing RNN Using Keras**
+
+```python
+from tensorflow.keras.models import Sequential
+from tensorflow.keras.layers import SimpleRNN, Dense
+
+# Build the RNN model
+model = Sequential([
+    SimpleRNN(128, activation='tanh', input_shape=(10, 1)),  # 10 timesteps, 1 feature
+    Dense(1, activation='sigmoid')                          # Output layer (binary classification)
+])
+
+# Compile the model
+model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
+
+# Summary of the model
+model.summary()
+```
+
+---
+
+# **Applications in Real-world Projects**
+* Use RNN for tasks involving sequential data where past information impacts the future.
+* Prefer LSTM or GRU over vanilla RNN for learning long-term dependencies.
diff --git "a/docs/algorithms/deep-learning/neural-networks/recurrent-neural-networks.md\nl" "b/docs/algorithms/deep-learning/neural-networks/recurrent-neural-networks.md\nl"
new file mode 100644
index 00000000..db8954e4
--- /dev/null
+++ "b/docs/algorithms/deep-learning/neural-networks/recurrent-neural-networks.md\nl"
@@ -0,0 +1,129 @@
+# Recurrent Neural Networks (RNN)
+
+---
+
+## **What is a Recurrent Neural Network (RNN)?**
+
+A **Recurrent Neural Network (RNN)** is a type of artificial neural network designed for modeling **sequential data**. Unlike traditional feedforward networks, RNNs have the capability to remember information from previous time steps, making them well-suited for tasks involving temporal or sequential relationships.
+
+### **Key Characteristics of RNN**:
+- **Sequential Processing**: Processes inputs sequentially, one step at a time.
+- **Memory Capability**: Uses hidden states to store information about previous steps.
+- **Shared Weights**: The same weights are applied across all time steps, reducing complexity.
+
+---
+
+## **Architecture of RNN**
+
+### **Components of RNN**:
+1. **Input Layer**:
+   - Accepts sequential input data (e.g., time-series data, text, or audio signals).
+
+2. **Hidden Layer with Recurrence**:
+   - Maintains a **hidden state** h_t, which is updated at each time step based on the input and the previous hidden state.
+   - Formula:  
+\[
+     h_t = f(W_h \cdot h_{t-1} + W_x \cdot x_t + b)
+\]
+     Where:
+     - h_t: Current hidden state.
+     - h_{t-1}: Previous hidden state.
+     - x_t: Input at time step t.
+     - W_h, W_x: Weight matrices.
+     - b: Bias.
+     - f: Activation function (e.g., tanh or ReLU).
+
+3. **Output Layer**:
+   - Produces output based on the current hidden state.
+   - Formula:  
+\[
+     y_t = g(W_y \cdot h_t + c)
+\]
+     Where:
+     - y_t: Output at time step t.
+     - W_y: Output weight matrix.
+     - c: Output bias.
+     - g: Activation function (e.g., softmax or sigmoid).
+
+---
+
+## **Types of RNNs**
+
+### **1. Vanilla RNN**:
+- Standard RNN that processes sequential data using the hidden state.
+- Struggles with long-term dependencies due to **vanishing gradient problems**.
+
+### **2. Long Short-Term Memory (LSTM)**:
+- A specialized type of RNN that can learn long-term dependencies by using **gates** to control the flow of information.
+- Components:
+  - **Forget Gate**: Decides what to forget.
+  - **Input Gate**: Decides what to store.
+  - **Output Gate**: Controls the output.
+
+### **3. Gated Recurrent Unit (GRU)**:
+- A simplified version of LSTM that combines the forget and input gates into a single **update gate**.
+
+---
+
+## **Applications of RNN**
+
+### **1. Natural Language Processing (NLP)**:
+- Text generation (e.g., predictive typing, chatbots).
+- Sentiment analysis.
+- Language translation.
+
+### **2. Time-Series Analysis**:
+- Stock price prediction.
+- Weather forecasting.
+- Energy demand forecasting.
+
+### **3. Speech and Audio Processing**:
+- Speech-to-text transcription.
+- Music generation.
+
+### **4. Video Analysis**:
+- Video captioning.
+- Action recognition.
+
+---
+
+## **Advantages of RNN**
+- Can handle sequential and time-dependent data.
+- Shared weights reduce model complexity.
+- Effective for tasks with context dependencies, such as language modeling.
+
+---
+
+## **Limitations of RNN**
+- **Vanishing Gradient Problem**:
+  - Makes it difficult to learn long-term dependencies.
+- Computationally expensive for long sequences.
+- Struggles with parallelization compared to other architectures like CNNs.
+
+---
+
+## **Code Example: Implementing RNN Using Keras**
+
+```python
+from tensorflow.keras.models import Sequential
+from tensorflow.keras.layers import SimpleRNN, Dense
+
+# Build the RNN model
+model = Sequential([
+    SimpleRNN(128, activation='tanh', input_shape=(10, 1)),  # 10 timesteps, 1 feature
+    Dense(1, activation='sigmoid')                          # Output layer (binary classification)
+])
+
+# Compile the model
+model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
+
+# Summary of the model
+model.summary()
+```
+---
+
+# **Applications in Real-world Projects**
+
+Use RNN for tasks involving sequential data where past information impacts the future.
+
+Prefer LSTM or GRU over vanilla RNN for learning long-term dependencies.
\ No newline at end of file