Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
133 changes: 133 additions & 0 deletions docs/algorithms/deep-learning/neural-networks/multilayer-perceptron.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
![image](https://github.com/user-attachments/assets/018c5462-5977-415f-8600-65f5560722fd)

# Multilayer Perceptron (MLP)

---

## **What is a Multilayer Perceptron (MLP)?**

A **Multilayer Perceptron (MLP)** is a type of **artificial neural network (ANN)** that consists of multiple layers of neurons, designed to learn and map relationships between input data and output predictions. It is foundational building block for Deep Learning.

### **Key Characteristics of MLP**:
- **Fully Connected Layers**: Each neuron in one layer is connected to every neuron in the next layer.
- **Non-linear Activation Functions**: Introduces non-linearity to help the model learn complex patterns.
- **Supervised Learning**: Typically trained using labeled data with **backpropagation** and optimization algorithms like **Stochastic Gradient Descent (SGD)** or **Adam**.

---

## **Architecture of MLP**

An MLP consists of three main types of layers:

1. **Input Layer**:
- Accepts the input features (e.g., pixels of an image, numerical data).
- Every neuron corresponds to one input feature .

2. **Hidden Layers**:
- Perform intermediate computations to learn the patterns and relationships in data.
- Can have one or more layers depending on the complexity of the problem.

3. **Output Layer**:
- Produces the final prediction.
- The number of neurons corresponds to the number of output classes (for classification tasks) or a single neuron for regression tasks.

### **Flow of Data in MLP**:
1. **Linear transformation**: \( z = W \cdot x + b \)
- \( W \): Weight matrix
- \( x \): Input
- \( b \): Bias
2. **Non-linear activation**: \( a = f(z) \), where \( f \) is an activation function (e.g., ReLU, sigmoid, or tanh).

---

## **Applications of Multilayer Perceptron**

### **Classification**:
- Handwritten digit recognition (e.g., MNIST dataset).
- Sentiment analysis of text.
- Image classification for small datasets.

### **Regression**:
- Predicting house prices based on features like area, location, etc.
- Forecasting time series data like stock prices or weather, etc.

### **Healthcare**:
- Disease diagnosis based on patient records.
- Predicting patient outcomes in hospitals.

### **Finance**:
- Fraud detection in credit card transactions.
- Risk assessment and loan approval.

### **Speech and Audio**:
- Voice recognition.
- Music genre classification.

---

## **Key Concepts in MLP**

### **1. Activation Functions**:
- Introduced non linearity to model, enabling it to learn complex patterns.
- Commonly used:
- **ReLU (Rectified Linear Unit)**: \( f(x) = \max(0, x) \)
- **Sigmoid**: \( f(x) = \frac{1}{1 + e^{-x}} \)
- **Tanh**: \( f(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}} \)

### **2. Loss Functions**:
- It measures the difference between predicted and actual values.
- Common examples:
- **Mean Squared Error (MSE)**: Used for regression.
- **Categorical Crossentropy**: Used for classification.

### **3. Backpropagation**:
- A technique used to compute gradients for updating weights.
- Consists of:
1. **Forward pass**: Calculate the output.
2. **Backward pass**: Compute gradients using the chain rule.
3. **Weight update**: Optimize weights using an optimizer.

### **4. Optimizers**:
- Algorithms that adjusts weights to minimize the loss function to improve model.
- Examples: **SGD**, **Adam**, **RMSprop**.

---

## **Advantages of MLP**
- Can model non-linear relationships between inputs and outputs.
- Versatile for solving both classification and regression problems.
- Ability to approximate any continuous function (Universal Approximation Theorem).

---

## **Limitations of MLP**
- Computationally expensive for large datasets.
- Prone to overfitting if not regularized properly.
- Less effective for image or sequential data without specialized architectures (e.g., CNNs, RNNs).

---

## **Code Example: Implementing MLP Using Keras**

```python
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Build the MLP
model = Sequential([
Dense(128, activation='relu', input_shape=(20,)), # Input layer (20 features)
Dense(64, activation='relu'), # Hidden layer
Dense(1, activation='sigmoid') # Output layer (binary classification)
])

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Summary of the model
model.summary()
```
---

# **Applications in Real-world Projects**
* Use MLP for datasets where data is in tabular or vector format (e.g., CSV files).
* Fine-tune the architecture by adjusting the number of neurons and layers based on your dataset.
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# Recurrent Neural Networks (RNN)

---

## **What is a Recurrent Neural Network (RNN)?**

A **Recurrent Neural Network (RNN)** is a type of artificial neural network designed for modeling **sequential data**. Unlike traditional feedforward networks, RNNs have the capability to remember information from previous time steps, making them well-suited for tasks involving temporal or sequential relationships.

### **Key Characteristics of RNN**:
- **Sequential Processing**: Processes inputs sequentially, one step at a time.
- **Memory Capability**: Uses hidden states to store information about previous steps.
- **Shared Weights**: The same weights are applied across all time steps, reducing complexity.

---

## **Architecture of RNN**

### **Components of RNN**:
1. **Input Layer**:
- Accepts sequential input data (e.g., time-series data, text, or audio signals).

2. **Hidden Layer with Recurrence**:
- Maintains a **hidden state** \( h_t \), which is updated at each time step based on the input and the previous hidden state.
- Formula:
\[
h_t = f(W_h \cdot h_{t-1} + W_x \cdot x_t + b)
\]
Where:
- \( h_t \): Current hidden state.
- \( h_{t-1} \): Previous hidden state.
- \( x_t \): Input at time step \( t \).
- \( W_h, W_x \): Weight matrices.
- \( b \): Bias.
- \( f \): Activation function (e.g., tanh or ReLU).

3. **Output Layer**:
- Produces output based on the current hidden state.
- Formula:
\[
y_t = g(W_y \cdot h_t + c)
\]
Where:
- \( y_t \): Output at time step \( t \).
- \( W_y \): Output weight matrix.
- \( c \): Output bias.
- \( g \): Activation function (e.g., softmax or sigmoid).

---

## **Types of RNNs**

### **1. Vanilla RNN**:
- Standard RNN that processes sequential data using the hidden state.
- Struggles with long-term dependencies due to **vanishing gradient problems**.

### **2. Long Short-Term Memory (LSTM)**:
- A specialized type of RNN that can learn long-term dependencies by using **gates** to control the flow of information.
- Components:
- **Forget Gate**: Decides what to forget.
- **Input Gate**: Decides what to store.
- **Output Gate**: Controls the output.

### **3. Gated Recurrent Unit (GRU)**:
- A simplified version of LSTM that combines the forget and input gates into a single **update gate**.

---

## **Applications of RNN**

### **1. Natural Language Processing (NLP)**:
- Text generation (e.g., predictive typing, chatbots).
- Sentiment analysis.
- Language translation.

### **2. Time-Series Analysis**:
- Stock price prediction.
- Weather forecasting.
- Energy demand forecasting.

### **3. Speech and Audio Processing**:
- Speech-to-text transcription.
- Music generation.

### **4. Video Analysis**:
- Video captioning.
- Action recognition.

---

## **Advantages of RNN**
- Can handle sequential and time-dependent data.
- Shared weights reduce model complexity.
- Effective for tasks with context dependencies, such as language modeling.

---

## **Limitations of RNN**
- **Vanishing Gradient Problem**:
- Makes it difficult to learn long-term dependencies.
- Computationally expensive for long sequences.
- Struggles with parallelization compared to other architectures like CNNs.

---

## **Code Example: Implementing RNN Using Keras**

```python
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense

# Build the RNN model
model = Sequential([
SimpleRNN(128, activation='tanh', input_shape=(10, 1)), # 10 timesteps, 1 feature
Dense(1, activation='sigmoid') # Output layer (binary classification)
])

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Summary of the model
model.summary()
```

---

# **Applications in Real-world Projects**
* Use RNN for tasks involving sequential data where past information impacts the future.
* Prefer LSTM or GRU over vanilla RNN for learning long-term dependencies.
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
# Recurrent Neural Networks (RNN)

---

## **What is a Recurrent Neural Network (RNN)?**

A **Recurrent Neural Network (RNN)** is a type of artificial neural network designed for modeling **sequential data**. Unlike traditional feedforward networks, RNNs have the capability to remember information from previous time steps, making them well-suited for tasks involving temporal or sequential relationships.

### **Key Characteristics of RNN**:
- **Sequential Processing**: Processes inputs sequentially, one step at a time.
- **Memory Capability**: Uses hidden states to store information about previous steps.
- **Shared Weights**: The same weights are applied across all time steps, reducing complexity.

---

## **Architecture of RNN**

### **Components of RNN**:
1. **Input Layer**:
- Accepts sequential input data (e.g., time-series data, text, or audio signals).

2. **Hidden Layer with Recurrence**:
- Maintains a **hidden state** h_t, which is updated at each time step based on the input and the previous hidden state.
- Formula:
\[
h_t = f(W_h \cdot h_{t-1} + W_x \cdot x_t + b)
\]
Where:
- h_t: Current hidden state.
- h_{t-1}: Previous hidden state.
- x_t: Input at time step t.
- W_h, W_x: Weight matrices.
- b: Bias.
- f: Activation function (e.g., tanh or ReLU).

3. **Output Layer**:
- Produces output based on the current hidden state.
- Formula:
\[
y_t = g(W_y \cdot h_t + c)
\]
Where:
- y_t: Output at time step t.
- W_y: Output weight matrix.
- c: Output bias.
- g: Activation function (e.g., softmax or sigmoid).

---

## **Types of RNNs**

### **1. Vanilla RNN**:
- Standard RNN that processes sequential data using the hidden state.
- Struggles with long-term dependencies due to **vanishing gradient problems**.

### **2. Long Short-Term Memory (LSTM)**:
- A specialized type of RNN that can learn long-term dependencies by using **gates** to control the flow of information.
- Components:
- **Forget Gate**: Decides what to forget.
- **Input Gate**: Decides what to store.
- **Output Gate**: Controls the output.

### **3. Gated Recurrent Unit (GRU)**:
- A simplified version of LSTM that combines the forget and input gates into a single **update gate**.

---

## **Applications of RNN**

### **1. Natural Language Processing (NLP)**:
- Text generation (e.g., predictive typing, chatbots).
- Sentiment analysis.
- Language translation.

### **2. Time-Series Analysis**:
- Stock price prediction.
- Weather forecasting.
- Energy demand forecasting.

### **3. Speech and Audio Processing**:
- Speech-to-text transcription.
- Music generation.

### **4. Video Analysis**:
- Video captioning.
- Action recognition.

---

## **Advantages of RNN**
- Can handle sequential and time-dependent data.
- Shared weights reduce model complexity.
- Effective for tasks with context dependencies, such as language modeling.

---

## **Limitations of RNN**
- **Vanishing Gradient Problem**:
- Makes it difficult to learn long-term dependencies.
- Computationally expensive for long sequences.
- Struggles with parallelization compared to other architectures like CNNs.

---

## **Code Example: Implementing RNN Using Keras**

```python
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense

# Build the RNN model
model = Sequential([
SimpleRNN(128, activation='tanh', input_shape=(10, 1)), # 10 timesteps, 1 feature
Dense(1, activation='sigmoid') # Output layer (binary classification)
])

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Summary of the model
model.summary()
```
---

# **Applications in Real-world Projects**

Use RNN for tasks involving sequential data where past information impacts the future.

Prefer LSTM or GRU over vanilla RNN for learning long-term dependencies.