diff --git a/docs/algorithms/machine-learning/supervised/classifications/index.md b/docs/algorithms/machine-learning/supervised/classifications/index.md index 06267036..eef10a31 100644 --- a/docs/algorithms/machine-learning/supervised/classifications/index.md +++ b/docs/algorithms/machine-learning/supervised/classifications/index.md @@ -1,5 +1,4 @@ # Classification Algorithms 🤖 -
@@ -8,4 +7,4 @@

There are no items available at this time. Check back again later.

-
+
\ No newline at end of file diff --git a/docs/algorithms/machine-learning/supervised/regressions/Logistic_Regression.md b/docs/algorithms/machine-learning/supervised/regressions/Logistic_Regression.md deleted file mode 100644 index d265be0a..00000000 --- a/docs/algorithms/machine-learning/supervised/regressions/Logistic_Regression.md +++ /dev/null @@ -1,124 +0,0 @@ -# Logistic Regression - -This module contains an implementation of Logistic Regression, a popular algorithm for binary classification. - -## Parameters - -- `learning_rate`: Step size for gradient descent. -- `n_iterations`: Number of iterations for gradient descent. - -## Scratch Code - -- logistic_regression.py file - -```py -import numpy as np - -class LogisticRegression: - def __init__(self, learning_rate=0.01, n_iterations=1000): - """ - Constructor for the LogisticRegression class. - - Parameters: - - learning_rate: The step size for gradient descent. - - n_iterations: The number of iterations for gradient descent. - """ - self.learning_rate = learning_rate - self.n_iterations = n_iterations - self.weights = None - self.bias = None - - def _sigmoid(self, z): - """ - Sigmoid activation function. - - Parameters: - - z: Linear combination of input features and weights. - - Returns: - - Sigmoid of z. - """ - return 1 / (1 + np.exp(-z)) - - def _initialize_parameters(self, n_features): - """ - Initialize weights and bias. - - Parameters: - - n_features: Number of input features. - - Returns: - - Initialized weights and bias. - """ - self.weights = np.zeros(n_features) - self.bias = 0 - - def fit(self, X, y): - """ - Fit the Logistic Regression model to the input data. - - Parameters: - - X: Input features (numpy array). - - y: Target labels (numpy array). - """ - n_samples, n_features = X.shape - self._initialize_parameters(n_features) - - for _ in range(self.n_iterations): - # Linear combination of features and weights - linear_combination = np.dot(X, self.weights) + self.bias - - # Predictions using the sigmoid function - predictions = self._sigmoid(linear_combination) - - # Update weights and bias using gradient descent - dw = (1 / n_samples) * np.dot(X.T, (predictions - y)) - db = (1 / n_samples) * np.sum(predictions - y) - - self.weights -= self.learning_rate * dw - self.bias -= self.learning_rate * db - - def predict(self, X): - """ - Make predictions on new data. - - Parameters: - - X: Input features for prediction (numpy array). - - Returns: - - Predicted labels (numpy array). - """ - linear_combination = np.dot(X, self.weights) + self.bias - predictions = self._sigmoid(linear_combination) - - # Convert probabilities to binary predictions (0 or 1) - return np.round(predictions) -``` - -- logistic_regression_test.py file - -```py -import numpy as np -import unittest -from LogisticRegression import LogisticRegression - -class TestLogisticRegression(unittest.TestCase): - def setUp(self): - # Generate synthetic data for testing - np.random.seed(42) - self.X_train = np.random.rand(100, 2) - self.y_train = (np.random.rand(100) > 0.5).astype(int) - - self.X_test = np.random.rand(20, 2) - - def test_fit_predict(self): - model = LogisticRegression(learning_rate=0.01, n_iterations=1000) - model.fit(self.X_train, self.y_train) - predictions = model.predict(self.X_test) - - self.assertEqual(predictions.shape, (20,)) - self.assertTrue(np.all(predictions == 0) or np.all(predictions == 1)) - -if __name__ == '__main__': - unittest.main() -``` diff --git a/docs/algorithms/machine-learning/supervised/regressions/index.md b/docs/algorithms/machine-learning/supervised/regressions/index.md index 9fd9026a..85452e30 100644 --- a/docs/algorithms/machine-learning/supervised/regressions/index.md +++ b/docs/algorithms/machine-learning/supervised/regressions/index.md @@ -21,6 +21,14 @@

📅 2025-01-19 | ⏱️ 3 mins

- + + + +
+

Logistic Regression

+

Classifying data into discrete categories.

+

📅 2025-01-19 | ⏱️ 2 mins

+
+
diff --git a/docs/algorithms/machine-learning/supervised/regressions/logistic-regression.md b/docs/algorithms/machine-learning/supervised/regressions/logistic-regression.md new file mode 100644 index 00000000..f39f703f --- /dev/null +++ b/docs/algorithms/machine-learning/supervised/regressions/logistic-regression.md @@ -0,0 +1,174 @@ +# 🧮 Logistic Regression Algorithm + +
Logistic Regression Poster
+ +## 🎯 Objective +Logistic Regression is a supervised learning algorithm used for classification tasks. It predicts the probability of a data point belonging to a particular class, mapping the input to a value between 0 and 1 using a logistic (sigmoid) function. + +## 📚 Prerequisites +- Basic understanding of Linear Algebra and Probability. +- Familiarity with the concept of classification. +- Libraries: NumPy, Pandas, Matplotlib, Scikit-learn. + +--- + +## 🧩 Inputs +- *Input Dataset*: A structured dataset with features (independent variables) and corresponding labels (dependent variable). +- The dependent variable should be categorical (binary or multiclass). +- Example: A CSV file with columns like `age`, `income`, and `purchased` (label). + + +## 📤 Outputs +- *Predicted Class*: The output is the probability of each data point belonging to a class. +- *Binary Classification*: Outputs 0 or 1 (e.g., Yes or No). +- *Multiclass Classification*: Outputs probabilities for multiple categories. + +--- + +## 🏛️ Algorithm Architecture + +### 1. Hypothesis Function +The hypothesis function of Logistic Regression applies the sigmoid function: + +\[ +h_\theta(x) = \frac{1}{1 + e^{-\theta^T x}} +\] + +--- + +### 2. Cost Function +The cost function used in Logistic Regression is the log-loss (or binary cross-entropy): + +\[ +J(\theta) = -\frac{1}{m} \sum_{i=1}^m \left[ y^{(i)} \log(h_\theta(x^{(i)})) + (1 - y^{(i)}) \log(1 - h_\theta(x^{(i)})) \right] +\] + +--- + +### 3. Gradient Descent +The parameters of the logistic regression model are updated using the gradient descent algorithm: + +\[ +\theta := \theta - \alpha \frac{\partial J(\theta)}{\partial \theta} +\] + +--- + +## 🏋️‍♂️ Training Process +- **Model**: Logistic Regression model from sklearn. + +- **Validation Strategy**: A separate portion of the dataset can be reserved for validation (e.g., 20%), but this is not explicitly implemented in the current code. + +- **Training Data**: The model is trained on the entire provided dataset. + + + +--- + +## 📊 Evaluation Metrics +- Accuracy is used to evaluate the classification performance of the model. + +\[ +\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN} +\] + +Where: + +- **TP**: True Positives +- **TN**: True Negatives +- **FP**: False Positives +- **FN**: False Negatives + +--- + +## 💻 Code Implementation + +```python +import numpy as np +from sklearn.linear_model import LogisticRegression +from sklearn.metrics import accuracy_score + +# Generate Example Dataset +np.random.seed(42) +X = np.random.rand(100, 2) # Features +y = (X[:, 0] + X[:, 1] > 1).astype(int) # Labels: 0 or 1 based on sum of features + +# Train Logistic Regression Model +model = LogisticRegression() +model.fit(X, y) + +# Predictions +y_pred = model.predict(X) +accuracy = accuracy_score(y, y_pred) + +# Output Accuracy +print("Accuracy:", accuracy) +``` + +## 🔍 Scratch Code Explanation +1. **Dataset Generation**: + + - A random dataset with 100 samples and 2 features is created. + + - Labels (`y`) are binary, determined by whether the sum of feature values is greater than 1. + +2. **Model Training**: + - The `LogisticRegression` model from `sklearn` is initialized and trained on the dataset using the fit method. + +3. **Predictions**: + + - The model predicts the labels for the input data (`X`) using the `predict` method. + + - The `accuracy_score` function evaluates the accuracy of the predictions. + +4. **Output**: + + - The calculated accuracy is printed to the console. + + +### 🛠️ Example Usage: Predicting Customer Retention + +```python +# Example Data: Features (e.g., hours spent on platform, number of purchases) +X = np.array([[5.0, 20.0], [2.0, 10.0], [8.0, 50.0], [1.0, 5.0]]) # Features +y = np.array([1, 0, 1, 0]) # Labels: 1 (retained), 0 (not retained) + +# Train Logistic Regression Model +model = LogisticRegression() +model.fit(X, y) + +# Predict Retention for New Customers +X_new = np.array([[3.0, 15.0], [7.0, 30.0]]) +y_pred = model.predict(X_new) + +print("Predicted Retention (1 = Retained, 0 = Not Retained):", y_pred) +``` + +- This demonstrates how Logistic Regression can be applied to predict customer retention based on behavioral data, showcasing its practicality for real-world binary classification tasks. + + + +--- + +## 🌟 Advantages + - Simple and efficient for binary classification problems. + + - Outputs probabilities, allowing flexibility in decision thresholds. + + - Easily extendable to multiclass classification using the one-vs-rest (OvR) or multinomial approach. + +## ⚠️ Limitations + +- Assumes a linear relationship between features and log-odds of the target. + +- Not effective when features are highly correlated or when there is a non-linear relationship. + +## 🚀 Application + +=== "Application 1" + **Medical Diagnosis**: Predicting the likelihood of a disease based on patient features. + + +=== "Application 2" + **Marketing**: Determining whether a customer will purchase a product based on demographic and behavioral data. +