diff --git a/docs/index.mdx b/docs/index.mdx index 20061af..51029e1 100644 --- a/docs/index.mdx +++ b/docs/index.mdx @@ -102,7 +102,7 @@ Select a technology below to dive into our structured tutorials. Each path is de
Learn NoSQL database concepts with MongoDB. Store, query, and manage data efficiently for modern applications.
-Explore artificial intelligence, machine learning, and neural networks with beginner-friendly examples.
+
+Scatter plots display individual data points on an XY plane. They are the first step in identifying **Correlation**.
+* **Linear Relationship:** Points form a straight line.
+* **Non-linear Relationship:** Points form a curve.
+* **No Relationship:** Points look like a random cloud.
+
+### B. Bar Charts vs. Pie Charts
+* **Bar Charts:** Best for comparing a numerical value across different categories.
+* **Pie Charts:** Best for showing parts of a whole (though bar charts are often preferred for readability).
+
+---
+
+## 3. Visualizing Multiple Variables (Multivariate)
+
+### A. Heatmaps (Correlation Matrices)
+In ML, we often have dozens of features. A heatmap uses color to represent the correlation coefficient between every pair of features. This helps in **Feature Selection** by identifying redundant variables.
+
+
+
+### B. Pair Plots
+A grid of scatter plots for every pair of features in a dataset. It allows you to see relationships across the entire dataset at once.
+
+---
+
+## 4. Anscombe's Quartet: Why Visualization Matters
+The most famous example of why we visualize is **Anscombe's Quartet**. It consists of four datasets that have nearly identical descriptive statistics (mean, variance, correlation), yet look completely different when graphed.
+
+
+
+:::tip ML Best Practice
+Never start training a model before visualizing your data. Plots often reveal data quality issues (like sensors being stuck at a maximum value) that summary statistics would miss.
+:::
+
+---
+
+Visualizing our data often reveals a specific "bell-shaped" curve that appears everywhere in nature and math. Understanding this curve is our next major step.
\ No newline at end of file
diff --git a/docs/machine-learning/statistics/descriptive-statistics.mdx b/docs/machine-learning/statistics/descriptive-statistics.mdx
index e69de29..d74ed47 100644
--- a/docs/machine-learning/statistics/descriptive-statistics.mdx
+++ b/docs/machine-learning/statistics/descriptive-statistics.mdx
@@ -0,0 +1,70 @@
+---
+title: "Descriptive Statistics"
+sidebar_label: Descriptive Statistics
+description: "Mastering measures of central tendency (mean, median, mode) and dispersion (variance, standard deviation, range) to summarize and understand data distributions."
+tags: [statistics, mean, median, variance, standard-deviation, descriptive-statistics, mathematics-for-ml]
+---
+
+Descriptive statistics allow us to summarize large volumes of raw data into a few meaningful numbers. In Machine Learning, we use these to understand the "center" and the "spread" of our features, which is essential for data cleaning and feature scaling.
+
+## 1. Measures of Central Tendency
+
+These measures tell us where the "middle" of the data lies.
+
+
+
+### A. Mean (Average)
+The sum of all values divided by the total number of values. It is highly sensitive to **outliers**.
+$$ \mu = \frac{\sum x_i}{N} $$
+
+### B. Median
+The middle value when the data is sorted. It is **robust** to outliers, making it better for skewed distributions (like house prices or salaries).
+
+### C. Mode
+The value that appears most frequently. Useful for categorical data (e.g., finding the most common car color).
+
+---
+
+## 2. Measures of Dispersion (Spread)
+
+Knowing the center isn't enough; we need to know how "spread out" the data is.
+
+### A. Range
+The difference between the maximum and minimum values. Simple, but very sensitive to extreme outliers.
+
+### B. Variance ($\sigma^2$)
+The average of the squared differences from the Mean. It measures how far each number in the set is from the mean.
+$$ \sigma^2 = \frac{\sum (x_i - \mu)^2}{N} $$
+
+### C. Standard Deviation ($\sigma$)
+The square root of the variance. It is the most common measure of spread because it is in the **same units** as the original data.
+
+* **Low $\sigma$:** Data points are close to the mean.
+* **High $\sigma$:** Data points are spread out over a wide range.
+
+---
+
+## 3. Measures of Shape
+
+Beyond center and spread, we look at the symmetry and "peakedness" of the data.
+
+### A. Skewness
+Measures the asymmetry of the distribution.
+* **Positive (Right) Skew:** Long tail on the right side.
+* **Negative (Left) Skew:** Long tail on the left side.
+
+### B. Kurtosis
+Measures how "fat" or "thin" the tails of the distribution are compared to a normal distribution. High kurtosis indicates the presence of frequent outliers.
+
+---
+
+## 4. Why this matters for ML
+
+1. **Handling Outliers:** If the Mean and Median are far apart, you likely have outliers that could skew your model's training.
+2. **Missing Value Imputation:** When filling in missing data, we often choose the **Mean** (for normal data), **Median** (for skewed data), or **Mode** (for categorical data).
+3. **Feature Scaling:** Techniques like **Z-Score Normalization** (Standardization) directly use the Mean and Standard Deviation to rescale features:
+ $$ z = \frac{x - \mu}{\sigma} $$
+
+---
+
+Visualizing these numbers is often more intuitive than reading a table. Next, we’ll explore the most important probability distribution in all of science and ML.
\ No newline at end of file
diff --git a/docs/machine-learning/statistics/inferential-statistics.mdx b/docs/machine-learning/statistics/inferential-statistics.mdx
index e69de29..a075c47 100644
--- a/docs/machine-learning/statistics/inferential-statistics.mdx
+++ b/docs/machine-learning/statistics/inferential-statistics.mdx
@@ -0,0 +1,108 @@
+---
+title: "Inferential Statistics"
+sidebar_label: Inferential Statistics
+description: "Understanding how to make predictions and inferences about populations using samples, hypothesis testing, and p-values."
+tags: [statistics, inference, hypothesis-testing, p-value, confidence-intervals, mathematics-for-ml]
+---
+
+In Descriptive Statistics, we describe the data we have. In **Inferential Statistics**, we use that data to make "educated guesses" or predictions about data we *don't* have. This is the foundation of scientific discovery and model validation in Machine Learning.
+
+## 1. The Core Workflow
+
+Inferential statistics allows us to take a small sample and project those findings onto a larger population.
+
+```mermaid
+sankey-beta
+ %% source,target,value
+ Population,Sample,30
+ Sample,Analysis,30
+ Analysis,Point Estimates,15
+ Analysis,Confidence Intervals,15
+ Point Estimates,Population Inference,15
+ Confidence Intervals,Population Inference,15
+
+```
+
+## 2. Point Estimation
+
+A **Point Estimate** is a single value (a statistic) used to estimate a population parameter.
+
+* **Sample Mean ($\bar{x}$)** estimates the **Population Mean ($\mu$)**.
+* **Sample Variance ($s^2$)** estimates the **Population Variance ($\sigma^2$)**.
+
+However, because samples are smaller than populations, point estimates are rarely 100% accurate. We use **Confidence Intervals** to express our uncertainty.
+
+## 3. Hypothesis Testing
+
+Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics.
+
+### The Two Hypotheses
+
+1. **Null Hypothesis ($H_0$):** The "status quo." It assumes there is no effect or no difference. (e.g., "This new feature does not improve model accuracy.")
+2. **Alternative Hypothesis ($H_a$):** What we want to prove. (e.g., "This new feature improves model accuracy.")
+
+### The Decision Process
+
+We use the **P-value** to decide whether to reject the Null Hypothesis.
+
+```mermaid
+flowchart TD
+ Start["State Hypotheses H0 and Ha"] --> Alpha[Set Significance Level α - usually 0.05]
+ Alpha --> Test[Perform Statistical Test - t-test, Z-test]
+ Test --> PVal{Calculate P-value}
+ PVal -- "P < α" --> Reject[Reject H0: Results are Statistically Significant]
+ PVal -- "P ≥ α" --> Fail[Fail to Reject H0: No significant effect found]
+
+```
+
+## 4. Confidence Intervals
+
+A **Confidence Interval (CI)** provides a range of values that is likely to contain the population parameter.
+
+$$
+\text{CI} = \text{Point Estimate} \pm (\text{Critical Value} \times \text{Standard Error})
+$$
+
+:::note Example
+We are 95% confident that the true accuracy of our model on all future data is between 88% and 92%.
+:::
+
+## 5. Common Statistical Tests in ML
+
+| Test | Use Case | Example in ML |
+| --- | --- | --- |
+| **Z-Test** | Comparing means with a large sample size (n > 30). | Comparing the average spend of two large user groups. |
+| **T-Test** | Comparing means with a small sample size (n < 30). | Comparing performance of two model architectures on a small dataset. |
+| **Chi-Square Test** | Testing relationships between categorical variables. | Is the "Click" rate independent of the "Device Type"? |
+| **ANOVA** | Comparing means across 3 or more groups. | Does the choice of optimizer (Adam, SGD, RMSprop) significantly change accuracy? |
+
+## 6. Type I and Type II Errors
+
+When making inferences, we can be wrong in two ways:
+
+```mermaid
+quadrantChart
+ title Statistical Decision Matrix
+ x-axis "Null Hypothesis is True" --> "Null Hypothesis is False"
+ y-axis "Reject Null" --> "Fail to Reject"
+ quadrant-1 "Type I Error (False Positive)"
+ quadrant-2 "Correct Decision (True Positive)"
+ quadrant-3 "Correct Decision (True Negative)"
+ quadrant-4 "Type II Error (False Negative)"
+
+```
+
+