Skip to content

Latest commit

 

History

History
51 lines (26 loc) · 1.94 KB

File metadata and controls

51 lines (26 loc) · 1.94 KB

DPS(Data Processing and Data Statistics)

This repository contains practical exercises and mini-projects related to data preprocessing and data statistics.

📂 Structure

📂Data-Processing/


​ 📂Data Practice/
​ Exercises focused on data preprocessing techniques.

  • Titanic_Data Cleaning: Data cleaning with Pandas, handling missing values, outliers, duplicates, text and datetime processing.
  • Salary_Data Transformation: Data transformation with Pandas, merging datasets, summarizing, handling missing values and outliers, aggregation, pivot tables, log transformation, one-hot encoding, scaling, PCA.
  • Speed Dating_Feature Engineering: Feature engineering with Pandas using the Speed Dating dataset to derive additional insights from the provided data.

​ 📁 Data Project/
​ Projects that apply data preprocessing techniques to real-world datasets.

  • Segmentation_Project: Analysis based on RFM (Recency, Frequency, Monetary) segmentation.
  • TaxiFare_Project: Analysis based on data cleaning techniques.
  • Used Car Prices_Project: Analysis based on data transformation techniques.
  • Credit Transaction Anomaly Detection_Project: Analysis based on feature engineering techniques.

📂Data-Statistics/


  • AARRR & Statistical Analysis: Analysis based on the AARRR framework
  • Basic Statistics: Understanding the fundamentals of statistics.

✨ Skills Covered

🛠️ Tech Stack
Pandas, Numpy, SQL, BigQuery, Jupyter Notebook, google colab

📊 Techniques

  • Missing value handling, Outlier detection, Duplicate removal, One-hot encoding, Scaling, Principal Component Analysis (PCA), feature engineering
  • Acquisition, Activation, Retention, Revenue(ARPU), Revenue(CLV), Distribution Visualization, One-Sample t-Test, Independent Sample t-Test, Paired Sample t-Test, Sampling, Confidence Interval, Hypothesis Testing, A/B Test