This repo helps Excel-native analysts get started with Python and Databricks for everyday analysis.
It walks through loading data, doing pivot-like summaries, calculating KPIs, joining datasets, and saving outputs β all using Python and Pandas.
notebooks/: A walkthrough.ipynbnotebook for analystsscripts/: Reusable helper functions in.pyformatdata/: Sample retail data in CSV format
- Reading and exploring CSVs
- Groupby summaries and filtering
- Calculating metrics like AOV
- Joining datasets (like Excel VLOOKUP)
- Exporting clean results
- Analysts new to Databricks, Python, or SQL
- Teams transitioning from Excel-based workflows
- Data folks looking to document best practices
from scripts.analysis_helpers import load_csv, null_summary, aov_summary
df = load_csv('data/retail_sample_data.csv')
print(null_summary(df))
print(aov_summary(df))James Witcher
LinkedIn