From 131a57dd6888c34cf4b87b91a82f0d6065ad55b1 Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 5 Nov 2025 21:18:43 +0000 Subject: [PATCH 1/3] Add CLAUDE.md documentation for AI-assisted development Create comprehensive documentation file to guide Claude Code when working in this repository. Includes development setup, testing commands, core architecture overview, and common workflow patterns. --- CLAUDE.md | 210 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 210 insertions(+) create mode 100644 CLAUDE.md diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 000000000..d8108af03 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,210 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## Overview + +Quantipy is a Python 2.7-based data processing, analysis and reporting library for survey and market research data (people data). It extends pandas and numpy with specialized features for multiple choice variables, weighted analysis, metadata-driven operations, and exports to various formats. + +**Note**: This is the Python 2.7 version. A Python 3 port exists in a separate repository. + +## Development Setup + +### Creating Development Environment + +**Windows:** +```bash +conda create -n envqp python=2.7 numpy==1.11.3 scipy==0.18.1 +conda activate envqp +pip install -r requirements_dev.txt +``` + +**Linux:** +```bash +conda create -n envqp python=2.7 +conda activate envqp +pip install -r requirements_dev.txt +``` + +Or use the provided script: +```bash +bash install_dev.sh +``` + +### Key Dependencies +- Python 2.7.8 +- numpy==1.11.3 +- scipy==0.18.1 +- pandas==0.19.2 +- Additional: xlsxwriter, python-pptx, lxml, ftfy, xmltodict + +## Testing + +### Run All Tests +```bash +python -m unittest discover +``` + +Or with pytest: +```bash +pytest +``` + +### Run Tests with Coverage +```bash +coverage run -m unittest discover +coverage html +# View reports in htmlcov/index.html +``` + +### Run Tests with Multiple Cores +```bash +pytest -n auto +``` + +### Auto-Run Tests on File Changes +```bash +python autotests.py +``` + +## Core Architecture + +Quantipy uses a hierarchical object structure for managing survey data analysis: + +### Primary Objects Hierarchy + +**DataSet** → **Batch** → **Stack** → **Link** → **View** + +1. **DataSet** (`quantipy/core/dataset.py`) + - Main container for case data (pandas DataFrame) and metadata (JSON structure) + - Handles data import/export, variable creation, recoding, and transformations + - Metadata format describes variables, their types (single, delimited, array), and values + - Methods: `derive()`, `recode()`, `merge()`, `crosstab()`, `variables()`, `meta()` + +2. **Batch** (`quantipy/core/batch.py`) + - Subclass of DataSet for defining analysis specifications + - Structures which variables to cross-tabulate (x vs y variables) + - Stores batch definitions in dataset metadata under `_meta['sets']['batches']` + - Methods: `add_x()`, `add_y()`, `add_filter()` + +3. **Stack** (`quantipy/core/stack.py`) + - Nested dictionary container holding Link objects with View aggregations + - Structure: `stack[data_key][filter][x_variable][y_variable][view_key]` + - Created by calling `dataset.populate()` based on Batch definitions + - Methods: `add_data()`, `add_link()`, `aggregate()`, `add_stats()`, `describe()` + +4. **Link** (`quantipy/core/link.py`) + - Subclassed dictionary representing a single data/filter/x/y relationship + - Each Link contains multiple View aggregations of the same variable pairing + - Accessed as: `link = stack[data_key][filter][x][y]` + +5. **View** (`quantipy/core/view.py`) + - Represents a specific aggregation/analysis (counts, percentages, means, tests) + - Stored as pandas DataFrames within Link objects + - View types: frequency counts, column/row percentages, means, statistical tests + +6. **Chain** (`quantipy/core/chain.py`) + - Container for ordered Link definitions and associated Views + - Used for organizing and concatenating multiple analyses along an axis + - Supports serialization to/from `.chain` files using cPickle + +7. **Cluster** (`quantipy/core/cluster.py`) + - Higher-level container for managing multiple Chain objects + - Used for structured reporting and analysis workflows + +### Key Supporting Modules + +**Data Processing Tools** (`quantipy/core/tools/dp/`) +- `io.py`: Import/export functions for all supported formats +- `prep.py`: Data preparation utilities (merge, recode, frequency, crosstab) +- `query.py`: Logic-based filtering and subsetting +- `spss/`: SPSS .sav file reader/writer (uses savReaderWriter) +- `dimensions/`: Dimensions .ddf/.mdd file support +- `decipher/`: Decipher format support +- `ascribe/`: Ascribe format support + +**View Tools** (`quantipy/core/tools/view/`) +- `agg.py`: Aggregation methods +- `logic.py`: Logical operators (has_any, has_all, is_gt, union, intersection) +- `query.py`: View-level filtering + +**Export Builders** (`quantipy/core/builds/`) +- `excel/excel_painter.py`: ExcelPainter for XLSX exports with formatting +- `powerpoint/pptx_painter.py`: PowerPointPainter for PPTX chart/table exports + +**Weighting** (`quantipy/core/weights/`) +- `rim.py`: Rim weighting (iterative proportional fitting) +- `weight_engine.py`: Weight computation engine + +**Analysis Engine** (`quantipy/core/quantify/`) +- `engine.py`: Quantity and Test classes for advanced aggregations and statistical tests + +### Variable Types + +Quantipy distinguishes between three core variable types in metadata: + +- **single**: Single-choice categorical variables +- **delimited**: Multiple-choice variables (stored as delimited strings like "1;3;5;") +- **array**: Grids/matrices with multiple items sharing the same response scale + - Array items stored as separate columns but grouped in `_meta['masks']` + +### Metadata Structure + +Metadata is stored in `dataset._meta` as a nested dictionary: +- `_meta['columns']`: Column-level metadata (type, text, values) +- `_meta['masks']`: Array/grid definitions +- `_meta['sets']`: Named sets including batch definitions +- `_meta['lib']`: Shared value definitions + +## Common Workflow Patterns + +### Typical Analysis Workflow +1. Load data: `dataset = qp.DataSet('name'); dataset.read_quantipy(json_path, csv_path)` +2. Create batch: `batch = dataset.add_batch('batch_name')` +3. Define axes: `batch.add_x(['q1', 'q2']); batch.add_y(['gender', 'age'])` +4. Populate stack: `stack = dataset.populate()` +5. Add aggregations: `stack.aggregate(['counts', 'c%'])` +6. Export: `painter = qp.ExcelPainter(stack); painter.write_xlsx(path)` + +### Variable Manipulation +- Use `dataset.derive()` to create new variables from existing ones +- Use `dataset.recode()` to remap variable values +- Use `frange()` helper for range specifications: `frange('1-5, 97, 99')` + +### Accessing Results +```python +# Get specific link +link = stack[data_key][filter][x_var][y_var] + +# Get specific view from link +df = link[view_key] + +# Use Quantity engine for custom aggregations +q = qp.Quantity(link) +q.count() # Returns grouped DataFrame +``` + +## File I/O Formats + +Quantipy supports reading from: +- Native Quantipy (.json metadata + .csv data) +- SPSS .sav files +- Dimensions .ddf/.mdd files +- Decipher tab-delimited files +- Ascribe files + +Quantipy supports exporting to: +- Native Quantipy format +- SPSS .sav +- Dimensions .ddf/.mdd +- Excel .xlsx (with ExcelPainter) +- PowerPoint .pptx (with PowerPointPainter) + +Use functions from `quantipy.core.tools.dp.io` for all I/O operations. + +## Code Style Notes + +- This is Python 2.7 code - print statements, not print functions +- Uses `cPickle` for serialization +- Relies on older pandas 0.19.2 API (e.g., `.ix` accessor instead of `.loc`/`.iloc`) +- Extensive use of nested dictionaries and defaultdict for data structures From 35599489902da6e4740b92a9d79d78ca8add3159 Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 5 Nov 2025 21:27:42 +0000 Subject: [PATCH 2/3] Add Streamlit GUI for Quantipy Create a comprehensive web-based interface for Quantipy using Streamlit. This GUI provides an interactive, user-friendly way to work with Quantipy's data analysis features without requiring code. Features: - Multi-page application with intuitive navigation - Data Loader: Import from multiple formats (Quantipy, CSV, SPSS) - Data Explorer: Browse variables, view metadata, create crosstabs - Analysis: Create batches, configure analyses, run aggregations - Results: View and export results to Excel, CSV, and Quantipy format Files added: - streamlit_app.py: Main application entry point - pages/01_Data_Loader.py: Data import functionality - pages/02_Data_Explorer.py: Dataset exploration tools - pages/03_Analysis.py: Batch creation and analysis configuration - pages/04_Results.py: Results viewing and export - requirements_streamlit.txt: Streamlit-specific dependencies - run_streamlit.sh: Launcher script for easy startup - STREAMLIT_README.md: Comprehensive GUI documentation Updated README.md with Streamlit GUI quick start section. --- README.md | 17 ++ STREAMLIT_README.md | 227 ++++++++++++++++++++++++ pages/01_Data_Loader.py | 260 +++++++++++++++++++++++++++ pages/02_Data_Explorer.py | 353 +++++++++++++++++++++++++++++++++++++ pages/03_Analysis.py | 306 ++++++++++++++++++++++++++++++++ pages/04_Results.py | 350 ++++++++++++++++++++++++++++++++++++ requirements_streamlit.txt | 23 +++ run_streamlit.sh | 42 +++++ streamlit_app.py | 155 ++++++++++++++++ 9 files changed, 1733 insertions(+) create mode 100644 STREAMLIT_README.md create mode 100644 pages/01_Data_Loader.py create mode 100644 pages/02_Data_Explorer.py create mode 100644 pages/03_Analysis.py create mode 100644 pages/04_Results.py create mode 100644 requirements_streamlit.txt create mode 100755 run_streamlit.sh create mode 100644 streamlit_app.py diff --git a/README.md b/README.md index c5a328e60..8e624a9c2 100644 --- a/README.md +++ b/README.md @@ -19,6 +19,23 @@ Quantipy is an open-source data processing, analysis and reporting software proj ### Python 3 compatability Efforts are underway to port Quantipy to Python 3 in a [seperate repository](https://www.github.com/quantipy/quantipy3). +## Streamlit GUI +A user-friendly web interface for Quantipy is now available! The Streamlit GUI provides an interactive way to: +- Load and explore datasets +- Create batch analyses +- View and export results +- Generate Excel and other format exports + +**Quick Start:** +```bash +pip install -r requirements_streamlit.txt +streamlit run streamlit_app.py +# Or use the launcher script: +./run_streamlit.sh +``` + +See [STREAMLIT_README.md](STREAMLIT_README.md) for detailed documentation. + ## Docs [View the documentation at readthedocs.org](http://quantipy.readthedocs.io/) diff --git a/STREAMLIT_README.md b/STREAMLIT_README.md new file mode 100644 index 000000000..98d5c1f94 --- /dev/null +++ b/STREAMLIT_README.md @@ -0,0 +1,227 @@ +# Quantipy Streamlit GUI + +A user-friendly web interface for Quantipy data analysis, built with Streamlit. + +## Overview + +This Streamlit application provides an interactive graphical user interface for Quantipy's data processing and analysis capabilities. It's designed for researchers and analysts working with survey and market research data. + +## Features + +### 📁 Data Loader +- Load example datasets for quick exploration +- Import Quantipy files (JSON metadata + CSV data) +- Upload CSV files with automatic metadata inference +- Import SPSS .sav files +- View dataset summary and preview + +### 🔍 Data Explorer +- **Variable Browser**: Browse and search all dataset variables +- **Frequency Tables**: Generate frequency distributions for individual variables +- **Crosstabs**: Create cross-tabulations between variables +- **Metadata Viewer**: Inspect complete dataset metadata structure +- Interactive charts and visualizations + +### 📊 Analysis +- Create and manage multiple batch definitions +- Configure X variables (columns) and Y variables (rows) +- Define analysis specifications +- Run aggregations (counts, percentages, means, etc.) +- Generate comprehensive analysis stacks + +### 📈 Results +- Browse all analysis results interactively +- View individual crosstabs and statistics +- Export to Excel (XLSX) with formatting +- Export to Quantipy format (JSON + CSV) +- Download individual results as CSV +- View analysis summary and statistics + +## Installation + +### Prerequisites + +1. **Python 2.7** (as required by Quantipy) +2. **Conda** (recommended for managing Python 2.7 environment) + +### Setup + +1. Create and activate a Python 2.7 environment: + +```bash +# Create environment +conda create -n quantipy_gui python=2.7 numpy==1.11.3 scipy==0.18.1 + +# Activate environment +conda activate quantipy_gui +``` + +2. Install Quantipy and dependencies: + +```bash +pip install -r requirements_dev.txt +``` + +3. Install Streamlit and additional dependencies: + +```bash +pip install -r requirements_streamlit.txt +``` + +## Running the Application + +### Start the Streamlit App + +From the project root directory, run: + +```bash +streamlit run streamlit_app.py +``` + +The application will open in your default web browser at `http://localhost:8501` + +### Using the Example Data + +1. Click "Load Example Data & Explore" on the home page, or +2. Navigate to the Data Loader page and click "Load Example Dataset" + +This will load the built-in example dataset (Example Data A) from the tests folder. + +## Application Structure + +``` +quantipy_GUI/ +├── streamlit_app.py # Main application entry point +├── pages/ +│ ├── 01_Data_Loader.py # Data import page +│ ├── 02_Data_Explorer.py # Data exploration page +│ ├── 03_Analysis.py # Analysis configuration page +│ └── 04_Results.py # Results viewing and export page +├── requirements_streamlit.txt # Streamlit-specific requirements +└── STREAMLIT_README.md # This file +``` + +## Usage Guide + +### Basic Workflow + +1. **Load Data** + - Go to "Data Loader" page + - Choose a data source (Example, Quantipy files, CSV, or SPSS) + - Upload your files or load example data + +2. **Explore Data** + - Navigate to "Data Explorer" + - Browse variables and view metadata + - Create frequency tables and crosstabs + - Examine data distributions + +3. **Configure Analysis** + - Go to "Analysis" page + - Create a new batch + - Add X variables (columns) and Y variables (rows) + - Select aggregation types + +4. **Run Analysis** + - Click "Run Analysis" to generate results + - Wait for processing to complete + +5. **View Results** + - Navigate to "Results" page + - Browse individual results + - Export to Excel, CSV, or Quantipy format + +### Tips + +- **Session Persistence**: Your loaded dataset and analysis results persist across pages during your session +- **Navigation**: Use the page links in the sidebar or navigation buttons at the bottom of each page +- **Example Data**: Start with the example dataset to familiarize yourself with the interface +- **Error Details**: Most error messages include expandable details for troubleshooting + +## Limitations + +### Python 2.7 Compatibility + +This application is built for Python 2.7 to maintain compatibility with Quantipy. Some modern Streamlit features may not be available. + +### Performance + +- Large datasets (>100,000 rows) may take longer to process +- Complex batch configurations may require significant processing time +- Excel export with ExcelPainter can be memory-intensive + +### Known Issues + +- Some advanced Quantipy features (statistical tests, complex filters) are simplified in the GUI +- Chain and Cluster functionality not yet exposed in the interface +- ZIP archive export for multiple CSV files requires additional implementation + +## Troubleshooting + +### App Won't Start + +```bash +# Verify Streamlit is installed +pip list | grep streamlit + +# Reinstall if needed +pip install streamlit +``` + +### Import Errors + +```bash +# Ensure all dependencies are installed +pip install -r requirements_streamlit.txt +pip install -r requirements_dev.txt +``` + +### Display Issues + +- Clear Streamlit cache: Press 'C' in the app or use the menu +- Restart the Streamlit server +- Check browser compatibility (Chrome, Firefox, Safari recommended) + +### Memory Issues with Large Datasets + +- Use data subsets for exploration +- Close other applications +- Consider upgrading your system RAM + +## Advanced Features + +### Custom Filters + +While basic filtering is available through the interface, complex filters can be defined in the Analysis page using Quantipy's logic expressions. + +### Weighting + +Weight variables can be selected in the Analysis configuration. The weights will be applied during aggregation. + +### Export Formats + +- **Excel**: Uses Quantipy's ExcelPainter for formatted workbooks +- **Quantipy Format**: Preserves all metadata for future use +- **CSV**: Universal format for use in other tools + +## Contributing + +This GUI is part of the Quantipy project. For issues or suggestions: + +1. Report bugs via GitHub issues +2. Submit feature requests +3. Contribute improvements via pull requests + +## Support + +- **Quantipy Documentation**: http://quantipy.readthedocs.io/ +- **Streamlit Documentation**: https://docs.streamlit.io/ +- **Python 3 Version**: https://github.com/quantipy/quantipy3 + +## License + +Same license as the main Quantipy project. + +--- + +**Note**: This GUI is designed for Python 2.7 compatibility. For Python 3 support, consider using the Quantipy3 fork. diff --git a/pages/01_Data_Loader.py b/pages/01_Data_Loader.py new file mode 100644 index 000000000..7880f527e --- /dev/null +++ b/pages/01_Data_Loader.py @@ -0,0 +1,260 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +""" +Quantipy GUI - Data Loader Page +Load datasets from various formats +""" + +import streamlit as st +import pandas as pd +import quantipy as qp +import os +import json +import tempfile + +st.set_page_config(page_title="Data Loader", page_icon="📁", layout="wide") + +st.title("📁 Data Loader") +st.markdown("Load your dataset from various formats or use the example data.") + +# Initialize session state +if 'dataset' not in st.session_state: + st.session_state.dataset = None +if 'dataset_name' not in st.session_state: + st.session_state.dataset_name = None + +# Check if we should load example data +if st.session_state.get('load_example', False): + st.session_state.load_example = False # Reset flag + try: + with st.spinner("Loading example dataset..."): + test_path = os.path.join(os.path.dirname(os.path.dirname(__file__)), 'tests') + name = 'Example Data (A)' + dataset = qp.DataSet(name, False) + dataset.read_quantipy( + os.path.join(test_path, '{}.json'.format(name)), + os.path.join(test_path, '{}.csv'.format(name)) + ) + st.session_state.dataset = dataset + st.session_state.dataset_name = name + st.success("✅ Example dataset loaded successfully!") + except Exception as e: + st.error("Error loading example data: {}".format(str(e))) + +# Data source selection +st.subheader("Select Data Source") + +data_source = st.radio( + "Choose how to load your data:", + ["Example Data", "Quantipy Files (JSON + CSV)", "CSV File Only", "SPSS File"], + horizontal=True +) + +st.markdown("---") + +# Example Data +if data_source == "Example Data": + st.info("đŸ“Ļ Load the built-in example dataset to explore Quantipy features") + + if st.button("Load Example Dataset", type="primary"): + try: + with st.spinner("Loading example dataset..."): + test_path = os.path.join(os.path.dirname(os.path.dirname(__file__)), 'tests') + name = 'Example Data (A)' + dataset = qp.DataSet(name, False) + dataset.read_quantipy( + os.path.join(test_path, '{}.json'.format(name)), + os.path.join(test_path, '{}.csv'.format(name)) + ) + st.session_state.dataset = dataset + st.session_state.dataset_name = name + st.success("✅ Example dataset loaded successfully!") + st.rerun() + except Exception as e: + st.error("Error loading example data: {}".format(str(e))) + +# Quantipy Files +elif data_source == "Quantipy Files (JSON + CSV)": + st.info("📄 Upload both JSON metadata and CSV data files") + + col1, col2 = st.columns(2) + + with col1: + json_file = st.file_uploader("Upload JSON Metadata", type=['json']) + + with col2: + csv_file = st.file_uploader("Upload CSV Data", type=['csv']) + + dataset_name = st.text_input("Dataset Name", value="My Dataset") + + if st.button("Load Quantipy Dataset", type="primary"): + if json_file is None or csv_file is None: + st.error("Please upload both JSON and CSV files") + else: + try: + with st.spinner("Loading dataset..."): + # Create temporary files + with tempfile.NamedTemporaryFile(mode='w', suffix='.json', delete=False) as tmp_json: + json_content = json.load(json_file) + json.dump(json_content, tmp_json) + tmp_json_path = tmp_json.name + + with tempfile.NamedTemporaryFile(mode='wb', suffix='.csv', delete=False) as tmp_csv: + tmp_csv.write(csv_file.getvalue()) + tmp_csv_path = tmp_csv.name + + # Load dataset + dataset = qp.DataSet(dataset_name, False) + dataset.read_quantipy(tmp_json_path, tmp_csv_path) + + # Clean up temp files + os.unlink(tmp_json_path) + os.unlink(tmp_csv_path) + + st.session_state.dataset = dataset + st.session_state.dataset_name = dataset_name + st.success("✅ Dataset loaded successfully!") + st.rerun() + except Exception as e: + st.error("Error loading dataset: {}".format(str(e))) + import traceback + st.code(traceback.format_exc()) + +# CSV Only +elif data_source == "CSV File Only": + st.info("📊 Upload a CSV file (limited functionality without metadata)") + st.warning("âš ī¸ Loading CSV without metadata limits Quantipy features. Consider creating metadata or using full Quantipy format.") + + csv_file = st.file_uploader("Upload CSV File", type=['csv']) + dataset_name = st.text_input("Dataset Name", value="My CSV Dataset") + + if st.button("Load CSV", type="primary"): + if csv_file is None: + st.error("Please upload a CSV file") + else: + try: + with st.spinner("Loading CSV..."): + # Read CSV into DataFrame + df = pd.read_csv(csv_file) + + # Create basic dataset with minimal metadata + dataset = qp.DataSet(dataset_name, False) + dataset._data = df + + # Create basic metadata structure + dataset._meta = { + 'columns': {}, + 'masks': {}, + 'sets': {'data file': {'items': df.columns.tolist()}}, + 'lib': {'default text': 'en-GB', 'values': {}} + } + + # Infer basic metadata for each column + for col in df.columns: + dataset._meta['columns'][col] = { + 'name': col, + 'type': 'string', + 'text': {dataset._meta['lib']['default text']: col} + } + + dataset.text_key = dataset._meta['lib']['default text'] + + st.session_state.dataset = dataset + st.session_state.dataset_name = dataset_name + st.success("✅ CSV loaded successfully!") + st.warning("Note: Basic metadata was created. You may want to add variable types and value labels.") + st.rerun() + except Exception as e: + st.error("Error loading CSV: {}".format(str(e))) + import traceback + st.code(traceback.format_exc()) + +# SPSS File +elif data_source == "SPSS File": + st.info("📈 Upload an SPSS .sav file") + + sav_file = st.file_uploader("Upload SPSS File", type=['sav']) + dataset_name = st.text_input("Dataset Name", value="My SPSS Dataset") + + if st.button("Load SPSS File", type="primary"): + if sav_file is None: + st.error("Please upload an SPSS file") + else: + try: + with st.spinner("Loading SPSS file..."): + # Create temporary file + with tempfile.NamedTemporaryFile(mode='wb', suffix='.sav', delete=False) as tmp_sav: + tmp_sav.write(sav_file.getvalue()) + tmp_sav_path = tmp_sav.name + + # Load SPSS file + dataset = qp.DataSet(dataset_name, False) + dataset.read_spss(tmp_sav_path) + + # Clean up temp file + os.unlink(tmp_sav_path) + + st.session_state.dataset = dataset + st.session_state.dataset_name = dataset_name + st.success("✅ SPSS file loaded successfully!") + st.rerun() + except Exception as e: + st.error("Error loading SPSS file: {}".format(str(e))) + import traceback + st.code(traceback.format_exc()) + +# Display current dataset info +st.markdown("---") +st.subheader("📊 Current Dataset") + +if st.session_state.dataset is not None: + dataset = st.session_state.dataset + + col1, col2, col3 = st.columns(3) + + with col1: + st.metric("Dataset Name", st.session_state.dataset_name) + + with col2: + st.metric("Number of Cases", len(dataset._data)) + + with col3: + num_vars = len(dataset._meta.get('columns', {})) + len(dataset._meta.get('masks', {})) + st.metric("Number of Variables", num_vars) + + # Show data preview + st.markdown("#### Data Preview (First 10 Rows)") + st.dataframe(dataset._data.head(10), use_container_width=True) + + # Show variable list + with st.expander("📋 View All Variables"): + try: + columns = dataset.columns() + masks = dataset.masks() + + col1, col2 = st.columns(2) + + with col1: + st.markdown("**Regular Variables ({} total)**".format(len(columns))) + st.write(columns) + + with col2: + st.markdown("**Array Variables ({} total)**".format(len(masks))) + st.write(masks) + except Exception as e: + st.error("Error displaying variables: {}".format(str(e))) + + # Navigation + st.markdown("---") + st.success("✅ Dataset ready! Navigate to **Data Explorer** or **Analysis** to continue.") + + col1, col2 = st.columns(2) + with col1: + if st.button("🔍 Explore Data", use_container_width=True): + st.switch_page("pages/02_Data_Explorer.py") + with col2: + if st.button("📊 Start Analysis", use_container_width=True): + st.switch_page("pages/03_Analysis.py") + +else: + st.info("👆 No dataset loaded. Please select a data source above and load your data.") diff --git a/pages/02_Data_Explorer.py b/pages/02_Data_Explorer.py new file mode 100644 index 000000000..e1f10ca9e --- /dev/null +++ b/pages/02_Data_Explorer.py @@ -0,0 +1,353 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +""" +Quantipy GUI - Data Explorer Page +Explore dataset variables, metadata, and crosstabs +""" + +import streamlit as st +import pandas as pd +import quantipy as qp +from quantipy.core.tools.dp.prep import frequency, crosstab +import plotly.express as px +import plotly.graph_objects as go + +st.set_page_config(page_title="Data Explorer", page_icon="🔍", layout="wide") + +st.title("🔍 Data Explorer") +st.markdown("Explore your dataset variables, metadata, and create crosstabulations.") + +# Check if dataset is loaded +if 'dataset' not in st.session_state or st.session_state.dataset is None: + st.warning("âš ī¸ No dataset loaded. Please load a dataset first.") + if st.button("📁 Go to Data Loader"): + st.switch_page("pages/01_Data_Loader.py") + st.stop() + +dataset = st.session_state.dataset + +# Sidebar for variable selection +st.sidebar.header("🔧 Explorer Options") + +explorer_mode = st.sidebar.radio( + "Select Exploration Mode:", + ["Variable Browser", "Frequency Tables", "Crosstabs", "Metadata Viewer"] +) + +st.markdown("---") + +# Variable Browser Mode +if explorer_mode == "Variable Browser": + st.subheader("📋 Variable Browser") + + # Get all variables + try: + columns = dataset.columns() + masks = dataset.masks() + all_vars = columns + masks + except: + all_vars = dataset._data.columns.tolist() + columns = all_vars + masks = [] + + # Variable type filter + var_type_filter = st.radio( + "Filter by Type:", + ["All Variables", "Regular Variables Only", "Array Variables Only"], + horizontal=True + ) + + if var_type_filter == "Regular Variables Only": + display_vars = columns + elif var_type_filter == "Array Variables Only": + display_vars = masks + else: + display_vars = all_vars + + st.info("Showing {} variables".format(len(display_vars))) + + # Search functionality + search = st.text_input("🔍 Search variables", "") + if search: + display_vars = [v for v in display_vars if search.lower() in v.lower()] + st.caption("Found {} matching variables".format(len(display_vars))) + + # Display variables in a table + if display_vars: + var_info = [] + for var in display_vars: + try: + # Get variable type + if var in dataset._meta.get('columns', {}): + var_type = dataset._meta['columns'][var].get('type', 'unknown') + var_text = dataset._meta['columns'][var].get('text', {}).get(dataset.text_key, var) + elif var in dataset._meta.get('masks', {}): + var_type = 'array' + var_text = dataset._meta['masks'][var].get('text', {}).get(dataset.text_key, var) + else: + var_type = 'unknown' + var_text = var + + # Get unique count + if var in dataset._data.columns: + unique_count = dataset._data[var].nunique() + else: + unique_count = '-' + + var_info.append({ + 'Variable': var, + 'Label': var_text, + 'Type': var_type, + 'Unique Values': unique_count + }) + except: + var_info.append({ + 'Variable': var, + 'Label': var, + 'Type': 'unknown', + 'Unique Values': '-' + }) + + df_vars = pd.DataFrame(var_info) + st.dataframe(df_vars, use_container_width=True, height=400) + + # Select a variable to view details + st.markdown("---") + st.subheader("📊 Variable Details") + + selected_var = st.selectbox("Select a variable to view details:", display_vars) + + if selected_var: + col1, col2 = st.columns([1, 1]) + + with col1: + st.markdown("**Variable Information**") + try: + meta_info = dataset.meta(selected_var) + st.json(meta_info) + except Exception as e: + st.info("Metadata not available: {}".format(str(e))) + + with col2: + st.markdown("**Value Distribution**") + try: + if selected_var in dataset._data.columns: + freq_df = dataset._data[selected_var].value_counts().reset_index() + freq_df.columns = ['Value', 'Count'] + st.dataframe(freq_df, use_container_width=True) + + # Simple bar chart + if len(freq_df) <= 20: # Only show chart for reasonable number of values + fig = px.bar(freq_df.head(15), x='Value', y='Count', + title='Top 15 Values') + st.plotly_chart(fig, use_container_width=True) + except Exception as e: + st.error("Error showing distribution: {}".format(str(e))) + + else: + st.info("No variables to display") + +# Frequency Tables Mode +elif explorer_mode == "Frequency Tables": + st.subheader("📊 Frequency Tables") + + try: + columns = dataset.columns() + all_vars = columns + dataset.masks() + except: + all_vars = dataset._data.columns.tolist() + + selected_var = st.selectbox("Select variable for frequency table:", all_vars) + + col1, col2 = st.columns([3, 1]) + + with col2: + show_text = st.checkbox("Show Text Labels", value=True) + show_counts = st.checkbox("Show Counts", value=True) + show_pct = st.checkbox("Show Percentages", value=True) + + if st.button("Generate Frequency Table", type="primary"): + try: + with st.spinner("Generating frequency table..."): + # Use dataset.crosstab for single variable frequency + result = dataset.crosstab(selected_var, text=show_text) + st.dataframe(result, use_container_width=True) + + # Create visualization + if selected_var in dataset._data.columns: + freq_data = dataset._data[selected_var].value_counts().head(15) + + fig = go.Figure(data=[ + go.Bar(x=freq_data.index.astype(str), y=freq_data.values) + ]) + fig.update_layout( + title='Frequency Distribution: {}'.format(selected_var), + xaxis_title='Value', + yaxis_title='Count', + height=400 + ) + st.plotly_chart(fig, use_container_width=True) + + except Exception as e: + st.error("Error generating frequency table: {}".format(str(e))) + import traceback + with st.expander("Show Error Details"): + st.code(traceback.format_exc()) + +# Crosstabs Mode +elif explorer_mode == "Crosstabs": + st.subheader("📈 Crosstabulations") + + st.markdown("Create cross-tabulations between two variables") + + try: + columns = dataset.columns() + all_vars = columns + dataset.masks() + except: + all_vars = dataset._data.columns.tolist() + + col1, col2 = st.columns(2) + + with col1: + x_var = st.selectbox("Select X Variable (columns):", all_vars, key='x_var') + + with col2: + y_var = st.selectbox("Select Y Variable (rows):", ['@'] + all_vars, key='y_var') + st.caption("@ = base (total)") + + col1, col2, col3 = st.columns(3) + + with col1: + show_text = st.checkbox("Show Text Labels", value=True, key='ct_text') + + with col2: + show_counts = st.checkbox("Show Counts", value=True, key='ct_counts') + + with col3: + pct_type = st.selectbox("Percentage Type:", ["None", "Column %", "Row %", "Total %"]) + + if st.button("Generate Crosstab", type="primary"): + try: + with st.spinner("Generating crosstab..."): + result = dataset.crosstab(x_var, y_var, text=show_text) + st.dataframe(result, use_container_width=True) + + # Download option + csv = result.to_csv() + st.download_button( + label="đŸ“Ĩ Download as CSV", + data=csv, + file_name="crosstab_{}_{}.csv".format(x_var, y_var), + mime="text/csv" + ) + + except Exception as e: + st.error("Error generating crosstab: {}".format(str(e))) + import traceback + with st.expander("Show Error Details"): + st.code(traceback.format_exc()) + +# Metadata Viewer Mode +elif explorer_mode == "Metadata Viewer": + st.subheader("đŸ—‚ī¸ Metadata Viewer") + + st.markdown("View the complete metadata structure of your dataset") + + metadata_section = st.selectbox( + "Select metadata section:", + ["Overview", "Columns", "Masks", "Sets", "Library"] + ) + + if metadata_section == "Overview": + st.markdown("### Dataset Overview") + + col1, col2, col3 = st.columns(3) + + with col1: + st.metric("Dataset Name", st.session_state.dataset_name) + st.metric("Number of Cases", len(dataset._data)) + + with col2: + num_cols = len(dataset._meta.get('columns', {})) + st.metric("Regular Variables", num_cols) + num_masks = len(dataset._meta.get('masks', {})) + st.metric("Array Variables", num_masks) + + with col3: + text_key = dataset.text_key if dataset.text_key else 'Not set' + st.metric("Text Key", text_key) + num_sets = len(dataset._meta.get('sets', {})) + st.metric("Variable Sets", num_sets) + + st.markdown("### Data Shape") + st.write("Rows: {}, Columns: {}".format(dataset._data.shape[0], dataset._data.shape[1])) + + elif metadata_section == "Columns": + st.markdown("### Column Metadata") + columns_meta = dataset._meta.get('columns', {}) + + if columns_meta: + var_name = st.selectbox("Select variable:", list(columns_meta.keys())) + if var_name: + st.json(columns_meta[var_name]) + else: + st.info("No column metadata available") + + elif metadata_section == "Masks": + st.markdown("### Array/Mask Metadata") + masks_meta = dataset._meta.get('masks', {}) + + if masks_meta: + mask_name = st.selectbox("Select array:", list(masks_meta.keys())) + if mask_name: + st.json(masks_meta[mask_name]) + else: + st.info("No mask metadata available") + + elif metadata_section == "Sets": + st.markdown("### Variable Sets") + sets_meta = dataset._meta.get('sets', {}) + + if sets_meta: + set_name = st.selectbox("Select set:", list(sets_meta.keys())) + if set_name: + st.json(sets_meta[set_name]) + else: + st.info("No sets metadata available") + + elif metadata_section == "Library": + st.markdown("### Value Library") + lib_meta = dataset._meta.get('lib', {}) + + if lib_meta: + st.json(lib_meta) + else: + st.info("No library metadata available") + + # Full metadata export + st.markdown("---") + if st.button("đŸ“Ĩ Download Full Metadata as JSON"): + import json + meta_json = json.dumps(dataset._meta, indent=2) + st.download_button( + label="Download JSON", + data=meta_json, + file_name="{}_metadata.json".format(st.session_state.dataset_name), + mime="application/json" + ) + +# Navigation +st.markdown("---") +col1, col2, col3 = st.columns(3) + +with col1: + if st.button("📁 Back to Data Loader", use_container_width=True): + st.switch_page("pages/01_Data_Loader.py") + +with col2: + if st.button("📊 Go to Analysis", use_container_width=True): + st.switch_page("pages/03_Analysis.py") + +with col3: + if st.button("📈 View Results", use_container_width=True): + st.switch_page("pages/04_Results.py") diff --git a/pages/03_Analysis.py b/pages/03_Analysis.py new file mode 100644 index 000000000..8fbcc68a8 --- /dev/null +++ b/pages/03_Analysis.py @@ -0,0 +1,306 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +""" +Quantipy GUI - Analysis Page +Create batches, define analyses, and run aggregations +""" + +import streamlit as st +import pandas as pd +import quantipy as qp + +st.set_page_config(page_title="Analysis", page_icon="📊", layout="wide") + +st.title("📊 Analysis") +st.markdown("Create batch definitions and run analyses on your dataset.") + +# Check if dataset is loaded +if 'dataset' not in st.session_state or st.session_state.dataset is None: + st.warning("âš ī¸ No dataset loaded. Please load a dataset first.") + if st.button("📁 Go to Data Loader"): + st.switch_page("pages/01_Data_Loader.py") + st.stop() + +dataset = st.session_state.dataset + +# Initialize session state for analysis +if 'batches' not in st.session_state: + st.session_state.batches = {} +if 'current_batch' not in st.session_state: + st.session_state.current_batch = None +if 'stack' not in st.session_state: + st.session_state.stack = None + +# Get available variables +try: + columns = dataset.columns() + masks = dataset.masks() + all_vars = columns + masks +except: + all_vars = dataset._data.columns.tolist() + +st.markdown("---") + +# Analysis workflow tabs +tab1, tab2, tab3 = st.tabs(["📋 Create Batch", "âš™ī¸ Configure Analysis", "â–ļī¸ Run Analysis"]) + +# Tab 1: Create Batch +with tab1: + st.subheader("Create or Select Batch") + + # Batch management + col1, col2 = st.columns([2, 1]) + + with col1: + batch_name = st.text_input("Batch Name", value="batch_1") + + with col2: + if st.button("➕ Create New Batch", type="primary"): + try: + if batch_name in st.session_state.batches: + st.warning("Batch '{}' already exists. Updating...".format(batch_name)) + + batch = dataset.add_batch(batch_name) + st.session_state.batches[batch_name] = batch + st.session_state.current_batch = batch_name + st.success("✅ Batch '{}' created!".format(batch_name)) + st.rerun() + except Exception as e: + st.error("Error creating batch: {}".format(str(e))) + + # Show existing batches + if st.session_state.batches: + st.markdown("### Existing Batches") + + batch_list = list(st.session_state.batches.keys()) + selected_batch = st.selectbox("Select a batch to work with:", batch_list) + + if selected_batch: + st.session_state.current_batch = selected_batch + batch = st.session_state.batches[selected_batch] + + # Show batch configuration + st.markdown("#### Batch Configuration") + + col1, col2 = st.columns(2) + + with col1: + st.markdown("**X Variables (Columns)**") + if hasattr(batch, 'x_variables') and batch.x_variables: + st.write(batch.x_variables) + else: + st.info("No x variables defined") + + with col2: + st.markdown("**Y Variables (Rows)**") + if hasattr(batch, 'y_variables') and batch.y_variables: + st.write(batch.y_variables) + else: + st.info("No y variables defined") + + else: + st.info("👆 No batches created yet. Create a new batch to start.") + +# Tab 2: Configure Analysis +with tab2: + st.subheader("Configure Batch Variables") + + if st.session_state.current_batch is None: + st.warning("âš ī¸ Please create or select a batch first.") + else: + batch = st.session_state.batches[st.session_state.current_batch] + st.info("Configuring batch: **{}**".format(st.session_state.current_batch)) + + # X Variables (columns) + st.markdown("### X Variables (Column Variables)") + st.caption("These variables will appear as columns in your crosstabs") + + x_vars = st.multiselect( + "Select X variables:", + all_vars, + key='x_vars_select', + help="Choose variables to appear as columns" + ) + + if st.button("Add X Variables", key='add_x'): + if x_vars: + try: + batch.add_x(x_vars) + st.success("✅ Added {} x variables".format(len(x_vars))) + st.rerun() + except Exception as e: + st.error("Error adding x variables: {}".format(str(e))) + else: + st.warning("Please select at least one variable") + + st.markdown("---") + + # Y Variables (rows) + st.markdown("### Y Variables (Row Variables)") + st.caption("These variables will appear as rows in your crosstabs") + + # Add @ option for base + y_var_options = ['@'] + all_vars + + y_vars = st.multiselect( + "Select Y variables:", + y_var_options, + key='y_vars_select', + help="Choose variables to appear as rows. '@' represents the base/total." + ) + + if st.button("Add Y Variables", key='add_y'): + if y_vars: + try: + batch.add_y(y_vars) + st.success("✅ Added {} y variables".format(len(y_vars))) + st.rerun() + except Exception as e: + st.error("Error adding y variables: {}".format(str(e))) + else: + st.warning("Please select at least one variable") + + st.markdown("---") + + # Filters (optional) + st.markdown("### Filters (Optional)") + st.caption("Apply filters to subset your data") + + with st.expander("Add Filter (Advanced)"): + st.info("Filter functionality available - requires logic expressions") + filter_alias = st.text_input("Filter Name/Alias") + st.text_area( + "Filter Logic", + help="Example: {'gender': [1]} for males only", + placeholder="Enter filter logic as dict" + ) + st.button("Add Filter", disabled=True, help="Advanced feature - coming soon") + + # Weights (optional) + st.markdown("### Weighting (Optional)") + + weight_options = ['None'] + all_vars + weight_var = st.selectbox("Select weight variable:", weight_options) + + if weight_var and weight_var != 'None': + st.info("Weight variable: **{}** will be applied".format(weight_var)) + +# Tab 3: Run Analysis +with tab3: + st.subheader("Run Analysis & Generate Results") + + if st.session_state.current_batch is None: + st.warning("âš ī¸ Please create and configure a batch first.") + else: + batch = st.session_state.batches[st.session_state.current_batch] + st.info("Ready to analyze batch: **{}**".format(st.session_state.current_batch)) + + # Show current configuration + st.markdown("#### Current Configuration") + + col1, col2 = st.columns(2) + + with col1: + st.markdown("**X Variables**") + if hasattr(batch, 'x_variables') and batch.x_variables: + for var in batch.x_variables: + st.write("- " + var) + else: + st.caption("None defined") + + with col2: + st.markdown("**Y Variables**") + if hasattr(batch, 'y_variables') and batch.y_variables: + for var in batch.y_variables: + st.write("- " + var) + else: + st.caption("None defined") + + st.markdown("---") + + # Analysis options + st.markdown("#### Analysis Options") + + col1, col2 = st.columns(2) + + with col1: + aggregations = st.multiselect( + "Select aggregations:", + ['counts', 'c%', 'r%', 'mean', 'median', 'stddev'], + default=['counts', 'c%'], + help="Choose which statistics to calculate" + ) + + with col2: + add_stats = st.checkbox("Add statistical tests", value=False) + if add_stats: + st.caption("âš ī¸ Advanced feature") + + # Run button + st.markdown("---") + + if st.button("â–ļī¸ Run Analysis", type="primary", use_container_width=True): + # Check if batch is configured + if not hasattr(batch, 'x_variables') or not batch.x_variables: + st.error("❌ Please add X variables first") + elif not hasattr(batch, 'y_variables') or not batch.y_variables: + st.error("❌ Please add Y variables first") + else: + try: + with st.spinner("Running analysis... This may take a moment."): + # Create stack from batch + stack = dataset.populate() + + # Add aggregations + if aggregations: + stack.aggregate(aggregations, verbose=False) + + # Store in session state + st.session_state.stack = stack + + st.success("✅ Analysis complete!") + st.balloons() + + # Show quick summary + st.markdown("#### Analysis Summary") + try: + desc = stack.describe() + st.dataframe(desc.head(20), use_container_width=True) + + st.info("📊 {} link(s) created. Go to **Results** page to view details.".format(len(desc))) + except Exception as e: + st.warning("Summary not available: {}".format(str(e))) + + # Navigation to results + if st.button("📈 View Results", type="primary"): + st.switch_page("pages/04_Results.py") + + except Exception as e: + st.error("Error running analysis: {}".format(str(e))) + import traceback + with st.expander("Show Error Details"): + st.code(traceback.format_exc()) + + # Show existing stack info + if st.session_state.stack is not None: + st.markdown("---") + st.success("✅ Analysis results are available") + + if st.button("📈 Go to Results Page", use_container_width=True): + st.switch_page("pages/04_Results.py") + +# Navigation +st.markdown("---") +col1, col2, col3 = st.columns(3) + +with col1: + if st.button("📁 Back to Data Loader", use_container_width=True): + st.switch_page("pages/01_Data_Loader.py") + +with col2: + if st.button("🔍 Back to Explorer", use_container_width=True): + st.switch_page("pages/02_Data_Explorer.py") + +with col3: + if st.button("📈 View Results", use_container_width=True): + st.switch_page("pages/04_Results.py") diff --git a/pages/04_Results.py b/pages/04_Results.py new file mode 100644 index 000000000..340cbce1c --- /dev/null +++ b/pages/04_Results.py @@ -0,0 +1,350 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +""" +Quantipy GUI - Results Page +View and export analysis results +""" + +import streamlit as st +import pandas as pd +import quantipy as qp +import tempfile +import os + +st.set_page_config(page_title="Results", page_icon="📈", layout="wide") + +st.title("📈 Analysis Results") +st.markdown("View and export your analysis results.") + +# Check if dataset and stack are loaded +if 'dataset' not in st.session_state or st.session_state.dataset is None: + st.warning("âš ī¸ No dataset loaded. Please load a dataset first.") + if st.button("📁 Go to Data Loader"): + st.switch_page("pages/01_Data_Loader.py") + st.stop() + +if 'stack' not in st.session_state or st.session_state.stack is None: + st.warning("âš ī¸ No analysis results available. Please run an analysis first.") + if st.button("📊 Go to Analysis"): + st.switch_page("pages/03_Analysis.py") + st.stop() + +dataset = st.session_state.dataset +stack = st.session_state.stack + +st.markdown("---") + +# Results display tabs +tab1, tab2, tab3 = st.tabs(["📊 View Results", "đŸ“Ĩ Export Data", "📋 Summary"]) + +# Tab 1: View Results +with tab1: + st.subheader("Browse Analysis Results") + + # Get stack description + try: + desc = stack.describe() + st.markdown("#### Available Results") + st.dataframe(desc, use_container_width=True) + + st.markdown("---") + st.markdown("#### Select Result to View") + + # Filter options + col1, col2, col3 = st.columns(3) + + with col1: + # Get unique data keys + data_keys = desc['data'].unique().tolist() if 'data' in desc.columns else [] + selected_data = st.selectbox("Data:", data_keys if data_keys else ['No data']) + + with col2: + # Get filters for selected data + if selected_data and selected_data != 'No data': + filters = desc[desc['data'] == selected_data]['filter'].unique().tolist() + selected_filter = st.selectbox("Filter:", filters if filters else ['no_filter']) + else: + selected_filter = 'no_filter' + + with col3: + # Get x variables + if selected_data and selected_data != 'No data': + x_vars = desc[desc['data'] == selected_data]['x'].unique().tolist() + selected_x = st.selectbox("X Variable:", x_vars if x_vars else ['None']) + else: + selected_x = 'None' + + # Y variable selection + if selected_data and selected_data != 'No data' and selected_x and selected_x != 'None': + y_vars = desc[(desc['data'] == selected_data) & (desc['x'] == selected_x)]['y'].unique().tolist() + selected_y = st.selectbox("Y Variable:", y_vars if y_vars else ['@']) + else: + selected_y = '@' + + # View selection + if (selected_data and selected_data != 'No data' and + selected_x and selected_x != 'None'): + try: + link = stack[selected_data][selected_filter][selected_x][selected_y] + view_keys = link.keys() + selected_view = st.selectbox("View:", list(view_keys) if view_keys else ['No views']) + except: + selected_view = 'No views' + st.warning("Unable to access link") + + st.markdown("---") + + # Display selected result + if st.button("📊 Display Result", type="primary"): + try: + with st.spinner("Loading result..."): + link = stack[selected_data][selected_filter][selected_x][selected_y] + + if selected_view and selected_view != 'No views': + result_df = link[selected_view] + + st.markdown("#### Result: {} × {} [{}]".format(selected_x, selected_y, selected_view)) + + # Check if result has dataframe attribute + if hasattr(result_df, 'dataframe'): + display_df = result_df.dataframe + else: + display_df = result_df + + st.dataframe(display_df, use_container_width=True) + + # Download button + csv = display_df.to_csv() + st.download_button( + label="đŸ“Ĩ Download as CSV", + data=csv, + file_name="result_{}_{}_{}.csv".format(selected_x, selected_y, selected_view.replace('|', '_')), + mime="text/csv" + ) + else: + st.warning("No view selected or available") + + except Exception as e: + st.error("Error displaying result: {}".format(str(e))) + import traceback + with st.expander("Show Error Details"): + st.code(traceback.format_exc()) + + except Exception as e: + st.error("Error accessing results: {}".format(str(e))) + import traceback + with st.expander("Show Error Details"): + st.code(traceback.format_exc()) + +# Tab 2: Export Data +with tab2: + st.subheader("Export Results") + + st.markdown("Export your analysis results to various formats.") + + export_format = st.radio( + "Select export format:", + ["Excel (XLSX)", "Quantipy Format", "CSV (All Results)"], + horizontal=True + ) + + st.markdown("---") + + # Excel Export + if export_format == "Excel (XLSX)": + st.markdown("#### Export to Excel") + + st.info("💡 Excel export uses the ExcelPainter to create formatted workbooks") + + excel_name = st.text_input("Workbook name:", value="quantipy_results") + + col1, col2 = st.columns(2) + + with col1: + include_formats = st.checkbox("Include formatting", value=True) + + with col2: + separate_sheets = st.checkbox("Separate sheets per variable", value=True) + + if st.button("📊 Generate Excel File", type="primary"): + try: + with st.spinner("Generating Excel file... This may take a moment."): + # Create temporary directory + temp_dir = tempfile.mkdtemp() + output_path = os.path.join(temp_dir, "{}.xlsx".format(excel_name)) + + # Use ExcelPainter + try: + painter = qp.ExcelPainter(stack) + painter.write_xlsx(output_path) + + # Read the file and offer download + with open(output_path, 'rb') as f: + excel_data = f.read() + + st.success("✅ Excel file generated!") + + st.download_button( + label="đŸ“Ĩ Download Excel File", + data=excel_data, + file_name="{}.xlsx".format(excel_name), + mime="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" + ) + + # Cleanup + os.unlink(output_path) + os.rmdir(temp_dir) + + except Exception as e: + st.error("Excel export error: {}".format(str(e))) + st.info("Note: ExcelPainter may require specific stack structure. Try using basic CSV export instead.") + + except Exception as e: + st.error("Error generating Excel: {}".format(str(e))) + import traceback + with st.expander("Show Error Details"): + st.code(traceback.format_exc()) + + # Quantipy Format Export + elif export_format == "Quantipy Format": + st.markdown("#### Export to Quantipy Format") + + st.info("💡 Saves dataset as JSON metadata + CSV data files") + + export_name = st.text_input("Dataset name:", value=st.session_state.dataset_name) + + if st.button("💾 Export to Quantipy Format", type="primary"): + try: + with st.spinner("Exporting to Quantipy format..."): + # Create temporary files + temp_dir = tempfile.mkdtemp() + json_path = os.path.join(temp_dir, "{}.json".format(export_name)) + csv_path = os.path.join(temp_dir, "{}.csv".format(export_name)) + + # Write files + dataset.write_quantipy(json_path, csv_path) + + # Read files for download + with open(json_path, 'r') as f: + json_data = f.read() + + with open(csv_path, 'r') as f: + csv_data = f.read() + + st.success("✅ Files ready for download!") + + col1, col2 = st.columns(2) + + with col1: + st.download_button( + label="đŸ“Ĩ Download JSON Metadata", + data=json_data, + file_name="{}.json".format(export_name), + mime="application/json" + ) + + with col2: + st.download_button( + label="đŸ“Ĩ Download CSV Data", + data=csv_data, + file_name="{}.csv".format(export_name), + mime="text/csv" + ) + + # Cleanup + os.unlink(json_path) + os.unlink(csv_path) + os.rmdir(temp_dir) + + except Exception as e: + st.error("Error exporting: {}".format(str(e))) + import traceback + with st.expander("Show Error Details"): + st.code(traceback.format_exc()) + + # CSV Export All + elif export_format == "CSV (All Results)": + st.markdown("#### Export All Results to CSV") + + st.info("💡 Exports each result as a separate CSV file in a zip archive") + + if st.button("đŸ“Ļ Generate CSV Archive", type="primary"): + st.warning("This feature requires additional implementation for ZIP creation") + st.info("For now, use the 'View Results' tab to download individual results as CSV") + +# Tab 3: Summary +with tab3: + st.subheader("Analysis Summary") + + try: + desc = stack.describe() + + # Statistics + col1, col2, col3, col4 = st.columns(4) + + with col1: + st.metric("Total Links", len(desc)) + + with col2: + num_x_vars = desc['x'].nunique() if 'x' in desc.columns else 0 + st.metric("X Variables", num_x_vars) + + with col3: + num_y_vars = desc['y'].nunique() if 'y' in desc.columns else 0 + st.metric("Y Variables", num_y_vars) + + with col4: + num_views = desc['view'].nunique() if 'view' in desc.columns else 0 + st.metric("View Types", num_views) + + st.markdown("---") + + # Breakdown by variable + st.markdown("#### Results by X Variable") + if 'x' in desc.columns: + x_summary = desc.groupby('x').size().reset_index(name='count') + st.dataframe(x_summary, use_container_width=True) + + st.markdown("#### Results by Y Variable") + if 'y' in desc.columns: + y_summary = desc.groupby('y').size().reset_index(name='count') + st.dataframe(y_summary, use_container_width=True) + + st.markdown("#### Results by View Type") + if 'view' in desc.columns: + view_summary = desc.groupby('view').size().reset_index(name='count') + st.dataframe(view_summary, use_container_width=True) + + st.markdown("---") + + # Full description table + st.markdown("#### Complete Results Listing") + st.dataframe(desc, use_container_width=True, height=400) + + # Download summary + csv = desc.to_csv() + st.download_button( + label="đŸ“Ĩ Download Summary as CSV", + data=csv, + file_name="analysis_summary.csv", + mime="text/csv" + ) + + except Exception as e: + st.error("Error generating summary: {}".format(str(e))) + +# Navigation +st.markdown("---") +col1, col2, col3 = st.columns(3) + +with col1: + if st.button("📁 Back to Data Loader", use_container_width=True): + st.switch_page("pages/01_Data_Loader.py") + +with col2: + if st.button("🔍 Back to Explorer", use_container_width=True): + st.switch_page("pages/02_Data_Explorer.py") + +with col3: + if st.button("📊 Back to Analysis", use_container_width=True): + st.switch_page("pages/03_Analysis.py") diff --git a/requirements_streamlit.txt b/requirements_streamlit.txt new file mode 100644 index 000000000..7bf4e7254 --- /dev/null +++ b/requirements_streamlit.txt @@ -0,0 +1,23 @@ +# Streamlit GUI Requirements +# Install with: pip install -r requirements_streamlit.txt + +# Core Streamlit +streamlit>=0.89.0 + +# Visualization +plotly>=4.14.0 + +# Quantipy dependencies (if not already installed) +numpy==1.11.3 +pandas==0.19.2 +scipy==0.18.1 +ftfy==4.4.3 +xmltodict +lxml +xlsxwriter +prettytable +decorator +watchdog +requests +python-pptx +functools32 diff --git a/run_streamlit.sh b/run_streamlit.sh new file mode 100755 index 000000000..6aeae1eb3 --- /dev/null +++ b/run_streamlit.sh @@ -0,0 +1,42 @@ +#!/bin/bash + +# Quantipy Streamlit GUI Launcher +# This script starts the Streamlit application + +echo "=========================================" +echo "Quantipy Streamlit GUI" +echo "=========================================" +echo "" + +# Check if streamlit is installed +if ! command -v streamlit &> /dev/null +then + echo "âš ī¸ Streamlit is not installed!" + echo "" + echo "Please install requirements:" + echo " pip install -r requirements_streamlit.txt" + echo "" + exit 1 +fi + +# Check if quantipy is available +python -c "import quantipy" 2>/dev/null +if [ $? -ne 0 ]; then + echo "âš ī¸ Quantipy is not installed!" + echo "" + echo "Please install requirements:" + echo " pip install -r requirements_dev.txt" + echo "" + exit 1 +fi + +echo "✅ Dependencies found" +echo "" +echo "Starting Streamlit application..." +echo "The app will open in your default browser at http://localhost:8501" +echo "" +echo "Press Ctrl+C to stop the server" +echo "" + +# Run streamlit +streamlit run streamlit_app.py diff --git a/streamlit_app.py b/streamlit_app.py new file mode 100644 index 000000000..3cdfeefc2 --- /dev/null +++ b/streamlit_app.py @@ -0,0 +1,155 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +""" +Quantipy GUI - Streamlit Application +Main entry point for the Quantipy data analysis interface +""" + +import streamlit as st +import sys +import os + +# Configure page +st.set_page_config( + page_title="Quantipy GUI", + page_icon="📊", + layout="wide", + initial_sidebar_state="expanded" +) + +# Add custom CSS +st.markdown(""" + +""", unsafe_allow_html=True) + +# Main page +st.markdown('

📊 Quantipy GUI

', unsafe_allow_html=True) +st.markdown('

Python for People Data - Interactive Analysis Interface

', unsafe_allow_html=True) + +# Introduction +st.markdown(""" +## Welcome to Quantipy GUI + +This interactive application provides a user-friendly interface for **Quantipy**, +a powerful data processing and analysis tool designed for survey and market research data. + +### Features: +- 📁 **Data Import**: Load data from multiple formats (Quantipy, SPSS, CSV) +- 🔍 **Data Exploration**: Browse variables, view metadata, and crosstabs +- 📊 **Analysis**: Create batches, define cross-tabulations, and generate statistics +- 📈 **Visualization**: View results with interactive charts and tables +- 💾 **Export**: Save results to Excel, SPSS, or other formats + +### Getting Started: + +Use the sidebar navigation to access different features: + +1. **📁 Data Loader** - Import and manage your datasets +2. **🔍 Data Explorer** - Explore variables and metadata +3. **📊 Analysis** - Create batches and run analyses +4. **📈 Results** - View and export results + +--- + +### Quick Start Example: + +""") + +# Show example workflow +col1, col2, col3 = st.columns(3) + +with col1: + st.markdown(""" + #### Step 1: Load Data + - Upload CSV/JSON files + - Or use example data + - Select SPSS or other formats + """) + +with col2: + st.markdown(""" + #### Step 2: Explore + - Browse variables + - View crosstabs + - Check metadata + """) + +with col3: + st.markdown(""" + #### Step 3: Analyze + - Create batches + - Define x/y variables + - Generate results + """) + +st.markdown("---") + +# Sample data section +st.subheader("đŸ“Ļ Load Example Data") +st.markdown(""" +Click below to load the example dataset and start exploring immediately: +""") + +if st.button("🚀 Load Example Data & Explore", type="primary"): + st.session_state.load_example = True + st.switch_page("pages/01_Data_Loader.py") + +st.markdown("---") + +# Information section +with st.expander("â„šī¸ About Quantipy"): + st.markdown(""" + **Quantipy** is an open-source data processing, analysis and reporting software project + that builds on the excellent pandas and numpy libraries. Aimed at people data, Quantipy + offers support for: + + - Native handling of special data types like multiple choice variables + - Statistical analysis using case or observation weights + - DataFrame metadata management + - Pretty data exports to multiple formats + + **Key Features:** + - Multiple data format support (CSV, SPSS, Dimensions, Decipher, Ascribe) + - Open metadata format to describe and manage datasets + - Powerful, metadata-driven data cleaning and transformation + - Computation and assessment of data weights + - Easy-to-use analysis interface with Batch definitions + - Structured reporting via Chain and Cluster containers + - Export to SPSS, Excel, PowerPoint with flexible layouts + + **Documentation:** [readthedocs.org/quantipy](http://quantipy.readthedocs.io/) + """) + +with st.expander("💡 Tips for Using This App"): + st.markdown(""" + - **Navigation**: Use the sidebar to switch between different pages + - **Data Persistence**: Your loaded dataset stays in memory across pages + - **Session State**: The app maintains your work during the session + - **File Upload**: Supports drag-and-drop for easy file loading + - **Export**: Generate reports in multiple formats from the Results page + - **Example Data**: Use the built-in example dataset to explore features + """) + +# Footer +st.markdown("---") +st.markdown(""" +
+ Quantipy GUI | Built with Streamlit | Python 2.7 Compatible +
+""", unsafe_allow_html=True) From b77c16ece365f5b1ce90131ab9c9ab87963336f9 Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 5 Nov 2025 21:59:40 +0000 Subject: [PATCH 3/3] Update CLAUDE.md to include Streamlit GUI documentation Add a new section documenting the Streamlit GUI that was recently added to the project. Includes: - How to run the Streamlit app - GUI architecture overview - Multi-page structure explanation - Key implementation notes - Reference to detailed STREAMLIT_README.md This ensures future Claude Code instances are aware of the GUI component and can work with it effectively. --- CLAUDE.md | 37 +++++++++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+) diff --git a/CLAUDE.md b/CLAUDE.md index d8108af03..d5bfa214e 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -38,6 +38,43 @@ bash install_dev.sh - pandas==0.19.2 - Additional: xlsxwriter, python-pptx, lxml, ftfy, xmltodict +## Streamlit GUI + +A web-based graphical interface is available for Quantipy, providing interactive access to core functionality without requiring code. + +### Running the Streamlit App + +```bash +# Install Streamlit dependencies +pip install -r requirements_streamlit.txt + +# Start the application +streamlit run streamlit_app.py + +# Or use the launcher script +./run_streamlit.sh +``` + +The app will open at `http://localhost:8501` + +### GUI Architecture + +The Streamlit app uses a multi-page architecture with session state for data persistence: + +- **`streamlit_app.py`**: Main entry point and home page +- **`pages/01_Data_Loader.py`**: Data import from multiple formats (Quantipy, CSV, SPSS) +- **`pages/02_Data_Explorer.py`**: Variable browsing, frequencies, crosstabs, metadata viewing +- **`pages/03_Analysis.py`**: Batch creation and analysis configuration +- **`pages/04_Results.py`**: Results viewing and export (Excel, CSV, Quantipy format) + +Key implementation notes: +- Session state (`st.session_state`) maintains dataset, stack, and batches across pages +- Uses Plotly for interactive visualizations +- Temporary files for upload/download operations (cleaned up after use) +- Compatible with Python 2.7 Streamlit versions + +See `STREAMLIT_README.md` for detailed GUI documentation. + ## Testing ### Run All Tests