Kedro pipeline#3
Open
VJausovec wants to merge 70 commits into
Open
Conversation
…umentation - Added initial documentation structure in index.rst - Created .gitkeep in notebooks directory to track empty folder - Set up pyproject.toml for project dependencies and configuration - Added requirements.txt for additional dependencies - Implemented __init__.py and __main__.py for package execution - Registered pipelines for data ingestion, feature engineering, and risk scoring - Developed nodes for data ingestion, feature processing, and risk scoring calculations - Established settings.py for project configuration
…efactor on rissk/utils extract_zip to clean up password handling
…e in data ingestion
… ingestion notebook execution counts
…a, questionnaires, and microdata; update catalog paths to include survey name
…talog paths for hies2024 survey
- Renamed nodes in the feature engineering pipeline for clarity: - `process_paradata_timestamps` to `process_paradata_node` - `filter_active_events` to `filter_active_paradata_node` - Updated inputs and outputs in the pipeline nodes to reflect changes in data structure. - Modified test ingestion notebook to adjust execution counts and outputs for consistency. - Enhanced output comparison in the notebook to reflect changes in data structure and ensure accurate testing.
…catalog not a raw path parameter - file cleanup
… extraction, change README and upgrade to python 3.13
- implement FolderDataset - enhance zip extraction - update catalog configuration
…enhance error handling; update nodes.py to remove unnecessary debug logging; modify test_ingestion.ipynb to reflect changes in data structure and update expected outputs.
…dling; update catalog.yml and globals.yml for dynamic data paths; modify parameters.yml to pull questionnaire config from globals; clean up nodes.py and pipeline.py; remove unused test files.
…g.yml and globals.yml for dynamic data paths; modify parameters.yml to pull questionnaire config from globals; clean up nodes.py and pipeline.py
…pdate read_microdata_file to avoid direct import dependency on StataMissingValue; adjust globals.yml for questionnaire version consistency
…tures for comment handling and translation positions; update input parameters for nodes; enhance data validation logic in microdata retrieval.
… functions to public; update pipeline and configuration files for legacy data handling; enhance parameters for new features.
…r testing add legacy microdata and paradata to catalog
…g answer removed feature; enhance GPS data extraction and validation logic.
…pdate catalog with new dataframes for scoring.
…; add new nodes for scoring logic and restructure pipeline definition.
…tion parameter handling and improve GPS score calculations.
…ce contamination parameter handling and GPS score calculations.
…ataFrames; update file types for risk scores to CSV format.
…ltering; update related feature functions for consistency in handling numeric data.
…actor questionnaire loading and processing logic.
… feature engineering components and enhance logging for questionnaire processing.
…nhance documentation for clarity.
…re and update aggregation in unit processing to maintain legacy compatibility.
…s from unit output
…ration; ensure legacy behavior for responsible score columns by filling NaNs with 0.
…clarify handling of removed_answers and improve legacy compatibility.
…redundant functions and enhance output structure by adding 'qnr' to responsible scores.
…equirement, clarify scoring functions, and improve GUI launcher scripts.
…rity; update scoring logic to capture two decimal digits.
…ML files for clarity and maintenance.
…uppress RuntimeWarnings in calculate_first_decimals_score for cleaner logging. Co-authored-by: Copilot <copilot@github.com>
- Updated `requirements.txt` to include new Kedro framework dependencies and remove outdated ones. - Refactored `item_processing_kedro.py` and `unit_processing_kedro.py` to add commented out legacy scoring functions not yet ported. - Modified `SETUP.md` for clearer instructions on environment setup and questionnaire configuration. - Adjusted `catalog.yml` to use `questionnaire.name` instead of `survey.name` for data paths. - Updated `globals.yml` to reflect questionnaire configuration instead of survey configuration. - Changed `parameters.yml` to align with new questionnaire structure. - Simplified `pipeline_registry.py` by removing questionnaire loading logic and directly registering pipelines. - Refined `data_ingestion` and `feature_creation` nodes to work with the new questionnaire structure. - Enhanced `rissk_scoring` pipeline to include consent filtering based on the updated questionnaire configuration. Co-authored-by: Copilot <copilot@github.com>
…nctions; update environment configuration for clarity and maintenance. Co-authored-by: Copilot <copilot@github.com>
responsible score: if don't fill with zero before multiplying (if 1 responsible is missing combined score if NaN, same as legacy) transform_multi - minor tweak for texlist questions where we keep N/A if question is enabled and all answers are N/A. This was previously dropped. In practice no such questions exist in test data so effect negligible. Co-authored-by: Copilot <copilot@github.com>
… ensure responsible_score presence in calculations and reorder columns in final csv output Co-authored-by: Copilot <copilot@github.com>
…; update requirements to remove loguru dependency.
- Removed obsolete env.yaml and environment_kedro.yml files. - Updated environment.yml to reflect new package versions and dependencies. - Deleted main.py, pipeline.yaml, and rissk_readme.ipynb as part of project restructuring. - Cleaned up unnecessary imports and code related to previous project setup.
…, and unit processing modules to streamline the codebase and improve maintainability.
- Deleted `file_manager_utils.py`, `file_process_utils.py`, `import_utils.py`, `stats_utils.py` as they are no longer needed. - Removed `run_gui.bat` and `run_gui.sh` scripts for GUI launching. - Cleaned up `setup.py` and `setup.cfg` as part of the project restructuring. - Updated `SETUP.md` to reflect changes in GUI launch instructions. Co-authored-by: Copilot <copilot@github.com>
…ce installation instructions, clarify GUI launch commands, and correct output file naming. Co-authored-by: Copilot <copilot@github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Kedro pipeline version of RISSK Project