Skip to content

Remove redundant code, fix bugs, and improve efficiency#32

Open
benhayes21 wants to merge 1 commit intodevelopfrom
clean-up
Open

Remove redundant code, fix bugs, and improve efficiency#32
benhayes21 wants to merge 1 commit intodevelopfrom
clean-up

Conversation

@benhayes21
Copy link
Contributor

Summary

  • Bug fixes: Fixed split_azure_url using parts.get() instead of query.get(), fixed broken urllib.parse.urlunparse references, and fixed shallow inheritance check in load_class that only checked direct parents
  • Deduplicated code: Extracted shared helpers for pandas reader (_read_with), pyarrow reader source resolution, root dir normalization (_normalize_root_dir), StrictRootDirFs safe checks, and log level parsing
  • Efficiency improvements: Replaced O(n*m) column mapping in OasisDaskReader.apply_sql with O(n+m) dict lookup, fixed unbounded pre_sql_columns growth, switched URL downloads to streaming via shutil.copyfileobj
  • Cleanup: Removed redundant httpx exception subclasses, dead code in ComplexData.run, unnecessary Python 3.8 version guards, unused import, and inconsistent logging.info vs self.logger usage

Net result: 13 files changed, 159 insertions, 147 deletions.

Test plan

  • All local storage tests pass (18/18)
  • All df_reader tests pass (39 passed, 7 skipped due to dask-sql/geodatasets not installed)
  • All complex module tests pass (29/29 non-cloud)
  • All caching tests pass (11/11 local context)
  • Verify S3 and Azure backend tests pass in CI (failures in local env are pre-existing LocalStack region config issues)

🤖 Generated with Claude Code

- Deduplicate read_csv/read_parquet in OasisPandasReader via _read_with helper
- Consolidate triplicated dataset logic in OasisPyarrowReader.read_parquet
- Extract _normalize_root_dir to BaseStorage, replacing duplicated stripping in S3/Azure backends
- Extract _safe_check helper in StrictRootDirFs for repeated try/except pattern
- Extract _parse_log_level helper in log.py for duplicated log level parsing
- Remove unnecessary Python 3.8 version guards in config modules
- Remove redundant httpx exception subclasses in RestComplexData
- Fix dead ternary branch in ComplexData.run
- Fix O(n*m) column mapping in OasisDaskReader.apply_sql with dict-based O(n+m) lookup
- Fix unbounded pre_sql_columns growth in apply_sql by using local list
- Fix inconsistent logging.info vs self.logger usage in BaseStorage
- Use shutil.copyfileobj for streaming URL downloads instead of reading into memory
- Fix bug: split_azure_url used parts.get() instead of query.get()
- Fix unused import and broken urllib.parse.urlunparse references in filestore.py
- Fix shallow inheritance check in load_class using issubclass()
- Add CLAUDE.md for Claude Code guidance

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant