Skip to content

fix(data): provide _dataset_uri fallback on DatasetProvider#2214

Open
genisis0x wants to merge 1 commit into
microsoft:mainfrom
genisis0x:fix/local-dataset-uri-1843
Open

fix(data): provide _dataset_uri fallback on DatasetProvider#2214
genisis0x wants to merge 1 commit into
microsoft:mainfrom
genisis0x:fix/local-dataset-uri-1843

Conversation

@genisis0x
Copy link
Copy Markdown

Summary

Fixes #1843. When qlib runs without a `DatasetCache` wrapper, the global `DatasetD` `Wrapper` is registered with a bare `LocalDatasetProvider` instance. `LocalProvider.features_uri` (`qlib/data/data.py:1221`) then calls:

```python
return DatasetD._dataset_uri(instruments, fields, start_time, end_time, freq, disk_cache)
```

That goes through `Wrapper.getattr` to find `_dataset_uri` on the provider — and `LocalDatasetProvider` doesn't have one. The cache-aware override lives on `DatasetCache` / `DiskDatasetCache` only, so the no-cache code path crashes with:

```
AttributeError: 'LocalDatasetProvider' object has no attribute '_dataset_uri'
```

Approach

Add a base `_dataset_uri` to `DatasetProvider` that returns `""` by convention. That mirrors the existing `""` = "no URI, fetch directly" signal that `DiskDatasetCache._dataset_uri` already emits on its `disk_cache == 0` branch (`qlib/data/cache.py:750`), so the contract isn't a new invention — the caller already handles `""` correctly.

Cache subclasses keep their explicit overrides, so the wrapped-with-cache path is unchanged. The fallback also covers `ClientDatasetProvider` and any third-party providers that subclass `DatasetProvider` without routing through a cache, not just `LocalDatasetProvider` specifically.

Test

New `tests/misc/test_dataset_provider_uri.py` with three regressions:

  • attribute presence on `LocalDatasetProvider` (the AttributeError pinned exactly)
  • the empty-string return for the standard parameter shape
  • stability across `disk_cache` values 0/1/2 so a future refactor can't reintroduce the crash by adding a misplaced branch

```
$ .venv/bin/python -m pytest tests/misc/test_dataset_provider_uri.py -v
tests/misc/test_dataset_provider_uri.py::DatasetProviderURITest::test_disk_cache_value_is_ignored_in_fallback PASSED
tests/misc/test_dataset_provider_uri.py::DatasetProviderURITest::test_local_dataset_provider_has_dataset_uri PASSED
tests/misc/test_dataset_provider_uri.py::DatasetProviderURITest::test_local_dataset_provider_returns_empty_uri PASSED
============================== 3 passed in 0.94s ===============================
```

Fixes #1843

Fixes microsoft#1843. When qlib runs without a `DatasetCache` wrapper the
`DatasetD` `Wrapper` registers a bare `LocalDatasetProvider` instance.
`LocalProvider.features_uri` then calls `DatasetD._dataset_uri(...)`
unconditionally, which goes through `Wrapper.__getattr__` to look up
`_dataset_uri` on the provider — and `LocalDatasetProvider` doesn't
have one. The cache-aware override lives on `DatasetCache` /
`DiskDatasetCache` only, so the no-cache code path crashes with
`AttributeError: 'LocalDatasetProvider' object has no attribute
'_dataset_uri'`.

Add a base `_dataset_uri` to `DatasetProvider` that returns `""` by
convention — the same "no URI, fetch directly" signal that
`DiskDatasetCache._dataset_uri` already emits on its `disk_cache == 0`
branch. Cache subclasses continue to override this with a real URI
implementation, so the wrapped-with-cache path is unchanged.

The new fallback covers any provider that subclasses `DatasetProvider`
without going through a cache (LocalDatasetProvider,
ClientDatasetProvider, plus any third-party providers users register).

Adds `tests/misc/test_dataset_provider_uri.py` with three regressions:
attribute presence, the empty-string return, and stability across
`disk_cache` values so future refactors can't reintroduce the crash.
@genisis0x
Copy link
Copy Markdown
Author

Read through the CLA — all good. @microsoft-github-policy-service agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AttributeError: 'LocalDatasetProvider' object has no attribute '_dataset_uri'

1 participant