Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 113 additions & 0 deletions docs/LAZY_LOADING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# Lazy Loading for Changesets

## Overview

The `ChangesetsGet` method now supports lazy loading to handle queries that return more than 100 changesets. The OpenStreetMap API limits responses to 100 changesets per request, so lazy loading automatically fetches additional batches as needed.

## Usage

### Default Behavior (Lazy Loading Enabled)

By default, `ChangesetsGet` returns a `ChangesetsResponse` object that behaves like a dictionary but loads data on demand:

```python
import osmapi

api = osmapi.OsmApi()

# Returns a ChangesetsResponse object
# Only the first batch (up to 100 changesets) is loaded immediately
changesets = api.ChangesetsGet(username="someuser")

# Accessing data triggers loading of additional batches if needed
for changeset_id in changesets:
print(f"Changeset {changeset_id}: {changesets[changeset_id]}")
```

### Dict-Like Interface

The `ChangesetsResponse` object supports all standard dictionary operations:

```python
# Length (returns count of currently loaded changesets)
print(len(changesets))

# Iteration (loads additional batches as needed)
for changeset_id in changesets:
print(changeset_id)

# Key access (loads all data if changeset not found in current batch)
changeset = changesets[12345]

# Contains check
if 12345 in changesets:
print("Found!")

# Dict methods (load all remaining data)
keys = changesets.keys()
values = changesets.values()
items = changesets.items()

# Get with default
changeset = changesets.get(12345, None)
```

### Converting to Regular Dict

If you need a regular Python dictionary (e.g., for JSON serialization):

```python
# Load all data and return as a regular dict
regular_dict = changesets.as_dict()
```

### Disabling Lazy Loading (Backward Compatibility)

If you want the original behavior (single API call, maximum 100 results):

```python
# Returns a regular dict with only the first batch
changesets = api.ChangesetsGet(username="someuser", lazy_load=False)
```

## Benefits

1. **Automatic Pagination**: No need to manually handle multiple API requests
2. **Memory Efficient**: Data is loaded only when accessed
3. **Backward Compatible**: Existing code continues to work
4. **Transparent**: The response object behaves like a regular dict

## Implementation Details

- **First Batch Loaded Eagerly**: The first batch (up to 100 changesets) is loaded immediately when `ChangesetsGet` is called
- **Subsequent Batches Loaded on Demand**: Additional batches are fetched automatically when you iterate or access data not yet loaded
- **Pagination Strategy**: Uses the timestamp of the last loaded changeset to request the next batch with `order=oldest`
- **End Detection**: Stops loading when a batch returns fewer than 100 changesets

## Example: Loading All Changesets for a User

```python
import osmapi

api = osmapi.OsmApi()

# Get all changesets for a user (may be >100)
changesets = api.ChangesetsGet(username="metaodi")

# Iterate through all changesets - additional batches loaded automatically
for changeset_id, changeset_data in changesets.items():
print(f"Changeset {changeset_id} created at {changeset_data['created_at']}")

# Total count (all batches loaded at this point)
print(f"Total changesets: {len(changesets)}")

api.close()
```

## Comparison with Similar Libraries

This implementation follows a pattern similar to:
- [swissparlpy](https://github.com/metaodi/swissparlpy) - Swiss Parliament OData API wrapper
- [sruthi](https://github.com/metaodi/sruthi) - SRU client with DataLoader pattern

The lazy loading approach provides a clean API while efficiently handling large result sets.
104 changes: 104 additions & 0 deletions examples/lazy_loading_changesets.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
#!/usr/bin/env python3
"""
Example demonstrating lazy loading of changesets using osmapi.

When there are more than 100 changesets matching the query, the OSM API
returns results in batches of up to 100 items. The ChangesetsGet method
with lazy_load=True (default) automatically handles pagination, fetching
additional batches as needed.

This example shows how to:
1. Get changesets with automatic lazy loading (default behavior)
2. Iterate through all changesets without loading all data upfront
3. Access specific changesets by ID
4. Convert to a regular dict when needed
"""

import osmapi

# Create an API client
api = osmapi.OsmApi()

print("Example 1: Lazy loading (default behavior)")
print("=" * 60)

# Get changesets for a user - returns a ChangesetsResponse object
# Only the first batch (up to 100 changesets) is loaded immediately
result = api.ChangesetsGet(username="metaodi")

# The response object behaves like a dict
print(f"Type: {type(result)}")
print(f"Number of changesets loaded so far: {len(result)}")

# Accessing a specific changeset by ID
# If it's not in the loaded data, more batches will be fetched automatically
if len(result) > 0:
first_id = list(result.keys())[0]
print(f"\nFirst changeset ID: {first_id}")
print(f"Changeset data: {result[first_id]}")

print("\n" + "=" * 60)
print("Example 2: Iterating through all changesets")
print("=" * 60)

# When iterating, additional batches are loaded automatically as needed
count = 0
for changeset_id in result:
count += 1
if count <= 3:
print(f"Changeset {changeset_id}: {result[changeset_id].get('created_at')}")
elif count == 4:
print("...")

print(f"\nTotal changesets iterated: {count}")

print("\n" + "=" * 60)
print("Example 3: Dict-like operations")
print("=" * 60)

# You can use dict methods like keys(), values(), items()
# These load all remaining data
print(f"Keys: {list(result.keys())[:5]}...")
print(f"Has specific ID: {first_id in result}")

# Get with default value
print(f"Get non-existent: {result.get(99999999, 'Not found')}")

print("\n" + "=" * 60)
print("Example 4: Convert to regular dict")
print("=" * 60)

# If you need a regular dict (e.g., for serialization), use as_dict()
regular_dict = result.as_dict()
print(f"Type after conversion: {type(regular_dict)}")
print(f"Number of changesets: {len(regular_dict)}")

print("\n" + "=" * 60)
print("Example 5: Disable lazy loading (backward compatibility)")
print("=" * 60)

# If you want the old behavior (load once, return dict), use lazy_load=False
# This will only return the first batch (up to 100 changesets)
result_eager = api.ChangesetsGet(username="metaodi", lazy_load=False)
print(f"Type: {type(result_eager)}")
print(f"Number of changesets: {len(result_eager)}")
print("Note: With lazy_load=False, you only get the first 100 changesets")

print("\n" + "=" * 60)
print("Example 6: Filter by other criteria")
print("=" * 60)

# You can combine filters - lazy loading works with all parameters
result_filtered = api.ChangesetsGet(
username="metaodi",
only_closed=True,
# closed_after="2020-01-01T00:00:00Z",
)
print(f"Filtered changesets: {len(result_filtered)} (first batch loaded)")

# Clean up
api.close()

print("\n" + "=" * 60)
print("Example complete!")
print("=" * 60)
74 changes: 53 additions & 21 deletions osmapi/OsmApi.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@
from . import http
from . import parser
from . import xmlbuilder
from . import response


logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -1432,9 +1433,10 @@ def ChangesetsGet( # noqa
created_before=None,
only_open=False,
only_closed=False,
lazy_load=True,
):
"""
Returns a dict with the id of the changeset as key
Returns a dict-like object with the id of the changeset as key
matching all criteria:

#!python
Expand All @@ -1445,37 +1447,67 @@ def ChangesetsGet( # noqa
}

All parameters are optional.

If lazy_load is True (default), returns a ChangesetsResponse object that
loads data from the API on demand. This is useful when there are more than
100 changesets, as the API limits responses to 100 items per request.

If lazy_load is False, returns a regular dict with all changesets loaded
immediately (backward compatible behavior, but limited to first 100 results).
"""

uri = "/api/0.6/changesets"
params = {}
base_params = {}
if min_lon or min_lat or max_lon or max_lat:
params["bbox"] = f"{min_lon},{min_lat},{max_lon},{max_lat}"
base_params["bbox"] = f"{min_lon},{min_lat},{max_lon},{max_lat}"
if userid:
params["user"] = userid
base_params["user"] = userid
if username:
params["display_name"] = username
base_params["display_name"] = username
if closed_after and not created_before:
params["time"] = closed_after
base_params["time"] = closed_after
if created_before:
if not closed_after:
closed_after = "1970-01-01T00:00:00Z"
params["time"] = f"{closed_after},{created_before}"
base_params["time"] = f"{closed_after},{created_before}"
if only_open:
params["open"] = 1
base_params["open"] = 1
if only_closed:
params["closed"] = 1

if params:
uri += "?" + urllib.parse.urlencode(params)

data = self._session._get(uri)
changesets = dom.OsmResponseToDom(data, tag="changeset")
result = {}
for curChangeset in changesets:
tmpCS = dom.DomParseChangeset(curChangeset)
result[tmpCS["id"]] = tmpCS
return result
base_params["closed"] = 1

if lazy_load:
# Return a lazy loading response object
def uri_builder(next_timestamp=None):
uri = "/api/0.6/changesets"
params = base_params.copy()

if next_timestamp:
# For pagination, we need to get changesets after this timestamp
params["time"] = f"{next_timestamp},"
# Request oldest first to properly paginate
params["order"] = "oldest"

if params:
uri += "?" + urllib.parse.urlencode(params)
return uri

return response.ChangesetsResponse(
session=self._session,
uri_builder=uri_builder,
params=base_params,
)
else:
# Original implementation: load once and return dict
uri = "/api/0.6/changesets"
if base_params:
uri += "?" + urllib.parse.urlencode(base_params)

data = self._session._get(uri)
changesets = dom.OsmResponseToDom(data, tag="changeset")
result = {}
for curChangeset in changesets:
tmpCS = dom.DomParseChangeset(curChangeset)
result[tmpCS["id"]] = tmpCS
return result

def ChangesetComment(self, ChangesetId, comment):
"""
Expand Down
1 change: 1 addition & 0 deletions osmapi/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,4 @@

from .OsmApi import * # noqa
from .errors import * # noqa
from . import response # noqa
Loading
Loading