Skip to content

Bug: Downloading multiple data tables is slow #218

@sergiopeixoto-seequent

Description

@sergiopeixoto-seequent

Bug Description

Retrieving multiple data tables from a Geoscience Object is slow.

Steps to Reproduce

` dhc = await object_api_client.download_object_by_id(object_uuid)
dhc_dict = dhc.as_dict()

data_client = object_api_client.get_data_client(manager.cache)

tables_coroutines = []
data_urls = []
for collection in dhc_dict["collections"]:
    if collection["name"] == "DerivedCPTReadings":
        for attribute in collection["distance"]["attributes"]:
            tables_coroutines.append(dhc.download_table(attribute["values"]))
            data_urls.append(dhc._urls_by_name[attribute["values"]["data"]])

sdk_start = time.perf_counter()
sdk_tables = await asyncio.gather(*tables_coroutines)
sdk_elapsed = time.perf_counter() - sdk_start
print(f"Download with SDK took {sdk_elapsed:.2f}")

async def _download_one(session: aiohttp.ClientSession, url: str) -> pa.Table:
    async with session.get(url) as resp:
        resp.raise_for_status()
        return pq.read_table(BytesIO(await resp.read()))

async with aiohttp.ClientSession() as session:
    non_sdk_start = time.perf_counter()
    tables = await asyncio.gather(*[_download_one(session, url) for url in data_urls])
    non_sdk_elapsed = time.perf_counter() - non_sdk_start
    print(f"Download without SDK took {non_sdk_elapsed:.2f}") 

`

Expected Behavior

Download with SDK took 0.99
Download without SDK took 0.99

Actual Behavior

Download with SDK took 10.30
Download without SDK took 0.99

Environment

  • Windows 11
  • Python 3.12.3
  • evo-compute==0.0.1rc3
  • evo-sdk-common==0.5.15

Acceptance Criteria

The speed is the same using the SDK or using raw requests

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions