From 6dcc5d02ff848ef01ec6763eb5af042ae38cd5f4 Mon Sep 17 00:00:00 2001 From: Jaladh Singhal Date: Tue, 26 May 2026 21:00:05 -0700 Subject: [PATCH 1/7] Add tutorial for accessing ZTF DR24 light curves from HATS catalog --- tutorials/ztf/ztf_lightcurves.md | 434 +++++++++++++++++++++++++++++++ 1 file changed, 434 insertions(+) create mode 100644 tutorials/ztf/ztf_lightcurves.md diff --git a/tutorials/ztf/ztf_lightcurves.md b/tutorials/ztf/ztf_lightcurves.md new file mode 100644 index 00000000..19f3e612 --- /dev/null +++ b/tutorials/ztf/ztf_lightcurves.md @@ -0,0 +1,434 @@ +--- +authors: +- name: Jaladh Singhal +- name: IRSA Data Science Team +jupytext: + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.19.3 +kernelspec: + display_name: irsa-tutorials + language: python + name: python3 +--- + +(ztf-lightcurves-lsdb)= +# Access ZTF DR24 Light Curves from HATS Catalog + ++++ + +## Learning Goals + +By the end of this tutorial, you will learn how to: + +- Open ZTF DR24 HATS catalogs for light curves and the Objects Table using `lsdb`. +- Retrieve light curves for specific sources by ZTF object IDs using an index search. +- Retrieve light curves for sources in a sky region using a cone search on RA and Dec. +- Cross-reference the Objects Table to enrich cone search results with per-source variability statistics. +- Plot ZTF light curves filtered by variability. + ++++ + +## Introduction + +The ZTF DR24 enhanced data products at IRSA include two [HATS](https://irsa.ipac.caltech.edu/docs/parquet_catalogs/#hats) (Hierarchical Adaptive Tiling Scheme) catalogs hosted on AWS S3: + +- **Lightcurves catalog**: one row per ZTF object, with a nested column storing the full photometry time series — timestamps, magnitudes, uncertainties, and quality flags. +- **Objects Table**: one row per ZTF object per band, with collapsed light curve metrics such as magnitude RMS, chi-squared variability statistic, number of good observations, and mean magnitude. + +These HATS catalogs offer a scalable, cloud-native alternative to the ZTF light curve service, enabling efficient access especially when the service is overloaded. +The [lsdb](https://docs.lsdb.io/en/latest/index.html) Python library provides a convenient interface for working with HATS catalogs, including spatial queries and object-ID-based lookups. + +This tutorial covers two common entry points for accessing ZTF light curves: + +1. **Object IDs**: you have specific ZTF object IDs — from a previous query, a catalog crossmatch, or a published source list — and want their light curves directly. +2. **RA/Dec**: you have sky coordinates and want all ZTF sources within a given radius. + +Both approaches are demonstrated below. An optional section then shows how to join the position search results with the Objects Table to select and plot the most variable sources using robust variability statistics. + +For more context on ZTF DR24 data products, refer to the [ZTF DR24 release notes](https://irsa.ipac.caltech.edu/data/ZTF/docs/releases/ztf_release_notes_latest) and [explanatory supplement](https://irsa.ipac.caltech.edu/data/ZTF/docs/ztf_explanatory_supplement.pdf) at IRSA. + ++++ + +## Imports + +```{code-cell} ipython3 +# Uncomment the next line to install dependencies if needed. +# !pip install s3fs "lsdb>=0.6.6,<0.8" pyarrow pandas numpy astropy matplotlib +``` + +```{code-cell} ipython3 +import s3fs +import lsdb +import pyarrow.parquet as pq +from astropy.coordinates import SkyCoord +import numpy as np +import pandas as pd +from astropy import units as u +import os +import matplotlib.pyplot as plt +from dask.distributed import Client +``` + +```{code-cell} ipython3 +pd.set_option("display.max_colwidth", None) +pd.set_option("display.min_rows", 18) +``` + +## 1. Locate ZTF DR24 HATS Catalogs in the Cloud + +From IRSA's [cloud data access page](https://irsa.ipac.caltech.edu/cloud_access/), we identify the S3 bucket and path prefixes for the ZTF DR24 HATS catalogs: + +```{code-cell} ipython3 +ztf_bucket = "ipac-irsa-ztf" +ztf_lc_hats_prefix = "ztf/enhanced/dr24/lc/hats" # Light curves catalog +ztf_objects_hats_prefix = "ztf/enhanced/dr24/objects/hats" # Objects table +``` + +[s3fs](https://s3fs.readthedocs.io/en/latest/) provides a filesystem-like Python interface for AWS S3 buckets. +First, we create an S3 client: + +```{code-cell} ipython3 +s3 = s3fs.S3FileSystem(anon=True) +``` + +Let's list the contents of the ZTF DR24 lightcurves HATS **collection**: + +```{code-cell} ipython3 +s3.ls(f"{ztf_bucket}/{ztf_lc_hats_prefix}") +``` + +In this collection, you can see collection properties, catalog, index table, and margin cache in order. +You can explore more directories to see how this HATS collection follows the directory structure described in IRSA's documentation on [HATS partitioning and HATS Collections](https://irsa.ipac.caltech.edu/docs/parquet_catalogs/#hats). + +As per the documentation, the Parquet file containing the schema for this catalog is stored in `dataset/_common_metadata`. +Let's save its path for later use (using the catalog name identified from the listing above): + +```{code-cell} ipython3 +ztf_lc_schema_path = "ztf_dr24_lc-hats/dataset/_common_metadata" # ztf_dr24_lc-hats is the catalog name identified above +``` + +Similarly, let's list the ZTF DR24 Objects Table HATS collection: + +```{code-cell} ipython3 +s3.ls(f"{ztf_bucket}/{ztf_objects_hats_prefix}") +``` + +```{code-cell} ipython3 +ztf_objects_schema_path = "ztf_dr24_objects-hats/dataset/_common_metadata" # ztf_dr24_objects-hats is the catalog name identified above +``` + +## 2. Explore the Catalog Schemas + +Before querying the catalogs, let's inspect what columns are available in each. +We read schemas from the `_common_metadata` files, which also contain column metadata such as units and descriptions: + +```{code-cell} ipython3 +def pq_schema_to_df(schema): + """Convert a PyArrow schema to a Pandas DataFrame.""" + return pd.DataFrame( + [ + ( + field.name, + str(field.type), + field.metadata.get(b"unit", b"").decode(), + field.metadata.get(b"description", b"").decode() + ) + for field in schema + ], + columns=["name", "type", "unit", "description"] + ) +``` + +```{code-cell} ipython3 +ztf_lc_schema = pq.read_schema( + f"s3://{ztf_bucket}/{ztf_lc_hats_prefix}/{ztf_lc_schema_path}", + filesystem=s3 +) +ztf_lc_schema_df = pq_schema_to_df(ztf_lc_schema) +ztf_lc_schema_df +``` + +Notice the `lightcurve` column — this is a **nested column** that stores the full photometric time series for each ZTF object. +Each element of `lightcurve` is itself a table with columns including: + +- `hmjd`: Heliocentric-based Modified Julian Date of each observation +- `mag` / `magerr`: Magnitude and its uncertainty +- `clrcoeff`: Linear color coefficient term from photometric calibration +- `catflags`: Photometric/image quality flags encoded as bits (described in the [explanatory supplement](https://irsa.ipac.caltech.edu/data/ZTF/docs/ztf_explanatory_supplement.pdf) section 13.6; set `catflags == 0` to keep only clean epochs) + +```{code-cell} ipython3 +ztf_lc_columns = ["objectid", "objra", "objdec", "filterid", "nepochs", "lightcurve"] +``` + +## 3. Get Light Curves by Object ID + +If you have specific ZTF object IDs, you can retrieve their light curves directly using an index search — no spatial filter needed. +This is the fastest approach for targeted lookups. + +### 3.1 Open the Light Curves Catalog + +We open the ZTF DR24 light curves HATS catalog. No data is read yet — lsdb opens catalogs [lazily](https://docs.lsdb.io/en/latest/tutorials/lazy_operations.html): + +```{code-cell} ipython3 +ztf_lc_catalog = lsdb.open_catalog( + f"s3://{ztf_bucket}/{ztf_lc_hats_prefix}", + columns=ztf_lc_columns +) +ztf_lc_catalog +``` + +### 3.2 Identify the Index Column + +The ZTF DR24 light curves HATS catalog ships with an ancillary index table that enables fast lookups by object ID. +Let's identify which column is indexed: + +```{code-cell} ipython3 +ztf_lc_idx_column = list(ztf_lc_catalog.hc_collection.all_indexes.keys())[0] +print(f"Index column: {ztf_lc_idx_column}") +``` + +### 3.3 Perform an Index Search + +We use the same object IDs from the [ZTF light curve API docs](https://irsa.ipac.caltech.edu/docs/program_interface/ztf_lightcurve_api.html) multi-object example — you can compare results from this tutorial directly with that service. +In your workflow, these IDs might come from a previous query, a catalog crossmatch, or a published source table: + +```{code-cell} ipython3 +object_ids = [686103400034440, 686103400106565] +object_ids +``` + +```{code-cell} ipython3 +ztf_lcs_by_id = ztf_lc_catalog.id_search(values={ztf_lc_idx_column: object_ids}) +ztf_lcs_by_id +``` + +### 3.4 Compute and Inspect the Results + +```{code-cell} ipython3 +ztf_lcs_by_id_df = ztf_lcs_by_id.compute() +ztf_lcs_by_id_df +``` + +```{code-cell} ipython3 +print(f"Found {len(ztf_lcs_by_id_df)} light curves for {len(object_ids)} objects.") +``` + +Each row is one ZTF object. The `lightcurve` column contains a nested DataFrame per object. +Let's inspect the light curve of the first object: + +```{code-cell} ipython3 +ztf_lcs_by_id_df['lightcurve'].iloc[0] +``` + +### 3.5 Plot Light Curves + +```{code-cell} ipython3 +fig, axs = plt.subplots(len(ztf_lcs_by_id_df), 1, + figsize=(10, 4 * len(ztf_lcs_by_id_df)), + constrained_layout=True) + +if len(ztf_lcs_by_id_df) == 1: + axs = [axs] + +for ax, (_, row) in zip(axs, ztf_lcs_by_id_df.iterrows()): + lc = row['lightcurve'].query("catflags == 0") + title = f"ZTF Object {row['objectid']} (RA={row['objra']:.4f}°, Dec={row['objdec']:.4f}°)" + pts = ax.plot(lc['hmjd'], lc['mag'], '.', markersize=4, zorder=3) + ax.errorbar( + lc['hmjd'], lc['mag'], yerr=lc['magerr'], + fmt='none', ecolor=pts[0].get_color(), elinewidth=0.8, alpha=0.3, zorder=2 + ) + ax.set_ylabel("Magnitude") + ax.set_xlabel("HMJD") + ax.invert_yaxis() + ax.set_title(title, fontsize=10) + +fig.suptitle("ZTF DR24 Light Curves — Object ID Search Results", fontsize=13, y=1.02) +plt.show() +``` + +## 4. Get Light Curves by Sky Position + +If you have sky coordinates and want all ZTF sources within a given area, use a cone search. + +### 4.1 Define a Spatial Filter + +We use the same sky position as the [ZTF light curve API docs](https://irsa.ipac.caltech.edu/docs/program_interface/ztf_lightcurve_api.html) positional example: + +```{code-cell} ipython3 +target = SkyCoord(ra=298.0025, dec=29.87147, unit="deg") # same as ZTF light curve API docs positional example +search_radius = 5 * u.arcsec +``` + +Using lsdb, we define a cone [search object](https://docs.lsdb.io/en/latest/tutorials/region_selection.html#4.-The-Search-object) for this region: + +```{code-cell} ipython3 +spatial_filter = lsdb.ConeSearch( + ra=target.ra.deg, + dec=target.dec.deg, + radius_arcsec=search_radius.to(u.arcsec).value +) +``` + +### 4.2 Define Row Filters + +In addition to the spatial filter, we can pre-filter rows using Parquet column statistics. +Here we keep only objects with more than 100 epochs, focusing on well-sampled light curves: + +```{code-cell} ipython3 +row_filters = [["nepochs", ">", 100]] +``` + +### 4.3 Open the Filtered Light Curves Catalog + +We open the catalog with both filters applied. lsdb evaluates this lazily — no data is read yet: + +```{code-cell} ipython3 +ztf_lc_cone = lsdb.open_catalog( + f"s3://{ztf_bucket}/{ztf_lc_hats_prefix}", + search_filter=spatial_filter, + columns=ztf_lc_columns, + filters=row_filters +) +ztf_lc_cone +``` + +Notice that only the partitions overlapping the cone are included, avoiding reads of the full catalog. + +### 4.4 Compute and Inspect the Results + +Now we execute the query by calling `compute()`. The ZTF DR24 LC catalog stores full nested light curves per HATS partition — each partition can be several gigabytes regardless of cone size. We create a Dask client with `memory_limit=None` to avoid per-worker memory caps: + +```{code-cell} ipython3 +def get_nworkers(catalog): + return min(os.cpu_count(), catalog.npartitions + 1) + +with Client(n_workers=get_nworkers(ztf_lc_cone), + threads_per_worker=1, + memory_limit=None # each partition can be several GB; avoid per-worker cap + ) as client: + print(f"You can monitor progress in the Dask dashboard at {client.dashboard_link}") + ztf_lc_cone_df = ztf_lc_cone.compute() +``` + +```{code-cell} ipython3 +ztf_lc_cone_df +``` + +```{code-cell} ipython3 +print(f"Found {len(ztf_lc_cone_df)} ZTF light curves for the search criteria.") +``` + +Each row corresponds to one ZTF object. The `lightcurve` column contains a nested DataFrame per object: + +```{code-cell} ipython3 +ztf_lc_cone_df['lightcurve'].iloc[0] +``` + +## 5. [Optional] Look Up Additional Info from the Objects Table + +```{note} +This section is optional — skip it if you only need the raw light curves from section 4. +``` + +### 5.1 Explore the Objects Table Schema + +The Objects Table contains per-band summary statistics for each ZTF source. +Let's inspect its schema to identify columns of interest: + +```{code-cell} ipython3 +ztf_objects_schema = pq.read_schema( + f"s3://{ztf_bucket}/{ztf_objects_hats_prefix}/{ztf_objects_schema_path}", + filesystem=s3 +) +pq_schema_to_df(ztf_objects_schema) +``` + +We'll select a subset of columns useful for characterizing variable sources: + +```{code-cell} ipython3 +ztf_objects_columns = ['oid', 'ra', 'dec', 'filtercode', 'ngoodobsrel', 'chisq', 'magrms', 'meanmag', 'medianabsdev'] +``` + +### 5.2 Open the Objects Table + +We reuse the same `spatial_filter` from section 4 to retrieve Objects Table entries for the same sky region: + +```{code-cell} ipython3 +ztf_objects_cone = lsdb.open_catalog( + f"s3://{ztf_bucket}/{ztf_objects_hats_prefix}", + search_filter=spatial_filter, + columns=ztf_objects_columns +) +ztf_objects_cone +``` + +### 5.3 Compute and Inspect + +```{code-cell} ipython3 +with Client(n_workers=get_nworkers(ztf_objects_cone), + threads_per_worker=1, + memory_limit=None) as client: + ztf_objects_cone_df = ztf_objects_cone.compute() +``` + +```{code-cell} ipython3 +ztf_objects_cone_df +``` + +### 5.4 Merge Objects Table Info into Light Curves + +We join the Objects Table with the position search light curves on the shared object ID: + +```{code-cell} ipython3 +objects_cols_to_merge = ['oid', 'filtercode', 'ngoodobsrel', 'chisq', 'magrms', 'meanmag', 'medianabsdev'] +combined_df = ztf_lc_cone_df.merge( + ztf_objects_cone_df[objects_cols_to_merge], + left_on='objectid', + right_on='oid', + how='inner' +) +combined_df +``` + +## 6. Plot Most Variable Light Curves from the Position Search + +Using the `chisq` column from the Objects Table, we select the top 3 most variable sources from the position search and plot their light curves annotated with summary statistics: + +```{code-cell} ipython3 +most_variable = combined_df.nlargest(3, 'chisq') + +fig, axs = plt.subplots(len(most_variable), 1, + figsize=(10, 4 * len(most_variable)), + constrained_layout=True) + +if len(most_variable) == 1: + axs = [axs] + +for ax, (_, row) in zip(axs, most_variable.iterrows()): + lc = row['lightcurve'].query("catflags == 0") + title = (f"ZTF Object {row['objectid']} ({row['filtercode']} band)\n" + f"χ²={row['chisq']:.2f}, RMS mag={row['magrms']:.4f}, " + f"mean mag={row['meanmag']:.3f}, N good obs={int(row['ngoodobsrel'])}") + pts = ax.plot(lc['hmjd'], lc['mag'], '.', markersize=4, zorder=3) + ax.errorbar( + lc['hmjd'], lc['mag'], yerr=lc['magerr'], + fmt='none', ecolor=pts[0].get_color(), elinewidth=0.8, alpha=0.3, zorder=2 + ) + ax.set_ylabel("Magnitude") + ax.set_xlabel("HMJD") + ax.invert_yaxis() + ax.set_title(title, fontsize=10) + +fig.suptitle("Most Variable ZTF DR24 Sources from Position Search (annotated with Objects Table data)", fontsize=13, y=1.02) +plt.show() +``` + +## About this notebook + +Updated: 2026-05-26 + +Contact: the [IRSA Helpdesk](https://irsa.ipac.caltech.edu/docs/help_desk.html) with questions or to report problems. From 9192ca336ef2ac60f833f140ab7c3eaf25089b9a Mon Sep 17 00:00:00 2001 From: Jaladh Singhal Date: Wed, 27 May 2026 16:09:41 -0700 Subject: [PATCH 2/7] Fix narrative and some cleanup --- tutorials/ztf/ztf_lightcurves.md | 74 ++++++++++++++++++-------------- 1 file changed, 41 insertions(+), 33 deletions(-) diff --git a/tutorials/ztf/ztf_lightcurves.md b/tutorials/ztf/ztf_lightcurves.md index 19f3e612..47142c9a 100644 --- a/tutorials/ztf/ztf_lightcurves.md +++ b/tutorials/ztf/ztf_lightcurves.md @@ -1,7 +1,8 @@ --- authors: - name: Jaladh Singhal -- name: IRSA Data Science Team +- name: Troy Raen +- name: "Brigitta Sip\u0151cz" jupytext: text_representation: extension: .md @@ -14,7 +15,7 @@ kernelspec: name: python3 --- -(ztf-lightcurves-lsdb)= +(ztf-lightcurves)= # Access ZTF DR24 Light Curves from HATS Catalog +++ @@ -25,9 +26,9 @@ By the end of this tutorial, you will learn how to: - Open ZTF DR24 HATS catalogs for light curves and the Objects Table using `lsdb`. - Retrieve light curves for specific sources by ZTF object IDs using an index search. -- Retrieve light curves for sources in a sky region using a cone search on RA and Dec. +- Retrieve light curves for sources in a sky region using a cone search. - Cross-reference the Objects Table to enrich cone search results with per-source variability statistics. -- Plot ZTF light curves filtered by variability. +- Plot ZTF light curves (filtered by variability statistics). +++ @@ -38,7 +39,7 @@ The ZTF DR24 enhanced data products at IRSA include two [HATS](https://irsa.ipac - **Lightcurves catalog**: one row per ZTF object, with a nested column storing the full photometry time series — timestamps, magnitudes, uncertainties, and quality flags. - **Objects Table**: one row per ZTF object per band, with collapsed light curve metrics such as magnitude RMS, chi-squared variability statistic, number of good observations, and mean magnitude. -These HATS catalogs offer a scalable, cloud-native alternative to the ZTF light curve service, enabling efficient access especially when the service is overloaded. +These HATS catalogs offer a scalable, cloud-native alternative to the [ZTF light curve service](https://irsa.ipac.caltech.edu/docs/program_interface/ztf_lightcurve_api.html), enabling efficient access especially when the service is overloaded. The [lsdb](https://docs.lsdb.io/en/latest/index.html) Python library provides a convenient interface for working with HATS catalogs, including spatial queries and object-ID-based lookups. This tutorial covers two common entry points for accessing ZTF light curves: @@ -56,7 +57,7 @@ For more context on ZTF DR24 data products, refer to the [ZTF DR24 release notes ```{code-cell} ipython3 # Uncomment the next line to install dependencies if needed. -# !pip install s3fs "lsdb>=0.6.6,<0.8" pyarrow pandas numpy astropy matplotlib +# !pip install s3fs "lsdb>=0.6.6,<0.8" pyarrow pandas astropy matplotlib ``` ```{code-cell} ipython3 @@ -64,7 +65,6 @@ import s3fs import lsdb import pyarrow.parquet as pq from astropy.coordinates import SkyCoord -import numpy as np import pandas as pd from astropy import units as u import os @@ -120,7 +120,7 @@ s3.ls(f"{ztf_bucket}/{ztf_objects_hats_prefix}") ztf_objects_schema_path = "ztf_dr24_objects-hats/dataset/_common_metadata" # ztf_dr24_objects-hats is the catalog name identified above ``` -## 2. Explore the Catalog Schemas +## 2. Explore the Catalog Schema Before querying the catalogs, let's inspect what columns are available in each. We read schemas from the `_common_metadata` files, which also contain column metadata such as units and descriptions: @@ -152,12 +152,8 @@ ztf_lc_schema_df ``` Notice the `lightcurve` column — this is a **nested column** that stores the full photometric time series for each ZTF object. -Each element of `lightcurve` is itself a table with columns including: - -- `hmjd`: Heliocentric-based Modified Julian Date of each observation -- `mag` / `magerr`: Magnitude and its uncertainty -- `clrcoeff`: Linear color coefficient term from photometric calibration -- `catflags`: Photometric/image quality flags encoded as bits (described in the [explanatory supplement](https://irsa.ipac.caltech.edu/data/ZTF/docs/ztf_explanatory_supplement.pdf) section 13.6; set `catflags == 0` to keep only clean epochs) +Each element of `lightcurve` is itself a table with columns including `hmjd`, `mag`,`magerr`, `clrcoeff` and `catflags`. +We save the list of columns interesting to us for later use when opening the catalog with `lsdb`: ```{code-cell} ipython3 ztf_lc_columns = ["objectid", "objra", "objdec", "filterid", "nepochs", "lightcurve"] @@ -197,7 +193,6 @@ In your workflow, these IDs might come from a previous query, a catalog crossmat ```{code-cell} ipython3 object_ids = [686103400034440, 686103400106565] -object_ids ``` ```{code-cell} ipython3 @@ -207,6 +202,8 @@ ztf_lcs_by_id ### 3.4 Compute and Inspect the Results +Now we execute the query we planned in previous steps by calling `compute()`. This is where the data is read into memory as a Pandas DataFrame. + ```{code-cell} ipython3 ztf_lcs_by_id_df = ztf_lcs_by_id.compute() ztf_lcs_by_id_df @@ -224,6 +221,7 @@ ztf_lcs_by_id_df['lightcurve'].iloc[0] ``` ### 3.5 Plot Light Curves +When plotting the light curves, it's important to note that we apply `catflags == 0` filter to keep only clean epochs (as described in the [explanatory supplement](https://irsa.ipac.caltech.edu/data/ZTF/docs/ztf_explanatory_supplement.pdf) section 13.6). ```{code-cell} ipython3 fig, axs = plt.subplots(len(ztf_lcs_by_id_df), 1, @@ -234,7 +232,7 @@ if len(ztf_lcs_by_id_df) == 1: axs = [axs] for ax, (_, row) in zip(axs, ztf_lcs_by_id_df.iterrows()): - lc = row['lightcurve'].query("catflags == 0") + lc = row['lightcurve'].query("catflags == 0") # to keep only clean epochs title = f"ZTF Object {row['objectid']} (RA={row['objra']:.4f}°, Dec={row['objdec']:.4f}°)" pts = ax.plot(lc['hmjd'], lc['mag'], '.', markersize=4, zorder=3) ax.errorbar( @@ -256,10 +254,10 @@ If you have sky coordinates and want all ZTF sources within a given area, use a ### 4.1 Define a Spatial Filter -We use the same sky position as the [ZTF light curve API docs](https://irsa.ipac.caltech.edu/docs/program_interface/ztf_lightcurve_api.html) positional example: +We use the same sky position as the [ZTF light curve API docs](https://irsa.ipac.caltech.edu/docs/program_interface/ztf_lightcurve_api.html) positional example but you can specify any coordinates and search radius you want: ```{code-cell} ipython3 -target = SkyCoord(ra=298.0025, dec=29.87147, unit="deg") # same as ZTF light curve API docs positional example +target = SkyCoord(ra=298.0025, dec=29.87147, unit="deg") search_radius = 5 * u.arcsec ``` @@ -276,10 +274,13 @@ spatial_filter = lsdb.ConeSearch( ### 4.2 Define Row Filters In addition to the spatial filter, we can pre-filter rows using Parquet column statistics. -Here we keep only objects with more than 100 epochs, focusing on well-sampled light curves: +Here we keep only objects with more than 50 epochs, focusing on well-sampled light curves: ```{code-cell} ipython3 -row_filters = [["nepochs", ">", 100]] +row_filters = [ + ["nepochs", ">", 50], + # additional filters can be added here if desired + ] ``` ### 4.3 Open the Filtered Light Curves Catalog @@ -300,7 +301,7 @@ Notice that only the partitions overlapping the cone are included, avoiding read ### 4.4 Compute and Inspect the Results -Now we execute the query by calling `compute()`. The ZTF DR24 LC catalog stores full nested light curves per HATS partition — each partition can be several gigabytes regardless of cone size. We create a Dask client with `memory_limit=None` to avoid per-worker memory caps: +Now we execute the query by calling `compute()`. The ZTF DR24 LC catalog stores full nested light curves per HATS partition. We wrap the compute call in a Dask client to parallelize if multiple partitions are involved, and to monitor progress in the Dask dashboard. ```{code-cell} ipython3 def get_nworkers(catalog): @@ -315,11 +316,11 @@ with Client(n_workers=get_nworkers(ztf_lc_cone), ``` ```{code-cell} ipython3 -ztf_lc_cone_df +print(f"Found {len(ztf_lc_cone_df)} ZTF light curves for the search criteria.") ``` ```{code-cell} ipython3 -print(f"Found {len(ztf_lc_cone_df)} ZTF light curves for the search criteria.") +ztf_lc_cone_df.head(5) ``` Each row corresponds to one ZTF object. The `lightcurve` column contains a nested DataFrame per object: @@ -331,7 +332,7 @@ ztf_lc_cone_df['lightcurve'].iloc[0] ## 5. [Optional] Look Up Additional Info from the Objects Table ```{note} -This section is optional — skip it if you only need the raw light curves from section 4. +This section is optional — skip it if you don't need additional information beyond the raw light curves from section 4. ``` ### 5.1 Explore the Objects Table Schema @@ -347,7 +348,7 @@ ztf_objects_schema = pq.read_schema( pq_schema_to_df(ztf_objects_schema) ``` -We'll select a subset of columns useful for characterizing variable sources: +We'll select a subset of columns useful for characterizing and annotating variable sources: ```{code-cell} ipython3 ztf_objects_columns = ['oid', 'ra', 'dec', 'filtercode', 'ngoodobsrel', 'chisq', 'magrms', 'meanmag', 'medianabsdev'] @@ -355,7 +356,7 @@ ztf_objects_columns = ['oid', 'ra', 'dec', 'filtercode', 'ngoodobsrel', 'chisq', ### 5.2 Open the Objects Table -We reuse the same `spatial_filter` from section 4 to retrieve Objects Table entries for the same sky region: +We reuse the same `spatial_filter` from section 4 to retrieve Objects Table entries for the same sky region. This is important for ensuring we only retrieve rows relevant to the light curves we got from the position search. ```{code-cell} ipython3 ztf_objects_cone = lsdb.open_catalog( @@ -381,12 +382,11 @@ ztf_objects_cone_df ### 5.4 Merge Objects Table Info into Light Curves -We join the Objects Table with the position search light curves on the shared object ID: +We merge the Objects Table with the position search light curves on the shared object ID via an inner join: ```{code-cell} ipython3 -objects_cols_to_merge = ['oid', 'filtercode', 'ngoodobsrel', 'chisq', 'magrms', 'meanmag', 'medianabsdev'] combined_df = ztf_lc_cone_df.merge( - ztf_objects_cone_df[objects_cols_to_merge], + ztf_objects_cone_df, left_on='objectid', right_on='oid', how='inner' @@ -396,11 +396,17 @@ combined_df ## 6. Plot Most Variable Light Curves from the Position Search -Using the `chisq` column from the Objects Table, we select the top 3 most variable sources from the position search and plot their light curves annotated with summary statistics: +Using the `chisq` column, we rudimentarily select the top 3 most variable sources from the position search results combined with objects table. ```{code-cell} ipython3 +# most_variable = ztf_lc_cone_df # uncomment if you skipped section 5, and comment the line below most_variable = combined_df.nlargest(3, 'chisq') +most_variable +``` + +Then we plot their light curves annotated with summary statistics: +```{code-cell} ipython3 fig, axs = plt.subplots(len(most_variable), 1, figsize=(10, 4 * len(most_variable)), constrained_layout=True) @@ -409,7 +415,7 @@ if len(most_variable) == 1: axs = [axs] for ax, (_, row) in zip(axs, most_variable.iterrows()): - lc = row['lightcurve'].query("catflags == 0") + lc = row['lightcurve'].query("catflags == 0") # to keep only clean epochs title = (f"ZTF Object {row['objectid']} ({row['filtercode']} band)\n" f"χ²={row['chisq']:.2f}, RMS mag={row['magrms']:.4f}, " f"mean mag={row['meanmag']:.3f}, N good obs={int(row['ngoodobsrel'])}") @@ -429,6 +435,8 @@ plt.show() ## About this notebook -Updated: 2026-05-26 +**Updated:** 2026-05-27 + +**Contact:** the [IRSA Helpdesk](https://irsa.ipac.caltech.edu/docs/help_desk.html) with questions or to report problems. -Contact: the [IRSA Helpdesk](https://irsa.ipac.caltech.edu/docs/help_desk.html) with questions or to report problems. +**AI Acknowledgement:** This tutorial was developed with the assistance of AI tools. From bfad5882cdf342598ec1abfe87129139d4b6cdb2 Mon Sep 17 00:00:00 2001 From: Jaladh Singhal Date: Wed, 27 May 2026 16:26:19 -0700 Subject: [PATCH 3/7] Add ztf notebook to TOC --- toc.yml | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/toc.yml b/toc.yml index e70ab95c..804533be 100644 --- a/toc.yml +++ b/toc.yml @@ -60,6 +60,10 @@ project: - title: Spitzer children: - file: tutorials/spitzer/plot_Spitzer_IRS_spectra.md + - title: ZTF + children: + - title: DR24 Light Curves (HATS) + file: tutorials/ztf/ztf_lightcurves.md - title: Simulated Data file: tutorials/simulated-data/simulated.md children: From 716de78371e5ac7a36e9bd2a7baaca7cc5d405ee Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Brigitta=20Sip=C5=91cz?= Date: Wed, 27 May 2026 20:39:27 -0700 Subject: [PATCH 4/7] Adding new lsdb hats notebook to the ignore list for oldestdeps test --- tox.ini | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tox.ini b/tox.ini index 53d20428..474cad22 100644 --- a/tox.ini +++ b/tox.ini @@ -50,7 +50,7 @@ install_command = # lsdb has tighter minimum dependencies, deal with it here for now, long term handle it from the notebook metadata # We need to do this here before the dependencies are installed to work around deps conflicts # SED fitting notebook uses numpy 2.0+ functionality, ignore it from the oldest job - oldestdeps: bash -c "echo tutorials/techniques-and-tools/irsa-hats-with-lsdb >> ignore_testing; echo tutorials/simulated-data/OpenUniverse2024/openuniverse2024_SED_fit.md >> ignore_testing; sed -i -e 's|lsdb|\#lsdb|g' tutorial_requirements.txt && python -I -m pip install $@" + oldestdeps: bash -c "echo tutorials/techniques-and-tools/irsa-hats-with-lsdb >> ignore_testing; echo tutorials/ztf/ztf_lightcurves >> ignore_testing; echo tutorials/simulated-data/OpenUniverse2024/openuniverse2024_SED_fit.md >> ignore_testing; sed -i -e 's|lsdb|\#lsdb|g' tutorial_requirements.txt && python -I -m pip install $@" # Adding back the default install command; commented out version for clear cases, more complex one if we need to add more conditional skips # !oldestdeps: python -I -m pip install {opts} {packages} From 386c8ab6cf1fe7e4627c2864c7014b71011d5eae Mon Sep 17 00:00:00 2001 From: Jaladh Singhal Date: Tue, 2 Jun 2026 16:03:49 -0700 Subject: [PATCH 5/7] Apply suggestions from @troyraen review --- tutorials/ztf/ztf_lightcurves.md | 25 +++++++++++++++---------- 1 file changed, 15 insertions(+), 10 deletions(-) diff --git a/tutorials/ztf/ztf_lightcurves.md b/tutorials/ztf/ztf_lightcurves.md index 47142c9a..852eebd7 100644 --- a/tutorials/ztf/ztf_lightcurves.md +++ b/tutorials/ztf/ztf_lightcurves.md @@ -25,9 +25,9 @@ kernelspec: By the end of this tutorial, you will learn how to: - Open ZTF DR24 HATS catalogs for light curves and the Objects Table using `lsdb`. -- Retrieve light curves for specific sources by ZTF object IDs using an index search. -- Retrieve light curves for sources in a sky region using a cone search. -- Cross-reference the Objects Table to enrich cone search results with per-source variability statistics. +- Retrieve light curves for specific objects by ZTF object IDs using an index search. +- Retrieve light curves for objects in a sky region using a cone search. +- Cross-reference the Objects Table to enrich cone search results with per-object variability statistics. - Plot ZTF light curves (filtered by variability statistics). +++ @@ -37,7 +37,7 @@ By the end of this tutorial, you will learn how to: The ZTF DR24 enhanced data products at IRSA include two [HATS](https://irsa.ipac.caltech.edu/docs/parquet_catalogs/#hats) (Hierarchical Adaptive Tiling Scheme) catalogs hosted on AWS S3: - **Lightcurves catalog**: one row per ZTF object, with a nested column storing the full photometry time series — timestamps, magnitudes, uncertainties, and quality flags. -- **Objects Table**: one row per ZTF object per band, with collapsed light curve metrics such as magnitude RMS, chi-squared variability statistic, number of good observations, and mean magnitude. +- **Objects Table**: one row per ZTF object, with collapsed light curve metrics such as magnitude RMS, chi-squared variability statistic, number of good observations, and mean magnitude. These HATS catalogs offer a scalable, cloud-native alternative to the [ZTF light curve service](https://irsa.ipac.caltech.edu/docs/program_interface/ztf_lightcurve_api.html), enabling efficient access especially when the service is overloaded. The [lsdb](https://docs.lsdb.io/en/latest/index.html) Python library provides a convenient interface for working with HATS catalogs, including spatial queries and object-ID-based lookups. @@ -83,8 +83,8 @@ From IRSA's [cloud data access page](https://irsa.ipac.caltech.edu/cloud_access/ ```{code-cell} ipython3 ztf_bucket = "ipac-irsa-ztf" -ztf_lc_hats_prefix = "ztf/enhanced/dr24/lc/hats" # Light curves catalog -ztf_objects_hats_prefix = "ztf/enhanced/dr24/objects/hats" # Objects table +ztf_lc_hats_prefix = "ztf/enhanced/dr24/lc/hats" # Lightcurves catalog +ztf_objects_hats_prefix = "ztf/enhanced/dr24/objects/hats" # Objects Table ``` [s3fs](https://s3fs.readthedocs.io/en/latest/) provides a filesystem-like Python interface for AWS S3 buckets. @@ -94,7 +94,7 @@ First, we create an S3 client: s3 = s3fs.S3FileSystem(anon=True) ``` -Let's list the contents of the ZTF DR24 lightcurves HATS **collection**: +Let's list the contents of the ZTF DR24 Lightcurves HATS **Collection**: ```{code-cell} ipython3 s3.ls(f"{ztf_bucket}/{ztf_lc_hats_prefix}") @@ -151,7 +151,7 @@ ztf_lc_schema_df = pq_schema_to_df(ztf_lc_schema) ztf_lc_schema_df ``` -Notice the `lightcurve` column — this is a **nested column** that stores the full photometric time series for each ZTF object. +Notice the `lightcurve` column — this is a **[nested](https://nested-pandas.readthedocs.io/) column** that stores the full photometric time series for each ZTF object. Each element of `lightcurve` is itself a table with columns including `hmjd`, `mag`,`magerr`, `clrcoeff` and `catflags`. We save the list of columns interesting to us for later use when opening the catalog with `lsdb`: @@ -337,7 +337,7 @@ This section is optional — skip it if you don't need additional information be ### 5.1 Explore the Objects Table Schema -The Objects Table contains per-band summary statistics for each ZTF source. +The Objects Table contains summary statistics for each ZTF object. Let's inspect its schema to identify columns of interest: ```{code-cell} ipython3 @@ -348,6 +348,11 @@ ztf_objects_schema = pq.read_schema( pq_schema_to_df(ztf_objects_schema) ``` +```{important} `objectid` == `oid` +ZTF's object ID column is named `objectid` in Lightcurves and `oid` in Objects Table. +Despite this difference, the two columns are the same and can be used to join the catalogs. +``` + We'll select a subset of columns useful for characterizing and annotating variable sources: ```{code-cell} ipython3 @@ -416,7 +421,7 @@ if len(most_variable) == 1: for ax, (_, row) in zip(axs, most_variable.iterrows()): lc = row['lightcurve'].query("catflags == 0") # to keep only clean epochs - title = (f"ZTF Object {row['objectid']} ({row['filtercode']} band)\n" + title = (f"ZTF Object {row['objectid']} ({row['filtercode']} filter)\n" f"χ²={row['chisq']:.2f}, RMS mag={row['magrms']:.4f}, " f"mean mag={row['meanmag']:.3f}, N good obs={int(row['ngoodobsrel'])}") pts = ax.plot(lc['hmjd'], lc['mag'], '.', markersize=4, zorder=3) From 9d61f1ae9c551db4cc931153374d6a32c21d02e1 Mon Sep 17 00:00:00 2001 From: Jaladh Singhal Date: Tue, 2 Jun 2026 16:18:18 -0700 Subject: [PATCH 6/7] Fix languaging of objects and proper nouns --- tutorials/ztf/ztf_lightcurves.md | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/tutorials/ztf/ztf_lightcurves.md b/tutorials/ztf/ztf_lightcurves.md index 852eebd7..4d02cf83 100644 --- a/tutorials/ztf/ztf_lightcurves.md +++ b/tutorials/ztf/ztf_lightcurves.md @@ -44,10 +44,10 @@ The [lsdb](https://docs.lsdb.io/en/latest/index.html) Python library provides a This tutorial covers two common entry points for accessing ZTF light curves: -1. **Object IDs**: you have specific ZTF object IDs — from a previous query, a catalog crossmatch, or a published source list — and want their light curves directly. -2. **RA/Dec**: you have sky coordinates and want all ZTF sources within a given radius. +1. **Object IDs**: you have specific ZTF object IDs — from a previous query, a catalog crossmatch, or a published object list — and want their light curves directly. +2. **RA/Dec**: you have sky coordinates and want all ZTF objects within a given radius. -Both approaches are demonstrated below. An optional section then shows how to join the position search results with the Objects Table to select and plot the most variable sources using robust variability statistics. +Both approaches are demonstrated below. An optional section then shows how to join the position search results with the Objects Table to select and plot the most variable objects using robust variability statistics. For more context on ZTF DR24 data products, refer to the [ZTF DR24 release notes](https://irsa.ipac.caltech.edu/data/ZTF/docs/releases/ztf_release_notes_latest) and [explanatory supplement](https://irsa.ipac.caltech.edu/data/ZTF/docs/ztf_explanatory_supplement.pdf) at IRSA. @@ -101,7 +101,7 @@ s3.ls(f"{ztf_bucket}/{ztf_lc_hats_prefix}") ``` In this collection, you can see collection properties, catalog, index table, and margin cache in order. -You can explore more directories to see how this HATS collection follows the directory structure described in IRSA's documentation on [HATS partitioning and HATS Collections](https://irsa.ipac.caltech.edu/docs/parquet_catalogs/#hats). +You can explore more directories to see how this HATS Collection follows the directory structure described in IRSA's documentation on [HATS partitioning and HATS Collections](https://irsa.ipac.caltech.edu/docs/parquet_catalogs/#hats). As per the documentation, the Parquet file containing the schema for this catalog is stored in `dataset/_common_metadata`. Let's save its path for later use (using the catalog name identified from the listing above): @@ -110,7 +110,7 @@ Let's save its path for later use (using the catalog name identified from the li ztf_lc_schema_path = "ztf_dr24_lc-hats/dataset/_common_metadata" # ztf_dr24_lc-hats is the catalog name identified above ``` -Similarly, let's list the ZTF DR24 Objects Table HATS collection: +Similarly, let's list the ZTF DR24 Objects Table HATS Collection: ```{code-cell} ipython3 s3.ls(f"{ztf_bucket}/{ztf_objects_hats_prefix}") @@ -164,9 +164,9 @@ ztf_lc_columns = ["objectid", "objra", "objdec", "filterid", "nepochs", "lightcu If you have specific ZTF object IDs, you can retrieve their light curves directly using an index search — no spatial filter needed. This is the fastest approach for targeted lookups. -### 3.1 Open the Light Curves Catalog +### 3.1 Open the Lightcurves Catalog -We open the ZTF DR24 light curves HATS catalog. No data is read yet — lsdb opens catalogs [lazily](https://docs.lsdb.io/en/latest/tutorials/lazy_operations.html): +We open the ZTF DR24 Lightcurves HATS catalog. No data is read yet — lsdb opens catalogs [lazily](https://docs.lsdb.io/en/latest/tutorials/lazy_operations.html): ```{code-cell} ipython3 ztf_lc_catalog = lsdb.open_catalog( @@ -178,7 +178,7 @@ ztf_lc_catalog ### 3.2 Identify the Index Column -The ZTF DR24 light curves HATS catalog ships with an ancillary index table that enables fast lookups by object ID. +The ZTF DR24 Lightcurves HATS catalog ships with an ancillary index table that enables fast lookups by object ID. Let's identify which column is indexed: ```{code-cell} ipython3 @@ -189,7 +189,7 @@ print(f"Index column: {ztf_lc_idx_column}") ### 3.3 Perform an Index Search We use the same object IDs from the [ZTF light curve API docs](https://irsa.ipac.caltech.edu/docs/program_interface/ztf_lightcurve_api.html) multi-object example — you can compare results from this tutorial directly with that service. -In your workflow, these IDs might come from a previous query, a catalog crossmatch, or a published source table: +In your workflow, these IDs might come from a previous query, a catalog crossmatch, or a published object catalog: ```{code-cell} ipython3 object_ids = [686103400034440, 686103400106565] @@ -250,7 +250,7 @@ plt.show() ## 4. Get Light Curves by Sky Position -If you have sky coordinates and want all ZTF sources within a given area, use a cone search. +If you have sky coordinates and want all ZTF objects within a given area, use a cone search. ### 4.1 Define a Spatial Filter @@ -283,7 +283,7 @@ row_filters = [ ] ``` -### 4.3 Open the Filtered Light Curves Catalog +### 4.3 Open the Filtered Lightcurves Catalog We open the catalog with both filters applied. lsdb evaluates this lazily — no data is read yet: @@ -353,7 +353,7 @@ ZTF's object ID column is named `objectid` in Lightcurves and `oid` in Objects T Despite this difference, the two columns are the same and can be used to join the catalogs. ``` -We'll select a subset of columns useful for characterizing and annotating variable sources: +We'll select a subset of columns useful for characterizing and annotating variable objects: ```{code-cell} ipython3 ztf_objects_columns = ['oid', 'ra', 'dec', 'filtercode', 'ngoodobsrel', 'chisq', 'magrms', 'meanmag', 'medianabsdev'] @@ -401,7 +401,7 @@ combined_df ## 6. Plot Most Variable Light Curves from the Position Search -Using the `chisq` column, we rudimentarily select the top 3 most variable sources from the position search results combined with objects table. +Using the `chisq` column, we rudimentarily select the top 3 most variable objects from the position search results combined with the Objects Table. ```{code-cell} ipython3 # most_variable = ztf_lc_cone_df # uncomment if you skipped section 5, and comment the line below @@ -434,7 +434,7 @@ for ax, (_, row) in zip(axs, most_variable.iterrows()): ax.invert_yaxis() ax.set_title(title, fontsize=10) -fig.suptitle("Most Variable ZTF DR24 Sources from Position Search (annotated with Objects Table data)", fontsize=13, y=1.02) +fig.suptitle("Most Variable ZTF DR24 Objects from Position Search (annotated with Objects Table data)", fontsize=13, y=1.02) plt.show() ``` From dc1aa6bbb1058542f92a00b76cde540bc80cbb3a Mon Sep 17 00:00:00 2001 From: Jaladh Singhal Date: Tue, 2 Jun 2026 18:18:22 -0700 Subject: [PATCH 7/7] Improve dask client usage and refine filtering criteria --- tutorials/ztf/ztf_lightcurves.md | 46 ++++++++++++++++++++++---------- 1 file changed, 32 insertions(+), 14 deletions(-) diff --git a/tutorials/ztf/ztf_lightcurves.md b/tutorials/ztf/ztf_lightcurves.md index 4d02cf83..e7c32c02 100644 --- a/tutorials/ztf/ztf_lightcurves.md +++ b/tutorials/ztf/ztf_lightcurves.md @@ -202,10 +202,20 @@ ztf_lcs_by_id ### 3.4 Compute and Inspect the Results -Now we execute the query we planned in previous steps by calling `compute()`. This is where the data is read into memory as a Pandas DataFrame. +Now we execute the query by calling `compute()`, which reads the data into memory as a Pandas DataFrame. We use a Dask client to parallelize across partitions, manage +memory, and monitor progress in the Dask dashboard. ```{code-cell} ipython3 -ztf_lcs_by_id_df = ztf_lcs_by_id.compute() +def get_nworkers(catalog): + return min(os.cpu_count(), catalog.npartitions + 1) + +with Client(n_workers=get_nworkers(ztf_lcs_by_id), + threads_per_worker=1, + memory_limit=None # each partition can be several GB; avoid per-worker cap + ) as client: + print(f"You can monitor progress in the Dask dashboard at {client.dashboard_link}") + ztf_lcs_by_id_df = ztf_lcs_by_id.compute() + ztf_lcs_by_id_df ``` @@ -252,13 +262,23 @@ plt.show() If you have sky coordinates and want all ZTF objects within a given area, use a cone search. +```{important} ZTF objects are defined per (filter, field, quadrant) +ZTF objects (i.e., unique object IDs) are defined _per_ (filter, field, quadrant). +This means that observations of a single _astrophysical_ object are usually spread out amongst several different _ZTF_ objects. + +At minimum, a given astrophysical object will be represented by up to 3 ZTF objects, one per filter (g, r, and i). +The per-filter observations may themselves be separated into additional ZTF objects if the astrophysical object lies near the boundary of a ZTF field and/or quadrant. + +ZTF's pixel scale is 1"/pixel (see [ZTF Technical Specifications](https://www.ptf.caltech.edu/page/ztf_technical)), so combining all ZTF objects within a 1" cone search may be reasonable for a given astrophysical object. +``` + ### 4.1 Define a Spatial Filter We use the same sky position as the [ZTF light curve API docs](https://irsa.ipac.caltech.edu/docs/program_interface/ztf_lightcurve_api.html) positional example but you can specify any coordinates and search radius you want: ```{code-cell} ipython3 target = SkyCoord(ra=298.0025, dec=29.87147, unit="deg") -search_radius = 5 * u.arcsec +search_radius = 1 * u.arcsec # to keep the runtime minimal for this tutorial ``` Using lsdb, we define a cone [search object](https://docs.lsdb.io/en/latest/tutorials/region_selection.html#4.-The-Search-object) for this region: @@ -274,11 +294,11 @@ spatial_filter = lsdb.ConeSearch( ### 4.2 Define Row Filters In addition to the spatial filter, we can pre-filter rows using Parquet column statistics. -Here we keep only objects with more than 50 epochs, focusing on well-sampled light curves: +Here we keep only objects with more than 25 epochs, focusing on well-sampled light curves: ```{code-cell} ipython3 row_filters = [ - ["nepochs", ">", 50], + ["nepochs", ">", 25], # additional filters can be added here if desired ] ``` @@ -301,12 +321,10 @@ Notice that only the partitions overlapping the cone are included, avoiding read ### 4.4 Compute and Inspect the Results -Now we execute the query by calling `compute()`. The ZTF DR24 LC catalog stores full nested light curves per HATS partition. We wrap the compute call in a Dask client to parallelize if multiple partitions are involved, and to monitor progress in the Dask dashboard. +Now we execute the query by calling `compute()`, which reads the data into memory as a Pandas DataFrame. We use a Dask client to parallelize across partitions, manage +memory, and monitor progress in the Dask dashboard. ```{code-cell} ipython3 -def get_nworkers(catalog): - return min(os.cpu_count(), catalog.npartitions + 1) - with Client(n_workers=get_nworkers(ztf_lc_cone), threads_per_worker=1, memory_limit=None # each partition can be several GB; avoid per-worker cap @@ -348,7 +366,7 @@ ztf_objects_schema = pq.read_schema( pq_schema_to_df(ztf_objects_schema) ``` -```{important} `objectid` == `oid` +```{important} objectid == oid ZTF's object ID column is named `objectid` in Lightcurves and `oid` in Objects Table. Despite this difference, the two columns are the same and can be used to join the catalogs. ``` @@ -401,11 +419,11 @@ combined_df ## 6. Plot Most Variable Light Curves from the Position Search -Using the `chisq` column, we rudimentarily select the top 3 most variable objects from the position search results combined with the Objects Table. +Using the `chisq` column, we rudimentarily select the top 2 most variable objects from the position search results combined with the Objects Table. ```{code-cell} ipython3 -# most_variable = ztf_lc_cone_df # uncomment if you skipped section 5, and comment the line below -most_variable = combined_df.nlargest(3, 'chisq') +# most_variable = ztf_lc_cone_df.iloc[:2] # uncomment if you skipped section 5, and comment the line below +most_variable = combined_df.nlargest(2, 'chisq') most_variable ``` @@ -440,7 +458,7 @@ plt.show() ## About this notebook -**Updated:** 2026-05-27 +**Updated:** 2026-06-02 **Contact:** the [IRSA Helpdesk](https://irsa.ipac.caltech.edu/docs/help_desk.html) with questions or to report problems.