Create tutorial to get ZTF light curves from HATS catalog by jaladh-singhal · Pull Request #325 · Caltech-IPAC/irsa-tutorials

jaladh-singhal · 2026-05-27T23:28:04Z

jaladh-singhal · 2026-05-27T23:36:04Z

+
+with Client(n_workers=get_nworkers(ztf_lc_cone),
+            threads_per_worker=1,
+            memory_limit=None  # each partition can be several GB; avoid per-worker cap


It was running out of memory locally without this. Let's see how it performs on CI.

jaladh-singhal · 2026-05-27T23:43:50Z

+    right_on='oid',
+    how='inner'
+)
+combined_df


Just FYI, I tried 3 more approaches before settling on this one (which takes takes 2m±5s both compute calls combined):

ztf_objects_cone.join(ztf_lc_cone, ...).compute() takes 2m±5s

ztf_objects_cone.merge(ztf_lc_cone, ...).compute() takes 2m±5s

ztf_objects_cone.compute(); ztf_lc_cone.id_search(values={'objectid': list(ztf_objects_cone_df.oid)}).compute() took 2m50s

I kept this approach because of time as well as to maintain the narrative of keeping objects table search optional.

bsipocz · 2026-05-28T03:40:20Z

I've pushed a commit directly that fixes the oldestdeps job failure, review is to come separately

bsipocz · 2026-05-28T03:43:24Z

I'm not sure what goes on in circleCI, we may actually hit that memory limit even though the graph doesn;t show it (but the resolution of the graph is pretty bad, so it's still my prime suspect for the reason for the failure) -- keep an eye on the GHA buildhtml job instead for now.

jaladh-singhal · 2026-05-28T21:54:50Z

@bsipocz buildhtml job is getting skipped (I tried re-triggering it)

troyraen

Thanks @jaladh-singhal! I'm requesting changes that are important but small. The meat of this is great. Thanks for putting it together so fast.

I ran this on Fornax and found that we can reduce memory usage to a max of <8G by using a dask client for the index search (in addition to the others that you already noticed). Hopefully that will be enough for it to run in CI. 🤞

I flagged two things that often confuse users (how objects are defined, and how the object ID column is named) and suggest we add admonitions for those.

There's two languaging things I think we should change. I commented on only the first instance of each of these, so look for other instances throughout the notebook.

I think "object" or "target" will be more clear to ZTF users than "source". "source" means different things to different people throughout astronomy, so you're usage isn't wrong per se, but in my experience with time-domain use cases and surveys like ZTF it almost always means a single observation (ie, one point in the light curve).
"Objects Table", "Lightcurves", and "HATS Collection" are proper nouns so use those spellings and capitalizations consistently. (In case it's confusing, "Lightcurves" is the name of the catalog, while a "light curve" is a time series of data points for a given object (and can also be plural). It's usually obvious which is meant, so that spelling should be used. But there are cases where a sentence/phrase is equally correct either way, so then you can just pick one.
...And then there's also the column name, which is spelled "lightcurve" 😅.)

troyraen · 2026-06-01T16:32:02Z

+
+```{code-cell} ipython3
+# Uncomment the next line to install dependencies if needed.
+# !pip install s3fs "lsdb>=0.6.6,<0.8" pyarrow pandas astropy matplotlib


Is lsdb<0.8 just for our own CI and due to other notebooks? I wonder if/how we can handle that without implying to end users that this particular notebook requires lsdb<0.8.

troyraen · 2026-06-01T16:33:35Z

+ztf_lc_hats_prefix = "ztf/enhanced/dr24/lc/hats" # Light curves catalog
+ztf_objects_hats_prefix = "ztf/enhanced/dr24/objects/hats" # Objects table


Suggested change

ztf_lc_hats_prefix = "ztf/enhanced/dr24/lc/hats" # Light curves catalog

ztf_objects_hats_prefix = "ztf/enhanced/dr24/objects/hats" # Objects table

ztf_lc_hats_prefix = "ztf/enhanced/dr24/lc/hats" # Lightcurves catalog

ztf_objects_hats_prefix = "ztf/enhanced/dr24/objects/hats" # Objects Table

Use the proper nouns. (ZTF named these products "Lightcurves" and "Objects Table".)

troyraen · 2026-06-01T16:36:20Z

+ztf_lc_schema_df
+```
+
+Notice the `lightcurve` column — this is a **nested column** that stores the full photometric time series for each ZTF object.


Suggested change

Notice the `lightcurve` column — this is a **nested column** that stores the full photometric time series for each ZTF object.

Notice the `lightcurve` column — this is a **[nested](https://nested-pandas.readthedocs.io/) column** that stores the full photometric time series for each ZTF object.

Maybe link here. I find myself needing to refer to those docs to figure out/remember how to work with nested columns.

troyraen · 2026-06-01T16:41:05Z

+
+### 5.1 Explore the Objects Table Schema
+
+The Objects Table contains per-band summary statistics for each ZTF source.


Suggested change

The Objects Table contains per-band summary statistics for each ZTF source.

The Objects Table contains summary statistics for each ZTF object.

troyraen · 2026-06-02T00:02:50Z

+ztf_lcs_by_id_df = ztf_lcs_by_id.compute()
+ztf_lcs_by_id_df


I ran this on Fornax to check memory usage and found that Dask holds onto the memory it grabs here. Adding del ztf_lcs_by_id_df and similar for the catalog objects doesn't help. Wrapping this in a client context does help. (same as you figured out for sec. 4 below)

11.5G = Max memory usage of notebook without Client here
7.1G = Max memory usage of notebook with Client here

I used this:

Suggested change

ztf_lcs_by_id_df = ztf_lcs_by_id.compute()

ztf_lcs_by_id_df

def get_nworkers(object_ids):

return min(os.cpu_count(), len(object_ids))

with Client(n_workers=get_nworkers(object_ids),

threads_per_worker=1,

memory_limit=None # each partition can be several GB; avoid per-worker cap

) as client:

print(f"You can monitor progress in the Dask dashboard at {client.dashboard_link}")

ztf_lcs_by_id_df = ztf_lcs_by_id.compute()

ztf_lcs_by_id_df

troyraen · 2026-06-02T00:36:38Z

+We save the list of columns interesting to us for later use when opening the catalog with `lsdb`:
+
+```{code-cell} ipython3
+ztf_lc_columns = ["objectid", "objra", "objdec", "filterid", "nepochs", "lightcurve"]


I hoped we could save more memory by selecting only the lightcurve columns that actually get used, but I tried this and the savings is insignificant (we need 4 out of the 5 columns, so I suppose that makes sense). Leaving this suggestion in case you want to show that it's possible to load only some of the nested columns. Also fine to ignore this.

Suggested change

ztf_lc_columns = ["objectid", "objra", "objdec", "filterid", "nepochs", "lightcurve"]

ztf_lc_columns = ["objectid", "objra", "objdec", "filterid", "nepochs",

"lightcurve.hmjd", "lightcurve.mag", "lightcurve.magerr", "lightcurve.catflags"]

troyraen · 2026-06-02T01:20:32Z

+## 4. Get Light Curves by Sky Position
+
+If you have sky coordinates and want all ZTF sources within a given area, use a cone search.
+


Let's add this admonition (or similar) to describe how ZTF objects are defined. This is the first place where it's directly relevant, but you could move it up to the introduction if you prefer.

Suggested change

:::{important} ZTF objects are defined per (filter, field, quadrant)

ZTF objects (i.e., unique object IDs) are defined _per_ (filter, field, quadrant).

This means that observations of a single _astrophysical_ object are usually spread out amongst several different _ZTF_ objects.

At minimum, a given astrophysical object will be represented by up to 3 ZTF objects, one per filter (g, r, and i).

The per-filter observations may themselves be separated into additional ZTF objects if the astrophysical object lies near the boundary of a ZTF field and/or quadrant.

ZTF's pixel scale is 1"/pixel (see [ZTF Technical Specifications](https://www.ptf.caltech.edu/page/ztf_technical)), so combining all ZTF objects within a 1" cone search may be reasonable for a given astrophysical object.

:::

You can choose whether you want to change the actual code below here or not. I think it's fine to leave as-is as long as we provide this admonition. If changing, I would probably reduce search_radius to 1 arcsec, remove the ["nepochs", ">", 50] row filter, and group by filter (ie, band) before plotting the light curves. (FWIW, our Fornax light-curve-collector notebook justifies the 1" cone search by citing Graham et al., 2024. Unfortunately, ADS returns 79 papers for "Graham et al., 2024" and I don't know which one that came from.)

troyraen · 2026-06-02T01:40:05Z

+
+for ax, (_, row) in zip(axs, most_variable.iterrows()):
+    lc = row['lightcurve'].query("catflags == 0")  # to keep only clean epochs
+    title = (f"ZTF Object {row['objectid']}  ({row['filtercode']} band)\n"


Maybe we should stick with "filter". Most people will understand that band == filter but newbies may be confused if we don't explain it.

Suggested change

title = (f"ZTF Object {row['objectid']} ({row['filtercode']} band)\n"

title = (f"ZTF Object {row['objectid']} ({row['filtercode']} filter)\n"

troyraen · 2026-06-02T01:45:55Z

+- Retrieve light curves for specific sources by ZTF object IDs using an index search.
+- Retrieve light curves for sources in a sky region using a cone search.
+- Cross-reference the Objects Table to enrich cone search results with per-source variability statistics.


I think we should use either "object" or "target" instead of "source".

Suggested change

- Retrieve light curves for specific sources by ZTF object IDs using an index search.

- Retrieve light curves for sources in a sky region using a cone search.

- Cross-reference the Objects Table to enrich cone search results with per-source variability statistics.

- Retrieve light curves for specific objects by ZTF object IDs using an index search.

- Retrieve light curves for objects in a sky region using a cone search.

- Cross-reference the Objects Table to enrich cone search results with per-object variability statistics.

troyraen · 2026-06-02T02:03:47Z

+```
+
+We'll select a subset of columns useful for characterizing and annotating variable sources:
+


This is another thing that trips people up, so let's add an admonition. Could move this up to the intro (or elsewhere) if you prefer.

Suggested change

:::{important} `objectid` == `oid`

ZTF's object ID column is named `objectid` in Lightcurves and `oid` in Objects Table.

Despite this difference, the two columns are the same and can be used to join the catalogs.

:::

jaladh-singhal added 3 commits May 26, 2026 21:00

Add tutorial for accessing ZTF DR24 light curves from HATS catalog

6dcc5d0

Fix narrative and some cleanup

9192ca3

Add ztf notebook to TOC

bfad588

jaladh-singhal requested review from bsipocz and troyraen May 27, 2026 23:28

jaladh-singhal self-assigned this May 27, 2026

jaladh-singhal added the content Content related issues/PRs. label May 27, 2026

jaladh-singhal commented May 27, 2026

View reviewed changes

Adding new lsdb hats notebook to the ignore list for oldestdeps test

716de78

bsipocz added the GHA buildhtml Enable extra buildhtml job on GHA label May 28, 2026

troyraen requested changes Jun 2, 2026

View reviewed changes

troyraen added content: parquet Content related issues/PRs for notebooks with parquet/HATS relevance content: ztf Content related issues/PRs for notebooks with ZTF relevance labels Jun 2, 2026

		ztf_lc_hats_prefix = "ztf/enhanced/dr24/lc/hats" # Light curves catalog
		ztf_objects_hats_prefix = "ztf/enhanced/dr24/objects/hats" # Objects table

	Notice the `lightcurve` column — this is a nested column that stores the full photometric time series for each ZTF object.
	Notice the `lightcurve` column — this is a [nested](https://nested-pandas.readthedocs.io/) column that stores the full photometric time series for each ZTF object.


		### 5.1 Explore the Objects Table Schema

		The Objects Table contains per-band summary statistics for each ZTF source.

	The Objects Table contains per-band summary statistics for each ZTF source.
	The Objects Table contains summary statistics for each ZTF object.

	ztf_lc_columns = ["objectid", "objra", "objdec", "filterid", "nepochs", "lightcurve"]
	ztf_lc_columns = ["objectid", "objra", "objdec", "filterid", "nepochs",
	"lightcurve.hmjd", "lightcurve.mag", "lightcurve.magerr", "lightcurve.catflags"]

		## 4. Get Light Curves by Sky Position

		If you have sky coordinates and want all ZTF sources within a given area, use a cone search.

+:::{important} ZTF objects are defined per (filter, field, quadrant)
+ZTF objects (i.e., unique object IDs) are defined _per_ (filter, field, quadrant).
+This means that observations of a single _astrophysical_ object are usually spread out amongst several different _ZTF_ objects.
+At minimum, a given astrophysical object will be represented by up to 3 ZTF objects, one per filter (g, r, and i).
+The per-filter observations may themselves be separated into additional ZTF objects if the astrophysical object lies near the boundary of a ZTF field and/or quadrant.
+ZTF's pixel scale is 1"/pixel (see [ZTF Technical Specifications](https://www.ptf.caltech.edu/page/ztf_technical)), so combining all ZTF objects within a 1" cone search may be reasonable for a given astrophysical object.
+:::

	title = (f"ZTF Object {row['objectid']} ({row['filtercode']} band)\n"
	title = (f"ZTF Object {row['objectid']} ({row['filtercode']} filter)\n"

		```

		We'll select a subset of columns useful for characterizing and annotating variable sources:

+:::{important} `objectid` == `oid`
+ZTF's object ID column is named `objectid` in Lightcurves and `oid` in Objects Table.
+Despite this difference, the two columns are the same and can be used to join the catalogs.
+:::

Conversation

jaladh-singhal commented May 27, 2026

Uh oh!

jaladh-singhal May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bsipocz commented May 28, 2026

Uh oh!

bsipocz commented May 28, 2026

Uh oh!

jaladh-singhal commented May 28, 2026

Uh oh!

troyraen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

troyraen Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jaladh-singhal May 27, 2026 •

edited

Loading

troyraen Jun 2, 2026 •

edited

Loading