Roman DC2 + HLWAS selection-function products#53
Open
psferguson wants to merge 57 commits into
Open
Conversation
black/isort
Merge MultiSurveyInjector into StreamInjector: one class now accepts a
single survey or several (name/Survey, {namespace: spec} dict, or list).
Output columns are always survey-namespaced (<survey>_<band>_true/_obs/_err,
<survey>_flag_observed), even for a single survey. The model emits
<survey>_<band>_true uniformly (single-survey isochrone is just the
one-survey case).
- observed.py: StreamInjector takes one-or-many surveys; shared
_inject_one_survey/detect_flag now take an explicit survey; public
complete_data() fills missing geometry/ra-dec/true-mags from the config;
MultiSurveyInjector removed.
- model.py: complete_catalog never overwrites present values (fills only
missing rows, per column); add `dist` (scalar/vector) to set distances
directly without phi1 / a distance_modulus model.
- columns.py docs, scene yaml, demo builder + regenerated notebook updated.
- tests: namespaced columns; new TestCompleteCatalogPermutations.
- plan doc: single class, always-namespace, exact-nstars agreed; flagged
for removal (migrate to docs) before merge.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The reference band (survey.completeness_band) has its SNR>=5 cut baked into the survey selection functions, so the per-band loop was double-applying it (idempotent today, but conceptually double-counted and fragile), and a special-cased "force" block was needed for the perfect-galstarsep flag because the detection-efficiency curve does not bake the cut in. Now the reference-band cut is applied exactly once (to both flags) and the detection_mag_cut loop defaults to all injected bands except the reference band, which is skipped. Behaviour-preserving (same bands cut, ref counted once); removes the double-count and the flag asymmetry. Document the path from (b) to (a) -- folding the cut into the detection-efficiency curve itself -- in roman_multisurvey_plan.md, including which data products must be regenerated (per-survey detection_eff tables) and how. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adding test + formating
…t, isochrone masses
Responds to MatthieuPE's review on the roman_multisurvey PR.
API simplification:
- Collapse survey_bands+bands into one `bands` arg (list | {survey: bands} dict)
- Fold _complete_shared into the public complete_data; inject() delegates to it
- Make `survey` a required arg of detect_flag (drop the primary fallback)
Release-everywhere namespacing (Decision 1):
- Add Survey.namespace ({name}_{release}); injector keys surveys by it so the
same survey at two releases yields distinct, non-colliding columns
- _load_survey accepts a {"survey":, "release":} spec dict; _inject_one_survey
derives the namespace from survey.namespace
- `primary` is now the primary Survey; namespace string is primary_namespace
- Model: single-survey isochrone path is release-aware; _build_iso strips
`release` before the ugali factory
SNR cut (Decision 2): the ref-band S/N>=5 cut is baked into both selection-
function curves, so remove the redundant re-application in _inject_one_survey
and correct the comment/docstring.
Isochrone/mass:
- Raise _MASS_STEPS 1000 -> 4000 (convergence check: ~600 vs ~220 distinct
masses for a 5000-star stream; documented)
- Collapse single/multi isochrone builders into one _build_isochrones path
- sample_multisurvey accepts optional masses and returns (mags, masses);
complete_catalog exposes a `mass` column and reuses a provided one
Tests: release-namespaced column updates; new multi-survey complete_data,
unified-bands, mandatory-survey, and isochrone-mass tests. 35 passed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Doc staleness introduced by the PR #47 behavior changes: - column_convention.md / quickstart.md / multisurvey.md: column examples are now release-namespaced ({name}_{release}); state the namespacing rule. Quickstart Example 3 (lsst/yr1) no longer errors on copy-paste. - Correct the false "the dict key is the column namespace" claim in multisurvey.md, the StreamInjector.__init__ docstring, and roman_rubin_demo.yaml (keys are containers; namespace is re-derived from each Survey). - Document the {"survey":,"release":} spec-dict input form, the bands-dict validation, the mass column / user-supplied masses, _MASS_STEPS, and primary/primary_namespace. - Reword the S/N "applied once" text: the reference-band cut is owned by the selection-function curves, not re-applied by the injector. - Note the multi-survey isochrone requirement that surveys: keys equal the injector namespaces. Code: - set_completeness now accepts both "classification_eff" and the legacy misspelled "classifiction_eff" header, so the correct spelling documented in new_survey.md works without breaking the current (misspelled) data package. Also fixed the docstring's stale "eff_star" -> "detection_eff". Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- SplineStreamModel._create_model called an undefined self._create_distance();
rename to self._create_distance_modulus() (model.py:981). This made every
spline-stream run (e.g. bin/generate_spline_stream.py) raise AttributeError.
- plotting.plot_inject read bare, non-namespaced columns (flag_observed, r_obs,
...) and so failed on the injector's {name}_{release}-namespaced output; derive
the namespace from survey.namespace and use it for all flag/true/obs columns.
Tests: test_spline_model.py (instantiate + sample, skipped if the spline data
file is absent) and test_plotting.py (plot_inject smoke test on namespaced
output). 37 passed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The DES survey config is name=des, release=yr6 (namespace des_yr6) and the data directory is data/surveys/des_yr6/, matching the LSST yr* convention. DES.md told users to load release='y6' and use a des_y6/ folder, which fails (no des_y6 config). Standardize the release/dir on yr6 in DES.md and the new_survey.md example. (Data-file basenames in the package remain des_y6_*; that's a separate data-package rename.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Notes claimed magnitudes (like velocities) overwrite whole columns. In fact phi1/phi2/dist, magnitudes, and the shared mass column fill only missing rows (the preserve-existing contract); only velocities are recomputed wholesale. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Correcting bug and moving notebooks
…) and fix test_surveys collection error (pytest >=9)
- config/surveys/roman_hlwas.yaml: survey config keyed to the official 5-sigma point-source depth convention (DC2-derived maglim map, completeness and photo-error tables in F158) - notebooks/create_streamobs_files_hlwas.ipynb: full derivation from the Roman-Rubin DC2 mock (paper-exact flags==0 / S/N>5 / matched cuts, truth-duplicate and tile-margin handling, desqr-style depth maps, streamobs-format tables) - notebooks/build_roman_dc2_det_truth.py: builder for the det->truth matched catalog the derivation runs on - docs: new "Roman HLWAS survey files" page documenting the derivation, depth conventions (STScI HLWAS medians), and caveats
- rewrite the Roman HLWAS docs page as a self-contained data-section style derivation (no notebook references), documenting the matching, selection, completeness, photo-error, and depth-normalization choices - embed diagnostic figures (mag distributions, combined star efficiency + galaxy misclassification, photo-error, maglim maps, LSST comparison), exported by the derivation notebook into docs/source/_static/roman_hlwas/ - notebook: merge the galaxy misclassification curve onto the star detection/classification efficiency plot Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…odel Validating the reported SExtractor magerr against the truth shows it underestimates the real scatter of (observed - true) by a flat factor ~1.9-2.0 in all four bands - the apparent excess depth over the official 5-sigma values is not real. Accordingly: - photo-error table is now built from the truth-based scatter ((p84-p16)/2 of mag_auto - truth mag), not the reported magerr - normalized _5sigps maglim maps removed; maps stay in the measured convention and the tables' delta_mag is keyed to the measured F158 map median, one internal convention throughout - new error-validation figure (reported vs truth-based, per band) in the notebook and docs; docs depth/error sections rewritten accordingly - config points at the un-normalized maglim map Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… extinction docs From a full re-read of Troxel+23: - saturation IS modeled in the mock (clip ~1.1e5 e-, effects brighter than mag~17, Fig 7); config saturation 15 -> 17 (delta_saturation -10.1) - restrict survey bands to f106/f129/f158 (F184 is deep-tier-only in the community HLWAS and has a known unresolved chromatic calibration issue) - docs: detection is two-stage (2.5-sigma/minarea-5 segmentation + S/N>5 catalog cut), not a single 5-sigma threshold - docs: flags==0 documented as the observational pure-star-sample selection (removes blends), deliberately stricter than the paper's relaxed flags<=2 analysis cuts - docs: official Roman WFI AB zeropoints table (2024-03-01 effective areas) + grey calibration offsets; extinction-coefficient section (CCM89 absolute values, band ratios validated to ~1% against the truth dereddening corrections; amplitudes not validatable due to the mock's dust bug) - caveats: detector-optimistic simple model, unmasked diffraction spikes Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…e 3) A_band/E(B-V) = 1.1495/0.8497/0.6140 for F106/F129/F158 (synphot, solar Phoenix spectrum), replacing the CCM89 estimates (which agreed to 1-2%). Docs cite the document and cross-reference its zeropoints against the effective-area ecsv values. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The star-classified sample is galaxy-dominated faintward of F158~25.5 (69% true galaxies at 25.5, 99% at 26.5), inflating the apparent scatter (0.33 vs 0.15 mag at 25.5). New cell shows the same diagnostics for true stars passing the star classification. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ename - all selection-function products now use TRUE stars passing the star classification (photo-error table, depth maps; efficiency already did): the observationally star-classified sample is galaxy-dominated faintward of F158~25.5 and doubles the apparent faint-end scatter - depth maps truth-anchored: desqr spatial structure from reported errors, absolute scale set where the truth-based scatter reaches S/N=5 (medians 26.19/26.19/25.98/25.26); photo-error model = 0.217 mag at delta_mag=0 by construction; config delta_saturation re-keyed (-9.0) - two-panel photo-error figure (reported errors | sigma(true-obs) vs reported); single-panel magnitude distributions (color=band, linestyle=true/obs) - derivation notebook converted to scripts/roman/create_streamobs_files_hlwas.py and removed from the repo (kept locally, gitignored); roman build scripts moved to scripts/roman/ - docs page renamed roman_hlwas -> roman_dc2, figures under _static/roman_dc2/, depth/error sections rewritten for the anchored convention Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The catalog's S/N>5 selection uses the reported detection-image flux errors; truth-validating them (scatter of mag_auto - truth_bb_mag for clean stars, bright-end floor removed in quadrature) gives an error factor of 1.71, so a true S/N=5 selection is reported det_sn > 8.6 (keeps 81.6% of detections). Applied to every product selection; the error-validation cell intentionally keeps the raw catalog. Effect: detection efficiency now falls off at the true depth (0.82/0.59/0.18 at delta_mag = 0/+0.5/+1) instead of extending ~1 mag past it; the photo-error table ends at delta ~+0.9 and still evaluates to sigma = 0.217 at delta_mag = 0; anchored maglims unchanged. Combined 50% completeness now at F158 ~ 26.3. The notebook copy is regenerated from the script and rerun (both stay local/gitignored, including the converter helper). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The selection-function products are DC2-derived, so they now live under
the roman 'dc2' release: config/surveys/roman_dc2.yaml with data in
data/surveys/roman_dc2/ (tables + maglim maps moved there; the survey
loads and smoke-tests via SurveyFactory.create_survey('roman', 'dc2')).
config/surveys/roman_hlwas.yaml is a commented placeholder for the real
HLWAS footprint (exptime-scaled maglim maps in the same truth-anchored
convention, reusing the DC2 tables). Script OUT_DIR updated; notebook
regenerated and rerun.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ction
Replace the scalar class_star>0.5 cut with a single-band F158 size envelope as
the production star classifier in create_streamobs_files_hlwas.py: a two-sided
band in log(size) about the per-mag stellar locus, tuned to 0.875 purity
(DES Y6 0<=EXT_XGB<=1), bright-capped by the stellar log-size scatter,
single-peaked, frozen faintward of mag 24, with the upper bound flaring to
0.15" at mag 18. Detection is now treated as single-band F158 (true S/N>5 from
magerr_auto_H158, error factor 1.59) instead of the 4-band det_sn gate; the
envelope replaces class_star in every product selection.
Also wires the two-curve photo-error model into config/surveys/roman_dc2.yaml
(log_photo_error_catalog + log_photo_error_sample) and has the generator write
roman_photoerror_f158_catalog.csv.
Adds the classifier-comparison figure (envelope vs class_star>0.5 vs per-mag
optimized class_star; classifiction_eff + purity-on-secax) and updates
docs/source/roman_dc2.md for the new classifier, H-band detection, the shifted
depths (F158 maglim 25.98 -> 26.38, error factor 1.71 -> 1.59) and a quantified
explanation of the bright-end completeness plateau.
Note: running the generator also (re)writes the survey data products under
data/surveys/roman_dc2/ -- roman_photoerror_f158{,_catalog}.csv,
roman_stellar_efficiency_cutf158.csv, roman_dc2_maglim_f*_nside1024.fits.gz.
Those live beyond a symlink, outside the tracked tree, and are NOT in this
commit; regenerate them by running the script.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Keep the rendered-notebook builders (build_*_nb.py) un-tracked; the production scripts and their products are the source of truth. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- fixing + testing reproducibility - testing isochrone model
Collaborator
|
I added some tests and fixed an issue of reproducibility of injection / sampling |
MatthieuPE
reviewed
Jun 19, 2026
Comment on lines
+43
to
+44
| > `classifiction_eff` (the misspelling is intentional and load-bearing — the loader | ||
| > greps that exact string). Keep it when re-deriving products for other surveys. |
Collaborator
There was a problem hiding this comment.
Haha I did see that before, I guess it was intentional, we could / should solve it
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Roman DC2 + HLWAS selection-function products
Stacked on
roman_multisurvey(the unifiedStreamInjector/ namespacing refactor); retarget tomainafter that merges.Summary
Adds the full Roman catalog-level injection product suite to streamobs, derived from the Roman–Rubin DC2 image simulations (Troxel et al. 2023). This lets streams be injected and "observed" through Roman's depth, photometric errors, and star/galaxy selection — alongside LSST/DES via the unified multi-survey injector — and documents the derivation as a reusable methodology so LSST/DES products can later be re-derived self-consistently.
Surveys added
roman_dc2roman_hlwas_wideroman_hlwas_mediumroman_hlwas_all(F129 isn't in the HLWAS exposure-time maps; F184 excluded for a known chromatic-calibration issue. Wide/medium are F158-only by survey design.)
What's included
config/surveys/roman_photoerror_corrections.yaml) to clean the measured faint end reproducibly.depth = 26.375 + 1.25·log10(t/770 s), anchored to the DC2 truth-anchored F158 median and the DC2 HLIS reference exposure (Troxel §3.1) — keeps HLWAS on the same depth convention as the DC2-derived tables.size_true < 0.3″), via a positional Roman↔LSST-DC2 match and cosmoDC2size_true(joined bycosmodc2_id). 100% size coverage of matched galaxies.roman_star_classifierused by both the product generator and the misclassification script (no drift).bin/build_data_archive.pyConventions
roman_F158_true, release-independent); observed/error/flag columns key on the full namespace (roman_dc2_F158_obs,roman_dc2_flag_observed).F106/F129/F158), matching ugali and the literature.Testing
tests/test_roman.py+ Roman entries inSURVEY_REGISTRY(incl. a true/obs column-convention round-trip and Vega→AB checks).des_yr6Y-band threshold, also present on the base branch).Data
Data files are not committed. Runtime products ship via Zenodo (
data.zipbuilt bybin/build_data_archive.py). UpdateBASE_DATA_URLindownload_data.pyafter upload.Deferred to follow-up PRs