Skip to content

Roman multisurvey#47

Open
psferguson wants to merge 31 commits into
mainfrom
roman_multisurvey
Open

Roman multisurvey#47
psferguson wants to merge 31 commits into
mainfrom
roman_multisurvey

Conversation

@psferguson

Copy link
Copy Markdown
Collaborator

Multi-survey (Roman + Rubin) stream injection
Lets a single mock stream carry both Roman and Rubin/LSST photometry, with each band drawing its errors and detection from its own survey's maglim maps. Four phases (see docs/source/roman_multisurvey_plan.md):

Phase 1 — Survey holds two error curves: catalog (reported magerr, drives the S/N cut) and optional sample (true scatter, drives the noise draw).
Phase 2 — injector de-hardcoded off {r, g}; all column names route through a new columns.py with a uniform _true/_obs/err scheme.
Phase 3 — IsochroneModel is multi-survey: masses drawn once and interpolated into every survey's bands (same physical star, consistent across surveys); Roman bands auto-converted Vega→AB.
Phase 4 — new MultiSurveyInjector: one shared sky placement + true-mag fill, then per-survey
_obs/_err and _flag_observed. Adds a demo scene and a runnable Phases 1–4 notebook.
⚠️ Behaviour changes
nstars now returns exactly N stars (was an emergent IMF count).
S/N cut applies to all injected bands (was hard-coded ["g"]).
Column names changed to _true/obs/err — not backward compatible with the old mag/magerr. Public inject() API unchanged.
Other
Isochrone configs switched Bressan2012 → Marigo2017; repo-wide black/isort pass.
Testing
pytest green; demo notebook executes with 0 errors. Real Roman/Rubin maglim maps aren't committed, so MultiSurveyInjector was validated against a stub survey (swapping in Survey.load(...) needs no injector changes).

Comment thread streamobs/observed.py
@psferguson

Copy link
Copy Markdown
Collaborator Author

Make sure a unit test exists that tests the complete_data method in observed.py/StreamInjectior for the multi survey case.

@MatthieuPE

Copy link
Copy Markdown
Collaborator

Do we really need to do a new instance for multi survey injector, rather than a single one which could take a list of surveys as input

psferguson and others added 4 commits June 16, 2026 11:19
Merge MultiSurveyInjector into StreamInjector: one class now accepts a
single survey or several (name/Survey, {namespace: spec} dict, or list).
Output columns are always survey-namespaced (<survey>_<band>_true/_obs/_err,
<survey>_flag_observed), even for a single survey. The model emits
<survey>_<band>_true uniformly (single-survey isochrone is just the
one-survey case).

- observed.py: StreamInjector takes one-or-many surveys; shared
  _inject_one_survey/detect_flag now take an explicit survey; public
  complete_data() fills missing geometry/ra-dec/true-mags from the config;
  MultiSurveyInjector removed.
- model.py: complete_catalog never overwrites present values (fills only
  missing rows, per column); add `dist` (scalar/vector) to set distances
  directly without phi1 / a distance_modulus model.
- columns.py docs, scene yaml, demo builder + regenerated notebook updated.
- tests: namespaced columns; new TestCompleteCatalogPermutations.
- plan doc: single class, always-namespace, exact-nstars agreed; flagged
  for removal (migrate to docs) before merge.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The reference band (survey.completeness_band) has its SNR>=5 cut baked into
the survey selection functions, so the per-band loop was double-applying it
(idempotent today, but conceptually double-counted and fragile), and a
special-cased "force" block was needed for the perfect-galstarsep flag
because the detection-efficiency curve does not bake the cut in.

Now the reference-band cut is applied exactly once (to both flags) and the
detection_mag_cut loop defaults to all injected bands except the reference
band, which is skipped. Behaviour-preserving (same bands cut, ref counted
once); removes the double-count and the flag asymmetry.

Document the path from (b) to (a) -- folding the cut into the
detection-efficiency curve itself -- in roman_multisurvey_plan.md, including
which data products must be regenerated (per-survey detection_eff tables)
and how.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread streamobs/observed.py Outdated
Comment thread streamobs/model.py Outdated
Comment thread streamobs/model.py
Comment thread streamobs/model.py Outdated
Comment thread streamobs/observed.py Outdated
Comment thread streamobs/observed.py Outdated
Comment thread streamobs/observed.py Outdated
Comment thread streamobs/observed.py Outdated
Comment thread streamobs/columns.py
Comment thread streamobs/observed.py Outdated
Comment thread streamobs/observed.py Outdated
MatthieuPE and others added 8 commits June 17, 2026 16:24
…t, isochrone masses

Responds to MatthieuPE's review on the roman_multisurvey PR.

API simplification:
- Collapse survey_bands+bands into one `bands` arg (list | {survey: bands} dict)
- Fold _complete_shared into the public complete_data; inject() delegates to it
- Make `survey` a required arg of detect_flag (drop the primary fallback)

Release-everywhere namespacing (Decision 1):
- Add Survey.namespace ({name}_{release}); injector keys surveys by it so the
  same survey at two releases yields distinct, non-colliding columns
- _load_survey accepts a {"survey":, "release":} spec dict; _inject_one_survey
  derives the namespace from survey.namespace
- `primary` is now the primary Survey; namespace string is primary_namespace
- Model: single-survey isochrone path is release-aware; _build_iso strips
  `release` before the ugali factory

SNR cut (Decision 2): the ref-band S/N>=5 cut is baked into both selection-
function curves, so remove the redundant re-application in _inject_one_survey
and correct the comment/docstring.

Isochrone/mass:
- Raise _MASS_STEPS 1000 -> 4000 (convergence check: ~600 vs ~220 distinct
  masses for a 5000-star stream; documented)
- Collapse single/multi isochrone builders into one _build_isochrones path
- sample_multisurvey accepts optional masses and returns (mags, masses);
  complete_catalog exposes a `mass` column and reuses a provided one

Tests: release-namespaced column updates; new multi-survey complete_data,
unified-bands, mandatory-survey, and isochrone-mass tests. 35 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Doc staleness introduced by the PR #47 behavior changes:
- column_convention.md / quickstart.md / multisurvey.md: column examples are now
  release-namespaced ({name}_{release}); state the namespacing rule. Quickstart
  Example 3 (lsst/yr1) no longer errors on copy-paste.
- Correct the false "the dict key is the column namespace" claim in
  multisurvey.md, the StreamInjector.__init__ docstring, and roman_rubin_demo.yaml
  (keys are containers; namespace is re-derived from each Survey).
- Document the {"survey":,"release":} spec-dict input form, the bands-dict
  validation, the mass column / user-supplied masses, _MASS_STEPS, and
  primary/primary_namespace.
- Reword the S/N "applied once" text: the reference-band cut is owned by the
  selection-function curves, not re-applied by the injector.
- Note the multi-survey isochrone requirement that surveys: keys equal the
  injector namespaces.

Code:
- set_completeness now accepts both "classification_eff" and the legacy
  misspelled "classifiction_eff" header, so the correct spelling documented in
  new_survey.md works without breaking the current (misspelled) data package.
  Also fixed the docstring's stale "eff_star" -> "detection_eff".

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- SplineStreamModel._create_model called an undefined self._create_distance();
  rename to self._create_distance_modulus() (model.py:981). This made every
  spline-stream run (e.g. bin/generate_spline_stream.py) raise AttributeError.
- plotting.plot_inject read bare, non-namespaced columns (flag_observed, r_obs,
  ...) and so failed on the injector's {name}_{release}-namespaced output; derive
  the namespace from survey.namespace and use it for all flag/true/obs columns.

Tests: test_spline_model.py (instantiate + sample, skipped if the spline data
file is absent) and test_plotting.py (plot_inject smoke test on namespaced
output). 37 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The DES survey config is name=des, release=yr6 (namespace des_yr6) and the data
directory is data/surveys/des_yr6/, matching the LSST yr* convention. DES.md told
users to load release='y6' and use a des_y6/ folder, which fails (no des_y6
config). Standardize the release/dir on yr6 in DES.md and the new_survey.md
example. (Data-file basenames in the package remain des_y6_*; that's a separate
data-package rename.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Notes claimed magnitudes (like velocities) overwrite whole columns. In fact
phi1/phi2/dist, magnitudes, and the shared mass column fill only missing rows
(the preserve-existing contract); only velocities are recomputed wholesale.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@psferguson

psferguson commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator Author

Ok, I think I/cdawg addressed all of your comments.
Points to flag:

  • everything is required to have <survey>_<release> naming (including true magnitudes) and internally we call the <survey>_<release> just <namespace>
  • Upped the mass steps to 4k (happy to go higher)
  • I removed the reference band S/N cut so that we dont double count it.
  • single/multi isochrone builders collapsed into one path
  • sample_multisurvey accepts/returns masses + a mass column
  • Fixed some bugs and updated the documentation to match the code in a couple of places as well as adding more unit test coverage.

Comment thread streamobs/model.py Outdated
@MatthieuPE

Copy link
Copy Markdown
Collaborator

PR Ok for merging for me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants