Skip to content

Conversation

@cmungall
Copy link
Member

Add comprehensive family income valueset for U.S. health surveys

Closes #27

Summary

  • Added FamilyIncomeEnum to demographics schema with income brackets commonly used in ACS/BRFSS/NHANES/NIH CDEs
  • Includes all requested income ranges from <$10,000 to $150,000+
  • Added survey response options "Prefer not to answer" and "Don't know"
  • Follows LinkML conventions and includes metadata about year-versioning and granularity
  • References authoritative sources: CDC BRFSS, Census ACS, NIH NHANES

Test Plan

  • Run just validate to ensure schema compliance
  • Run just site to verify documentation generation
  • Verify enum appears correctly in generated artifacts

Generated with Claude Code

- Added comprehensive family income valueset with standard U.S. health survey brackets
- Includes income ranges from <$10,000 to $150,000+ as specified in ACS/BRFSS/NHANES/NIH CDEs
- Added "Prefer not to answer" and "Don't know" options for survey completeness
- Follows LinkML conventions with CamelCase enum name and UPPER_CASE permissible values
- Includes metadata about year-versioning and granularity considerations
- Referenced authoritative sources: CDC BRFSS, Census ACS, NIH NHANES

Co-authored-by: madanucd <madanucd@users.noreply.github.com>

🤖 Generated with [Claude Code](https://claude.ai/code)
@claude
Copy link
Contributor

claude bot commented Nov 26, 2025

Claude Code is working…

I'll analyze this and get back to you.

View job run

Copilot AI review requested due to automatic review settings December 13, 2025 00:56
@claude
Copy link
Contributor

claude bot commented Dec 13, 2025

Claude finished @cmungall's task —— View job


Code Review Complete ✅

I've reviewed the PR that adds FamilyIncomeEnum to the demographics schema. Here's my comprehensive feedback:

Todo List:

  • Read and analyze the changed demographics.yaml file
  • Review code quality and LinkML compliance
  • Check naming conventions and structure
  • Validate ontology mappings and references
  • Test schema validation if possible
  • Provide comprehensive feedback

Code Quality Assessment

Overall: This is a well-structured addition that follows LinkML conventions and project guidelines.

Strengths:

  • Proper LinkML Structure: Correct enum syntax and organization
  • Comprehensive Coverage: Income brackets from <$10K to $150K+ cover typical survey needs
  • Survey Best Practices: Includes "Prefer not to answer" and "Don't know" options
  • Good Documentation: Comments explain year-specificity and granularity considerations
  • Authoritative Sources: References CDC BRFSS, Census ACS, and NIH NHANES
  • Consistent Naming: Follows CamelCase convention for enum name
  • Proper Integration: Slot definition correctly added (lines 38-40)

Issues and Recommendations

Critical: Missing Ontology Mappings

Issue: Per CLAUDE.md guidelines, the enum lacks meaning: mappings to ontology terms (lines 43-47 in CLAUDE.md)

Example from other enums in the file:

MARRIED:
  description: A current marriage contract is active
  meaning: HL7:marital-status#M  # ← Missing for FamilyIncomeEnum

Recommendation: Add ontology mappings using established vocabularies like SNOMED, LOINC, or HL7

⚠️ Missing Metadata Fields

Issues:

  1. No title field (line 290) - All other enums have titles
  2. No status field - Should be DRAFT to match other enums
  3. No contributors field - Should include author ORCID
  4. No instantiates field - Should reference valuesets_meta:ValueSetEnumDefinitionWithSource

Example from other enums:

FamilyIncomeEnum:
  title: Family Income  # ← Add this
  status: DRAFT  # ← Add this
  contributors:  # ← Add this
  - orcid:0000-0002-6601-2165
  instantiates:  # ← Add this
  - valuesets_meta:ValueSetEnumDefinitionWithSource

🔧 Minor Improvements

  1. Source URL Format: Consider using a more specific source URL rather than multiple space-separated URLs

    • Current: source: https://www.cdc.gov/brfss/ https://www.census.gov/programs-surveys/acs/ https://www.cdc.gov/nchs/nhanes/
    • Suggestion: Pick primary source or format as array
  2. Value Naming: Consider consistency in numerical formatting:

    • TWENTY_FIVE_TO_29999 vs THIRTY_TO_34999 (inconsistent hyphenation in text)

Security Considerations

No security concerns identified - This is a static value set with no executable code or sensitive data.

Performance Considerations

No performance issues - Static enum definitions have minimal impact.

Test Coverage Recommendations

Since validation commands require approval, please manually verify:

  1. Run just validate to ensure LinkML schema compliance
  2. Run just site to verify documentation generation
  3. Check that the enum appears in generated artifacts

Summary

This is a solid contribution that adds valuable standardized income brackets for health surveys. The main issues are:

  • Missing ontology mappings (critical for LinkML best practices)
  • Missing standard metadata fields (title, status, contributors, instantiates)

These are easily addressed and don't affect the core functionality. The income ranges and structure are appropriate for the intended use case.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new FamilyIncomeEnum valueset to the demographics schema, providing standardized income bracket ranges for U.S. health surveys. The enum includes 10 income ranges from less than $10,000 to $150,000+, along with "Prefer not to answer" and "Don't know" response options. The implementation references authoritative sources (CDC BRFSS, Census ACS, NIH NHANES) and includes helpful comments about year-specificity and granularity considerations.

Key Changes:

  • Added family_income slot definition to demographics schema
  • Created comprehensive FamilyIncomeEnum with 12 permissible values covering standard U.S. health survey income brackets
  • Included metadata documenting source, year-versioning considerations, and granularity guidance

Comment on lines +290 to +296
FamilyIncomeEnum:
description: Annual household income ranges commonly used in U.S. health surveys (ACS, BRFSS, NHANES, NIH CDEs)
comments:
- Income brackets are derived from commonly used ranges in U.S. health surveys
- Values should be considered year-specific as income brackets shift with inflation
- This represents a standard granularity level; coarser bins may be needed for some applications
source: https://www.cdc.gov/brfss/ https://www.census.gov/programs-surveys/acs/ https://www.cdc.gov/nchs/nhanes/
Copy link

Copilot AI Dec 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new FamilyIncomeEnum is missing several required metadata fields that all other enums in this file have. Please add the following fields after line 291:

  • title: A human-readable title for the enum (e.g., "Family Income")
  • status: Should be "DRAFT" to match other enums in this file
  • contributors: List of contributors (should include your ORCID and/or github reference like other enums)
  • instantiates: Should be "valuesets_meta:ValueSetEnumDefinitionWithSource" to match other enums

These fields are present in all other enums in demographics.yaml (EducationLevel, MaritalStatus, EmploymentStatus, HousingStatus, GenderIdentity, OmbRaceCategory, OmbEthnicityCategory) and should be included for consistency.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add valuesets for family income

2 participants