Skip to content

Add personal data and confidentiality indicators (fixes #58, #59)#66

Open
deeeeeepesh wants to merge 4 commits intoopen-semantic-interchange:mainfrom
deeeeeepesh:main
Open

Add personal data and confidentiality indicators (fixes #58, #59)#66
deeeeeepesh wants to merge 4 commits intoopen-semantic-interchange:mainfrom
deeeeeepesh:main

Conversation

@deeeeeepesh
Copy link
Copy Markdown

Summary

This PR implements two feature requests:

Changes

Added two new optional boolean attributes to both Field and Dataset schemas:

Attribute Purpose
contains_personal_data Flags fields/datasets containing PII for privacy compliance (GDPR, CCPA)
is_confidential Indicates data that should not be exposed to LLMs or external tools

Files Modified

  • core-spec/osi-schema.json - JSON Schema definitions
  • core-spec/spec.yaml - YAML specification
  • core-spec/spec.md - Documentation with examples

Example Usage

fields:
  - name: email
    expression:
      dialects:
        - dialect: ANSI_SQL
          expression: email
    contains_personal_data: true
    is_confidential: false
    
  - name: ssn
    expression:
      dialects:
        - dialect: ANSI_SQL
          expression: social_security_number
    contains_personal_data: true
    is_confidential: true

Copilot AI and others added 4 commits February 17, 2026 09:43
…d Dataset schemas

Co-authored-by: deeeeeepesh <85902051+deeeeeepesh@users.noreply.github.com>
Co-authored-by: deeeeeepesh <85902051+deeeeeepesh@users.noreply.github.com>
…confidential-indicator

Add data sensitivity attributes: contains_personal_data and is_confidential
Comment thread core-spec/spec.yaml
# Optional: Human-readable description of the logical dataset
description: string

# Optional: Indicates if this dataset contains personal data (PII) subject to privacy regulations
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be extended to classification_type rather than just an indicator of whether it is a personal type or not? Or the expression should be used as a classification type? Here are some of the classification type can be considered as contains_personal_data
https://github.com/ananthdurai/schemata/blob/main/src/opencontract/v1/org/schemata/protobuf/schemata.proto#L87

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does GDPR have an official standard for data classification ?

@khush-bhatia
Copy link
Copy Markdown
Member

deeeeeepesh Thanks for this PR. But there is a catalog working group that is figuring out a full proposal for catalog , sensitive data classification and governance. Let's wait for the proposal from that working group to update the spec cohesively. You are welcome to join the working group as well.

Slack Join link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

New attribute : confidential indicator New attribute : contain personal data

4 participants