Integration with GreatExpectations, validation of lab models by julianam-w · Pull Request #217 · beyondessential/tamanu-source-dbt

julianam-w · 2025-08-24T13:18:21Z

MAUI-5788

Code review
Testing

Setup script

Compile expectations to json
Create batch definitions that is for the whole table
Create a suite of expectations
Create validation definitions
Create checkpoint from validation definitions

Validate script

Execute validation and generate report

gemini-code-assist

Summary of Changes

Hello @julianam-w, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces Great Expectations to the project, establishing a robust data validation framework for lab-related data models. It sets up a comprehensive system for defining, running, and reporting on data quality expectations, ensuring the reliability of ds__lab_requests and ds__lab_tests datasets, including their sensitive counterparts.

Highlights

Great Expectations Framework: A new gx/ directory is added, containing Python scripts (setup.py, validate.py) and configuration files (great_expectations.yml, checkpoint, expectation suites) to integrate Great Expectations for data quality checks.
Lab Data Model Validation: Dedicated expectation suites (ds__lab_requests_expectations, ds__lab_tests_expectations, and their sensitive versions) are introduced to validate the integrity and consistency of lab request and lab test data.
Refactored dbt Models: Existing ds__lab_requests.sql and ds__lab_tests.sql dbt models are refactored to leverage new macros (lab_requests_dataset, lab_tests_dataset), enabling better separation and validation of sensitive and non-sensitive lab data.
New Sensitive Data Models: New dbt models (ds__sensitive_lab_requests.sql, ds__sensitive_lab_tests.sql) are created specifically for sensitive lab data, along with their corresponding YAML schemas.
Reporting Updates: SQL reports that consume lab data are updated to correctly reference the new sensitive/non-sensitive lab data models, ensuring accurate reporting based on the validated datasets.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a comprehensive data validation setup using Great Expectations, which is a fantastic addition for ensuring data quality. The modular structure with separate scripts for setup and validation is well-designed. However, there are a few critical issues to address, most importantly a security vulnerability involving hardcoded credentials. I've also included suggestions to improve error handling, repository hygiene, and code robustness.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: julianam-w <77252335+julianam-w@users.noreply.github.com>

…l/tamanu-source-dbt into gx-lab-validation

julianam-w · 2025-08-24T14:30:08Z

/gemini review

gemini-code-assist

Code Review

This PR introduces a comprehensive data validation framework using Great Expectations, which is a great addition for ensuring data quality. The separation of setup and validation logic into distinct scripts (setup.py, validate.py) is well-thought-out. The use of Python scripts to define expectation suites is also a good practice for maintainability.

My review focuses on improving code quality, fixing potential bugs, and ensuring consistency. I've identified a critical syntax error, some uses of bad practices like bare except clauses, and opportunities to reduce code duplication. I've also noted that the version of great-expectations is quite old and suggest an update.

One architectural point to consider is the dual representation of expectation suites (both .py and .json files are checked in). This can lead to them getting out of sync. It would be better to either have the Python scripts as the single source of truth and generate the JSON files at runtime (and add gx/expectations/*.json to .gitignore), or commit only the JSON files if they are to be edited manually.

Addressing these points will make the new validation framework more robust and easier to maintain.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: julianam-w <77252335+julianam-w@users.noreply.github.com>

…l/tamanu-source-dbt into gx-lab-validation

julianam-w · 2025-08-24T21:50:55Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a comprehensive data validation framework using Great Expectations, which is a fantastic step towards ensuring data quality for the lab models. The changes are well-structured, with separate scripts for setup and validation, and clear documentation in the new gx/README.md.

My review focuses on a few critical correctness issues in the expectation definitions and some medium-severity improvements related to code practices, such as avoiding broad exception handling and wildcard imports. I've also noted that the Great Expectations version is quite old and recommend planning an upgrade. Addressing these points will make the new validation system more robust and reliable.

gemini-code-assist

Code Review

This pull request introduces a robust data validation framework using Great Expectations, which is a fantastic addition for ensuring data quality in the lab models. The structure with separate setup and validation scripts, along with modular utility functions, is well-designed. I've identified a critical logical error in two of the expectation queries that needs to be addressed. Additionally, there's a potential security risk in how new datasources are created, and a few medium-severity issues related to code maintainability and a typo. Overall, this is a great feature, and with these fixes, it will be a solid and secure implementation.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: julianam-w <77252335+julianam-w@users.noreply.github.com>

julianam-w added 3 commits August 24, 2025 21:19

MAUI-5623 Separate sensitive lab datasets

90d4b4e

Update

eb373af

Integrate GK, validate lab models

bdb7f26

julianam-w changed the base branch from main to maui-5623-split-sensitive-lab August 24, 2025 13:18

gemini-code-assist Bot reviewed Aug 24, 2025

View reviewed changes

julianam-w and others added 7 commits August 24, 2025 23:27

Update gx/expectations/ds__lab_requests_expectations.py

ae938e2

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: julianam-w <77252335+julianam-w@users.noreply.github.com>

Update gx/utils/validation_manager.py

40b1e08

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: julianam-w <77252335+julianam-w@users.noreply.github.com>

Update gx/utils/validation_manager.py

b0cf884

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: julianam-w <77252335+julianam-w@users.noreply.github.com>

Update

f8cd807

Update gx/setup.py

8f59fa7

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: julianam-w <77252335+julianam-w@users.noreply.github.com>

Update

67ca00c

Merge branch 'gx-lab-validation' of https://github.com/beyondessentia…

b64ddfe

…l/tamanu-source-dbt into gx-lab-validation

gemini-code-assist Bot reviewed Aug 24, 2025

View reviewed changes

julianam-w and others added 6 commits August 25, 2025 00:34

Update gx/expectations/ds__lab_tests_expectations.py

1294b38

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: julianam-w <77252335+julianam-w@users.noreply.github.com>

Update gx/expectations/ds__sensitive_lab_requests_expectations.py

3cc9aad

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: julianam-w <77252335+julianam-w@users.noreply.github.com>

Update gx/expectations/ds__sensitive_lab_tests_expectations.py

fc4dbd1

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: julianam-w <77252335+julianam-w@users.noreply.github.com>

Update

f0ded68

Merge branch 'gx-lab-validation' of https://github.com/beyondessentia…

a632f54

…l/tamanu-source-dbt into gx-lab-validation

Update .gitignore

528a5e1

gemini-code-assist Bot reviewed Aug 24, 2025

View reviewed changes

julianam-w and others added 6 commits August 25, 2025 08:50

Update gx/validate.py

0632183

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: julianam-w <77252335+julianam-w@users.noreply.github.com>

Update gx/expectations/ds__lab_tests_expectations.py

29012d0

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: julianam-w <77252335+julianam-w@users.noreply.github.com>

Update gx/expectations/ds__lab_requests_expectations.py

318f2f7

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: julianam-w <77252335+julianam-w@users.noreply.github.com>

Update gx/setup.py

42b073e

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: julianam-w <77252335+julianam-w@users.noreply.github.com>

Update validation_manager.py

5d1bd4f

Update result_extractor.py

8de1df0

julianam-w added 4 commits August 25, 2025 09:01

Update docs_manager.py

7a500b9

Update __init__.py

741ff68

Update

133fa65

Update

e84e82a

Base automatically changed from maui-5623-split-sensitive-lab to main August 26, 2025 22:09

julianam-w added 2 commits August 27, 2025 08:11

Merge branch 'main' into gx-lab-validation

166e29c

Merge branch 'main' into gx-lab-validation

bfe55e9

Conversation

julianam-w commented Aug 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

julianam-w commented Aug 24, 2025

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

julianam-w commented Aug 24, 2025

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

julianam-w commented Aug 24, 2025 •

edited

Loading