GitHub - Health-Informatics-UoN/carrot-transform: Streamlined Data Transformation to OMOP

Streamlined Data Transformation to OMOP

Carrot Transform automates data transformation processes and facilitates the standardisation of datasets to the OMOP vocabulary, simplifying the integration of diverse data sources.

Explore the docs »

Carrot Mapper is a webapp which allows the user to use the metadata (as output by WhiteRabbit) from a dataset to produce mapping rules to the OMOP standard, in the JSON format. These can be ingested by Carrot Transform to perform the mapping of the contents of the dataset to OMOP.

Carrot Transform transforms input data into tab separated variable files of standard OMOP tables, with concepts mapped according to the provided rules (generated from Carrot Mapper).

Quick Start

To have the project up and running, please follow the Quick Start Guide.

If you need to perform development, there's a brief guide here to get the tool up and running.

Formatting and Linting

This project is using ruff to check formatting and linting. The only dependency is the uv command line tool. The .vscode/tasks.json file contains a task to run this tool for the currently open file. The commands can be run on thier own (in the root folder) like this ...

# reformat all the files in `./`
λ uv run ruff format .

# run linting checks all the files in `./` 
λ uv run ruff check .

# check and fix all the files in `./`
λ uv run ruff check --fix .

# check and fix all the files in `./` but do so so more eggrsively
λ uv run ruff check --fix --unsafe-fixes .

SQLAlchemy Workflow

Carrot-Transform can read input tables from SQLAlchemy. This is experimental, and requires specifying a connection-string as --input-db-url instead of an input dir folder. The person-file parameter and carrot-mapper workflow should still be used, as if working with .csv files, but carrot-transform can read from an SQLAlchemy database.

Extract/export some rows from the various tables
- something like SELECT column_name(s) FROM patients LIMIT 1000; is written to patients.csv
the usual scan reports are performed on these subsets
when carrot-transform is invoked instead of --input-dir one specifies --input-db-url with a database connection string
- the --person-file parameter should still point to the equivalent of person_tablename.csv
- the --rules-file parameter needs to refer to a file on the disk as usual
carrot transform will still write data to --output-dir and otherwise operate as normal
- The following parameters have undefined behaviour with this functionality
  - --write-mode
  - --saved-person-id-file
  - --use-input-person-ids
  - --last-used-ids-file

Release Procedure

To release a new version of carrot-transform follow the steps outlined on the documentation website.

License

This repository's source code is available under the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
.github/workflows		.github/workflows
.vscode		.vscode
carrottransform		carrottransform
images		images
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Streamlined Data Transformation to OMOP

Quick Start

Formatting and Linting

SQLAlchemy Workflow

Release Procedure

License

About

Uh oh!

Releases 10

Packages

Uh oh!

Uh oh!

Contributors 9

Uh oh!

Languages

License

Health-Informatics-UoN/carrot-transform

Folders and files

Latest commit

History

Repository files navigation

Streamlined Data Transformation to OMOP

Quick Start

Formatting and Linting

SQLAlchemy Workflow

Release Procedure

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Uh oh!

Uh oh!

Contributors 9

Uh oh!

Languages

Packages