Research Data Engineer
Building tools for oceanographic data curation, metadata compliance, and FAIR data publication.
|
Slocum glider mission archive pipeline. Ingests IOOS Glider DAC segments, merges into CF-1.8 trajectory NetCDF, computes TEOS-10 derived variables, applies QARTOD QC, and produces archive packages with interactive maps, reports, and DataCite DOI metadata. |
ISO 19115-2 metadata compliance engine. XSD structure checks, Schematron policy rules, CF-1.8 and ACDD-1.3 attribute validation, YAML rules DSL for custom policies, FAIR self-scoring, and a FastAPI web dashboard. |
|
CTD cast processing pipeline. Reads Sea-Bird CNV files, applies TEOS-10 conversions via GSW, flags outliers, bins to standard depths, and exports CF-compliant NetCDF with full provenance. |
NDBC buoy observation ETL pipeline. Extracts Gulf of Mexico station data, transforms with unit normalization and QC flagging, loads into partitioned Parquet with DuckDB analytics. |
|
Reusable framework for building oceanographic data curation pipelines. Provides base classes for ingest, transform, validate, and publish stages with plugin architecture, checksum verification, and structured logging. |
|
| Category | Tools |
|---|---|
| Languages | Python, Bash, SQL |
| Ocean Data | xarray, netCDF4, GSW (TEOS-10), CF-conventions, ERDDAP |
| Standards | ISO 19115-2, ACDD-1.3, IOOS QARTOD, DataCite 4.4, FAIR |
| Data | Pandas, NumPy, DuckDB, Parquet |
| Web | FastAPI, Jinja2, Folium |
| Quality | pytest, ruff, GitHub Actions CI |
| XML/Schema | lxml, XSD, Schematron, XPath |