Skip to content

[389] Add support for bio data ingest from postgres db#415

Closed
dbashford-NOAA wants to merge 12 commits intoOSOceanAcoustics:mainfrom
dbashford-NOAA:389_support_bio_data_ingest_from_db
Closed

[389] Add support for bio data ingest from postgres db#415
dbashford-NOAA wants to merge 12 commits intoOSOceanAcoustics:mainfrom
dbashford-NOAA:389_support_bio_data_ingest_from_db

Conversation

@dbashford-NOAA
Copy link
Copy Markdown

Resolves #389

  • Added a function to load bio data from a postgres db as an alternative to loading from a CSV
  • Uses psycopg package to connect to postgres

@leewujung
Copy link
Copy Markdown
Member

Quick recap from email exchanges between me and @dbashford-NOAA :

  • the current workflow has been tested (separately from the rest of the workflow) to pull data from the existing db in AWS set up by @aliciabillings-noaa
  • @dbashford-NOAA will create a mock db to simulate the connection to enable having tests for this new functionality

@dbashford-NOAA dbashford-NOAA self-assigned this Nov 6, 2025
@brandynlucca
Copy link
Copy Markdown
Collaborator

@dbashford-NOAA I see that you are running into similar issues with jupyter-book as I did when drafting #416 earlier this week. Jupyter Book put out a major release this week that made rendering the RTD pages a frustrating experience. The easiest fix is to pin jupyter-book==1.0.4.post1, which will allow the docs test to pass and doesn't require a complete overhaul of the configuration files. I believe you will also have to pin docutils>=0.18,<0.22 [ref], but that may have been the source of an unrelated error on my end.

@dbashford-NOAA dbashford-NOAA force-pushed the 389_support_bio_data_ingest_from_db branch from 6db6f63 to 0e8370f Compare November 7, 2025 19:03
@dbashford-NOAA
Copy link
Copy Markdown
Author

@brandynlucca Thanks for the help! I ended up pinning jupyter-book and docutils to resolve it. Definitely frustrating to troubleshoot.

@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@dbashford-NOAA dbashford-NOAA force-pushed the 389_support_bio_data_ingest_from_db branch from 33572ae to be85236 Compare November 21, 2025 20:05
@dbashford-NOAA
Copy link
Copy Markdown
Author

@leewujung @brandynlucca It looks like the tests are failing because of the psycopg dependency for my changes.
I've added it to requirements.txt but I only see requirements-dev.txt in the PR workflow.
Do I need to add psycopg somewhere else for the workflow?

@brandynlucca
Copy link
Copy Markdown
Collaborator

@leewujung @brandynlucca It looks like the tests are failing because of the psycopg dependency for my changes. I've added it to requirements.txt but I only see requirements-dev.txt in the PR workflow. Do I need to add psycopg somewhere else for the workflow?

You need to also add it to pyproject.toml.

@dbashford-NOAA dbashford-NOAA force-pushed the 389_support_bio_data_ingest_from_db branch 3 times, most recently from 1f7d2e9 to 6d492ae Compare November 25, 2025 19:44
@dbashford-NOAA
Copy link
Copy Markdown
Author

@leewujung @brandynlucca
These changes pass testing fine in my local dev environment but I'm having some trouble with passing for each platform.
Testing with a local postgres db has been the main part of the issue.
My first testing implementation was using testing.postgresql to setup a temporary db with test data which works fine in my local environment and passes testing on macos and linux but not windows because of permissions issues.

Execution of PostgreSQL by a user with administrative permissions is not permitted.
The server must be started under an unprivileged user ID to prevent possible system security compromises. 
See the documentation for more information on how to properly start the server.

There may be a way around this by running the windows tests in the workflow as a non-admin user but that could cause problems with other part of the process.

I also tried setting up a docker container service to run the postgres db in the workflow but that's only supported for linux so the tests failed for macos and windows.

Container operations are only supported on Linux runners

Another solution could be to have an external postgres db setup to test against but I'm not sure where that would live or how we would manage credentials.

Let me know if you have any ideas!

@leewujung
Copy link
Copy Markdown
Member

@dbashford-NOAA : this is interesting - looks like this action would be helpful? https://github.com/marketplace/actions/setup-postgresql-for-linux-macos-windows

@dbashford-NOAA dbashford-NOAA force-pushed the 389_support_bio_data_ingest_from_db branch from 100b803 to 6c8ca04 Compare December 9, 2025 17:02
@dbashford-NOAA dbashford-NOAA marked this pull request as draft December 11, 2025 16:36
@dbashford-NOAA dbashford-NOAA force-pushed the 389_support_bio_data_ingest_from_db branch from 5e29792 to a280419 Compare December 11, 2025 17:10
@dbashford-NOAA
Copy link
Copy Markdown
Author

@leewujung
It took some testing to figure out but I got that action to work and fixed the issue.
It's passing all the tests and is ready for review now.

@dbashford-NOAA dbashford-NOAA marked this pull request as ready for review December 11, 2025 20:11
@leewujung
Copy link
Copy Markdown
Member

leewujung commented Dec 11, 2025

@dbashford-NOAA : Awesome! I am glad it worked out! Looking forward to reviewing this (and all the other reviews I owe you...).

Comment thread echopop/ingest/biological.py Outdated
Comment thread echopop/ingest/biological.py Outdated
Copy link
Copy Markdown
Member

@leewujung leewujung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @dbashford-NOAA: Thanks for the PR and making postgres tests in CI work!

I think this PR is almost ready to go, just have some consistency issues that I marked in inline comments.

In addition, how about changing the biodata loading function names so that it is clear where the data are from?

  • load_biological_data --> load_biological_data_excel
  • load_db_biological_data --> load_biological_data_database

@dbashford-NOAA
Copy link
Copy Markdown
Author

@leewujung
I've made those changes so it should be ready now!

Copy link
Copy Markdown
Member

@leewujung leewujung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbashford-NOAA : Great, thanks! Everything looks fine. I will merge this after the hackday tomorrow in case of breaking something due to the function name change for the bio data ingestion part.

@leewujung
Copy link
Copy Markdown
Member

For visibility: After the hack day we found out that the database that has been worked on was not the correct one, so more changes is needed on the database end. I'll hold on to reviewing / merging this until that is resolved. We have a meeting tomorrow (Jan 30) that will hopefully resolve this.

@ElizabethMPhillips
Copy link
Copy Markdown

Following up on this and after checking with Wu-Jung today, @dbashford-NOAA please ensure that the updated materialized views in the nwfsc_public_prod.acoustics schema match the previous views developed, and then go ahead and update the
code to access the nwfsc_public_prod.acoustics schema. After that is complete, we can test pulling the new, merged length view (that includes both length and specimen in one table)

@aliciabillings-noaa
Copy link
Copy Markdown
Collaborator

aliciabillings-noaa commented Feb 2, 2026 via email

@leewujung
Copy link
Copy Markdown
Member

leewujung commented Feb 19, 2026

Let's flesh this out at our meeting tomorrow. I thought we agreed to use
trimmed down versions of just catch and specimen (that includes all length
fish as well).

Yes - this is confirmed before/at the Feb 19 meeting.

From Feb 19 notes, these below can be added to the materialized view:

  • Will add SPECIES_CODE into the view
  • Can try to generate UID in the materialized view

Current UID: {SHIP}-{SURVEY_ID}-{SPECIES_CODE}-{HAUL_NUMBER}

Also @dbashford-NOAA : How are the views from the database coming along? Do you think you can finish this soon? There are recent codebase changes, so please merge the upstream to your PR. Since it's been so long and there are going to be new changes coming up from @brandynlucca to use the catch and specimen, to avoid confusion, I think it'll be better if you revert the function name load_biological_data_excel back to load_biological_data. Thanks.

@dbashford-NOAA
Copy link
Copy Markdown
Author

@leewujung
I've updated the ingest and testing code to not expect the length view but was waiting on the changes from @brandynlucca before finalizing it to make sure it'll be compatible. I don't think any further changes will take much time though.

I'll work on pulling in the upstream changes and I can revert the function name too.

@leewujung
Copy link
Copy Markdown
Member

Sounds good - thanks @dbashford-NOAA !

@leewujung
Copy link
Copy Markdown
Member

@dbashford-NOAA : Just a quick note that once #463 is merged you could go ahead to make the changes. That PR should include all the handling needed to use the single specimen file (combining the previous specimen and length files) plus the catch file for running the workflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add the functionality to ingest biological data directly from database views

5 participants