move postUpload logs to user's uploadDir#1318
Conversation
f754689 to
de26c09
Compare
de26c09 to
89238ac
Compare
There was a problem hiding this comment.
praise: Log files are moved to the $user's directory after spreadsheet upload has finished execution.
issue: In a clean staging directory when instantiating my staging environment, I got an error: An argument named create_random_password is not expected here in rds-instance.tf line 33, in module "db". This is probably caused by my environment using the latest Terraform AWS modules. The fix is to replace the out of date create_random_password variable with manage_master_user_password = false.
issue: When running sudo /home/centos/datasetUpload.sh to process xls spreadsheet, I get 3 errors in java.log:
[lily@ip-10-99-0-221 uploadDir]$ sudo /home/centos/datasetUpload.sh
[lily@ip-10-99-0-221 uploadDir]$ more java.log
java.io.FileNotFoundException: time.txt (Permission denied)
org.postgresql.util.PSQLException: The authentication type 10 is not supported. Check that you have configured the pg_hba.conf
file to include the client's IP address or subnet, and that it is using an authentication scheme supported by the driver.
at org.postgresql.core.v3.ConnectionFactoryImpl.doAuthentication(ConnectionFactoryImpl.java:403)
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:108)
at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:66)
at org.postgresql.jdbc2.AbstractJdbc2Connection.<init>(AbstractJdbc2Connection.java:125)
at org.postgresql.jdbc3.AbstractJdbc3Connection.<init>(AbstractJdbc3Connection.java:30)
at org.postgresql.jdbc3g.AbstractJdbc3gConnection.<init>(AbstractJdbc3gConnection.java:22)
at org.postgresql.jdbc4.AbstractJdbc4Connection.<init>(AbstractJdbc4Connection.java:32)
at org.postgresql.jdbc4.Jdbc4Connection.<init>(Jdbc4Connection.java:24)
at org.postgresql.Driver.makeConnection(Driver.java:393)
at org.postgresql.Driver.connect(Driver.java:267)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:247)
at Database.<init>(Database.java:32)
at Validation.<init>(Validation.java:80)
at Main.processExcel(Main.java:121)
at Main.main(Main.java:168)
**Begin: file 1 : GigaDBUpload_102203_GIGA-D-21-00197_hamster.xls in process...
java.io.FileNotFoundException: /tool/uploadDir/GigaDBUpload_102203_GIGA-D-21-00197_hamster.xls (Permission denied)
at java.io.RandomAccessFile.open0(Native Method)
at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243)
at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:163)
at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:145)
at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:87)
at Excel2Database.<init>(Excel2Database.java:134)
at Main.processExcel2Database(Main.java:43)
at Main.processExcel(Main.java:130)
issue: java.io.FileNotFoundException: /tool/uploadDir/GigaDBUpload_102203_GIGA-D-21-00197_hamster.xls (Permission denied) is probably caused by the spreadsheet xls file in /home/centos/uploadDir still belonging to lily user:
[centos@ip-10-99-0-221 ~]$ ls -lh uploadDir/
total 72K
-rwx------. 1 lily lily 72K Jul 10 06:36 GigaDBUpload_102203_GIGA-D-21-00197_hamster.xls
Could be fixed by adding chown centos:centos /home/centos/uploadDir/* after mv $uploadDir/* /home/centos/uploadDir/ in execute.sh.
issue: The org.postgresql.util.PSQLException: The authentication type 10 is not supported. Check that you have configured the pg_hba.conf error is probably caused by the ExceltoGigaDB tool requiring its lib/postgresql-9.1-901.jdbc4.jar to be replaced with a more recent jdbc driver. I managed to get spreadsheet upload working with RDS PostgreSQL 14 using postgresql-42.6.0.jar which I downloaded from https://jdbc.postgresql.org/download/ and selecting Java 8, 42.6.0 since we are using Java 8 in the excel upload tool container.
praise: The postUpload.sh script seems to be working fine:
[lily@ip-10-99-0-172 ~]$ sudo /home/centos/postUpload.sh 102203
[License]
All files and data are distributed under the CC0 1.0 Universal (CC0 1.0) Public
Domain Dedication (https://creativecommons.org/publicdomain/zero/1.0/), unless
specifically stated otherwise, see http://gigadb.org/site/term for more details.
[Comments]
[End]
Done with creating the README file for 102203. The README file is saved in file: /home/centos/uploadLogs/readme-102203.txt
All postUpload logs have been moved to: /home/lily/uploadDir
PostUpload jobs done!
[lily@ip-10-99-0-172 ~]$ ls uploadDir/
invalid-urls-102203.txt javac.log updating-file-size-102203.txt
java.log readme-102203.txt updating-md5checksum-102203.txt
|
Hi @kencho51 relevant email from @only1chunts: |
|
Hi @pli888, All the issues have been addressed. |
There was a problem hiding this comment.
Hi @kencho51
This is not complete review yet, but I wanted to put this out first before I continue tomorrow in order to save time
praise: code change looks good at first glance
praise: logo changes look fine
issue: change to rds-instance.tf is not complete and caused an error
$ terraform version
Terraform v1.5.3
on darwin_arm64
+ provider registry.terraform.io/hashicorp/aws v4.18.0
+ provider registry.terraform.io/hashicorp/external v2.2.2
+ provider registry.terraform.io/hashicorp/random v3.3.1
$ ../../../scripts/tf_init.sh --project gigascience/forks/rija-gigadb-website --env staging
You need to specify an AWS region: eu-west-3
Initializing the backend...
Initializing modules...
Initializing provider plugins...
- Reusing previous version of hashicorp/aws from the dependency lock file
- Reusing previous version of hashicorp/random from the dependency lock file
- Reusing previous version of hashicorp/external from the dependency lock file
- Using previously-installed hashicorp/aws v4.18.0
- Using previously-installed hashicorp/random v3.3.1
- Using previously-installed hashicorp/external v2.2.2
Terraform has been successfully initialized!
$ terraform plan
╷
│ Error: Unsupported argument
│
│ on ../../modules/rds-instance/rds-instance.tf line 33, in module "db":
│ 33: manage_master_user_password = false
│
│ An argument named "manage_master_user_password" is not expected here.
suggestion: It's better to revert the change made to rds-instance.tf.
That work is already part of @pli888's PR #1316, and if you look at that PR, you will see it's not just that line that need change, there are other changes to that file that need to be made together and tested properly.
I think that's not the purpose of your PR (especially, there's already additional changes to the frontend you have taken on this PR, plus the PostgreSQL jar file).
If we have the setup necessary for #1316, and we want to test this PR without getting into terraform errors, I reckon it should be easy enough to delete and recreate the ops/infrastructure/envs/staging directory or make another checkout of the codebase.
This reverts commit 874f355.
For info, part of my sentence is not correct.
And that's it, the rest is as usual. |
rija
left a comment
There was a problem hiding this comment.
Hi @kencho51,
praise: postUpload.sh script is now correctly copying its output in the user's uploadDir directory
praise: I was able to build a brand new AWS staging environment without errors
issue: the behaviour of datasetUpload.sh has regressed.
Before, when users upload a spreadsheet and the process failed they will know immediately without having to look at the log because they will still see the spreadsheet in the upload directory.
It also means that when they don't see the spreadsheet in the upload directory anymore, they know the run was successful.
I've just tried with a spreadsheet I knew will fail, but I didn't see it in lily's upload directory. After checking in centos user's upload directory I saw it there. It wasn't copied over to lily's after the run.
suggestion: do revert the change you made to execute.sh.
The line you didn't think was useful was there to allow the aforementioned feature to happen.
Admittedly, It's not a very elegant code (it very much look like I rushed that one), and you cannot be faulted for thinking it was an error, so If you can quickly come up with a better implementation for doing the same thing, please do so.
I'm attaching in a comment below the spreadsheet I used to reproduce the issue, if it helps reproducing the issue
|
Spreadsheet that would fail the datasetUpload.sh run for me: |
…m kencho51/update-jdbc-driver) The postgresql-9.1-901.jdbc4.jar has been replaced with postgresql-42.6.0.jar in order for the spreadsheet upload tool to work with PostgreSQL 14. Reviewed-by: @pli888, @rija Refs: gigascience/gigadb-website#1318
pli888
left a comment
There was a problem hiding this comment.
The org.postgresql.util.PSQLException: The authentication type 10 is not supported error has disappeared with the new PostgreSQL jdbc jar being used now by the consultant's spreadsheet upload tool when I upload a spreadsheet on my staging server. I also do not see java.io.FileNotFoundException error anymore either.
I approve this PR.
Pull request for issue: #1292 and #1317
This is a pull request for the following functionalities:
To fix #1292, this PR will move all the logs produced by the
postUpload.shin/home/centos/uploadLogsto/home/$user/uploadDir, so $user can access the logs in their own directory.To fix #1317, this PR will remove all logos of BGI and CNGB from the gigadb-website public pages.
How to test?
Describe how the new functionalities can be tested by PR reviewers
To access the postUpload logs
Spin up staging gigadb-website
Create a user account in the bastion server
Execute the datasetUpload.sh on normal spreadsheet
Execute the datasetUpload.sh on faulty spreadsheet
Execute postUpload.sh
To check the logos of BGI and CNGB have been removed from the gigadb website page's footer by going to:
To check the logos of BGI and CNGB have been removed from the gigadb job page's footer by going to:
The PR is at here: gigascience/gigascience.github.io#7.
How have functionalities been implemented?
Before the fix, the log files produced by the
postUpload.shwere stored in thehome/centos/uploadLogsafter the execution, where$userdoes not have the access permission.So, by moving all the logs to the
/home/$user/uploadDir/during the./potUpload.shprocess, individual user can get hold of the postUpload logs.Any issues with implementation?
Describe any problems with your implementation
Any changes to automated tests?
None.
Any changes to documentation?
None.
Any technical debt repayment?
The
datasetUpload.shwill intake xls file from theuploadDirand ingest the data into the database, and onlyjava.logandjavac.logwill be produced inuploadLogsdir,mv /home/centos/uploadDir/* $uploadDir/in line 28 seems to be redundant, so it is removed.Any improvements to CI/CD pipeline?
Describe any improvements to the Gitlab pipeline