-
Notifications
You must be signed in to change notification settings - Fork 0
How to submit metadata and upload data using morphic util
morphic-util is a command line based tool that assists with data upload to MorPhic DRACC AWS S3 storage. To be able to use morphic-util you will need:
- Python3 installed on your computer
- Basic knowledge of how to navigate the command line and run commands in a terminal, e.g.
cd,ls,pwd - AWS username and password, provided to you by the DRACC team
If you are missing any of the prerequisites above, please contact the MorPhic DRACC helpdesk at helpdesk@morphic.bio or on the dedicated slack channel.
Check that you have Python 3 installed by opening a terminal and running the following command:
which python3It should return the location of your Python 3 installation e.g.
/path/to/bin/python3If nothing is returned it means you do not have Python 3 installed.
There are many tutorials online for installing python so please follow one that makes sense to you and suits your operating system or contact your system administrator for help. One example can be found here: https://realpython.com/installing-python/
Once you have installed Python 3, in your terminal run the following command to install morphic-util
pip3 install morphic-util
Once installed, you can see the available commands by typing the following command in your terminal.
morphic-util -h
If this succeeds continue to the configuration section.
Depending on your Python installation you may need to use sudo to install the tool in your system directory. If the above command gives a Permission denied error, try using the following command:
sudo pip3 install morphic-util
You will then be prompted for your system password. If this succeeds continue to the configuration section below.
You will need to configure the morphic-util tool the first time you use it. The configuration creates an AWS profile called morphic-util that will give you the appropriate permissions to upload to your upload area.
To configure your morphic-util tool, use the following command with the AWS username and password that you have obtained from your DRACC administrator.
morphic-util config AWS_USERNAME AWS_PASSWORD
This would look something like:
$ morphic-util config myUserName my#Pass#12345
Credentials saved.
Valid credentials
You can prepare your study metadata file in tsv, csv or json format. An example tsv file will look similar to this file
morphic-util submit --type study --file <local_path_to_your_study_metadata_file>
You will find your unique study ID in your response, please keep a note of your study ID as you will need it in your next step
Study created successfully: 664f1ef55a564312eb478177
A dataset doesn't need any metadata but if you want to provide metadata you can do so using the --file option and mentioning a file path that contains the dataset metadata
Passing the study ID while creating a dataset
morphic-util submit --type dataset --study 664f1ef55a564312eb478177
You will find your unique dataset ID in your response. Please note the dataset ID is automatically linked to the study ID
Dataset created successfully: 664f1fd35a564312eb478179
Linking dataset 664f1fd35a564312eb478179 to study 664f1ef55a564312eb478177
Dataset linked successfully to study: 664f1ef55a564312eb478177
Interactively creating and linking a dataset to a study
morphic-util submit --type dataset
Dataset created successfully: 664f20265a564312eb47817b
Do you want to link this dataset to a study? (yes/no): yes
Input study id: 664f1ef55a564312eb478177
Linking dataset 664f20265a564312eb47817b to study 664f1ef55a564312eb478177
Dataset linked successfully to study: 664f1ef55a564312eb478177
- Please note an upload area in Amazon S3 is now created with your dataset ID for uploading your data files.
Once configured, you need to select your upload area
using the dataset ID that you just created in the last step.
morphic-util select UPLOADAREANAME
For example:
$ morphic-util select 664f20265a564312eb47817b
Selected 664f20265a564312eb47817b
You are now ready to upload files to your upload area!
Once your upload area is selected you can use the upload command upload the files related to your project into the upload area. The command works by specifying either a path to a file, a space separated list of paths to files, or a path to a directory. Sub-directories of a provided directory path are ignored.
To upload a single file or space-separated list of files, specify the relative or absolute path to each file after the upload command. If files have spaces they must be escaped or enclosed in quotes:
morphic-util upload /path/to/file/file1.txt "/path/to/file/file 2.txt"
This could look something like:
$ morphic-util upload /path/to/file/sample1_R1.fastq.gz /path/to/file/sample1_R2.fastq.gz
Uploading...
/path/to/file/sample1_R1.fastq.gz 2845965046 / 2845965046.0 (100.00%)
/path/to/file/sample1_R2.fastq.gz 2845965046 / 2845965046.0 (100.00%)
$ morphic-util upload /path/to/file/sample1_R1.fastq.gz "/path/to/file/dissociation protocol.pdf" /path/to/file/enrichment\ protocol.pdf
Uploading...
/path/to/file/sample1_R1.fastq.gz 2845965046 / 2845965046.0 (100.00%)
/path/to/file/dissociation protocol.pdf 354 / 354.0 (100.00%)
/path/to/file/enrichment protocol.pdf 354 / 354.0 (100.00%)
Successful upload.
To upload all files in a directory, specify the path to the directory or use the . operator to upload all files in your current working directory:
morphic-util upload .
This would look something like:
$ morphic-util upload .
Uploading...
sample1_R1.fastq.gz 2845965046 / 2845965046.0 (100.00%)
sample1_R2.fastq.gz 2845965046 / 2845965046.0 (100.00%)
sample2_R1.fastq.gz 2845965046 / 2845965046.0 (100.00%)
sample2_R2.fastq.gz 2845965046 / 2845965046.0 (100.00%)
dissociation protocol.pdf 354 / 354.0 (100.00%)
enrichment protocol.pdf 354 / 354.0 (100.00%)
To check if all the files you expected to upload are present in your upload area use the list command
This should look something like:
morphic-util list
664f20265a564312eb47817b/sample1_R1.fastq.gz
664f20265a564312eb47817b/sample1_R2.fastq.gz
664f20265a564312eb47817b/sample2_R1.fastq.gz
664f20265a564312eb47817b/sample2_R2.fastq.gz
664f20265a564312eb47817b/dissociation protocol.pdf
664f20265a564312eb47817b/enrichment protocol.pdf
By default the upload command won't upload files that have the same name as files already present in the upload area. If you do need to overwrite an uploaded file with a file of the same name, you will need to use the -o flag. For example:
$ morphic-util upload -o /path/to/file/sample1_R1.fastq.gz
Uploading...
/path/to/file/sample1_R1.fastq.gz 2845965046 / 2845965046.0 (100.00%)
Notes:
* If you change your mind and wish to cancel the upload hit ctrl + c to cancel the upload.
* If there are sub-directories within a folder these will be ignored so please ensure all files to be uploaded are within provided path.
* If there are file names with space(s) these should be quoted for any command, for example morphic-util upload 'a file name' 'another file'.
If you have any issues with uploading files or using the morphic-util tool, or wish to discuss more options for transfer of data, please contact the MorPhic DRACC helpdesk at helpdesk@morphic.bio or on the dedicated Slack channel.
Periodically there will be updates to the tool to fix bugs and release new features. The latest version of the tool can be installed using the upgrade command.
$ pip3 install --upgrade --no-cache morphic-util
Successfully installed morphic-util-0.0.2
We suggest using the no-cache flag in order to avoid issues relating to storing old packages.
-
Contributor guides
- MorPhiC Data Contribution Guide
-
How to upload data using morphic util
(Legacy AWS-based workflow) -
How to upload data through Globus
(Draft – new Globus-based submission workflow)