Installation

Docker

ProCogGraph is both a pipeline for analysis of structures and a database of cognate ligand-domain mappings. To get started, the easiest method, described below, is to run ProCogGraph in a Docker container - for installation instructions for the database on bare metal, and for running the Nextflow pipeline see the installation guide. As part of the installation process, the latest flat files are downloaded from Zenodo. These files are currently 175.9 MB, and so the download may take some time depending on your internet connection. The total size of the database once built is approximately 4GB - ensure you have sufficient disk space available before beginning the install.

NOTE: Currently, the NeoDash Docker image does not contain a build for arm based Mac devices. There is an open issue in NeoDash related to this, and until it is fixed by the developers, ProCogGraph cannot be setup via Docker on arm-based Mac devices. Therefore execution of Docker steps is limited to x86 Mac devices. ProCogGraph can still be installed directly on arm-based Mac devices by following the steps in the installation guide and using the web-hosted (by Neo4j) NeoDash web app.

Download and install Docker from the Docker website

Clone the ProCogGraph repository:

git clone bashton-lab/ProCogGraph
cd ProCogGraph

Run the setup script to download the latest flat files and create the necessary directories and Docker compose files if running on Linux/OSX:
```
./setup_docker_linux.sh
```
or for Windows (in Powershell with administrative access)
```
Set-ExecutionPolicy Unrestricted
./setup_docker_windows.ps1
Set-ExecutionPolicy Restricted
```
This script creates the necessary directories for setting up the database, downloads the latest flat files from Zenodo and produces two yaml files, one to build the database (run first time only) and one to run the database (run each time you want to start the database).
Run the build command:
```
docker compose -f compose-build.yml up
```
Run the database:
```
docker compose -f compose-run.yml up
```
After running the Docker Compose script, three containers are started, one for the Neo4j database, one for the NeoDash dashboard and an Nginx server which serves the iframe visualisations available within the dashboard. The database can be accessed by navigating to http://localhost:7474 in a web browser to access the neo4j browser tool or connecting to ProCogDash via localhost:5005. The compose-run.yml file can be modified to specify memory allocation for the Neo4j database, which can be adjusted as necessary for your system. Currently, these are not set by the install script, and so will operate with the memory configured in docker. To adjust these parameters add the following lines to the environment section of the compose_run.yaml file:
```
  - NEO4J_server_memory_heap_initial__size=3600m
  - NEO4J_server_memory_heap_max__size=3600m
  - NEO4J_server_memory_pagecache_size=2g
  - NEO4J_server_jvm_additional=-XX:+ExitOnOutOfMemoryError
```
Access the dashboard. The ProCogDash dashboard is built using NeoDash, a Neo4j plugin. The dashboard can be accessed by connecting to a running instance of the database in Docker at localhost:5005. The dashboard requires a username and password, which are set to neo4j and procoggraph by default.

Neo4j

Installation instructions for running the database on bare metal, rather than Docker, are described below.

Download the latest database flat files from Zenodo here and clone the ProCogGraph repository:

git clone Bashton-Lab/ProCogGraph
curl https://zenodo.org/records/13165852/files/procoggraph_flat_files_v1-0.zip?download=1 -o /PATH/TO/DATABASE_FLAT_FILES/procoggraph_flat_files_v1-0.zip
unzip /PATH/TO/DATABASE_FLAT_FILES/procoggraph_flat_files_v1-0.zip

Download and install Neo4j community edition from the Neo4j website. The database was built using Neo4j version 5.

Copy the build script from the repository to the Neo4j database directory (e.g. neo4j-5.4.0) and the database flat files to the import directory:

cp -r /PATH/TO/PROCOGGRAPH_REPOSITORY/nextflow/bin/import_neo4j_data.sh /PATH/TO/NEO4J_DATABASE/
cp -r /PATH/TO/DATABASE_FLAT_FILES/* /PATH/TO/NEO4J_DATABASE/import/

Run the build script:

    cd /PATH/TO/NEO4J_DATABASE/
    ./import_neo4j_data.sh

Start the Neo4j database:
```
bin/neo4j start
```
Access the database by navigating to http://localhost:7474 in a web browser and update the default password (set to user neo4j and password neo4j by default).
Access ProCogDash via NeoDash. The dashboard can be loaded into Neodash by expanding the menu option in the bottom left of the screen, clicking the + icon and importing the dashboard from a JSON file. Upload the file from the repository at procogdash/dashboard.json.

ProCogGraph Pipeline

The ProCogGraph pipeline is built using Nextflow for workflow management. To run the pipeline, follow these steps:

The pipeline utilises data from a number of different sources to build the ProCogGraph database. To begin, prepare a data files directory with the following:

File	Description	Download
pdb_chain_enzyme.tsv.gz	Protein chain EC ID annotation from SIFTS for PDe structures.	SIFTS
assemblies_data.csv.gz	Assembly data for PDBe structures from PDBe-KB	PDBe-KB
enzclass.txt	Enzyme classification hierarchy	ExPASy
enzyme.dat	ENZYME database records	ExPASy
cath-names.txt	CATH domain names	CATH
cath-domain-description-file.txt	CATH domain descriptions	CATH
dir.des.scop.1_75.txt	SCOP domain descriptions	SCOP
dir.cla.scop.1_75.txt	SCOP domain classifications	SCOP
clan_membership.txt.gz	Pfam clan membership	InterPro
clan.txt.gz	Pfam clan descriptions	InterPro
interpro.xml.gz	InterPro domain annotations	InterPro
rhea-reaction-smiles.tsv	RHEA reaction smiles strings	RHEA
rhea2ec.tsv	RHEA to EC number mappings	RHEA
rhea-directions.tsv	RHEA reaction directions	RHEA
chebi_names.tsv.gz	ChEBI names	ChEBI
relation.tsv	ChEBI relations	ChEBI
ChEBI_Results.tsv	ChEBI records with database cross references to KEGG GLYCAN and KEGG COMPOUND, where a structure exists for the record, generated with advanced search function.	ChEBI
scop2-cla-latest.txt	SCOP2 domain classifications	EBI
scop2-des-latest.txt	SCOP2 domain descriptions	EBI
ccd.cif	Chemical Component Dictionary Structures	CCD
pubchem_substance_id_mapping.txt	PubChem substance ID mappings from PubChem search for KEGG data source.	PubChem

Clone this repository and install dependencies:

git clone m-crown/ProCogGraph
cd ProCogGraph
conda env create -f nextflow/envs/environment.yml

Preprocess RHEA reaction files:

cd /PATH/TO/DATA_DIR/
python3 preprocess_rhea.py --rhea_ec_mapping rhea2ec.tsv --rhea_reaction_directions rhea-directions.tsv --rd_dir rd/ --outdir . --chebi_names chebi_names.tsv.gz

Produce final manifest file of structures to be processed:

python3 download_mmcif.py --sifts_file /PATH/TO/DATA_DIR/pdb_chain_enzyme.tsv.gz --assemblies_file /PATH/TO/DATA_DIR/assemblies_data.csv.gz --chunk_size 100 --output_dir /PATH/TO/STRUCTURES_DIR

Run the nextflow pipeline:

To configure the nextflow pipeline, the nextflow.config file within the repository should be modified. A SLURM cluster profile, specific for the development of the pipeline within the Bashton Group at Northumbria, is included called 'crick'. The standard profile is designed for running the pipeline on a local machine, and is by default configured with a large amount of memory and CPU resources. This should be adjusted before running.

Four additional parameters must be set specific to the user's environment:
- params.data_dir - the path to the data directory created including data files described above.
- params.cache_in - the path to the cache directory for the pipeline, if pipeline has been run previously.
- params.output_dir - the desired output directory.
- params.manifest - the path to the manifest file created in step 3.
```
cd /PATH/TO/PROCOGGRAPH_REPOSITORY/nextflow
nextflow run main.nf -resume -profile standard
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Installation

Docker

Neo4j

ProCogGraph Pipeline

FilesExpand file tree

installation.md

Latest commit

History

installation.md

File metadata and controls

Installation

Docker

Neo4j

ProCogGraph Pipeline