Skip to content

Latest commit

 

History

History
202 lines (151 loc) · 6.18 KB

File metadata and controls

202 lines (151 loc) · 6.18 KB

A Comprehensive Protein-DNA Interface Generation Tool with Residue Propensity Map analysis

Welcome to the Protein<>DNA Interface Generation repository! 🚀


Table of Contents

  1. Introduction
  2. Repository Structure
  3. Workflow Stages
  4. Installation and Dependencies
  5. Usage
  6. Docker Usage
  7. Testing
  8. Contributing
  9. License

Introduction

This project offers a pipeline to:

  • Process multi-chain PDB files placed in the input/ folder.
  • Split them into chain-specific files in split_chain/.
  • Use Naccess to generate .asa, .rsa, and .int outputs for both the entire complex and individual chains in rsa/.
  • Produce final residue propensity maps and other interface analysis results in CSV format under interface/.

It leverages:

  • Python: Data parsing and scripting tasks.
  • Fortran: Performance-heavy computations.
  • Shell: Workflow automation.
  • Docker: Reproducible and consistent environment.
  • Snakemake: Automated workflow orchestration.

Repository Structure

Protein_DNA_Interface_Generation/
├── input/                  # Raw PDB or other input files
├── split_chain/            # Contains split chain PDB files
├── rsa/                    # Naccess outputs (.asa, .rsa, .int) for complex & chains
├── interface/              # Final residue propensity maps (CSV) & summary outputs
├── scripts/                # Python, Shell, Fortran scripts
├── docker/                 # Docker configuration and resources
├── Snakefile               # Main Snakemake workflow definition
└── README.md               # Project documentation (this file)
  • input/: Place your raw PDB files here to be processed.
  • split_chain/: Contains the individual chain-specific PDB files generated by the workflow.
  • rsa/: Holds the output from Naccess (e.g., .asa, .rsa, .int) run on both the entire complex and individual chains.
  • interface/: Stores final CSV results with residue-based interface metrics and any summary files.
  • scripts/: Key scripts for chain splitting, interface analysis, and more.
  • docker/: Docker setup to help package and run the entire pipeline in a container.

Workflow Stages

  1. Input Parsing
    • Reads .pdb files from input/.
  2. Chain Splitting
    • Splits each file by chain, outputting them to split_chain/.
  3. Naccess Runs
    • Computes accessible surface areas for both the complex and each chain, results go to rsa/.
  4. Interface Computation
    • Uses Naccess outputs to identify interface residues and compute relevant metrics.
  5. Results Aggregation
    • Final CSV files summarizing residue-based interface stats are written into interface/.

Installation and Dependencies

  1. Clone the Repository

    git clone https://github.com/mhtjsh/Protein_DNA_Interface_Generation.git
    cd Protein_DNA_Interface_Generation
  2. Install Dependencies

    • Python (3.7+ recommended)
    • Snakemake (install via pip or conda):
      pip install snakemake
    • Fortran Compiler (e.g., gfortran)
    • Shell (usually installed by default)
    • Docker (optional, but recommended for reproducible runs)
  3. Check Installation

    snakemake --version

    A valid Snakemake version (e.g., 7.x) should be displayed.


Usage

  1. Prepare Input

    • Place your raw .pdb files in input/.
  2. Run the Workflow

    snakemake --cores 1 --latency-wait 10

    This commands all the steps: splitting PDB files, running Naccess, and generating interface results.

  3. Customization (Optional)

    • Modify or add rules in the Snakefile.
    • Update any scripts in scripts/ to customize the pipeline.

Common Snakemake Options

  • Dry Run

    snakemake --cores 1 --latency-wait 10

    Shows the planned jobs without executing them.

  • Force All Steps

    snakemake --cores 1 --latency-wait 10

    Re-runs every rule ignoring cached results.

  • Workflow DAG

    snakemake --dag | dot -Tpng > dag.png

    Exports a directed acyclic graph (DAG) of the workflow.


Docker Usage

We provide a ready-to-use Docker image to facilitate a reproducible environment. Below are instructions to pull or build the image, and run the pipeline inside a container.

Pull the Pre-built Image (Recommended)

docker pull mhtjsh/protein-dna-interface

Build the Image Yourself (Optional)

git clone https://github.com/mhtjsh/Protein_DNA_Interface_Generation.git
cd Protein_DNA_Interface_Generation
docker build -t mhtjsh/protein-dna-interface .

Running the Container

Basic Run Using Example Data

docker run --rm -it mhtjsh/protein-dna-interface

Mounting Input and Output Folders

To process your own input PDB files and retrieve outputs on your host system:

docker run --rm -it \
  -v /home/mhtjsh/Protein_DNA_Interface_Generation/input:/app/input \
  -v /home/mhtjsh/Protein_DNA_Interface_Generation/output:/app/output \
  mhtjsh/protein-dna-interface
  • Ensure your local input/ directory has .pdb files before running.
  • Results will be placed in output/.
  • Adjust the local path (/home/mhtjsh/Protein_DNA_Interface_Generation) to your actual directory if needed.

Testing

  1. Manual Testing

    • Place a test PDB file in input/.
    • Run Snakemake or the Docker container, verifying outputs in split_chain/, rsa/, and interface/.
  2. Automated Testing

    • Create minimal test data and a test rule in the Snakefile or a CI configuration (e.g., GitHub Actions).

Contributing

All contributions are welcome! To contribute:

  1. Fork this repository.
  2. Create a new feature branch.
  3. Submit a pull request with your changes.

License

This project is distributed under an open-source license (e.g., MIT). See LICENSE for details.