🧬 Virtual Screening Data Pipeline

A computational drug discovery pipeline that automates the retrieval, processing, and analysis of chemical potential drug candidates. This project compares Natural Substrates vs. Synthetic Inhibitors to analyze binding affinity trends.

🚀 Key Features

Automated Data Retrieval: Fetches real-time chemical data (SMILES, Molecular Weight) from the PubChem PUG REST API.
Cheminformatics: Calculates key molecular descriptors like LogP (Lipophilicity) using RDKit.
Database Management: Stores structured data in a local SQLite database (results.db).
Virtual Screening Simulation: Simulates docking scores to model binding affinity.
Visualization: Generates a professional dashboard comparing Natural vs. Synthetic compounds.

🛠️ Tech Stack

Python 3
RDKit: For molecular descriptor calculation.
Pandas & Matplotlib: For data analysis and visualization.
SQLite: For lightweight relational database storage.
PubChem API: For chemical data sourcing.

📦 Installation

Clone the repository

git clone https://github.com/yourusername/virtual-screening-pipeline.git
cd virtual-screening-pipeline

Install dependencies
```
pip install -r requirements.txt
```

⚡ Usage

1. Run the Pipeline

Execute the main script to fetch data, calculate properties, and populate the database.

python3 main.py

Output: Creates results.db and exports natvssynt.csv.

2. Generate the Dashboard

Create the visualization suite to analyze the results.

python3 visualization.py

Output: Generates dashboard.png.

📊 Data Analysis

The pipeline analyzes two distinct groups:

Natural Substrates (e.g., Folic Acid, Dihydrofolate)
Synthetic Inhibitors (e.g., Methotrexate, Pemetrexed)

Hypothesis: Synthetic inhibitors are designed to bind more tightly (lower score) than natural substrates. Check the "Avg Score by Type" graph to verify!

📝 SQL Exploration

Want to query the database directly?

-- Find the top 3 strongest binders
SELECT name, score FROM screening_results ORDER BY score ASC LIMIT 3;

(See sql_queries.md for more examples)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
data		data
output		output
script		script
.DS_Store		.DS_Store
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧬 Virtual Screening Data Pipeline

🚀 Key Features

🛠️ Tech Stack

📦 Installation

⚡ Usage

1. Run the Pipeline

2. Generate the Dashboard

📊 Data Analysis

📝 SQL Exploration

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧬 Virtual Screening Data Pipeline

🚀 Key Features

🛠️ Tech Stack

📦 Installation

⚡ Usage

1. Run the Pipeline

2. Generate the Dashboard

📊 Data Analysis

📝 SQL Exploration

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages