A next-generation ESG (Environmental, Social, Governance) analysis platform that leverages cutting-edge Retrieval-Augmented Generation (RAG) technology to revolutionize sustainability reporting and compliance analysis. Built for the 2025 regulatory landscape including CSRD, EU Taxonomy, and other evolving ESG frameworks.
| Framework | Coverage | Status |
|---|---|---|
| CSRD | Complete ESRS standards | ✅ Full |
| GRI | Universal Standards 2021 | ✅ Full |
| SASB | Industry standards | ✅ Full |
| TCFD | 2023 framework | ✅ Full |
| EU Taxonomy | Environmental objectives | ✅ Full |
| SEC Climate | Proposed rules | ✅ Full |
- Environmental: Climate, pollution, water, biodiversity, circular economy
- Social: Workforce, value chain, communities, consumers
- Governance: Business conduct, risk management, ethics
- Hierarchical Retrieval: Multi-stage document retrieval with re-ranking
- Query Decomposition: Break complex ESG questions into sub-queries
- Hybrid Search: Combines semantic and keyword search for optimal results
- Dynamic Context: Adaptive context window based on query complexity
- Framework Compliance: CSRD, GRI, SASB, TCFD, EU Taxonomy, SEC Climate
- Gap Analysis: Automated identification of compliance gaps and risks
- Best Practices: AI-powered recommendations based on industry standards
- Double Materiality: Support for impact and financial materiality assessment
- Microservices Architecture: Scalable, containerized services
- Multiple Vector Stores: ChromaDB, Pinecone, Qdrant support
- Comprehensive Monitoring: LangSmith integration with structured logging
- Production Security: API authentication, rate limiting, data encryption
- RESTful API: Comprehensive API with OpenAPI documentation
- Streaming Responses: Real-time response generation
- Batch Processing: Efficient bulk document processing
- Interactive UI: Modern Streamlit interface with analytics dashboard
┌─────────────────────────────────────────────────────────────┐
│ ESG Analysis Platform │
├─────────────────┬─────────────────┬─────────────────────────┤
│ Frontend UI │ FastAPI Core │ Background Services │
│ │ │ │
│ • Streamlit UI │ • RAG Service │ • Document Processing │
│ • REST API │ • Query Engine │ • Vector Indexing │
│ • Admin Panel │ • Auth & Rate │ • Monitoring & Logging │
│ │ Limiting │ │
├─────────────────┼─────────────────┼─────────────────────────┤
│ Vector Store │ LLM Services │ Data Storage │
│ │ │ │
│ • ChromaDB │ • OpenAI GPT-4 │ • PostgreSQL │
│ • Pinecone │ • Anthropic │ • Redis Cache │
│ • Qdrant │ • Claude │ • File Storage │
└─────────────────┴─────────────────┴─────────────────────────┘
# Clone the repository
git clone https://github.com/mominalix/ESG-Analysis-Using-Retrieval-Augmented-Generation-Engine.git
cd ESG-Analysis-Using-Retrieval-Augmented-Generation-Engine
# Set up environment variables
cp .env.example .env
# Edit .env with your API keys and configuration
# Start all services
docker-compose up -d
# Access the application
# API: http://localhost:8000
# UI: http://localhost:8501
# Grafana: http://localhost:3000 (admin/admin)# Create virtual environment
python -m venv .venv
source .venv/bin/activate # Linux/Mac
# .venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Set environment variables
cp .env.example .env
# Edit .env with your configuration
# Run the API server
uvicorn src.api.main:app --reload --host 0.0.0.0 --port 8000
# Run the Streamlit UI (optional)
streamlit run ui/main.py --server.port 8501ESG-Analysis-Platform/
├── src/ # Source code
│ ├── api/ # FastAPI application
│ │ ├── main.py # Main API application
│ │ ├── models.py # Pydantic models
│ │ └── routers/ # API route modules
│ ├── core/ # Core utilities
│ │ ├── config.py # Configuration management
│ │ ├── logging.py # Structured logging
│ │ └── exceptions.py # Custom exceptions
│ └── services/ # Business logic services
│ ├── llm_service.py # LLM provider management
│ ├── vector_store_service.py # Vector database management
│ ├── document_service.py # Document processing
│ └── rag_service.py # Advanced RAG implementation
├── sample_data/ # Sample ESG documents and frameworks
│ ├── frameworks/ # ESG framework documents
│ ├── policies/ # Sample company policies
│ └── reports/ # Example ESG reports
├── ui/ # Streamlit user interface
├── monitoring/ # Monitoring and observability
├── requirements.txt # Python dependencies
├── docker-compose.yml # Container orchestration
├── Dockerfile # Main application container
└── README.md # This file
Key configuration options in .env:
# LLM Configuration
OPENAI_API_KEY="your-openai-api-key"
DEFAULT_MODEL="gpt-4o-mini"
TEMPERATURE=0.1
# Vector Database
VECTOR_DB_PROVIDER="chroma" # chroma, pinecone, qdrant
CHROMA_PERSIST_DIRECTORY="./data/vector_db"
# Document Processing
CHUNK_SIZE=1000
CHUNK_OVERLAP=200
MAX_DOCUMENT_SIZE_MB=50
# Monitoring
LANGSMITH_API_KEY="your-langsmith-key" # Optional
ENABLE_TRACING=true- Best for: Local development, small to medium datasets
- Setup: No additional configuration required
- Storage: Local file system
- Best for: Production, large-scale deployments
- Setup: Requires Pinecone API key and index creation
- Storage: Cloud-managed
- Best for: Self-hosted production, hybrid deployments
- Setup: Included in docker-compose
- Storage: Self-managed
POST /api/v1/query
{
"question": "What are CSRD climate disclosure requirements?",
"esg_framework": "CSRD",
"search_strategy": "hybrid",
"k": 5,
"use_query_decomposition": false
}POST /api/v1/upload
Content-Type: multipart/form-data
- file: <ESG document>
- esg_framework: "CSRD"
- document_type: "policy"POST /api/v1/documents/search
{
"query": "climate risk assessment",
"esg_framework": "TCFD",
"k": 10
}GET /healthWhen running in development mode, access interactive API documentation at:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
# Upload CSRD framework document
curl -X POST "http://localhost:8000/api/v1/upload" \
-H "Content-Type: multipart/form-data" \
-F "file=@sample_data/frameworks/csrd/csrd_overview.md" \
-F "esg_framework=CSRD" \
-F "document_type=standard"
# Batch upload company policies
curl -X POST "http://localhost:8000/api/v1/documents/batch-upload" \
-H "Content-Type: multipart/form-data" \
-F "files=@sample_data/policies/environmental_management_policy.md" \
-F "esg_framework=Multi-Framework" \
-F "document_type=policy"curl -X POST "http://localhost:8000/api/v1/query" \
-H "Content-Type: application/json" \
-d '{
"question": "What are the key climate disclosure requirements under CSRD?",
"esg_framework": "CSRD",
"search_strategy": "hybrid",
"k": 5
}'curl -X POST "http://localhost:8000/api/v1/query" \
-H "Content-Type: application/json" \
-d '{
"question": "How do our environmental policies compare against CSRD requirements and what gaps exist in our climate risk reporting?",
"esg_framework": "CSRD",
"use_query_decomposition": true,
"k": 10
}'curl -X POST "http://localhost:8000/api/v1/query/stream" \
-H "Content-Type: application/json" \
-d '{
"question": "Provide a comprehensive analysis of TCFD climate risk disclosure requirements",
"esg_framework": "TCFD"
}'- Structured Logging: JSON-formatted logs with correlation IDs
- Performance Metrics: Response times, confidence scores, retrieval accuracy
- Health Checks: Service health monitoring and dependency checks
- Error Tracking: Comprehensive error logging and alerting
Configure LangSmith for advanced tracing:
export LANGSMITH_API_KEY="your-langsmith-key"
export LANGSMITH_PROJECT="esg-analysis-platform"
export ENABLE_TRACING=trueAccess monitoring dashboards at http://localhost:3000 (admin/admin):
- API Performance Metrics
- Document Processing Statistics
- Query Analysis Trends
- System Resource Usage
- API Keys: Use environment variables or secret management
- Authentication: Implement proper JWT or OAuth2
- Rate Limiting: Configure appropriate limits for your use case
- HTTPS: Enable TLS encryption for all external communications
- Data Privacy: Ensure compliance with GDPR and other regulations
add_header X-Content-Type-Options nosniff;
add_header X-Frame-Options DENY;
add_header X-XSS-Protection "1; mode=block";
add_header Strict-Transport-Security "max-age=31536000";import requests
# Initialize ESG Analysis client
class ESGAnalysisClient:
def __init__(self, base_url="http://localhost:8000"):
self.base_url = base_url
def query(self, question, framework=None):
response = requests.post(
f"{self.base_url}/api/v1/query",
json={
"question": question,
"esg_framework": framework
}
)
return response.json()
# Usage
client = ESGAnalysisClient()
result = client.query(
"What are the mandatory climate disclosures?",
framework="CSRD"
)# Webhook endpoint for document processing
@app.post("/webhooks/document-processed")
async def handle_document_processed(payload: dict):
# Handle completed document processing
document_id = payload["document_id"]
status = payload["status"]
if status == "completed":
# Trigger downstream processes
await trigger_compliance_analysis(document_id)- API Scaling: Multiple FastAPI instances behind load balancer
- Vector Database: Distributed vector stores (Pinecone, Weaviate clusters)
- Background Processing: Celery workers for document processing
- Caching: Redis clusters for response caching
- Chunking Strategy: Optimize chunk size and overlap for your use case
- Embedding Caching: Cache embeddings to reduce computation
- Query Caching: Cache frequent query results
- Async Processing: Leverage async/await for I/O operations
# Install development dependencies
pip install -r requirements.txt
pip install -e .
# Run tests
pytest tests/
# Code formatting
black src/
isort src/
# Type checking
mypy src/Built with ❤️ for a sustainable future