class AIEngineer:
def __init__(self):
self.name = "Shashibhushan Singh"
self.role = "AI Engineer — LLM + Backend"
self.location = "India 🇮🇳"
self.experience = ["AWS Cloud Engineer @ Amazon", "AI/LLM Engineer"]
self.focus = ["LLM Applications", "RAG Systems", "AI Agents", "Backend APIs"]
self.currently = "Building production-grade LLM backends"
self.learning = ["Fine-tuning (LoRA/QLoRA)", "Multi-agent systems", "MLOps"]
self.ask_me = ["LangChain", "FastAPI", "Vector DBs", "AWS", "Prompt Engineering"]
self.contact = "itshashi.io@gmail.com"
def __str__(self):
return "From cloud infrastructure to intelligent AI systems 🚀"Designed, deployed, and maintained scalable cloud infrastructure on AWS
- Architected multi-region, highly available systems on EC2, ECS, Lambda, and EKS
- Built automated CI/CD pipelines using CodePipeline, CodeBuild, and GitHub Actions
- Managed data infrastructure with S3, RDS, DynamoDB, and Redshift
- Implemented infrastructure-as-code with Terraform and AWS CloudFormation
- Set up observability stacks using CloudWatch, X-Ray, and Grafana
- Enforced security best practices: IAM roles, VPC design, KMS, GuardDuty
- Reduced infrastructure costs by optimizing Reserved Instances and Auto Scaling policies
Building production AI systems powered by large language models
- Designing RAG pipelines over enterprise document corpora with retrieval + reranking
- Building LLM agents with tool use, persistent memory, and multi-step reasoning
- Shipping async FastAPI backends with streaming (SSE/WebSocket) for real-time AI
- Integrating OpenAI, Anthropic Claude, and open-source models (Mistral, LLaMA)
- Implementing LLM observability with LangSmith and Langfuse
- Exploring LoRA/QLoRA fine-tuning for domain-specific model adaptation
📄 RAG Pipelines → Ingest docs · Chunk · Embed · Retrieve · Rerank · Answer
🤖 AI Agents → Tool use · ReAct loops · Multi-agent orchestration
⚡ LLM APIs → FastAPI · Streaming (SSE) · Async · Auth · Rate limiting
☁️ Cloud Infrastructure → AWS multi-region · IaC · Auto-scaling · Cost optimization
🧪 Eval Systems → RAGAS · LLM-as-judge · Regression testing
🔐 Safe AI → Guardrails · Prompt injection defense · Output validation
📦 Fine-tuning → LoRA · QLoRA · PEFT · Instruction datasets
End-to-end RAG pipeline over enterprise documents
- PDF/HTML ingestion → semantic chunking → Qdrant vector store
- Hybrid search (BM25 + dense) + Cohere reranking
- Streaming FastAPI backend with LangSmith tracing
- Deployed on AWS ECS + S3 with CloudWatch monitoring
FastAPILangChainQdrantOpenAIDockerAWS
Multi-tool LLM agent with persistent memory
- ReAct loop with web search, code execution, and DB query tools
- Long-term memory via Redis + embeddings
- Streamed responses over WebSocket
- Serverless deployment on AWS Lambda + API Gateway
LangGraphFastAPIRedisOpenAIPostgreSQLAWS
Unified API gateway across OpenAI, Anthropic, and open-source models
- Smart routing by cost, latency, and capability
- Semantic caching to cut costs by ~40%
- Per-user rate limiting and API key management
- Infra managed via Terraform on AWS ECS + ElastiCache
LiteLLMFastAPIRedisDockerTerraformAWS
AWS cost analysis and auto-optimization tool (from Amazon experience)
- Automated detection of idle EC2, oversized RDS, and unused Elastic IPs
- Scheduled Lambda jobs to snapshot and right-size instances
- Savings report dashboard via CloudWatch + QuickSight
PythonAWS LambdaBoto3CloudFormationCloudWatch
✅ AWS Cloud Engineer @ Amazon
└─ EC2 · ECS · Lambda · EKS · S3 · RDS · DynamoDB
└─ Terraform · CloudFormation · IAM · VPC · CloudWatch
✅ LLM Foundations
└─ Transformers · Attention · Tokenization · Embeddings
✅ LLM APIs & Prompt Engineering
└─ OpenAI · Anthropic · Few-shot · CoT · Structured outputs
✅ RAG Systems
└─ Chunking · Vector DBs · LangChain · LlamaIndex · Reranking
✅ FastAPI AI Backends
└─ Async · Streaming · WebSockets · Pydantic · Auth
🔄 Fine-tuning (LoRA / QLoRA) ← currently here
🔄 Multi-agent systems (LangGraph, CrewAI)
⬜ LLM inference optimization (vLLM, TensorRT-LLM)
⬜ MLOps at scale (BentoML, Kubeflow)
⬜ Multimodal models (vision + language)
I'm open to collaborating on LLM projects, AI tooling, cloud architecture, and backend systems.
Building something interesting? Let's talk.
"Cloud gives you the scale. AI gives you the intelligence. Together — that's the future."


