Sunil Kuruba SunilKuruba

Hi, I'm Sunil

Senior Backend & Data Engineer | AWS Certified Data Engineer | Ex-Fivetran

I bring ~5 years of industry experience in data engineering and backend systems, designing and operating scalable, cloud-native, and high-performance data platforms used in production by enterprise customers. I recently completed an MS in Computer Science at the University of Illinois Chicago, where I specialised in data engineering, cloud computing, distributed systems, and big data technologies, building directly on my industry background.

I am an AWS Certified Data Engineer – Associate with strong hands-on experience building secure, cost-efficient, and high-throughput data pipelines using services such as Amazon S3, Glue, EMR, Kinesis, Redshift, Athena, and DynamoDB. My expertise spans ETL/ELT pipelines, data lake and lakehouse architectures, data governance, and real-time analytics at scale.

Previously, I worked as a Senior Software Engineer at Fivetran, collaborating in a startup environment across multiple teams within the data pipeline platform and contributing to both source connectors and destination writers. My work spanned API-based connectors as well as database connectors such as DynamoDB and MongoDB, focusing on scalability, correctness, and performance. I led the design of a high-performance DynamoDB incremental sync engine, achieving 15× faster syncs, and implemented MongoDB Change Streams–based CDC incremental syncing, delivering a 5× performance improvement. On the destination side, I worked on data warehouse writers, including BigQuery and Snowflake.

Beyond individual connectors, I authored and designed a reliability framework that was adopted across 10+ engineering teams, improving consistency and fault tolerance across the data pipeline platform. I also won multiple internal hackathons, delivering features focused on product improvements, developer productivity, and platform innovation. In addition, I mentored and onboarded interns through a structured training program and regularly participated in technical interviews for engineering roles.

I enjoy working on high-impact data infrastructure problems, building systems that are scalable, reliable, and cost-efficient from day one. I am currently open to full-time roles in data engineering, backend systems, and cloud infrastructure, particularly in fast-moving, product-focused teams.

Core Strengths

Data Engineering & Pipelines: Designing and implementing scalable ETL/ELT pipelines, schema evolution, data modeling, connector development, orchestration, and real-time data processing.
Distributed Systems & Processing: Apache Spark, Hadoop MapReduce, Apache Flink, Kafka, AWS Kinesis, and gRPC for processing large-scale datasets in both batch and real-time systems.
Cloud Platforms & Services:
- AWS: EC2, Lambda, S3, EMR, Glue, Step Functions, RDS, DynamoDB, Redshift, Athena, EventBridge, IAM, KMS, CloudWatch, CloudTrail.
- GCP: BigQuery, Compute Engine, Pub/Sub, Cloud Storage.
- Azure: Azure VM, Azure Blob Storage.
Databases & Warehousing: Experience with modern data warehouses and databases like Snowflake, BigQuery, Redshift, DynamoDB, MySQL, PostgreSQL, MongoDB, and SQL Server.
Programming & Backend Development: Proficient in Java (Advanced), SQL (Advanced), Scala, Python, C++, Shell Scripting; expertise in backend architecture, REST APIs, and service frameworks.
DevOps & Infrastructure: Skilled in Docker, Kubernetes, Terraform, GitHub Actions, CI/CD, observability tools like New Relic, CloudWatch, and SonarQube for maintaining robust production systems.
Data Quality & Governance: Ensuring data integrity and reliability through validation frameworks, governance practices, and monitoring across the data lifecycle.
Collaboration & Leadership: Onboarded interns with structured training, led design efforts, and contributed to engineering hiring processes.

Certification

AWS Certified Data Engineer – Associate

Verified on Credly
Demonstrates ability to design, build, secure, and maintain data analytics solutions on AWS that are efficient, scalable, and cost-optimized. Proficient in:

Data lake and lakehouse architecture
Real-time and batch data ingestion
Data transformation using Glue, EMR
Querying with Athena, Redshift
Secure access via IAM, encryption, and governance

Education

University of Illinois Chicago

Master of Science in Computer Science | Aug 2024 – Dec 2025

RV College of Engineering, Bengaluru

Bachelor of Engineering in Computer Science | Aug 2016 – May 2020

Experience

Senior Software Engineer

Fivetran · Bengaluru, India
Mar 2023 – Aug 2024 · 1 yr 6 mos

Redesigned and developed a new BigQuery data writer aligned with SQL-based writers, eliminating 90% of maintenance overhead.
Enhanced Warehouse Data Writer throughput by 30% by implementing multithreaded concurrent processing for split files.
Added support for JSON data types in BigQuery, ensuring seamless schema evolution and data compatibility.
Introduced partitioning and clustering in BigQuery writer to reduce customer costs by ~90% — a hackathon-winning optimization.
Led infrastructure improvements across distributed data pipelines and contributed to system-level performance gains.

Software Engineer 2

Fivetran · Bengaluru, India
Sep 2021 – Mar 2023 · 1 yr 7 mos

Engineered a high-performance DynamoDB connector with 15× speedup in incremental syncs.
Improved MongoDB connector using Change Streams to achieve 5× faster data ingestion with reduced latency.
Designed support for Azure CosmosDB for MongoDB API, expanding Fivetran’s connector catalog.
Built Data Preview functionality using the IES framework to simplify customer onboarding and demo experiences.

Software Engineer

Fivetran · Bengaluru, India
Jun 2020 – Aug 2021 · 1 yr 3 mos

Authored Isolated Endpoint Sync (IES) — a hackathon-winning framework now adopted by 500+ connectors and 10+ teams.
Built a public Shopify connector app with OAuth-based merchant onboarding, GraphQL extraction, and failover capabilities.
Enhanced Stripe connector with multithreading and connected accounts support for scale and fault-tolerance.
Developed an ETL connector for ADP REST APIs with complete ERD-based schema documentation.

Software Engineering Intern

Fivetran · Bengaluru, India
Jan 2020 – May 2020 · 5 mos

Built webhook-based incremental sync mechanism for Recharge connector, achieving a 10× increase in extract performance.
Benchmarked performance of full ETL pipelines using Snowflake, delivering optimization insights for production rollouts.
Contributed to multiple API-based connectors and gained hands-on experience with Fivetran’s connector lifecycle.

Projects

Mini Database Management System Internals Implementation

GitHub: View Project Implemented core database engine internals including page-based storage, buffer management, record storage, and B+ tree indexing as part of an academic DBMS project. Tech: C, Storage Manager, Buffer Manager (FIFO/LRU), Record Manager, B+ Tree, Valgrind

AWS vs GCP Data Pipeline Benchmarking

GitHub: View Project
Benchmarks real-time data pipelines on AWS and GCP using a common IoT workload. Evaluates performance, cost, and sustainability.
Tech: AWS Kinesis, GCP Pub/Sub, Lambda, Dataflow, Python

Visual Analytics and Interactive Dashboards for LinkedIn Postings

GitHub: View Project

Developed an interactive visual analytics platform analyzing 124K+ LinkedIn job postings to uncover trends in skill demand, salaries, geography, experience levels, and remote work. Built reproducible data pipelines in Python and designed linked dashboards using Altair/Vega-Lite, including geospatial salary maps, skill–salary–industry views, and embedding-based job similarity exploration using PCA and UMAP. Tech: Python, Pandas, Altair, Vega-Lite, PCA, UMAP, Jupyter, GitHub Pages

AWS Bedrock LLM Conversation API with Ollama

GitHub: View Project
Built a cloud-native conversational API using AWS Bedrock and Ollama for multi-turn LLM-based dialogue.
Tech: Scala, Akka HTTP, gRPC, AWS Lambda, Docker

Distributed Neural Network Training & Sentence Generation

GitHub: View Project
Built a Spark-based deep learning pipeline to train and generate text using DL4J and AWS EMR.
Tech: Scala, Apache Spark, DL4J, AWS EMR

Social-Aware Movie Revenue Prediction

GitHub: View Project
A machine learning pipeline that predicts movie box office revenue by combining traditional metadata (e.g., budget, genre, cast) with sentiment and emotion signals extracted from Reddit and YouTube.
Tech: Python, scikit-learn, NLP, Reddit & YouTube API, Data Visualization, EDA

Hadoop-based LLM Tokenization & Embeddings

GitHub: View Project
Created a distributed NLP pipeline using custom tokenizers and Hadoop MapReduce to generate text embeddings.
Tech: Scala, Hadoop, AWS EMR

Help Session Activity Management System

GitHub: View Project
Designed the backend data model for scheduling and managing academic help sessions between TAs and students.
Tech: SQL, Database Design, ER Diagram

Provide feedback

Saved searches

Use saved searches to filter your results more quickly