Spark Kindling Framework

Version: 0.2.1 Platforms: Microsoft Fabric, Azure Synapse Analytics, Databricks

Overview

Spark Kindling Framework is a comprehensive solution for building robust data pipelines on Apache Spark, specifically designed for cross-platform solutions, via notebook or python-first development. It provides a declarative, dependency-injection-driven approach to defining and executing data transformations while maintaining strong governance and enabling robust observability.

Key Capabilities

Multi-Platform Support - Unified API across Fabric, Synapse, and Databricks
Data Apps & Job Deployment - Package and deploy apps as Spark jobs
Hierarchical Configuration - Platform, workspace, and environment-specific configs
Extensibility - Plugin system for custom telemetry and integrations
Enterprise Observability - Built-in logging, tracing, and Azure Monitor integration

Documentation

Core Features

Introduction - Framework overview and architecture
Data Entities - Data entity management system
Data Pipes - Transformation pipeline system
Entity Providers - Storage abstraction system
Setup Guide - Installation and configuration

Advanced Features

Job Deployment - Deploy apps as Spark jobs
Hierarchical Configuration - Multi-level YAML config system
Logging & Tracing - Observability foundation
Watermarking - Change tracking and incremental processing
File Ingestion - Built-in file ingestion capabilities
Stage Processing - Pipeline stage orchestration

Platform & Development

Platform API Architecture - Multi-platform abstraction
Platform Storage Utils - Storage operations
Utilities - Common utilities and helper functions
Build System - Platform-specific wheel building
CI/CD Setup - Continuous integration and deployment
Testing - Unit, integration, and system testing

Core Modules

The framework consists of several modular components:

Dependency Injection Engine - Provides IoC container for loose coupling
Data Entities - Entity registry and storage abstraction
Data Pipes - Transformation pipeline definition and execution
Data Apps - Package and deploy apps as .kda archives
Job Deployment - Deploy apps as Spark jobs across platforms
Configuration System - Hierarchical YAML configuration (platform/workspace/environment)
Platform Services - Unified abstraction for Fabric/Synapse/Databricks
Watermarking - Change tracking for incremental processing
File Ingestion - File pattern discovery and loading
Stage Processing - Orchestration of multi-stage pipelines
Common Transforms - Reusable data transformation utilities
Logging & Tracing - Comprehensive observability features

Extensions

kindling-otel-azure - Azure Monitor OpenTelemetry integration
kindling-databricks-dlt - Databricks Delta Live Tables integration

Quickstart

# Import Kindling framework
from kindling.data_entities import *
from kindling.data_pipes import *
from kindling.injection import get_kindling_service

# Define data entity
@DataEntities.entity(
    entityid="customers.raw",
    name="Raw Customer Data",
    partition_columns=["country"],
    merge_columns=["customer_id"],
    tags={"domain": "customer", "layer": "bronze"},
    schema=customer_schema
)

# Create data transformation pipe
@DataPipes.pipe(
    pipeid="customers.transform",
    name="Transform Customers",
    tags={"layer": "silver"},
    input_entity_ids=["customers.raw"],
    output_entity_id="customers.silver",
    output_type="table"
)
def transform_customers(customers_raw):
    # Transformation logic here
    return customers_raw.filter(...)

# Execute pipeline
executer = get_kindling_service(DataPipesExecution)
executer.run_datapipes(["customers.transform"])

Key Features

Core Framework

Declarative definitions for data entities and transformations
Dependency injection for loose coupling and testability
Delta Lake integration for reliable storage and time travel
Watermarking for change tracking and incremental loads
Pluggable providers for storage, paths, and execution strategies
Observability through logging and tracing

Multi-Platform Support

Unified API across Microsoft Fabric, Azure Synapse, and Databricks
Platform abstraction - write once, deploy anywhere
Platform-specific optimizations via configuration
Smart platform detection - automatic platform identification

Data Apps & Deployment

Package apps as .kda archives (Kindling Data Apps)
Deploy as Spark jobs - automated job creation and execution
Cross-platform deployment - same app runs on all platforms
Artifact management - wheels, configs, and dependencies

Configuration System

Hierarchical YAML configs - settings, platform, workspace, environment layers
Auto-detection - platform and workspace ID discovery
Flexible overrides - bootstrap config for runtime changes
Multi-team support - workspace-specific configurations

Extensibility

Extension system - load custom packages via configuration
Azure Monitor integration - via kindling-otel-azure extension
Custom providers - implement your own storage backends
Signal/event system - blinker-based pub/sub for custom workflows

Requirements

Python 3.10+
Apache Spark 3.4+
One of: Microsoft Fabric, Azure Synapse Analytics, or Databricks

Dependencies

This framework builds upon several excellent open source projects:

Apache Spark - Unified analytics engine for large-scale data processing (Apache 2.0)
Delta Lake - Storage framework for reliable data lakes (Apache 2.0)
inject - Python dependency injection framework (Apache 2.0)
blinker - Python signal/event framework for pub/sub (MIT)
dynaconf - Configuration management for Python (MIT)
pytest - Testing framework (MIT)

Platform SDKs (auto-detected, only one required):

notebookutils / mssparkutils - Microsoft Fabric & Synapse (MIT)
dbutils - Databricks utilities (Databricks)

Extensions (optional):

azure-monitor-opentelemetry - Azure Monitor integration (MIT)
opentelemetry-api/sdk - OpenTelemetry tracing (Apache 2.0)

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Support

This is open source software provided without warranty or guaranteed support.

Commercial Support: Professional support, training, and consulting services are available from Software Engineering Professionals, Inc. Contact us at contact information.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Developed By

Software Engineering Professionals, Inc. 16080 Westfield Blvd. Carmel, IN 46033 www.sep.com

Acknowledgments

This framework was developed to solve real-world data processing challenges encountered across multiple enterprise engagements. We're grateful to our clients who have helped shape the requirements and validate the approach.

Note: This framework is maintained by SEP and used across multiple projects. If you're using this framework and encounter issues or have suggestions, please open an issue or submit a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.devcontainer		.devcontainer
.github		.github
docs		docs
examples		examples
packages		packages
runtime/scripts		runtime/scripts
scripts		scripts
tests		tests
.env.template		.env.template
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENT_DEVELOPMENT_GUIDE.md		AGENT_DEVELOPMENT_GUIDE.md
CONTRIBUTING.md		CONTRIBUTING.md
DISCLAIMER.md		DISCLAIMER.md
LICENSE		LICENSE
README.md		README.md
load_env.sh		load_env.sh
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spark Kindling Framework

Overview

Key Capabilities

Documentation

Core Features

Advanced Features

Platform & Development

Core Modules

Extensions

Quickstart

Key Features

Core Framework

Multi-Platform Support

Data Apps & Deployment

Configuration System

Extensibility

Requirements

Dependencies

Platform SDKs (auto-detected, only one required):

Extensions (optional):

Contributing

Support

License

Developed By

Acknowledgments

About

Uh oh!

Releases 5

Packages

Contributors 3

Uh oh!

Languages

License

sep/spark-kindling-framework

Folders and files

Latest commit

History

Repository files navigation

Spark Kindling Framework

Overview

Key Capabilities

Documentation

Core Features

Advanced Features

Platform & Development

Core Modules

Extensions

Quickstart

Key Features

Core Framework

Multi-Platform Support

Data Apps & Deployment

Configuration System

Extensibility

Requirements

Dependencies

Platform SDKs (auto-detected, only one required):

Extensions (optional):

Contributing

Support

License

Developed By

Acknowledgments

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Contributors 3

Uh oh!

Languages

Packages