Unleashing Power of Open Source: Databricks Transforms Data Engineering through Declarative Pipeline Framework

Published: 13 Jun 2025
Databricks is entering uncharted territory, open-sourcing its hallmark ETL framework to supercharge the entire Apache Spark community with lightning-fast pipeline builds.

Redefining the boundaries of data engineering, Databricks has made a power-play by open-sourcing its core Declarative ETL (Extract, Transform, Load) framework. The operative driver behind Delta Live Tables (DLT), now referred to as Apache Spark Declarative Pipelines, this framework will soon be accessible to the comprehensive Apache Spark community. This action underscores the company’s dedication to openness, while triggering a rivalry with Snowflake, which has recently launched its own Openflow service for data integration.

Databricks’ Declarative Pipelines endeavour to demystify data engineering by tackling three primary pain points; intricate pipeline authoring, manual operational overhead, and the necessity to keep separate systems for batch and streaming workloads. By utilizing Spark Declarative Pipelines, engineers can articulate what their pipeline should undertake using SQL or Python, leaving Apache Spark to manage the execution. This system trims inefficiencies by automatically tracking dependencies between tables and handling operational burdens such as parallel execution.

While this framework is about to be integrated into the Spark codebase, its championship has already been vouched for by thousands of enterprises. The Declarative Pipelines, built on the robust Spark Structured Streaming engine, allows teams to tailor pipelines to their specific latencies, making it an exceptional solution for a variety of data engineering tasks, from daily batch reporting to real-time streaming applications.