Databricks Serverless: Simplifying Compute in the Lakehouse

Mo Uddin

21 February, 2025

Overview

When working with data engineering, analytics, or machine learning workloads, infrastructure management often takes up more time than it should. Enter Databricks Serverless, a fully managed, on-demand compute environment designed to reduce overhead and allow you to focus on what really matters—working with data.

Whether you're running Serverless SQL Warehouses or the newly introduced Serverless Jobs (in preview), these compute options automatically scale with your workloads, minimising idle costs and making scaling infrastructure effortless.

Key Features of Databricks Serverless:

Automatic Scaling: Dynamically adjusts resources to meet the demands of your workloads.
Fully Managed: Say goodbye to configuring, scaling, or maintaining clusters.
Cost-Efficient: Only pay for what you use—perfect for intermittent or unpredictable workloads.

Why Use Serverless? Identifying the Best Fit

Serverless compute is ideal for workloads that are intermittent, require rapid scaling, or benefit from low operational overhead. Here are a few scenarios where Serverless shines:

Ad-Hoc Analytics
Running sporadic SQL queries? Serverless SQL Warehouses are ideal. They spin up quickly (under 10 seconds) and terminate when idle, ensuring cost-efficiency. In contrast, provisioned clusters take longer to start (less than 5 minutes) and require manual management, which can lead to higher costs for idle resources.
Scheduled ETL Pipelines
Automate your ETL processes with Serverless Jobs, which allocate resources just for the duration of the task and shut down automatically when the job completes.
Interactive Exploration
Need a quick session to analyse a dataset? Serverless Compute for Notebooks provides resources on-demand, so you can dive into data exploration without worrying about cluster setup.

What About Networking? Meet NCC

One of the standout features of Serverless compute is its support for Network Connectivity Configurations (NCC), which allow you to connect securely to your Azure resources. NCC ensures private, managed connectivity between Databricks Serverless compute and your data sources, such as Azure Data Lake Storage (ADLS).

How NCC Works

NCC leverages Azure Private Link to create managed private endpoints.
This ensures that all communication between Serverless compute and your Azure resources happens securely, without exposing traffic to the public internet.

Limitations to Consider

No On-Prem Connectivity: NCC cannot directly connect to on-premises systems. If this is a requirement, consider using tools like Azure Data Factory for data replication.
Azure-Native Only: NCC is tailored for Azure services (ADLS, Blob Storage, Cosmos DB). Connecting to non-Azure services will require alternative solutions.

Pro Tip: If your workflows depend on on-premises data sources, explore hybrid solutions like Azure Arc to bridge the gap.

Managing Dependencies Without Init-Scripts

A common question we hear is: What happens to init-scripts in a Serverless world? In traditional clusters, init-scripts allow you to pre-configure environments. Serverless removes this level of access, but there are some great alternatives:

Notebook-Scoped Libraries
Use %pip install to dynamically install Python libraries directly within your notebooks.

For example: %pip install /dbfs/FileStore/wheels/my-library-0.1-py3-none-any.whl

Jobs API
For tasks outside of notebooks, define library dependencies dynamically in your job configuration.

_{

_{"libraries": [}

_{{ "pypi": { "package": "pandas" } },}

_{{ "whl": "dbfs:/path/to/my-library.whl" }}

_]

_}

Centralized Management with Unity Catalog
For larger teams, manage shared resources and dependencies centrally using Unity Catalog, ensuring consistency and governance.

Hydr8 and Serverless

At Advancing Analytics, we’ve been exploring how Serverless fits into our Hydr8 framework. While Serverless simplifies infrastructure management, transitioning from traditional clusters requires some adjustments:

Replacing Init-Scripts: We now use notebook-scoped libraries and Jobs API for managing dependencies.
Adapting to Shared Clusters: Serverless shifts from single-use clusters to shared environments. Using Unity Catalog has been key to ensuring secure, multi-user access.
On-Prem Data: For on-prem data sources, we've explored solutions like Azure Data Factory and hybrid architectures to bring data into Azure for seamless processing.

Are There Drawbacks?

While Serverless compute is fantastic for many use cases, it’s not a one-size-fits-all solution. Here are a few limitations to keep in mind:

Logging and Monitoring: Spark logs and the Spark UI are not available.
External Data Ingestion: Because serverless compute does not support JAR file installation, you cannot use a JDBC or ODBC driver to ingest data from an external data source.
Limited Customization: No direct access to cluster-level configurations like Spark settings or init-scripts.
Higher Costs for Continuous Workloads: For always-on tasks, provisioned clusters might be more cost-effective.

Wrapping Up

Databricks Serverless is an exciting step toward simplifying data engineering and analytics workflows. By reducing the need for manual infrastructure management, it allows teams to focus on building value, not managing compute.

At Advancing Analytics, we see Serverless as a game-changer for dynamic, on-demand workloads. Whether you’re optimizing ETL pipelines, enabling ad-hoc analytics, or scaling exploratory analysis, Serverless offers flexibility, efficiency, and cost savings.

Ready to streamline your data engineering and analytics? Visit our Analytics page for more info or contact us today to learn how we can help you leverage data analytics for your business.

References

Topics Covered :

Serverless Computing Databricks

Author

Mo Uddin

Industries

SEGA Case Study

Services

Products

Resources

Company

Partners

Databricks Serverless: Simplifying Compute in the Lakehouse

Overview

Why Use Serverless? Identifying the Best Fit

What About Networking? Meet NCC

Managing Dependencies Without Init-Scripts

Hydr8 and Serverless

Are There Drawbacks?

Wrapping Up

References

Contact us

Find us

Industries

SEGA Case Study

Services

Products

Resources

Company

Partners

Databricks Serverless: Simplifying Compute in the Lakehouse

Overview

Why Use Serverless? Identifying the Best Fit

What About Networking? Meet NCC

Managing Dependencies Without Init-Scripts

Hydr8 and Serverless

Are There Drawbacks?

Wrapping Up

References

Like what you see? Share with a friend.