Stop Wasting Time on Databricks Deployments: Master Asset Bundles Today

Written by Jordan Witcombe | Apr 25, 2025 3:49:58 PM

In the fast-paced world of data engineering, efficiency and coherence are paramount. Databricks Asset Bundles (DABs) have emerged as a powerful tool to streamline the deployment and management of Databricks projects. However, to truly harness their potential, teams must adopt best practices that ensure smooth, simultaneous development and deployment. In this blog, we’ll discover the top strategies for leveraging Databricks Asset Bundles effectively, ensuring your team stays on track and your projects remain scalable and maintainable.

What Are Databricks Asset Bundles?

Databricks Asset Bundles (DABs) are a structured way to package and deploy Databricks resources, including notebooks, workflows, and configurations. They enable teams to version-control their projects, automate deployments, and maintain consistency across environments. Think of them as a blueprint for your Databricks projects, ensuring that every piece of the puzzle fits together seamlessly.

Forget long complex CI/CD pipelines, deploy project code by running one command. Simply run `databricks bundle deploy` to deploy notebooks, pipelines, workflows and ml models to databricks.

And that’s it, all your project’s code will be deployed to Databricks.

Best Practices for Using Databricks Asset Bundles

1. Ensure Naming of Environments Makes Sense

A key design pattern in Data Engineering projects is to use at least two environments, Development and Production (with maybe an extra Staging/Test/UAT). By default, when creating a DAB by using the CLI running a `databricks bundle init` command. There will be a target created called ‘dev’ that looks like so:

The issue here, is that we may expect the target named ‘dev’ to correspond to the development branch and host the code for our development release. However, by default this is not the case.

The "mode:development" in DABs prepends resource names with a dev prefix including the developers name, allowing deployments of the same code for each team member. This is not what we want to happen for the code corresponding to the development branch.

In contrast, a “mode:production” enforces stricter validations and typically uses service principals for deployments, so there is one code deployment per target. Developers cannot directly publish to these targets.

A way to overcome this is to define a “user” target, which acts as a predevelopment environment. This target should be marked as `mode:development` in the YAML and all other targets set to production. This means that we can deploy our development code to mirror the development branch in source control through CI/CD. I’ve seen this confuse many people. Just think as mode:production as deploying through a service principal. So instead of the above, you should have something that looks more like:

2. Use Environment-Specific Configurations

As we witness above, you can leverage environment-specific variables and configuration inside of the Databricks.yml file, managing configurations for different deployment environments in one place. This ensures that your code behaves consistently across development, staging, and production environment, whilst minimising code changes. Enable automated pipeline triggers in production environments whilst keeping development environments ad hoc.

Key Configurations to Include:

Cluster configurations
Databricks workspace URLs
API tokens
Resource paths (e.g., S3 buckets, Delta tables)
Triggers (Paused/Unpaused)
Trigger frequency

Remember that defining any YAML inside of the Databricks.yml file will overwrite whatever is written in the underlying files. Which enables us to overwrite with environment specific configuration.

Find out more about parameterising asset bundles on my previous blog here.

3. Version Control Everything

Version control is crucial for Databricks Asset Bundles. Use Git (and GitFlow for efficient branching) to track changes, enable rollbacks, and facilitate collaboration. Two main strategies exist:

Single Repository: Manage multiple DABs within one repo. This simplifies dependency management for related projects but can lead to larger repo size and potential merge conflicts. Thorough testing is essential.
One Repository per Bundle: Each DAB gets its own repo. This improves isolation and simplifies version control but might lead to duplicated resources if bundles share common elements.

The best choice depends on your project's structure: a single repo for tightly coupled workflows, separate repos for independent ones.

4. Leverage DABs for Enhanced Code Reviews

Arguably the biggest benefit of using DABs is the ability for developers to easily deploy asset bundles with one line. Use this feature of DABs to transform code reviews by enabling easy, isolated deployments. Simply run `databricks bundle deploy` on the feature branch in review to deploy a complete, named instance of your changes. This allows for:

Faster Feedback: Identify and resolve issues quickly.
Reduced Integration Problems: Thorough testing minimizes merge conflicts.
Improved Collaboration: Everyone works with the same deployed codebase.

Make this a standard part of your pull request process for better code quality and team alignment.

5. Avoid Direct Edits to Deployed Code: Why It's Risky

A common request I get working with clients new to DABs is:

“Can I make edits directly in deployed Databricks notebooks or workflows? This feels faster than going through a DAB deployment."

Often coupled with... "It's only a small change..."

Although it's tempting to quickly fix issues directly in deployed Databricks notebooks. However, this seemingly minor shortcut introduces significant risks:

Version Control Bypass: Changes aren't tracked, making rollbacks and collaboration difficult.
Increased Error Risk: Overwriting or losing code becomes more likely, especially in collaborative environments, with deployments overwriting code.
Debugging Nightmares: Inconsistencies between environments make troubleshooting extremely challenging.

Always develop and test in your local environment or a version-controlled system before deploying via Databricks Asset Bundles. This ensures traceability, consistency, and maintainability.

6. Automate Deployments with CI/CD Pipelines

Integrate Databricks Asset Bundles into your CI/CD pipeline to automate deployments. Tools like GitHub Actions, Jenkins, or Azure DevOps can be used to trigger deployments whenever changes are pushed to specific branches.

Sample CI/CD Workflow:

Push changes to the dev/main branch.
CI/CD pipeline validates the code and runs tests, blog here on automated DABs testing.
Pipeline deploys the bundle to the target environment (dev/prod) by running a `databricks bundle deploy` command as part of the CI/CD.

By integrating the DAB release into your existing CI/CD we shorten the development cycle for a developer. But we also ensure on a project level that code is being tested, and we can ensure the requirements for code quality are upheld throughout the Databricks workspace.

Closing Thoughts

When multiple developers are working on the same Databricks project, conflicts and inconsistencies can arise. Here’s how to avoid them:

Set a Clear Code Workflow: Use Databricks Repos to sync notebooks and code with your version control system and keep code stored within Databricks. Or instead use external IDEs like VS Code to ensure everyone is working with the latest version of code. Set these workflows in place and stick to them. Issues arise when teams are not aligned on where development of code should take place.
Define Clear Ownership: When working in asset bundles it can be easy for code to be overwritten when deploying especially through CI/CD. Assign specific modules or workflows to individual developers to minimise overlap.
Use Feature Branches: Encourage developers to work on feature branches and merge changes only after thorough testing and review – always deploying the feature branch through DAB to test.

Remember, the key to success lies in consistency, automation, and clear communication. Start implementing these best practices today and watch your Databricks projects thrive!

If you want to understand how to speed up your teams Databricks deployments, reach out to us!

View full post