Automated testing is one of the most impactful changes you can make to improve your engineering function. Faster delivery, increased team efficiency, and reduction of risk are just some of the benefits of including automated tests as part of your workflow.
In a mature testing environment, a collection of automated tests should run each time a developer opens a pull request to merge their feature branch. These tests collectively offer a fast and mutually reinforcing method of ensuring the code will continue to work as expected. Teams are able to work at speed with confidence - making substantial changes to code with the security that frequently run tests will keep them on track.
One of the main challenges to adopt this approach has been the difficulty of establishing a framework that will run these tests as part of the code promotion process. Developers will often need knowledge of a number of additional tools to set up the testing framework outside of Databricks. This is particularly the case for analytical teams where, even though their pipelines are desperately in need of rigorous testing, the team may not have the skillset to establish testing.
Databricks Asset Bundles (DABs) are a compelling recent addition to the Databricks offering. Asset bundles provide a low-friction way for teams to adopt software engineering best practices with a familiar set of tools.
Asset bundles allow all aspects of a Databricks project to be managed programmatically. All resources, code, and configuration for a project is defined in the bundle. Underneath the hood, these asset bundles are leveraging the power of Terraform and the Databricks CLI. Critically, teams don't need to learn these tools and can instead use a combination of YAML files and their existing code to define their project. You can find more detail on Databricks Asset Bundles here: What are Databricks Asset Bundles.
Combining asset bundles with automated testing can supercharge your team's delivery and offers an accessible way for teams to adopt these best practices quickly.
Because asset bundles allow you to easily deploy and run your code in different locations, they offer a simple way to implement CI/CD. When a developer makes a pull request, the asset bundle is validated to ensure the configuration is correct:
databricks bundle validate
This offers a fast method to catch issues before running tests. If the bundle is valid then the project is deployed into our test workspace (or a test folder within the same workspace).
databricks bundle deploy -t test
In this fresh environment, the asset bundle can run unit tests and integration tests within Databricks.
databricks bundle run -t test integration_tests_job
Running inside Databricks like this is much simpler than trying to emulate the spark environment in a build agent but is also leads to more realistic tests.
If all tests pass, then the asset bundle can be deployed into our production workspace very simply:
databricks bundle deploy -t production
Combining automated tests with the easy deployment of resources offered by asset bundles means that we can quickly and simply take a big step up the data maturity ladder.