Introduction
Any data platform must be designed with security in mind: both protecting from outside attacks and restricting who can see what data. The requirements may be complex and granular: requiring permissions down to the row- and column-level based on individual users.
The great news is that all of this can be handled within Databricks’ Unity Catalog; no matter whether the data is structured, semi-structured or unstructured. You don't have to worry about defining permissions on both the storage accounts and your Unity Catalog objects: allow fine-grained access control through Unity Catalog only and lock down those underlying storage accounts. Read on to learn more.
Unity Catalog TL;DR
Unity Catalog is Databricks' data governance solution. It is open source and unified across the full Databricks platform, and it is the default choice for all Databricks accounts. Unity Catalog’s security model is based on standard SQL, so permissions can be granted and denied using familiar syntax; or you can use the Catalog Explorer UI. Unity Catalog spans workspaces, with one metastore per account per region. For more information check out the documentation.
Implementing Fine-Grained Access Control
At a very high level, to implement a row filter or a column mask in a Unity Catalog registered table you must:
- Register a SQL User-Defined Function with Unity Catalog
- Assign this function to the table
- This can ONLY be done by the owner of the table
Both row filters and column masks are functions that evaluate at runtime and it is therefore important to consider the compute impact of your logic. We give some advice in the Requirements and Recommendations section.
In the sections below we will demonstrate the results with a "user" table of data generated by the Faker Python package. To learn more check out our blog all about the package! Here is an excerpt of the table; one of the rows within the table corresponds to my own email address.
Id | Name | DOB | Address | |
1 | Justin Ortiz | justinortiz@sandoval.com | 2006-06-11 | 989 Karenfort |
2 | Michael Mayer | michaelmayer@wagner.com | 1939-02-20 | 767 East Justin |
3 | Dawn Robbins | dawnrobbins@estrada.com | 1958-09-03 | 7787 James Court |
4 | Brandon Mueller | brandonmueller@maddox.com | 1965-05-04 | 416 Rebekahport |
5 | Dakota Wade | dakotawade@fisher.com | 1979-01-03 | 889 Stevenland |
6 | Lucy Budge | lucybudge@domain.com | 1981-06-23 | 6843 Glover Passage |
User-Defined Functions in Unity Catalog
Don’t worry, we will get to the fine-grained access control shortly, however we will first cover user-defined functions in Unity Catalog.
"User-Defined Function" (UDF) is a generic term across languages and platforms, and is self-explanatory. A SQL UDF is registered with Unity Catalog using the CREATE FUNCTION syntax. For example,
CREATE FUNCTION user_filter(email STRING)
RETURN IF (is_account_group_member('metastore_admins'), TRUE, email == current_user())
Which will return TRUE if the user is a member of the "metastore_admins" group or if the value of the email input matches their email address. There are some useful built-in functions that can be used within your UDF for determining what the current user is able to access:
-
current_user() returns the email address of the current user
-
is_account_group_member() will return TRUE if the user is a member of the specified account-level group.
Row Filters
A row filter defines which rows are returned when a user runs a SELECT query on a table. Any data which is filtered out will be completely invisible to the user; they will not be able to discover whether those rows exist or how many there are.
The basic goal of a UDF for a row function is to decide whether to return the row or not: it is a true/false outcome. Perhaps you want users to see their own data only, but also for a certain group of users to have access to the full data. Using the UDF from above, we can then our function to a table, either on creation:
CREATE TABLE user (
Id INT,
Name STRING,
Email STRING,
DOB DATE,
Address STRING
)
WITH ROW FILTER user_filter ON (Email)
or afterwards using ALTER TABLE:
ALTER TABLE user
SET ROW FILTER user_filter ON (Email)
It is also possible to set a row filter through the Catalog Explorer UI. We first navigate to our "user" table and select "Add filter":
Then we select the previously created function, and choose which column is passed in to the email parameter:
A table can have at most one row filter assigned. Once the filter has been applied, when I run SELECT on the user table I get the below results:
UserId | Name | DOB | Address | |
6 | Lucy Budge | lucybudge@domain.com | 1981-06-23 | 6843 Glover Passage |
I am not a member of the "metastore_admins" group, and so I see only the row corresponding to my email.
To remove a row filter run
ALTER TABLE user DROP ROW FILTER;
Or remove it through the Catalog Explorer UI.
Column Masks
A column mask hides sensitive data by controlling what value is shown to a user when running a query. A user will be able to see that the column exists in the table, but they will see the values as those returned by the mask, not necessarily the true values.
A column mask UDF must take the value of the column in each row as an input and return a value of the same type. You can use additional columns as input to the UDF to be used in the logic.
CREATE FUNCTION mask_pii_string(pii STRING, email STRING)
RETURN CASE WHEN is_member('metastore_admins') OR email == current_user()
THEN pii
ELSE 'REDACTED'
END;
This function can then be set as the column mask, again either on creation of the table:
CREATE TABLE user (
Id INT,
Name STRING,
Email STRING,
DOB DATE,
Address STRING MASK mask_pii_string USING COLUMNS (Email)
)
or using the ALTER command:
ALTER TABLE user
ALTER COLUMN Address
SET MASK mask_pii_string USING COLUMNS (Email)
Similarly to the row filter, a column mask can also be applied through Catalog Exporer; from the table object this time we choose "Add column mask" in the row for the relevant column:
We then choose the function and select the column to be passed to the email parameter:
Each column can have at most one column filter applied. Once the above mask has been applied (and no row filters exist on the table), when I run SELECT on the user table I get the below results:
UserId | Name | DOB | Address | |
1 | Justin Ortiz | justinortiz@sandoval.com | 2006-06-11 | REDACTED |
2 | Michael Mayer | michaelmayer@wagner.com | 1939-02-20 | REDACTED |
3 | Dawn Robbins | dawnrobbins@estrada.com | 1958-09-03 | REDACTED |
4 | Brandon Mueller | brandonmueller@maddox.com | 1965-05-04 | REDACTED |
5 | Dakota Wade | dakotawade@fisher.com | 1979-01-03 | REDACTED |
6 | Lucy Budge | lucybudge@domain.com | 1981-06-23 | 6843 Glover Passage |
To remove a column mask run
ALTER TABLE user
ALTER COLUMN Address
DROP MASK;
Or remove it through Catalog Explorer.
When Should I Use a Dynamic View?
A view is a read-only object that is the result of a query over one or more tables and views in Unity Catalog. Dynamic views can be used to apply the same data masking and/or row filtering logic to data. This adds an extra object to the catalog: all queries must point to this object rather than the underlying table(s), and any changes made to the table(s) must be propagated through to this view where required.
There are a few situations where a dynamic view is preferable to row filters and/or column masks:
-
If the table you need to add filters/masks to is read-only (or you are not the owner), and it is acceptable to create a new object with a new name;
-
If you want to filter the data when sharing with Delta Sharing;
-
You want to create an object that combines data from several tables;
-
Users don't or can't have read permissions on the underlying table(s).
Databricks Security Groups
This is a very brief overview of the types of groups within Databricks, as this may be useful terminology to know when considering how to implement fine-grained access control. Once groups have been set-up and users (or other groups) assigned, membership of the current user can be checked with is_account_group_member()
.
Account Groups: Use to grant access to data in a Unity Catalog metastore, roles on service principals and permissions for identity federated workspaces.
Workspace-Local Groups: These are legacy groups and are only used in the context of the workspace they were created in.
External Groups: Created by your identity provider using a SCIM provisioning connector and stay up to date.
System Groups: Created and maintained by Databricks. Each account has an account system group called account users
which contains all users. Each workspace has two workspace-level system groups: users
and admins
. System groups cannot be deleted.
There are several benefits of using external groups. They can be automatically synced with the identity provider, therefore requiring very little maintenance, and the same groups can be used in multiple places; for example to implement row level security in Power BI. The permissions have to be defined in both places, but the users and groups are only defined and managed in one.
Requirements and Recommendations
Compute Requirements
You must be using one of:
-
SQL Warehouse
-
Shared access mode on Databricks Runtime 12.2 or above
-
Single User access mode on Databricks Runtime 15.4 or above
If you are using Single User access mode the workspace must be enabled for serverless compute as the data filtering functionality is carried out on serverless compute. You will be charged for this compute.
If you attempt to access a table with a row filter or column masks using an unsupported runtime version it will fail securely; no data will be returned.
Performance Recommendations
-
Use simple functions: the query engine will always choose security over optimization to ensure no information is leaked. If possible, use a simple
CASE WHEN
statement over an expression subquery or mapping table. -
Ensure all function arguments are necessary and used.
-
Use deterministic expressions that cannot throw errors: for example, use
try_divide
to avoid "division by zero" errors that could reveal information about hidden data.
Conclusion
Row filters and column masks are powerful data security tools and can step up the security of your data platform. Using this level of fine-grained access control within Unity Catalog can simplify your security model by controlling all data access through UC.
Contact us to discuss how we can help you secure your data using Unity Catalog.
Topics Covered :

Author
Lucy Budge