A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://developers.google.com/bigquery/docs/column-data-masking-intro below:

Introduction to data masking | BigQuery

Stay organized with collections Save and categorize content based on your preferences.

Introduction to data masking Note: This feature may not be available when using reservations that are created with certain BigQuery editions. For more information about which features are enabled in each edition, see Introduction to BigQuery editions.

BigQuery supports data masking at the column level. You can use data masking to selectively obscure column data for users groups, while still allowing them access to the column. Data masking functionality is built on top of column-level access control, so you should familiarize yourself with that feature before you proceed.

When you use data masking in combination with column-level access control, you can configure a range of access to column data, from full access to no access, based on the needs of different user groups. For example, for tax ID data, you might want to grant your accounting group full access, your analyst group masked access, and your sales group no access.

Benefits

Data masking provides the following benefits:

Data masking workflow

There are two ways of masking data. You can create a taxonomy and policy tags, then configure data policies on the policy tags. Alternatively, you can set a data policy directly on a column Preview. This lets you map a data masking rule on your data without handling policy tags or creating additional taxonomies.

Set a data policy directly on a column

You can configure dynamic data masking directly on a column (Preview). To do so, perform the following steps:

  1. Create a data policy.

  2. Assign a data policy to a column.

Mask data with policy tags

Figure 1 shows the workflow for configuring data masking:

Figure 1. Data masking components.

You configure data masking with the following steps:

  1. Set up a taxonomy and one or more policy tags.
  2. Configure data policies for the policy tags. A data policy maps a data masking rule and one or more principals, which represent users or groups, to the policy tag.

    When creating a data policy by using the Google Cloud console, you create the data masking rule and specify the principals in one step. When creating a data policy by using the BigQuery Data Policy API, you create the data policy and data masking rule in one step, and specify the principals for the data policy in a second step.

  3. Assign the policy tags to columns in BigQuery tables to apply the data policies.

  4. Assign users who should have access to masked data to the BigQuery Masked Reader role. As a best practice, assign the BigQuery Masked Reader role at the data policy level. Assigning the role at the project level or higher grants users permissions to all data policies under the project, which can lead to issues caused by excess permissions.

    The policy tag that is associated with a data policy can also be used for column-level access control. In that case, the policy tag is also associated with one or more principals who are granted the Data Catalog Fine-Grained Reader role. This enables these principals to access the original, unmasked column data.

Figure 2 shows how column-level access control and data masking work together:

Figure 2. Data masking components.

For more information about role interaction, see How Masked Reader and Fine-Grained Reader roles interact. For more information about policy tag inheritance, see Roles and policy tag hierarchy.

Data masking rules

When you use data masking, a data masking rule is applied to a column at query runtime, based on the role of the user running the query. Masking takes precedence to any other operations involved in the query. The data masking rule determines the type of data masking applied to the column data.

You can use the following data masking rules:

Data masking rule hierarchy

You can configure up to nine data policies for a policy tag, each with a different data masking rule associated with it. One of these policies is reserved for column-level access control settings. This makes it possible for several data policies to be applied to a column in a user's query, based on the groups that the user is a member of. When this happens, BigQuery chooses which data masking rule to apply based on the following hierarchy:

  1. Custom masking routine
  2. Hash (SHA-256)
  3. Email mask
  4. Last four characters
  5. First four characters
  6. Date year mask
  7. Default masking value
  8. Nullify

For example, user A is a member of both the employees and the accounting groups. User A runs a query that includes the sales_total field, which has the confidential policy tag applied. The confidential policy tag has two data policies associated with it: one that has the employees role as the principal and applies the nullify data masking rule, and one that has the accounting role as the principal and applies the hash (SHA-256) data masking rule. In this case, the hash (SHA-256) data masking rule is prioritized over the nullify data masking rule, so the hash (SHA-256) rule is applied to the sales_total field value in user A's query.

Figure 3 shows this scenario:

Figure 3. Data masking rule prioritization.

Roles and permissions Roles for managing taxonomies and policy tags

You need the Data Catalog Policy Tag Admin role to create and manage taxonomies and policy tags.

Role/ID Permissions Description Data Catalog Policy Tag Admin/datacatalog.categoryAdmin datacatalog.categories.getIamPolicy
datacatalog.categories.setIamPolicy
datacatalog.taxonomies.create
datacatalog.taxonomies.delete
datacatalog.taxonomies.get
datacatalog.taxonomies.getIamPolicy
datacatalog.taxonomies.list
datacatalog.taxonomies.setIamPolicy
datacatalog.taxonomies.update
resourcemanager.projects.get
resourcemanager.projects.list

Applies at the project level.

This role grants the ability to do the following:

Roles for creating and managing data policies

You need one of the following BigQuery roles to create and manage data policies:

Role/ID Permissions Description BigQuery Data Policy Admin/bigquerydatapolicy.admin

BigQuery Admin/bigquery.admin

BigQuery Data Owner/bigquery.dataOwner

bigquery.dataPolicies.create
bigquery.dataPolicies.delete
bigquery.dataPolicies.get
bigquery.dataPolicies.getIamPolicy
bigquery.dataPolicies.list
bigquery.dataPolicies.setIamPolicy
bigquery.dataPolicies.update

The bigquery.dataPolicies.create and bigquery.dataPolicies.list permissions apply at the project level. The other permissions apply at the data policy level.

This role grants the ability to do the following:

You also need the

datacatalog.taxonomies.get

permission, which you can get from several of the

Data Catalog predefined roles

.

Roles for attaching policy tags to columns

You need the datacatalog.taxonomies.get and bigquery.tables.setCategory permissions to attach policy tags to columns. datacatalog.taxonomies.get is included in the Data Catalog Policy Tags Admin and Viewer roles. bigquery.tables.setCategory is included in the BigQuery Admin (roles/bigquery.admin) and BigQuery Data Owner (roles/bigquery.dataOwner) roles.

Roles for querying masked data

You need the BigQuery Masked Reader role to query the data from a column that has data masking applied.

Role/ID Permissions Description Masked Reader/bigquerydatapolicy.maskedReader bigquery.dataPolicies.maskedGet

Applies at the data policy level.

This role grants the ability to view the masked data of a column that is associated with a data policy.

Additionally, a user must have appropriate permissions to query the table. For more information, see Required permissions.

How Masked Reader and Fine-Grained Reader roles interact

Data masking builds on top of column-level access control. For a given column, it is possible to have some users with the BigQuery Masked Reader role that allows them to read masked data, some users with the Data Catalog Fine-Grained Reader role that allows them to read unmasked data, some users with both, and some users with neither. These roles interact as follows:

In the case where a table has columns that are secured or secured and masked, in order to run a SELECT * FROM statement on that table, a user must be a member of appropriate groups such that they are granted Masked Reader or Fine-Grained Reader roles on all of these columns.

A user who is not granted these roles must instead specify only columns that they have access to in the SELECT statement, or use SELECT * EXCEPT (restricted_columns) FROM to exclude the secured or masked columns.

Authorization inheritance in a policy tag hierarchy

Roles are evaluated starting at the policy tag associated with a column, and then checked at each ascending level of the taxonomy, until the user either is determined to have appropriate permissions or the top of the policy tag hierarchy is reached.

For example, take the policy tag and data policy configuration shown in Figure 4:

Figure 4. Policy tag and data policy configuration.

You have a table column that is annotated with the Financial policy tag, and a user who is a member of both the ftes@example.com and analysts@example.com groups. When this user runs a query that includes the annotated column, their access is determined by the hierarchy defined in the policy tag taxonomy. Because the user is granted the Data Catalog Fine-Grained Reader role by the Financial policy tag, the query returns unmasked column data.

If another user who is only a member of the ftes@example.com role runs a query that includes the annotated column, the query returns column data that has been hashed using the SHA-256 algorithm, because the user is granted the BigQuery Masked Reader role by the Confidential policy tag, which is the parent of the Financial policy tag.

A user who is not a member of either of those roles gets an access denied error if they try to query the annotated column.

In contrast with the preceding scenario, take the policy tag and data policy configuration shown in Figure 5:

Figure 5. Policy tag and data policy configuration.

You have the same situation as shown in Figure 4, but the user is granted the Fine-Grained Reader role at a higher level of the policy tag hierarchy, and the Masked Reader role at a lower level of the policy tag hierarchy. Because of this, the query returns masked column data for this user. This happens even though the user is granted the Fine-Grained Reader role further up the tag hierarchy, because the service uses the first assigned role it encounters as it ascends the policy tag hierarchy to check for user access.

If you want to create a single data policy and have it apply to several levels of a policy tag hierarchy, you can set the data policy on the policy tag that represents the topmost hierarchy level to which it should apply. For example, take a taxonomy with the following structure:

If you want a data policy to apply to all of these policy tags, set the data policy on policy tag 1. If you want a data policy to apply to policy tag 1b and its children, set the data policy on policy tag 1b.

Data masking with incompatible features

When you use BigQuery features that aren't compatible with data masking, the service treats the masked column as a secured column, and only grants access to users who have the Data Catalog Fine-Grained Reader role.

For example, take the policy tag and data policy configuration shown in Figure 6:

Figure 6. Policy tag and data policy configuration.

You have a table column that is annotated with the Financial policy tag, and a user who is a member of the analysts@example.com group. When this user tries to access the annotated column through one of the incompatible features, they get an access denied error. This is because they are granted the BigQuery Masked Reader by Financial policy tag, but in this case, they must have the Data Catalog Fine-Grained Reader role. Because the service has already determined an applicable role for the user, it does not continue to check farther up the policy tag hierarchy for additional permissions.

Data masking example with output

To see how tags, principals, and roles work together, consider this example.

At example.com, basic access is granted through the data-users@example.com group. All employees who need regular access to BigQuery data are members of this group, which is assigned all the necessary permissions to read from tables as well as the BigQuery Masked Reader role.

Employees are assigned to additional groups that provide access to secured or masked columns where that is required for their work. All members of these additional groups are also members of data-users@example.com. You can see how these groups are associated with appropriate roles in Figure 7:

Figure 7. Policy tags and data policies for example.com.

The policy tags are then associated with table columns, as shown in Figure 8:

Figure 8. Example.com policy tags associated with table columns.

Given the tags that are associated with the columns, running SELECT * FROM Accounts; leads to the following results for the different groups:

Cost considerations

Data masking might indirectly affect the number of bytes processed, and therefore affect the cost of the query. If a user queries a column that is masked for them with the Nullify or Default Masking Value rules, then that column isn't scanned at all, resulting in fewer bytes processed.

Restrictions and limitations

The following sections describe the categories of restrictions and limitations that data masking is subject to.

Data policy management Policy tags Set access control

After a taxonomy has a data policy associated with at least one of its policy tags, access control is automatically enforced. If you want to turn off access control, you must first delete all of the data policies associated with the taxonomy.

Materialized views and repeated record masking queries

If you have existing materialized views, repeated record masking queries on the associated base table fail. To resolve this issue, delete the materialized view. If the materialized view is needed for other reasons, you can create it in another dataset.

Query masked columns in partitioned tables

Queries that include data masking on the partitioned or clustered columns are not supported.

SQL dialects

Legacy SQL is not supported.

Custom masking routines

Custom masking routines are subject to the following limitations:

Compatibility with other BigQuery features BigQuery API

Not compatible with the tabledata.list method. To call tabledata.list, you need full access to all of the columns returned by this method. The Data Catalog Fine-Grained Reader role grants appropriate access.

BigLake tables

Compatible. Data masking policies are enforced on BigLake tables.

BigQuery Storage Read API

Compatible. Data masking policies are enforced in the BigQuery Storage Read API.

BigQuery BI Engine

Compatible. Data masking policies are enforced in the BI Engine. Queries that have data masking in effect are not accelerated by BI Engine. Use of such queries in Looker Studio might cause related reports or dashboards to become slower and more expensive.

BigQuery Omni

Compatible. Data masking policies are enforced on the BigQuery Omni tables.

Collation

Partially compatible. You can apply DDM to collated columns, but masking is applied before collation. This order of operations can lead to unexpected results, as collation might not affect the masked values as intended (for example, case-insensitive matching might not work after masking). Workarounds are possible, such as using custom masking routines that normalize data before applying the masking function.

Copy jobs

Not compatible. To copy a table from source to the destination, you need to to have full access to all of the columns on the source table. The Data Catalog Fine-Grained Reader role grants appropriate access.

Data export

Compatible. If you have the BigQuery Masked Reader role, then the exported data is masked. If you have the Data Catalog Fine-Grained Reader role, then the exported data is not masked.

Row-level security

Compatible. Data masking is applied on top of row-level security. For example, if there is a row access policy applied on location = "US" and location is masked, then users are able to see rows where location = "US" but the location field is masked.

Search in BigQuery

Partially compatible. You can call the SEARCH function on indexed or unindexed columns that have data masking applied.

When you call the SEARCH function on columns that have data masking applied, you must use search criteria compatible with your level of access. For example, if you have Masked Reader access with a Hash (SHA-256) data masking rule, you would use the hash value in your SEARCH clause, similar to the following:

SELECT * FROM myDataset.Customers WHERE SEARCH(Email, "sg172y34shw94fujaweu");

If you have Fine-Grained Reader access, you would use the actual column value in your SEARCH clause, similar to the following:

SELECT * FROM myDataset.Customers WHERE SEARCH(Email, "jane.doe@example.com");

Searching is less likely to be useful if you have Masked Reader access to a column where the data masking rule used is Nullify or Default Masking Value. This is because the masked results you would use as search criteria, such as NULL or "", aren't sufficiently unique to be useful.

When searching on an indexed column that has data masking applied, the search index is only used if you have Fine-Grained Reader access to the column.

Snapshots

Not compatible. To create a snapshot of a table, you need full access to all of the columns on the source table. The Data Catalog Fine-Grained Reader role grants appropriate access.

Table renaming

Compatible. Table renaming is not affected by data masking.

Time travel

Compatible with both time decorators and the FOR SYSTEM_TIME AS OF option in SELECT statements. The policy tags for the current dataset schema are applied to the retrieved data.

Query caching

Partly compatible. BigQuery caches query results for approximately 24 hours, although the cache is invalidated if changes are made to the table data or schema before that. In the following circumstance, it is possible that a user who does not have the Data Catalog Fine-Grained Reader role granted on a column can still see the column data when they run a query:

  1. A user has been granted the Data Catalog Fine-Grained Reader role on a column.
  2. The user runs a query that includes the restricted column and the data is cached.
  3. Within 24 hours of Step 2, the user is granted the BigQuery Masked Reader role, and has the Data Catalog Fine-Grained Reader role revoked.
  4. Within 24 hours of Step 2, the user runs that same query, and the cached data is returned.
Wildcard table queries

Not compatible. You need full access to all of the referenced columns on all of the tables matching the wildcard query. The Data Catalog Fine-Grained Reader role grants appropriate access.

What's next

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4