GCP BigQuery Dynamic Data Masking

BigQuery Dynamic Data Masking

GCP BigQuery offers dynamic data masking at column level. By using data masking we obscure columns that holds sensitive data for users, groups and service accounts, while still allowing access to data in those columns.

What is Data Masking?

Data masking is a method of modifying sensitive data while presenting it to an end user or group in such a way that it is of no value to unauthorized user while still being usable by authorized users and programs.

How to Use BigQuery Dynamic Data Masking to obscure Table data

To mask sensitive data in a BigQuery table we need to follow these steps.

  • Create a data catalog taxonomy with at least one policy tag.
  • Create a data policy for the policy tag to map the masking rule with one or more principles.
  • Attach policy tags to sensitive columns in a BigQuery table.

Dynamic data masking workflow

Create a Data Catalog Taxonomy For Masking Table Data

Enable Data Catalog API, if it is not already enabled and navigate to Data Catalog console to create a Taxonomy. Click on “Create Taxonomy” button. It will open a wizard to enter details of Taxonomy.

taxonomy creation

Give a name to Taxonony and select location and fill details for policy tags. In this Example we are creating a Taxonomy with name “demo-taxonamy” in “US” location with “ID”, “Email” and “Name” policy tags. Finally click on “Create” button.

policy tag creation

In the below screen shot you can see details and policy tags of the taxonomy that we created.

policy tag details

Create Data Policies with Dynamic Data Masking Rules

Now let’s create Data Policies for Policy tags. Once you select a policy tag “Manage Data Policies” button gets active and click on that button to masking rules and principles.

data policy creation

Click on “ADD RULE” button.

data policy with masking rule

Give a name to data policy and select one of the 3 masking rules. we are using “Hash(SHA256)” rule for Email policy tag. Finally add a principle(user or group) for this rule. Repeat same process for ID policy tag and using “Default_Value” masking rule.

Once data policies are created policy tags, they look like below.

policy tags with data policies

Now let’s create a BigQuery table to apply these Policy tags and data policies. Crete a dataset called “masking_demo” and run below SQL query to create a table called sample table with ID, Name and Email columns.

CREATE TABLE masking_demo.sample_table(
  ID INTEGER,
  Name STRING,
  Email STRING) as
SELECT * FROM (
SELECT 4321 AS ID, 'Sam' AS Name,'sam@devopscounsel.com' AS Email
UNION ALL 
SELECT 9876 AS ID, 'Tom' AS Name,'tom@devopscounsel.com' AS Email
)

Attach Policy tags to BigQuery Table Columns

Following are the details of the table.

BigQuery table schema

To apply policy tags, we need to click on “EDIT SCHEMA” option. Then select a column -> “Add Policy Tag” -> Select a tag from demo-taxonomy -> click on “Select” -> Click on “SAVE”.

applying policy tags to Bigquery columns for masking data

Repeat same process for ID column, once policy tags are attached table schema looks like below.

policy tags on table columns

Now if we query the table we will see masked results.

BigQuery table output when Dynamic data masking is implemented

You can use below terraform code to achieve what we did above from console.

Note: At the time of writing this article, data policy creation resource block is not available in terraform google provider. after applying below terraform code manually data policies from Dataplex/DataCatalog console, like we did in above steps.

resource "google_data_catalog_taxonomy" "taxonomy" {
  provider               = google-beta
  project                = "devops-counsel-demo"
  region                 = "us"
  display_name           = "demo_tf_taxonomy"
  activated_policy_types = ["FINE_GRAINED_ACCESS_CONTROL"]
}

resource "google_data_catalog_policy_tag" "email_policy" {
  provider     = google-beta
  display_name = "email"
  taxonomy     = google_data_catalog_taxonomy.taxonomy.id
  description  = "A policy tag category used for high security access"
}

resource "google_data_catalog_policy_tag" "id_policy" {
  provider     = google-beta
  display_name = "id"
  taxonomy     = google_data_catalog_taxonomy.taxonomy.id
}

resource "google_data_catalog_policy_tag" "name_policy" {
  provider     = google-beta
  display_name = "name"
  taxonomy     = google_data_catalog_taxonomy.taxonomy.id
}

resource "google_bigquery_dataset" "dataset" {
  dataset_id = "demo_ddm"
  project    = "devops-counsel-demo"
}

resource "google_bigquery_table" "table" {
  dataset_id          = google_bigquery_dataset.dataset.dataset_id
  table_id            = "ddm_demo_table"
  deletion_protection = false
  schema              = <<EOF
[
  {
    "name": "ID",
    "type": "INTEGER",
    "policyTags": {
      "names": [
        "${google_data_catalog_policy_tag.id_policy.name}"
      ]
    }
  },
  {
    "name": "Name",
    "type": "STRING"
  },
  {
    "name": "Email",
    "type": "STRING",
    "policyTags": {
      "names": [
        "${google_data_catalog_policy_tag.email_policy.name}"
      ]
    }
  }
]
EOF
}
resource "random_id" "random" {
  byte_length = 8
}
resource "google_bigquery_job" "job" {
  job_id = random_id.random.id
  query {
    query = "INSERT  ${google_bigquery_table.table.table_id} (ID INTEGER, Name STRING, Email STRING) as SELECT * FROM (SELECT 4321 AS ID, 'Sam' AS Name,'sam@devopscounsel.com' AS Email UNION ALL SELECT 9876 AS ID, 'Tom' AS Name,'tom@devopscounsel.com' AS Email)"
  }
}

Log out of Terraform apply

random_id.random: Creating...
random_id.random: Creation complete after 0s [id=_W8wwEe8NdM]
google_data_catalog_taxonomy.taxonomy: Creating...
google_bigquery_dataset.dataset: Creating...
google_bigquery_dataset.dataset: Creation complete after 1s [id=projects/devops-counsel-demo/datasets/demo_ddm]
google_data_catalog_taxonomy.taxonomy: Creation complete after 2s [id=projects/devops-counsel-demo/locations/us/taxonomies/5631638234245625642]
google_data_catalog_policy_tag.name_policy: Creating...
google_data_catalog_policy_tag.email_policy: Creating...
google_data_catalog_policy_tag.id_policy: Creating...
google_data_catalog_policy_tag.name_policy: Creation complete after 1s [id=projects/devops-counsel-demo/locations/us/taxonomies/5631638234245625642/policyTags/5903016807678663757]
google_data_catalog_policy_tag.id_policy: Creation complete after 1s [id=projects/devops-counsel-demo/locations/us/taxonomies/5631638234245625642/policyTags/9015328222810608432]
google_data_catalog_policy_tag.email_policy: Creation complete after 1s [id=projects/devops-counsel-demo/locations/us/taxonomies/5631638234245625642/policyTags/6722565208645079226]
google_bigquery_table.table: Creating...
google_bigquery_table.table: Creation complete after 1s [id=projects/devops-counsel-demo/datasets/demo_ddm/tables/ddm_demo_table]
google_bigquery_job.job: Creating...
google_bigquery_job.job: Creation complete after 0s [id=projects/devops-counsel-demo/jobs/_W8wwEe8NdM]

Apply complete! Resources: 8 added, 0 changed, 0 destroyed.

When we try to query the table that we created with above Terraform code, we will get access denied error because of no data policies attached. To fix the error and get masked results, manually created data policies with masking rules for tag polices.

Conclusion

In this quick start demo, we created DataCatalog taxonomy with data polies and masking rules and applied them on BigQuery table columns to mask sensitive data. We also used terraform to deploy them.Read BigQuery official documentation for more details.

For more on BigQuery:

Search Indexes in BigQuery

Leave a Reply

%d