What is Pseudonymization?
Pseudonymization is a data de-identification procedure to replace personally identifiable information with artificial information.
Pseudonymization makes data record less identifiable while allowing data analysis. Pseudonymization allows re-identification of data with additional information, unlike Anonymization procedure. With Pseudonymization, we can mask BigQuery Table data while sharing it with other users by creating BigQuery table views.
Pseudonymization of BigQuery Table Data
We can produce Pseudonymized data of PII with help of TO_HEX and SHA256 BigQuery function.
Following SQL query will generate a Psedonymized data of “en_name” column
SELECT en_name, TO_HEX(SHA256(en_name)) as pseudonymised_en_name FROM `bigquery-public-data.google_ads.geotargets`
But this query generate same Pseudonymized data all the time.
By using BigQuery date functions we can generate new Pseudonymized data every day which gives more security to sensitive data.
Following SQL query will generate new Pseudonymized data whenever calendar date is changed.
SELECT en_name, TO_HEX(SHA256(CONCAT(en_name, FORMAT_DATE('%Y%m%d', CURRENT_DATE())))) as pseudonymised_en_name FROM `bigquery-public-data.google_ads.geotargets`
By creating a views with above BigQuery functions we can securely share PII data with other users for data analysis.
In this quick start demo, we have used BigQuery Hash functions to mask data data while querying a table. For more information about Hash functions read BigQuery official documentation.
More on Bigquery: