Automation of BigQuery Table Snapshot Creation

A BigQuery table snapshot preserves the contents of a table at a particular time. You can create a snapshot of a current table, or create a snapshot of a table as it was at any time in the past seven days. A table snapshot can have an expiration. When the configured amount of time has passed since the table snapshot was created, BigQuery deletes the table snapshot. You can restore a standard table from a table snapshot in the event of a base table data corruption.

You can take a table snapshot from BigQuery console or by running bq commands or by running BigQuery SQL statements. if you are maintaining large number of projects/datasets/tables you can’t take snapshots manually. you need a script to automate this process. following script will help you to automate this job.

Bash Script to Create Table Snapshots

#!/bin/bash

dataset_snapshot() {
        check_snapshot_dataset $1
        bq ls --project_id $1 $2|awk '{print $1}'|tail -n +3 |while read table;
        do
               echo "Creating snapshot of ${1}.${2}.${table} table with 30day expiration"
               bq cp --snapshot --no_clobber --project_id $1 --expiration=3888000 ${2}.${table} backup.${2}_${table}_`date +%d-%m-%Y-%H%M`
        done
}

project_snapshot() {
        check_snapshot_dataset $1
        bq ls --project_id $1 | awk '{print $1}'|tail -n +3 |grep -v ^backup|while read dataset
        do
                dataset_snapshot $1 $dataset
        done
}


check_snapshot_dataset() {
        STATUS=`bq ls --project_id $1 |grep -w backup`
        if [ $? -eq 0 ]
        then
                :
        else
                echo "Backup dataset does not exists, Creating it"
                bq mk --project_id $1 backup
        fi
}

help() {
        BLUE='\033[0;34m'
        NC='\033[0m'
        clear
        echo -e "\nScript creates bigquery table snapshots for all the datasets in a project or for a single dataset\n"
        echo -e "Before running the run gcloud auth login/activate-service-account \n"
        echo -e "This script takes table snapshots in "backup" dataset in each project or it will create if it does not exists\n"
        echo -e "To take a snapshot of a dataset\n"
        echo -e "${BLUE}Usage: bigquery-snapshot.sh <project-id> <dataset-id>\n${NC}"
        echo -e "To take a snapshot of all datasets in a project\n"
        echo -e "${BLUE}Usage: bigquery-snapshot.sh <project-id>\n${NC}"

}


if [ $# -eq 1 ]
then
        project_snapshot "$1"
elif [ $# == 2 ]
then
        dataset_snapshot "$1" "$2"
else
        help
fi

Save the above script into a file called “bigquery-snapshot.sh” and run like below.

First I have authenticated with my service account key file and then executed the script to backup all the tables in devops-counsel-demo project.

here you can see snapshots for each table under backup dataset.

You can run this script daily or weekly from crontab or as a Airflow DAG to create snapshots of BigQuery tables for a given project.

For more on BigQuery:

Pseudonymization of BigQuery Table Data

Leave a ReplyCancel reply

Exit mobile version
%%footer%%