Automation of BigQuery Table Snapshot Creation

A BigQuery table snapshot preserves the contents of a table at a particular time. You can create a snapshot of a current table, or create a snapshot of a table as it was at any time in the past seven days. A table snapshot can have an expiration. When the configured amount of time has passed since the table snapshot was created, BigQuery deletes the table snapshot. You can restore a standard table from a table snapshot in the event of a base table data corruption.

You can take a table snapshot from BigQuery console or by running bq commands or by running BigQuery SQL statements. if you are maintaining large number of projects/datasets/tables you can’t take snapshots manually. you need a script to automate this process. following script will help you to automate this job.

Bash Script to Create Table Snapshots

#!/bin/bash

dataset_snapshot() {
        check_snapshot_dataset $1
        bq ls --project_id $1 $2|awk '{print $1}'|tail -n +3 |while read table;
        do
               echo "Creating snapshot of ${1}.${2}.${table} table with 30day expiration"
               bq cp --snapshot --no_clobber --project_id $1 --expiration=3888000 ${2}.${table} backup.${2}_${table}_`date +%d-%m-%Y-%H%M`
        done
}

project_snapshot() {
        check_snapshot_dataset $1
        bq ls --project_id $1 | awk '{print $1}'|tail -n +3 |grep -v ^backup|while read dataset
        do
                dataset_snapshot $1 $dataset
        done
}


check_snapshot_dataset() {
        STATUS=`bq ls --project_id $1 |grep -w backup`
        if [ $? -eq 0 ]
        then
                :
        else
                echo "Backup dataset does not exists, Creating it"
                bq mk --project_id $1 backup
        fi
}

help() {
        BLUE='\033[0;34m'
        NC='\033[0m'
        clear
        echo -e "\nScript creates bigquery table snapshots for all the datasets in a project or for a single dataset\n"
        echo -e "Before running the run gcloud auth login/activate-service-account \n"
        echo -e "This script takes table snapshots in "backup" dataset in each project or it will create if it does not exists\n"
        echo -e "To take a snapshot of a dataset\n"
        echo -e "${BLUE}Usage: bigquery-snapshot.sh <project-id> <dataset-id>\n${NC}"
        echo -e "To take a snapshot of all datasets in a project\n"
        echo -e "${BLUE}Usage: bigquery-snapshot.sh <project-id>\n${NC}"

}


if [ $# -eq 1 ]
then
        project_snapshot "$1"
elif [ $# == 2 ]
then
        dataset_snapshot "$1" "$2"
else
        help
fi

Save the above script into a file called “bigquery-snapshot.sh” and run like below.

First I have authenticated with my service account key file and then executed the script to backup all the tables in devops-counsel-demo project.

here you can see snapshots for each table under backup dataset.

Automation of BigQuery Table Snapshot Creation

You can run this script daily or weekly from crontab or as a Airflow DAG to create snapshots of BigQuery tables for a given project.

For more on BigQuery:

Pseudonymization of BigQuery Table Data

Leave a Reply

%d