GCP BigQuery Python Code Samples

In this quick demo of how to handle BgiQuery resources using Python SDK, we are going to create Datasets/Tables/Snapshots and load data into tables.

Import Python BigQuery Module & Initializing a client

from google.cloud import bigquery

bq_client = bigquery.Client(project='devops-counsel-demo')

Create a BigQuery Dataset

create a BigQuery dataset called “demo_dataset” in EU location with 2 days default table expiry.

from google.cloud import bigquery

bq_client = bigquery.Client(project='devops-counsel-demo')

dataset = bigquery.Dataset("devops-counsel-demo.demo_dataset")
dataset.location = 'EU'
dataset.default_table_expiration_ms = 172800000

bq_client.create_dataset(dataset)

Note: if you don’t define a location it will take default dataset location, that is US. there is no table expiry by default.

List BigQuery Datasets

from google.cloud import bigquery

bq_client = bigquery.Client(project='devops-counsel-demo')

dataset_list = bq_client.list_datasets()

for dataset in dataset_list:
    print(dataset.dataset_id)

output:

Get Dataset

from google.cloud import bigquery

bq_client = bigquery.Client(project='devops-counsel-demo')

dataset = bq_client.get_dataset('devops-counsel-demo.demo_dataset')

print("dataset id is '{}'".format(dataset.dataset_id))
print("dataset location is '{}'".format(dataset.location))

output

Update BigQuery Dataset

from google.cloud import bigquery

bq_client = bigquery.Client(project='devops-counsel-demo')

dataset = bq_client.get_dataset('devops-counsel-demo.demo_dataset')

dataset.description = "production dataset, don't delete"

bq_client.update_dataset(dataset, ["description"])

updated_dataset = bq_client.get_dataset('devops-counsel-demo.demo_dataset')

print("updated description of demo_dataset is: " +  updated_dataset.description)

output:

Creating a BigQuery Table

we are going to create a table called “demo_table” under “demo_dataset”

from google.cloud import bigquery

bq_client = bigquery.Client(project='devops-counsel-demo')

schema = [
    bigquery.SchemaField("fruit_name", "STRING", mode="REQUIRED"),
    bigquery.SchemaField("count", "INTEGER", mode="REQUIRED"),
    ]
table = bigquery.Table('devops-counsel-demo.demo_dataset.demo_table', schema=schema)

bq_client.create_table(table)

List BigQuery Tables

from google.cloud import bigquery

bq_client = bigquery.Client(project='devops-counsel-demo')

tables_list = bq_client.list_tables('devops-counsel-demo.demo_dataset')

for table in tables_list:
    print("Table ID is: " + table.table_id)

output:

Get Table

from google.cloud import bigquery

bq_client = bigquery.Client(project='devops-counsel-demo')

table = bq_client.get_table('demo_dataset.demo_table')

print("Table schema is: " + str(table.schema))

output:

Load data into table from a CSV file

we are using a example csv file to load data into a table, create a file called fruits.csv and add below lines to the file.

fruit_name,count
Apple,10
Grape,20
Banana,50

Run following program to load above csv file into table called “demo_table”( we created this in earlier steps)

from google.cloud import bigquery

bq_client = bigquery.Client(project='devops-counsel-demo')

job_config = bigquery.LoadJobConfig(
    source_format=bigquery.SourceFormat.CSV,
    skip_leading_rows=1,
    write_disposition='WRITE_TRUNCATE'
    )
with open('fruits.csv', "rb") as source_file:
    job = bq_client.load_table_from_file(source_file, 'demo_dataset.demo_table', job_config=job_config)

job.result()

table = bq_client.get_table('demo_dataset.demo_table')
print("Loaded number of rows: " + str(table.num_rows))

output:

Query a Table

from google.cloud import bigquery

bq_client = bigquery.Client(project='devops-counsel-demo')

query_job = bq_client.query('select * from demo_dataset.demo_table')

print("The query data:")
for row in query_job:
    print(row)

output:

Create a Table Snapshot

in this example we are creating snapshot with 30days expiration.

from google.cloud import bigquery
import datetime
from dateutil.relativedelta import relativedelta

bq_client = bigquery.Client(project='devops-counsel-demo')

today = datetime.date.today()
today_date = datetime.date(day=today.day, month=today.month, year=today.year)
expiry_date = today_date + relativedelta(days=30)
snap_query = f"""
    CREATE SNAPSHOT TABLE IF NOT EXISTS demo_dataset.demo_snapshot
    CLONE demo_dataset.demo_table
    OPTIONS(expiration_timestamp = TIMESTAMP "{str(expiry_date)} 00:00:00.00-00:00")
    """
bq_client.query(snap_query)

Conclusion

With the help of above code samples you can play around with BigQuery Datasets, Tables and Table Snapshots. You can find more information about BigQuery here.

Find out more topics on BigQuery:

Search Indexes in BigQuery

Automation of BigQuery Table Snapshot Creation

Leave a ReplyCancel reply

Exit mobile version
%%footer%%